Archive for the ‘Uncategorized’ Category

Streaming XML with Rails

Friday, December 11th, 2009

Exposing a Ruby on Rails application API is dead simple. The easiest way, which is proposed by the scaffolding generator will call ActiveRecord’s to_xml method.

Unfortunately, this approach will load all the records in memory. Furthermore, XML will need to be fully generated before a single bit is sent back to the client. Since most real-world applications will gather impressive amounts of data over the years, this will become a show stopper at some point in the life of the application. Let’s examine how to solve this predicament.

First, get the excellent will_paginate plugin to scale up the rendering of your HTML. That being taken care of, let’s now focus on the XML. I have been Googling this topic and didn’t find any authoritative answers. The only post that I came across is a note from Otto Hilska on the APIdock website. The approach proposed by Otto is sound, but hard to grasp without additional explanation.

The following concepts are involved in the solution. These concepts represent some of the best features of Ruby, that made the language stand apart from others (although the innovation is now spreading).

  1. An Enumerator to encapsulate batch loading of records
  2. A Block to render the output (this post is a nice summary)
  3. Duck Typing (Wikipedia sums it up well) to send the XML response back

First, let’s take a look at the standard Rails way to process a gazillion of records in batches to conserve memory:

MyItem.find_each { |item| item.do_stuff }

This will load records by 1000s and transparently allow you to execute some code for each item. However, this code will iterate right-now-right-now instead of later-later. Let’s wrap the batch processing logic in an Enumerator and save it for later.

@my_items = Enumerable::Enumerator.new(MyItem, :find_each)

For Ruby >= 1.9, Enumerator is no longer in the Enumerable module. Use this code snippet instead:

@my_items = Enumerator.new(MyItem, :find_each)

Next, in order to stream content, you must pass a Block to the render method. The block will be responsible to process records, and send the XML output on-the-fly to the response object.

respond_to do |format|
  format.xml do
    render :text => lambda { |response, output|
      xml = Builder::XmlMarkup.new(
        :target => StreamingOutputWrapper.new(output) )
      eval(default_template.source, binding, default_template.path)
    }
  end
end

First, the XmlMarkup builder is created and the output object is set as its target. Next, the eval method will render your view, in a similar way as the Builder template would do. The output object is expected to walk like a Duck and respond to the << method called by the builder. Unfortunately, ActionController::Response lost its << method in Rails 2.3, thus we need the StreamingOutputWrapper.The best place to define the wrapper class is in your ApplicationController base class.

class ApplicationController < ActionController::Base
  # …
  class StreamingOutputWrapper
    def initialize(output)
      @output = output
    end
    def <<(*args)
      @output.write(*args)
    end
  end
end

The controller will now stream the output while records are being loaded. If you are using Phusion Passenger to deploy Rails, Apache will compress the stream on the fly with gzip. This will result in a more efficient usage of multi-core CPUs, better bandwidth management and a better overall performance of your application.

Note: An issue with Rails 2.3.2 will cause your app to crash in development mode when blocks are used with rendering. You must upgrade to Rails >= 2.3.4 to make it work.