Streaming XML with Rails

December 11th, 2009

Exposing a Ruby on Rails application API is dead simple. The easiest way, which is proposed by the scaffolding generator will call ActiveRecord’s to_xml method.

Unfortunately, this approach will load all the records in memory. Furthermore, XML will need to be fully generated before a single bit is sent back to the client. Since most real-world applications will gather impressive amounts of data over the years, this will become a show stopper at some point in the life of the application. Let’s examine how to solve this predicament.

First, get the excellent will_paginate plugin to scale up the rendering of your HTML. That being taken care of, let’s now focus on the XML. I have been Googling this topic and didn’t find any authoritative answers. The only post that I came across is a note from Otto Hilska on the APIdock website. The approach proposed by Otto is sound, but hard to grasp without additional explanation.

The following concepts are involved in the solution. These concepts represent some of the best features of Ruby, that made the language stand apart from others (although the innovation is now spreading).

  1. An Enumerator to encapsulate batch loading of records
  2. A Block to render the output (this post is a nice summary)
  3. Duck Typing (Wikipedia sums it up well) to send the XML response back

First, let’s take a look at the standard Rails way to process a gazillion of records in batches to conserve memory:

MyItem.find_each { |item| item.do_stuff }

This will load records by 1000s and transparently allow you to execute some code for each item. However, this code will iterate right-now-right-now instead of later-later. Let’s wrap the batch processing logic in an Enumerator and save it for later.

@my_items = Enumerable::Enumerator.new(MyItem, :find_each)

For Ruby >= 1.9, Enumerator is no longer in the Enumerable module. Use this code snippet instead:

@my_items = Enumerator.new(MyItem, :find_each)

Next, in order to stream content, you must pass a Block to the render method. The block will be responsible to process records, and send the XML output on-the-fly to the response object.

respond_to do |format|
  format.xml do
    render :text => lambda { |response, output|
      xml = Builder::XmlMarkup.new(
        :target => StreamingOutputWrapper.new(output) )
      eval(default_template.source, binding, default_template.path)
    }
  end
end

First, the XmlMarkup builder is created and the output object is set as its target. Next, the eval method will render your view, in a similar way as the Builder template would do. The output object is expected to walk like a Duck and respond to the << method called by the builder. Unfortunately, ActionController::Response lost its << method in Rails 2.3, thus we need the StreamingOutputWrapper.The best place to define the wrapper class is in your ApplicationController base class.

class ApplicationController < ActionController::Base
  # …
  class StreamingOutputWrapper
    def initialize(output)
      @output = output
    end
    def <<(*args)
      @output.write(*args)
    end
  end
end

The controller will now stream the output while records are being loaded. If you are using Phusion Passenger to deploy Rails, Apache will compress the stream on the fly with gzip. This will result in a more efficient usage of multi-core CPUs, better bandwidth management and a better overall performance of your application.

Note: An issue with Rails 2.3.2 will cause your app to crash in development mode when blocks are used with rendering. You must upgrade to Rails >= 2.3.4 to make it work.

The Open Source Paradigm

December 10th, 2009

I haven’t been writing much lately since I am working heads down on an Inventory Tracking system. The system is being built using Ruby on Rails. BTW, Ruby is the greatest language I’ve been using in my career so far. I’ve been rubying for more than four years now, and I am still discovering nifty ways to express logic almost every day.

The Open Source movement combined with distributed version control and repositories such as github have a profound effect on software development. First and foremost, I am spending most of my time focusing on the specificity of the application being built, rather than Reinventing the Wheel. Many generic problems have been already analyzed and solved. An ingenious integration of the existing solution into the architecture is usually an order of magnitude more effective than a stick-built one.

As a side note, git is a terrific version control tool. Many thanks to all the developers that made it possible. A special “thank you” goes to my dear friend from college, Nicolas Pitre, who is one of the smartest people I met.

Open Source for America

July 26th, 2009

Open Source for America is a new advocacy group that recently emerged in Washington D.C. Its goal is to promote the use of Open Source Software (OSS) in the government. Prognosoft is now a proud member of the organization, working with other technology companies to make the United States government more accountable and transparent.

With more than 4 million computers (1) used by the federal government, OSS indeed has the potential to provide considerable cost savings to the taxpayer by reducing licensing costs that the government spends on proprietary software. However, a more significant benefit should be an increased transparency for government operations. In addition, we believe that OSS adoption will level the playing field for both small and large businesses, driving more innovation and creating opportunities for small businesses to work with the government.

Notes:

  1. Information from a Memorandum to the heads of departments and agencies dated June 2003.

Ruby Nation 2009 – Second Day

June 14th, 2009

The previous day of the conference provided precious ruby stones of knowledge, wisdom and possible new friendships. I am now looking forward to hear the rest of the presentations.

Be Careful, Your Java is Showing

I tried to make Ruby more natural, not simple, in a way that mirrors life.
Yukihiro “Matz” Matsumoto

An analogy can be made between cultural immersion and learning a programming language. While learning it, study Ruby’s culture, style and techniques of writing code. Check out some of the most prominent Rubyists on github, immerse yourself in their work and strive to understand their ways. Some of the habits to learn include:

  • Breathe intelligence into your code by passing instructions to methods using blocs. For example, in a.collect {|e| e*e}, the collect method receives instructions on how to generate the elements of the new array. In contrast, an old fashioned loop will reveal the ugliness of code that intertwines data structures and algorithms:

    b = []
    for i in 0..a.length
    b << a[i]*a[i]
    end
  • Follow Ruby formatting conventions (underscore method names, spaces instead of tabs), method punctuations (?=boolean, !=dangerous), language features (unless instead of if !).
  • Use meta programming (belongs_to, etc.) to increase the expressiveness of your code by a whole order of magniture

Also check out:

Distributed Computing with Ruby and TupleSpaces

Some of the options for parallel computing in Ruby include:

  • Green threads in Ruby 1.8
  • Native threads are available in 1.9, but the Global Interpreter Lock makes concurrency a pipe dream
  • Fibers allow a lightweight threading model with cooperative scheduling, which provides some relief for multi-core programming
  • JRuby has native threads through the JVM
  • Hybrid model with M kernel threads for N user threads (like erlang)

Presenter's recommendation: threads are hard, use processes. With no shared memory, programming is easier but more expensive. Also, inter-process communication is needed. You can choose any from the following:

  • DRB
  • Sockets
  • Queues
  • Key-Value Databases

Rinda Tuplespaces is an interesting implementation of concurrent repository to enable inter-process communication. It is a built-in library in Ruby. However, it has limitations such as single point of failure and not being platform-independent.

Luc implemented the Blackboard tuplespace on top of Redis (key-value database). It shows promise, but has currently scalability issues. An Erlang-based implementation is planned for the future.

Building Native Mobile Apps in Ruby

Rhodes targets five most popular mobile platforms: iPhone, Windows Mobile, RIM, Symbian and Android. It allows building applications in Ruby and it provides interfaces for GPS, PIM and the camera. A sync service is also available to keep some data on the device. The framework borrows heavily from Ruby on Rails, but is more restrictive. However, the restrictions make it easier to be compliant with app stores. In addition, a hosted development service is available. Developers can choose to use the GPL for their apps with no royalty fees, or choose a commercial license. Check it out on twitter @rhomobile.

From Rails to Rack: Making Rails 3 a Better Ruby Citizen

The Rails project is merging with Merbs. Rails 3.0 will make building and extending frameworks easier by being more modular and layered. The improvements will be in:

  • How libraries are used by Rails
  • How Rails is working with Libraries (middleware)
  • Ease of building libraries on top of Rails

Lightning Talks

  • Jack Dempsey - Beet: project generator that runs a recipe to go beyond than what rails do out of the box.
  • Tim Morton - TDD: Machinist replaces Fixtures, Shoulda is better than Test:Unit
  • Jeff Schoolcraft - RailsBridge: community projects at http://railsbridge.org/.
  • Michael Harrison - Clojure: LISP like language on JVM. Copy operations are more efficient with Software Transactional Memory and data is immutable.

Requires SEO; acts_up

By default, Rails is not search engine-friendly. Simple changes, such as making links human-readable, can make a huge difference. Other techniques, such as faceted navigation and redirection to eliminate duplicate pages will improve both user experience and bot navigation.

Use Google Webmaster tools and documentation to your advantage. It contains almost everything you need to know. Avoid the snake oil salesmen for SEO.

Herding Tigers - Software Development and the Art of War

Great presentation that describes how to take agile development to the next level. Remember, Agile is not a methodology, it is a mindset. There are many methodologies to choose from, but without the agile mindset, failure will be waiting around the corner. The key to success is speed of execution. Speed mitigates risk by triggering failures earlier in the project. Deliver quickly, deliver often, deliver a lot.

Ruby Nation 2009 – First Day

June 12th, 2009

Last year, I attended the Ruby Nation conference in DC. It was a great conference that felt like a breeze of fresh ideas. As I sip my coffee waiting for the first speaker to kick off this year’s edition, I am hopeful that it will inspire the 150 developers that signed up for the event, the same way as it did in the past.

Preparing for Ruby 1.9
Ok, I don’t have ruby 1.9, since Apple still ships 1.8.6. Download source, configure, make and in 5 minutes I have it running on my Mac. This is the beauty of the open source revolution.

Features discussed by David Black that I like:

  • Scratch variables in blocks eliminate fear of clobbering another variable outside the block:
    [1,2,3].each {|n;x| x=n*10; puts x}
  • String encoding can be viewed or changed on the fly
  • String is not enumerable anymore, but it has methods that are:
    "I like Ruby".chars {|c| puts c}
  • Cyclic enumerations:
    ("a".."z").cycle{|c| puts c}
  • Eigenclass has published a comprehensive list of Changes in Ruby 1.9.

Design Patterns In Dynamic Languages

Before the conference started, I had a chat with Russ. We quickly came to the conclusion that old school thinking and coherent software development is alive and kicking. Design patterns is one of those things that you should pay attention to.

With that being said, 34000 books have been written about design patterns (as searched on Amazon). It is a challenge to to extract the gist of why we need design patterns, and this is how Russ put it:

Design patterns allow to extend the space of what a programming language can natively do. For example, it can describe a loop in Assembly or a class factory in Java.

In Ruby, Java patterns can stay the same, disappear or morph:

  • The patterns that stay the same, can sometimes be implemented more efficiently in Ruby. This code will give you a head start by transparently calling all the methods of the wrapped object:

    def method_missing(method, *args)
    real_object.send(method, *args)
    end
  • Patterns like Singleton may disappear. In Ruby, it is implemented as a module:

    class MyClass
    include Singleton
    end
  • Ruby can specify the acceptable behavior for an object at runtime, so selecting a specific class isn’t the only way to have the right behavior. For example, the belongs_to macro-like method in ActiveRecord, adds methods needed to access the associated record.

There is no final word in technology. Patterns should evolve and adapt to new realities.

Comics is Hard: Domain Modeling Challenges

What does the biological taxonomy, comic books and social networking have in common? They are difficult to model using a relational database. Consider some alternatives when facing this kind of challenge:

  • Column-oriented databases such as Apache Cassandra
  • Document-oriented databases such as CouchDB
  • Graph-based databases such as Neo4j

Reia: The Next Big Thing?

Object Oriented programming and Functional programming may converge into some unified language. It is too soon to tell what that language will look like, but Reia can provide some insights. Reia combines Ruby and Erlang to create a functional, distributed and object oriented language. It is still work in progress but worth trying.

Lightning Talks

Rails on JRuby: deploying Ruby on Rails in J2EE application containers

Static Types in Ruby: Diamondback Ruby

In-House IT Perspective: Understand the domain knowledge to speak the business language with your customers.

Java Swing in Ruby: Pass behaviors instead of subclassing Actions. Also take a look at clojure.

specMSMS allows to upload, analyze, share and compare data (bioproximity.com)

Help to build a more transparent government by joining civic coders: Sunlight Foundation, Twitter: @LuigiMontane and @sunlightlabs

‘Handling’ Legacy Databases with Rails and DataMapper
Datamapper is a great piece of technology to access relational databases. I used it myself to interface QuickBooks with SugarCRM and I appreciated its flexibility to access a legacy database.

The Passionate Programmer

A blog can’t do justice to Chad’s presentation. You need to listen to his talk, or even better, buy his book The Passionate Programmer.