website stat

Python versus plus Ruby

While AJAX strives through the Web, Service Oriented Architectures (SOA) pave their way onto the Enterprise. One of the best things SOA has brought is interoperability -- no more proprietary protocols and freakish RPC (this comes with one disadvantage though -- the cost of parsing XML).

The key point I'd like to introduce is that we are entering an era of cooperation rather than competition, at least to what concerns to software development. How?

blogaqui.com has a crawler to fetch all the registered blog's feeds and add them to a local database. This allows several things on blogaqui.com being the most popular one the planets. The crawler was initially built in Ruby. Here's a very brief overview.

RUBY:
  1. # get urls to fetch from DB
  2. # parse Atom or RSS and add a new record in the DB

It served its purpose but it didn't prime for efficiency. A quick profiling revealed that 80% of the CPU power was being taken by the XML parsing. That XML parsing was handled by REXML, a Ruby XML library. libxml addresses this issue being way faster than REXML. Unfortunately I was using a feed parsing library (do not reinvent the wheel) so there were two options:

  • Change all the feed parsing library to make use of libxml instead of REXML or build an Adapter to convert REXML calls to libxml;
  • Use a faster feed parsing library

Although there are some other options for parsing feeds, none were as complete as the one I was using (FeedTools by the way) and the majority also used REXML to handle the parsing. Then I stumbled upon myself thinking "Why should I insist in doing the crawler in Ruby? It's an offline operation anyways which does not interfere with the main Rails application". Light shined and then there was Python.

Python has a great XML parsing library and an also great feed parser library. It's slick as Ruby and.. well, it has everything I could praise for. The script took like 10 minutes to write and 20 for fine-tuning it (wasted figuring out which would be the most common exceptions and handle them).

Let the era of the cooperation and interoperability commence!

If you're now wondering why I talked about SOA (this had nothing to do with SOA maybe except for the fact of also parsing XML) it was simply for introducing the cooperation concept.


3 Responses to “Python versus plus Ruby”

  1. Nuno Mariz
    Published at March 16th, 2007 at 10:07 am

    Welcome to the new world(Python) ;)
    You also have the ElementTree package, take a look in http://effbot.org/zone/element-index.htm

  2. JP Antunes
    Published at March 16th, 2007 at 1:20 pm

    hi Mario,

    i found this article that might be helpful. http://nubyonrails.com/articles/2007/02/19/the-poor-mans-feed-cacher

  3. mlopes
    Published at March 16th, 2007 at 5:07 pm

    Nuno,

    Python is amazing and I never said otherwise :-). The first time I heard about Python was (let me think…) 8 years ago! I didn’t really pay that much attention by that time but I definitely got stuck with that.

    Things change when talking about frameworks. Django still has a lot to eat to reach Rails.. ;-) (kidding! Django is an amazing framework too as I’ve previously told you at FEUP).

    JP Antunes,

    That is really helpful but I don’t think it applies to my specific case. I won’t be building feeds but rather parsing them.