website stat

Ferret or Sphinx?

When looking for a full text search engine you wonder whether your database will be enough. For more than 2M records, some databases like MySQL might not handle it very well and take longer than desired (way longer if you have multiple indexes and crossover queries). PostgreSQL has some advantage here due to its built-in tsearch2 engine. But even tsearch2 might not be enough if we’re dealing with complex documents and lots of records.

So you’ve realized you need a real document full text search engine. One name pops to mind: Lucene. That’s an obvious choice if you’re using Java. If you’re not, integrating Lucene with your project is trickier.

If we’re talking about Ruby projects, other two options pop to mind: Ferret and Sphinx. Let’s spoil it and quote some authorities in the Ruby community:

We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret. ;-)

Robby Russell
Founder and Executive Director

And now from EngineYard,

Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.

Plus sphinx can reindex many many times faster then ferret and uses
less cpu and memory as well.

Cheers-
- Ezra Zygmuntowicz
– Founder & Software Architect

Ok, when something as crystal clear as these statements come from these kind of people there are very few questions remaining as to which one to choose. You probably stick to Sphinx.

But let’s make it clear why Sphinx will probably be the best choice: it talks to MySQL directly so there’s no need for you to handle the data back and forth to the search engine; it’s darn fast and efficient; it’s used on high load websites like ThePirateBay.

I’ve actually used Ferret in the past and still use in some of my projects (like blogaqui.com) and it has served me well. But these websites do not hold millions of records. The most complex queries it does is to blog posts, and I have around 500k stored in blogaqui’s database.

Sphinx it is, then.


3 Responses to “Ferret or Sphinx?”

  1. Rob Young
    Published at February 8th, 2008 at 5:22 pm

    Another popular option which is easy to integrate with is Solr which is a rest based search server which sites on top of Java Lucene. A little more work to integrate but also very cool is Xapian. Given the choice I think I would usually go for Xapian but I don’t know how well it plays with Ruby (I’m from PHP land).

  2. face
    Published at February 18th, 2008 at 10:35 am

    I recently came to the same conclusion between ferret and sphinx.

    However, has anyone used Ruby/Odeum by Zed Shaw (http://www.zedshaw.com/projects/ruby_odeum/index.html), or Estraier’s successor Hyper Estraier (http://hyperestraier.sourceforge.net/). I like Zed’s work but all my googling finds lots of comparisons between sphinx, ferret, or solr but not much mention or usage of Ruby/Odeum. Which surprises me as it’s been around since 2005…(perhaps that is the problem, looks like that last release was in ‘05 as well…stalled?)

  3. bob
    Published at March 22nd, 2008 at 9:49 pm

    I have used Ruby/Odeum in an experimental, research setting (i.e a non-production environment). It is very easy to use and fast. However, I am looking at alternatives be/c Ruby/Odeum appears to have no evidence of real production use (and ZS seems to have exited the Ruby world). QDBM, which Ruby/Odeum is built on, doesn’t seem to have much activity either.