website stat

Towards Performance

Just a small note I've seen repeated with panache. Database redundancy is usually not a good thing, but sometimes you have to forget about what Codd said and just duplicate information, baby.

For instance, if you want to frequently count a listing of, say, purchases, the bad approach is

CODE:
  1. SELECT COUNT(*) FROM purchases WHERE client_id = 1

Try running this on a 200k lines table and see how long it takes. Also, if you actually want to have 200k purchases you'll have to have a sleek and fast website and doing such a thing is not the way to go.

Now, the good approach is a lit bit of redundancy and transactions. Keep a no_purchases on the clients table and update it accordingly on CRUD operations. Just don't forget to create a transaction between the creation of the purchase and updating the field, otherwise you might end your day with incongruent results, lest the system has a crash.

Anyhow, this is way faster than polling the purchases database each time you want the count.

Update: JP Antunes went a little bit further and actually suggested decoupling business logic from data and create a trigger to update the no_purchases. This actually a good thing to do. I didn't suggest it because until very recently, some databases didn't support triggers (like MySQL <5.0)


13 Responses to “Towards Performance”

  1. JP Antunes
    Published at January 12th, 2008 at 6:59 pm

    “Keep a no_purchases on the clients table”

    Did you ate a word there? I’m guessing you’re speaking of creating view for each state of client/purchase relation. Am i correct?

  2. mlopes
    Published at January 12th, 2008 at 7:06 pm

    JP Antunes, no, I’m referring to having a column named “no_purchases” (number of purchases) on the clients table.

  3. JP Antunes
    Published at January 12th, 2008 at 7:26 pm

    Ok… and this will help you in what way?

    I mean,for one, a transaction will create a row lock on the table and it’s an unnecessary hit on the DB. Although I read the “forget about what Codd” part of your post, do you really think it’s a good idea to keep business logic on the clients table? Aren’t you sacrificing too much for a false sense of performance gain?

    I would create a view(or more…) that gets updated by a trigger on the purchase table. I’m guessing your approach is using the DB only as data storage medium and that doesn’t fly too well in 200k plus record tables.

  4. Nuno Mariz
    Published at January 12th, 2008 at 8:12 pm

    It calls database denormalization, and in my opinion is a good practice.
    I’m using it in all of my projects, even though Postgresql performs well in complex queries.

  5. Vitor
    Published at January 12th, 2008 at 8:20 pm

    A post-insert trigger would avoid the hassle of having a multi-statement transaction (insert + update).

  6. JP Antunes
    Published at January 12th, 2008 at 8:59 pm

    @mariz
    Yes, I think we all get what it is. It’s just a poor example and that’s what I’m criticizing. You should, to the best of my knowledge, use views not add columns to your table. Seemingly you shouldn’t mix business with data (purchases and clients) tables.

  7. Nuno Mariz
    Published at January 12th, 2008 at 9:36 pm

    @JP Antunes
    You are right about the mix business with data, even though a trigger can solve the thing.
    In this case, views don’t solve the performance issue. And denormalization shouldn’t always be the best solution, but in this case I think it should.

  8. mlopes
    Published at January 13th, 2008 at 1:24 am

    JP Antunes, with the view you still have the performance penalty.

    The other viable option is, indeed, a trigger. But when I thought about the problem I tried to give a solution that would work regardless of the database engine. Until very recently, MySQL didn’t have any triggers (< 5.0). Other well known databases still don’t have triggers.

    Finally, as for mixing business logic with the database, it’s the price to pay to gain some extra few miles of performance. When you create a stored procedure you’re most certainly mixing up a little bit of business logic as well. But that’s life, right? :-)

  9. JP Antunes
    Published at January 13th, 2008 at 2:09 am

    mlopes, I think you’re wrong. The cost of a query to a view is lower than to a table in most DB engines, although I’m not sure if this is true in either of MySQL’s engines to be honest. I mean, you can’t just evaluate performance on the number of queries made, right? And then there is the whole clustered index thingie that some engines support…

    Actually, I believe all major DB support triggers now. Even SQLite! :o)

    In a stored procedure (not sure how we got to SPs here) you could definitely find business logic. In fact I had a hand in a major app (read: deployed in several hundreds of desktops) whose business logic was mostly in MSSQL Server SPs. But in no moment business gets mixed in data _tables_.

  10. mlopes
    Published at January 13th, 2008 at 2:34 am

    JP Antunes, actually, AFAIK, a direct query to a table is faster than a query to a view in most, if not all, the database engines. This is because the view encapsulates the SELECT.

    As for triggers, I checked on Wikipedia and all the major known databases support triggers, so I think it’s the way to go.

    Honestly, I don’t see a major hassle for having a global counter on the parent table. Wordpress does that with the comments. The actual number of comments is stored on the article page! There are worst things that mix up business logic and data!

  11. Vitor
    Published at January 13th, 2008 at 6:20 am

    @JP Antunes: “The cost of a query to a view is lower than to a table in most DB engines”

    Care to explain? I don’t see how that can happen. A view is just a logical representation of a dataset and every query in a view needs to be executed in the source table(s), so how can the query in the view be faster than the query in the table itself?

    Unless you are talking about materialized views, but that is a completely different story from regular views.

  12. ricardo
    Published at January 13th, 2008 at 1:33 pm

    um motor q

  13. JP Antunes
    Published at January 24th, 2008 at 3:56 pm

    @vitor: well, that statement came from experience and I can’t really justify it without giving an example. Let’s say you want a report of the top ten selling products grouped by category and some sort of evaluation of cost-of-sale (or the like). Would you say that it would be better querying several tables and performing aggregation operation s and (let’s say) a complicated data-analysis function, or querying a previously created view ?
    Well, I would go with a view, but given your comment the next time I need something like this I’ll try to benchmark it, maybe your right.