Towards Performance
- Published January 12th, 2008 in Tips & Tricks
Just a small note I've seen repeated with panache. Database redundancy is usually not a good thing, but sometimes you have to forget about what Codd said and just duplicate information, baby.
For instance, if you want to frequently count a listing of, say, purchases, the bad approach is
-
SELECT COUNT(*) FROM purchases WHERE client_id = 1
Try running this on a 200k lines table and see how long it takes. Also, if you actually want to have 200k purchases you'll have to have a sleek and fast website and doing such a thing is not the way to go.
Now, the good approach is a lit bit of redundancy and transactions. Keep a no_purchases on the clients table and update it accordingly on CRUD operations. Just don't forget to create a transaction between the creation of the purchase and updating the field, otherwise you might end your day with incongruent results, lest the system has a crash.
Anyhow, this is way faster than polling the purchases database each time you want the count.
Update: JP Antunes went a little bit further and actually suggested decoupling business logic from data and create a trigger to update the no_purchases. This actually a good thing to do. I didn't suggest it because until very recently, some databases didn't support triggers (like MySQL <5.0)




“Keep a no_purchases on the clients table”
Did you ate a word there? I’m guessing you’re speaking of creating view for each state of client/purchase relation. Am i correct?
JP Antunes, no, I’m referring to having a column named “no_purchases” (number of purchases) on the clients table.
Ok… and this will help you in what way?
I mean,for one, a transaction will create a row lock on the table and it’s an unnecessary hit on the DB. Although I read the “forget about what Codd” part of your post, do you really think it’s a good idea to keep business logic on the clients table? Aren’t you sacrificing too much for a false sense of performance gain?
I would create a view(or more…) that gets updated by a trigger on the purchase table. I’m guessing your approach is using the DB only as data storage medium and that doesn’t fly too well in 200k plus record tables.
It calls database denormalization, and in my opinion is a good practice.
I’m using it in all of my projects, even though Postgresql performs well in complex queries.
A post-insert trigger would avoid the hassle of having a multi-statement transaction (insert + update).
@mariz
Yes, I think we all get what it is. It’s just a poor example and that’s what I’m criticizing. You should, to the best of my knowledge, use views not add columns to your table. Seemingly you shouldn’t mix business with data (purchases and clients) tables.
@JP Antunes
You are right about the mix business with data, even though a trigger can solve the thing.
In this case, views don’t solve the performance issue. And denormalization shouldn’t always be the best solution, but in this case I think it should.
JP Antunes, with the view you still have the performance penalty.
The other viable option is, indeed, a trigger. But when I thought about the problem I tried to give a solution that would work regardless of the database engine. Until very recently, MySQL didn’t have any triggers (< 5.0). Other well known databases still don’t have triggers.
Finally, as for mixing business logic with the database, it’s the price to pay to gain some extra few miles of performance. When you create a stored procedure you’re most certainly mixing up a little bit of business logic as well. But that’s life, right? :-)
mlopes, I think you’re wrong. The cost of a query to a view is lower than to a table in most DB engines, although I’m not sure if this is true in either of MySQL’s engines to be honest. I mean, you can’t just evaluate performance on the number of queries made, right? And then there is the whole clustered index thingie that some engines support…
Actually, I believe all major DB support triggers now. Even SQLite! :o)
In a stored procedure (not sure how we got to SPs here) you could definitely find business logic. In fact I had a hand in a major app (read: deployed in several hundreds of desktops) whose business logic was mostly in MSSQL Server SPs. But in no moment business gets mixed in data _tables_.
JP Antunes, actually, AFAIK, a direct query to a table is faster than a query to a view in most, if not all, the database engines. This is because the view encapsulates the SELECT.
As for triggers, I checked on Wikipedia and all the major known databases support triggers, so I think it’s the way to go.
Honestly, I don’t see a major hassle for having a global counter on the parent table. Wordpress does that with the comments. The actual number of comments is stored on the article page! There are worst things that mix up business logic and data!
@JP Antunes: “The cost of a query to a view is lower than to a table in most DB engines”
Care to explain? I don’t see how that can happen. A view is just a logical representation of a dataset and every query in a view needs to be executed in the source table(s), so how can the query in the view be faster than the query in the table itself?
Unless you are talking about materialized views, but that is a completely different story from regular views.
um motor q
@vitor: well, that statement came from experience and I can’t really justify it without giving an example. Let’s say you want a report of the top ten selling products grouped by category and some sort of evaluation of cost-of-sale (or the like). Would you say that it would be better querying several tables and performing aggregation operation s and (let’s say) a complicated data-analysis function, or querying a previously created view ?
Well, I would go with a view, but given your comment the next time I need something like this I’ll try to benchmark it, maybe your right.