Tuesday, February 23, 2010

Improving your SQL Queries

Greetings! You probably found this page either through a Search Engine, an aggregator that doesn't delete blogs for inactivity, or possibly from a site with a very long blogroll, so let me introduce myself to you.

My name is Robert Vollman. Those who were active in the on-line Oracle community from mid-2005 until mid-2007 may remember me either from this site or sites like it, from the Oracle Technology Network, DBASupport.com forums, or the Dizwell forum.

For those who joined our community more recently, I'm essentially your typical database specialist, and one who enjoys both learning more about Oracle, and in sharing that knowledge with others. I am not Oracle certified, but I've been working with Oracle (and other databases) ever since Oracle 8 about 10 years ago, and most recently I was Technical Editor on Alice Rischert's Oracle SQL By Example, 4th Edition. If you'd like to know more, I've kept all of my old articles, so I encourage you to peruse my archive.

Recently I've been of service to some of my colleagues who have been writing SQL Queries. Based on their feedback, I have helped them write faster-performing queries, in a shorter period of time, with fewer mistakes, and in such a way that maintenance was simpler. If you have found any use in what I'm about to present, you can thank them for encouraging me to dust off my blog and make another contribution to the community. Let's begin!

1. Document Clear Requirements

Some database analysts I've known have actually refused to even begin until they're given clear, documented requirements about what information is to be retrieved, and in what format.

While some of you might not be in a position to put your foot down quite so firmly, I think it's reasonable for you to insist that a certain amount of time is spent gathering and documenting requirements so that both you and those that come later know exactly what your SQL Query is/was meant to do.

Taking the time to get a solid understanding of your goal will help you carefully design your work before you begin, simultaneously reducing development time and decreasing the likelihood of errors.

2. Don't Re-invent the Wheel

Think carefully about the requirements you've been given. Unless they're really unusual, it's very possible that someone has already both seen and solved your problem.

Whether it's a pivot or crosstab query, a hierarchical query, or a task to find nearby rows, there's likely already an easy solution available. It could involve analytical functions, or something like translate or rownum/rowid.

If you're not sure, let me give you the same advice that I give frustrated strangers that find my blog and send me emails. Break down your big problem into smaller, more manageable pieces. Once you've mastered the smaller components of the requirements, start adding it all together.

3. Reduce Complexity

I remember a time when I would use the most sophisticated approach to any problem, and go home thinking I was extremely clever. Since then I've learned that sometimes you should choose to forfeit an improvement in performance if it requires an increase in complexity.

Not everyone has as sophisticated a knowledge of databases as you do, and it's foolish to create complicated queries if it's possible to write simpler ones -- especially if there's no trade-off in performance.

For just a couple of suggestions, consider using ANSI joins instead of the old-school joins, because separating your join clauses from your where clauses might make it easier for others to read. Another idea is to use views as a means to reduce the complexity of any one particular query.

4. Handle NULLs Correctly

Perhaps the most common mistake I encounter is caused by forgetting about NULLs. If someone brings me a query that is either returning rows that it shouldn't, or failing to find rows that it should, quite often it's because the query compares a field to a set value in some fashion, and the fields are NULL for the rows in question.

Databases handle NULLs are handled atypically, something I've covered several times in the past (like here and here). Provided you have this awareness, there are a lot of ways to deal with NULLs, including explicit checks (IS NULL), or something like nvl or decode.

5. Check your Types

Most of what I know I've learned either through experience, or by taking advantage of the experience of others. One of my favourites of which has always been Tom Kyte. He taught me to always be aware of my types when making comparisons: never to compare strings to dates, nor dates to strings.

Take his advice, and beware implicit conversions of dates and numbers. For more of his advice on types, check tips 9-13 in this collection of his tips.

6. Correct handling of duplicates

The next most common error when writing SQL is failing to consider how your query will handle duplicates. Sometimes you want to keep them, sometimes you only want one.

An awareness of how duplicates should be handled not only helps you write your query correctly, but it can sometimes even lead to performance boosts. For example, if you don't mind the duplicates, use UNION ALL instead of UNION.

7. Avoid Unnecessary Tuning

Performance tuning may be fun and high-profile, but most of the time it's unnecessary. Obviously you don't want to go out of your way to write sloppy SQL, but at the same time you don't need to tune your query unless you have a performance problem.

There's no harm in adopting some handy rules of thumb. For example, select only the columns you need instead of selecting *, since there may be an index that will cover your query and get you your rows faster. But if you find yourself commenting your SQL query to override the optimizer plan, then there's a very good chance that you've gone too far, and need to take a step back and find the real problem.

And don't bend over backwards trying to avoid joins. "Joins are not evil, databases were born to join," said Tom Kyte.

8. Use Bind Variables

Failure to use bind variables is one of the red flags that stand out when someone shows me a query. If you see code like this, whether it's generated dynamically or not:

SELECT something FROM somewhere WHERE something = 1234;

Then you could have a performance problem. Why? Because the exact same query with a different value (eg 4321) will be considered a different query, will not be found in the shared pool, and will require another hard parse to generate. That may not be a problem in my simple example, but it become a problem later.

Unless there's no chance a similar query will ever be written using a different value, the correct approach would be to write something like this instead:

SELECT something FROM somewhere WHERE something = :certain_something;

Now it will be found in the shared pool, require only a soft parse, and yield faster returns.

9. Use Source Control

This one is more a matter of personal preference. Ever had a query go missing? Ever had questions about a query, but didn't know who wrote it? Ever made a change to a query, screwed something up, and want to go back?

Don't keep a lot of back-up files kicking around, and don't clutter everything with needless documentation. Find and use a good, central source control repository and organise all your queries within.

10. Test, Test, Test

The three most important words in the Database world.

Furthermore, you want to test your queries on real data, with real volumes, with multiple users, and on a system that's either in use, or has simulated use closely matching the real world. Otherwise you might miss issues that only crop up in a live, high-demand system.

Well, that wraps up my list of suggestions. I really hope you found this helpful, and I'd like to thank you for reading. As always, my comments section are open, so please feel free to add more suggestions of your own, or to comment further on some of mine.

This page is powered by Blogger. Isn't yours?