Wednesday, August 17, 2005

Keeping Tables Small

David Aldridge recently put together a list of ways to make table scans faster:

http://oraclesponge.blogspot.com/2005/08/list-ways-to-scan-table-faster.html

There is one really simple, effective but often overlooked way of decreasing the time it takes to scan a table that I want to elaborate on. It's based on this simple premise:

Fewer Rows = Faster Scans.

The time it takes to scan a table can be roughly calculated like so:
Number of Rows TIMES Time to Process One Row

Many people focus on the second part of the question (the time it takes to process a single row), when sometimes it is far more simple to decrease the number of rows.

What that means is, look at the table and determine if some of the data is used infrequently enough that it can be moved aside to an archived version of this table.

You may need to modify your existing applications to account for cases where you truly do need to look at all that data. In that case, you might keep a view of all data.

That is basically my point, so you can stop reading there if you wish. I have put together an example, where I elaborate on this a little further. It is meant only as a demonstration of what I mean, not any kind of compelling argument or proof.

First off, here is an easy way to create a nice, big test table.

CREATE TABLE ReallyBigTable AS SELECT * FROM ALL_OBJECTS;

SELECT COUNT (*) FROM ReallyBigTable;

40763

Now I'm going to create a query that ought to really suffer, performance-wise, the larger the table is:

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:11:42.09

Alright, now let's create an archive and active version of this table. For argument's sake, let's say that anything with an object_id below 40000 is used infrequently enough that it is safe to push aside for our business-specific purposes:

CREATE TABLE ReallyBigTable_Archive AS SELECT * FROM ReallyBigTable
WHERE object_id < 40000;

SELECT COUNT (*) FROM ReallyBigTable_Archive;

38000

CREATE TABLE ReallyBigTable_Active AS SELECT * FROM ReallyBigTable
WHERE object_id >= 40000;

SELECT COUNT (*) FROM ReallyBigTable_Active;

2763

I'm in a single-user environment doing a test, so I really don't have to worry right now about updates taking place. Otherwise we'd have to more clever about the creation of our archive and active tables. But I will say that creating these tables can be lightning-fast!

Ok, now let's try our query on our little table:

SELECT SUM(object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable_Active);

Time Elapsed: 00:00:02.08

So it completed in 0.3% of the time. Awesome!

"OK," you say, "but you're cheating!" So what if I am? Sometimes in the real world we CAN cheat. Sometimes we don't have to go through all the data, every time, every case. Don't get mad at Oracle or your DBA or your application when it's not fast enough, just get creative. You don't always have to apply painful Kolkian techniques or Millsapian analysis to get results.

Moving on, there may still be some applications that want to query on the whole dataset. No problem, that is still possible using UNION. But what kind of price will we pay? Let's see, in this one example:

DROP TABLE ReallyBigTable;

CREATE VIEW ReallyBigTable AS
SELECT * FROM ReallyBigTable_Archive
UNION ALL
SELECT * FROM ReallyBigTable_Active;

Remember, a view is just like a stored query, but we can use it like a table. Which means our applications don't have to change their queries.

http://thinkoracle.blogspot.com/2005/07/use-views.html

One limitation of this kind of view is that we won't be able to insert from our legacy applications anymore:

INSERT INTO ReallyBigTable (object_id) VALUES (NULL);

ORA-01732: data manipulation operation not legal on this view

Let's see how much of a performance penalty we have in this case by using a view instead of the original base table:

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:08:18.02

What??? The use of a view on two smaller sub-tables took less time than the original table? It only took 70% as much time. Probably something to do with having the data in memory, or statistics, or something like that. I'll get back to you in another blog. The main thing I wanted to demonstrate was that the performance would be in the same "order" as the original query. That is, you're not taking a huge performance hit by using a view.

Out of curiousity, let's take a mixed sub-example, where we are using our new, smaller table against that big view.

SELECT SUM (object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:00:43.05

Nice.

One final point, if I may. Notice I used "UNION ALL" instead of "UNION." I did this because UNION rearranges the order of the table, and removes duplicates.

http://thinkoracle.blogspot.com/2005/08/union-all.html

I don't care about duplicates and order, and I didn't want the performance hit, so I used UNION ALL. Of course, I am curious to know how significant the performance hit is in this case. Let's see.

CREATE OR REPLACE VIEW ReallyBigTable AS
SELECT * FROM ReallyBigTable_Archive
UNION
SELECT * FROM ReallyBigTable_Active;

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:13:20.02

So it took about 60% longer than with UNION ALL.

SELECT SUM (object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:00:54.05

And this one about 25% longer. So in this case it was definitely noticeable.

Wrap-up:
1. David Aldridge has a great article on how to make your table scans go faster.
2. In addition to his techniques, I recommend "being creative" (aka "cheating") and trying to split your data.
3. Use Views to maintain applications that SELECT from the big, original table so you don't have to change code and can still query the whole set.
4. Using Views will not have a big performance hit. In fact, in this demonstration it was faster (a topic for another day's blog).
5. Use UNION ALL instead of UNION in this view to avoid the noticeable performance hit.

Comments:
Closer to partitioning than you might think, Doug ... in fact this is how partitioning was implemented in Oracle 7 before partitioned tables existed, through partitioned views (aka "manual partitioning"). Here's where they're documented: http://download-west.oracle.com/docs/cd/A57673_01/DOC/server/doc/A48506/partview.htm#238

You'll see some extra comments in there that show how partition elimination was implemented and how you know whether the optimizer recognizes operations against partitioned views.

The technique is deprecated now in favour of the (much more expensive and efficient) partitioning option, but may still be viable for those on a budget.
 
Yes, I wouldn't like to be fiddling around like that if I regularly could use partitioning. For a one-off event though it probably wouldn't be too bad.
 
This comment has been removed by a blog administrator.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?