Tuesday, October 04, 2005

Using DECODE to exploit COUNT/NULL feature

Not long ago, I mentioned that if you do a COUNT on a column (as opposed to on * or a constant), the result will not include rows that have a NULL value for that column.

http://thinkoracle.blogspot.com/2005/09/nulls-in-count.html

Apparently you can exploit this situation, by using DECODE, and improve the efficiency of your queries.

http://www.akadia.com/services/ora_decode.html

I found that very interesting. However, I had to check it for myself. Why? Because this assertion may be:
1. Applicable only to a different version than mine (9.2.0.6).
2. Misunderstood by me.
3. Just plain wrong.

Note: My first attempt at this was a case of #2, therefore this article was updated after my error was kindly pointed out by Gary Myers

Here is the test I ran which confirmed the results.

CREATE TABLE ReallyBigTable AS SELECT * FROM all_objects;

**Note: I did this about 15-20 times to get a really big table.

ALTER SESSION SET SQL_TRACE = TRUE;

SELECT rbt1.the_count, rbt1.the_sum,
rbt2.the_count, rbt2.the_sum FROM
(SELECT COUNT(*) the_count, SUM(object_id) the_sum
FROM ReallyBigTable WHERE object_type='TABLE') rbt1,
(SELECT COUNT(*) the_count, SUM(object_id) the_sum
FROM ReallyBigTable WHERE object_type='INDEX') rbt2;

ALTER SESSION SET SQL_TRACE = FALSE;

TKPROF robert_ora_2236.trc robert_ora_2236.prf explain='sys/******** as sysdba'

ALTER SESSION SET SQL_TRACE = TRUE;

SELECT COUNT(DECODE(object_type,'TABLE','*',NULL)) the_count,
SUM(DECODE(object_type,'TABLE',object_id,NULL)) the_sum,
COUNT(DECODE(object_type,'INDEX','*',NULL)) the_count2,
SUM(DECODE(object_type,'INDEX',object_id,NULL)) the_sum2
FROM ReallyBigTable;

ALTER SESSION SET SQL_TRACE = FALSE;

TKPROF robert_ora_2416.trc robert_ora_2416.prf explain='sys/******** as sysdba'

The DECODE version completed twice as quickly, because it only need to make one pass through the data instead of 2.

Excerpt of the results:

SELECT rbt1.the_count, rbt1.the_sum, 
rbt2.the_count, rbt2.the_sum FROM
(SELECT COUNT(*) the_count, SUM(object_id)
the_sum FROM ReallyBigTable
WHERE object_type='TABLE') rbt1,
(SELECT COUNT(*) the_count, SUM(object_id)
the_sum FROM ReallyBigTable
WHERE object_type='INDEX') rbt2

call count cpu elapsed disk query
------- ------ -------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0
Execute 1 0.00 0.00 0 0
Fetch 2 0.64 2.97 16994 17542
------- ------ -------- ---------- ---------- ----------
total 4 0.64 2.97 16994 17542

SELECT COUNT(DECODE(object_type,'TABLE','*',NULL)) the_count,
SUM(DECODE(object_type,'TABLE',object_id,NULL)) the_sum,
COUNT(DECODE(object_type,'INDEX','*',NULL)) the_count2,
SUM(DECODE(object_type,'INDEX',object_id,NULL)) the_sum2
FROM ReallyBigTable

call count cpu elapsed disk query
------- ------ -------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0
Execute 1 0.00 0.00 0 0
Fetch 2 0.60 1.76 8498 8771
------- ------ -------- ---------- ---------- ----------
total 4 0.60 1.76 8498 8771

Comments:
I think the implication of
http://www.akadia.com/services/ora_decode.html
was that, rather than running two queries with similar predicates, you can just run one, using decodes (or CASE) to differentiate the results into separate columns.

Your test could be changed to measure
SELECT COUNT(*), SUM(object_id) FROM ReallyBigTable
WHERE object_type='TABLE'
and
SELECT COUNT(*), SUM(object_id) FROM ReallyBigTable
WHERE object_type='INDEX'

against

SELECT COUNT(DECODE(object_type,'TABLE','*',NULL)) tab_count,
SUM(DECODE(object_type,'TABLE',object_id,NULL)) tab_sum
COUNT(DECODE(object_type,'INDEX','*',NULL)) ind_count,
SUM(DECODE(object_type,'INDEX',object_id,NULL)) ind_sum
FROM ReallyBigTable
where object_type in ('TABLE','INDEX')

What surprised me about your results is the increase in disk. I'd expect both to do a full scan, and so come out the same there.
 
Then I guess this was a case of #2 (misunderstood!)

Expect the corrected version of that post very shortly and thanks for the help Gary and Doug.
 
Thanks Doug:

1. There was no practical business purpose to this example, I just wanted to SUM something, and object_id was the first numerical column I saw.

2. I don't like to "rely" on things (like the NULL being the default case) unless I have to. This makes it (1) obvious what the default case is, and (2) makes it more resilient to changes in future versions. Just personal perference.

3. I like DECODE. :)
 
Ha, I see you have changed the test cases. I have been meaning to comment on the fact that the akadia article was to show that you can use DECODE to do a single pass of the data instead of multiple passes via multiple SELECTS (as Gary mentioned) in which case, you would expect the DECODE (single pass) to be more efficient.

Good job.
 
This comment has been removed by a blog administrator.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?