Tuesday, May 30, 2006

Windowing Clauses

Answer: "Analytic Functions"

Question: "I have become quite comfortable, perhaps even adept, with basic SQL queries. I am ready to learn how to do more with my data. What do you recommend I study?"

Given how often I write about analytic functions, my answer may not come as a surprise. To those who have been studying along with me, I would like to add one more tool to our toolbelts. Following up on my recent article about finding nearby rows, I would like to touch on windowing clauses.

Life Without Windowing Clauses

Quick, find me the average of a value for all rows in a table. No problem, right? Use the analytic function AVG.

Note: Sample data can be found at the end of the article.

SELECT AVG(votes) FROM elections;

AVG(VOTES)
----------------------
357315.2307692307692307692307692307692308

Great. Now, find me that average only for a certain range of data, say from elections in 1990s only. Again, no problem.

SELECT AVG(votes) FROM elections WHERE election BETWEEN 1990 and 1999;

AVG(VOTES)
----------------------
461947.5

Ok. Now, for each row, find me the average from a 10 year range before and after.

Um... errr...

How about the simply the average including the preceding and succeeding rows?

Um ... errr ... what was the link to your article on finding nearby rows again?

To the Rescue

Well you can relax. I am about to show you how to answer these questions, and others like it. We'll answer these questions using windowing clauses, which are the integral part of analytic functions that define the set of rows you wish to work with.

First let's review analytic functions. You can go read Chapter 6 of the SQL Reference, but generally, an analytic function comes in the following form:

ANALYTIC FUNCTION (ARGUMENT) OVER (QUERY PARTITION CLAUSE, ORDER BY CLAUSE, WINDOWING CLAUSE);

Using windowing clauses, you can define which row to start with, which row to end with including, if you wish, in reference to a specific row (either by a range, or row number).

Now to let you off the hook. Here is an example of how to use windowing clauses to determine the average of a column for a 10-year range, and an average including the rows immediately before and after.

SELECT election, leader, votes,
CEIL(AVG(votes) OVER (ORDER BY election RANGE BETWEEN 10 preceding AND 10 following)) ten_year,
CEIL(AVG(votes) OVER (ORDER BY election ROWS BETWEEN 1 preceding AND 1 following)) before_after
FROM elections;

ELECTION LEADER       VOTES        TEN_YEAR     BEFORE_AFTER           
-------- ------------ ------------ ------------ -------------
1959 Kirby 98730 93184 75004
1963 Harradance 51278 144122 93184
1967 Lougheed 129544 189250 159252
1971 Lougheed 296934 251124 265414
1975 Lougheed 369764 358565 358265
1979 Lougheed 408097 399552 455449
1982 Lougheed 588485 420075 454455
1986 Getty 366783 434118 440838
1989 Getty 367244 442418 391336
1993 Klein 439981 457035 430380
1997 Klein 483914 467097 517049
2001 Klein 627252 492060 509420
2004 Klein 417092 509420 522172


Armed with the knowledge of the windowing clause, we are finally beginning to unleash the full power of analytic functions. With that in mind, a closer look at the other two types of clauses are in order.

Query Partition Clause

Strictly speaking, the windowing clause is not the only way to define the set of rows you wish to work with. The query partition clause can achieve that same goal by grouping your data on a single, specific value (or set of values).

For example, say we wanted the average number of votes by leader. No problem using GROUP BY, right?

SELECT leader, AVG(votes) avg_votes FROM elections GROUP BY leader;

LEADER                           AVG_VOTES              
-------------------------------- ----------------------
Getty 367013.5
Harradance 51278
Kirby 98730
Klein 492059.75
Lougheed 358564.8


By using the partitioning clause we can achieve the same result, but use it in other ways. For example, to compare the average to every row.

SELECT election, leader, votes,
CEIL(votes - AVG (votes) OVER (PARTITION BY leader)) diff
FROM elections ORDER BY diff desc;

ELECTION LEADER        VOTES     DIFF                   
-------- ------------- --------- ---------
1982 Lougheed 588485 229921
2001 Klein 627252 135193
1979 Lougheed 408097 49533
1975 Lougheed 369764 11200
1989 Getty 367244 231
1963 Harradance 51278 0
1959 Kirby 98730 0
1986 Getty 366783 -230
1997 Klein 483914 -8145
1993 Klein 439981 -52078
1971 Lougheed 296934 -61630
2004 Klein 417092 -74967
1967 Lougheed 129544 -229020


Putting the partitioning clause and the windowing clause together is like mixing chocolate and peanut butter. Alone, they're great, together they're even better. We could solve such complex problems as "For each row, show me the average that includes the immediately preceding and succeeding rows that share the same id."

Further Reading

Other than the SQL Reference, and Chapter 19 "SQL for Analysis" of the Data Warehousing Guide, my primary sources include Chapter 12 of Tom Kyte's "Expert One-on-One Oracle" and Daniel Morgan's on-line reference.

Beyond that, there are a lot of fine articles on specific applications. For beginners, I would review an anonymous contribution to Howard Rogers' wiki introduction to analytic functions, which also includes a great example.

I will close with the final ingredient to analytic functions, the order by clause.

Creating Rolling Averages

By using the order by clause within the context of the analytic function, we can create rolling averages.

SELECT election, leader, votes,
CEIL(AVG (votes) OVER (ORDER BY election)) rolling_avg
FROM elections;

ELECTION LEADER        VOTES      ROLLING_AVG              
-------- ------------- ---------- -------------
1959 Kirby 98730 98730
1963 Harradance 51278 75004
1967 Lougheed 129544 93184
1971 Lougheed 296934 144122
1975 Lougheed 369764 189250
1979 Lougheed 408097 225725
1982 Lougheed 588485 277548
1986 Getty 366783 288702
1989 Getty 367244 297429
1993 Klein 439981 311684
1997 Klein 483914 327342
2001 Klein 627252 352334
2004 Klein 417092 357316


--

Sample data used in this article:

CREATE TABLE elections (election NUMBER(4), party VARCHAR2(32), leader VARCHAR2(32), seats NUMBER(2), votes NUMBER(7), tot_seats NUMBER(2), tot_votes NUMBER(7));

INSERT INTO elections VALUES (2004, 'PC', 'Klein', 61, 417092, 83, 890700);
INSERT INTO elections VALUES (2001, 'PC', 'Klein', 74, 627252, 83, 1013152);
INSERT INTO elections VALUES (1997, 'PC', 'Klein', 63, 483914, 83, 845713);
INSERT INTO elections VALUES (1993, 'PC', 'Klein', 51, 439981, 83, 989025);
INSERT INTO elections VALUES (1989, 'PC', 'Getty', 59, 367244, 83, 829189);
INSERT INTO elections VALUES (1986, 'PC', 'Getty', 61, 366783, 83, 713654);
INSERT INTO elections VALUES (1982, 'PC', 'Lougheed', 75, 588485, 79, 944936);
INSERT INTO elections VALUES (1979, 'PC', 'Lougheed', 74, 408097, 79, 710963);
INSERT INTO elections VALUES (1975, 'PC', 'Lougheed', 69, 369764, 75, 590200);
INSERT INTO elections VALUES (1971, 'PC', 'Lougheed', 49, 296934, 75, 639862);
INSERT INTO elections VALUES (1967, 'PC', 'Lougheed', 6, 129544, 65, 498351);
INSERT INTO elections VALUES (1963, 'PC', 'Harradance', 0, 51278, 63, 403444);
INSERT INTO elections VALUES (1959, 'PC', 'Kirby', 1, 98730, 65, 413516);

Comments:
ahh thanks, that certainly did help me a lot
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?