Monday, April 03, 2006
Pivot Queries Using Variable Number of Columns
Pivot queries (also known as crosstab queries) are a special kind of query that can turn rows into columns for special reporting needs. For example, if you have a table/view with three columns (employee name, year, and salary), and you need a view that shows employee name and salary over the years (as a single row), then a pivot query is what you want. Essentially, you want to "pivot" the year column from being rows to being a column.
I presented a simple, introductory example last September that used Canadian Football League teams and their points per season as an example. That example worked fine at the time, but what happened at the end of the CFL season? You entered some new rows of data to reflect the results of the 2005 season, but the query is hard-coded to show only 2002-2004 seasons. What should you do now? Re-write your query after every season? Fortunately, there is a better way. That is why today I would like to expand on this example to show you how to write a pivot query in the common case where you have an undefined number of columns (in this case, seasons/years).
First of all, let's add the 2005 season to our data. Also remember why I like to name my columns when I insert (or call stored procedures):
INSERT INTO CFL (season, team, points) VALUES (2005, 'Argonauts', 22);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Alouettes', 20);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Renegades', 14);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Tiger-Cats', 10);
Our problem is that we don't know how many columns we need when we write the query. We only know how many columns we'll need at the time the query is executed. That sounds like a job for dynamic SQL which, along with REF CURSORs, can build an appropriate query at the time we call it. That technique is described in detail by Tom Kyte in his book Expert One-on-One Oracle, in Chapter 12 on Analytic Functions. I will apply this technique to our specific problem in an upcoming article.
Tom describes another way to get around this issue on his famous AskTom website. You can leave the seasons as rows (allowing you to have as few or as many as you need), and pivot the non-season columns instead. His solution leads us to another way of tackling this "I don't know how many columns I'll need" problem. Make season/points a single column.
Solution #1: Leave Seasons as rows and pivot the Team column.
CREATE OR REPLACE TYPE Teams AS OBJECT
(team VARCHAR2(16), points NUMBER(3));
CREATE OR REPLACE TYPE Seasons AS TABLE OF Teams;
SELECT c1.season,
CAST(MULTISET(SELECT c2.team, sum(c2.points) tot_points
FROM CFL c2 WHERE c1.season = c2.season
GROUP BY c2.team) AS Seasons) Standings
FROM CFL c1
GROUP BY season;
SEASON
----------
STANDINGS(TEAM, POINTS)
--------------------------------------------------------------------------------
2002
SEASONS(TEAMS('Alouettes', 27), TEAMS('Argonauts', 16), TEAMS('Renegades', 10), TEAMS('Tiger-Cats', 15))
2003
SEASONS(TEAMS('Alouettes', 26), TEAMS('Argonauts', 18), TEAMS('Renegades', 14), TEAMS('Tiger-Cats', 2))
2004
SEASON
----------
STANDINGS(TEAM, POINTS)
--------------------------------------------------------------------------------
SEASONS(TEAMS('Alouettes', 28), TEAMS('Argonauts', 21), TEAMS('Renegades', 10), TEAMS('Tiger-Cats', 19))
2005
SEASONS(TEAMS('Alouettes', 20), TEAMS('Argonauts', 22), TEAMS('Renegades', 14), TEAMS('Tiger-Cats', 10))
Solution #2: Pivot Seasons, just like we did before, and create an array for seasons and points.
CREATE OR REPLACE TYPE SeasonResult AS OBJECT
(season NUMBER(4), points NUMBER(3));
CREATE OR REPLACE TYPE TeamResult AS TABLE OF SeasonResult;
SELECT c1.team,
CAST(MULTISET(SELECT c2.season, sum(c2.points) tot_points
FROM CFL c2 WHERE c1.team = c2.team
GROUP BY c2.season) AS TeamResult) Results
FROM CFL c1
GROUP BY team;
TEAM
----------------
RESULTS(SEASON, POINTS)
--------------------------------------------------------------------------------
Alouettes
TEAMRESULT(SEASONRESULT(2002, 27), SEASONRESULT(2003, 26), SEASONRESULT(2004, 28), SEASONRESULT(2005, 20))
Argonauts
TEAMRESULT(SEASONRESULT(2002, 16), SEASONRESULT(2003, 18), SEASONRESULT(2004, 21), SEASONRESULT(2005, 22))
Renegades
TEAM
----------------
RESULTS(SEASON, POINTS)
--------------------------------------------------------------------------------
TEAMRESULT(SEASONRESULT(2002, 10), SEASONRESULT(2003, 14), SEASONRESULT(2004, 10), SEASONRESULT(2005, 14))
Tiger-Cats
TEAMRESULT(SEASONRESULT(2002, 15), SEASONRESULT(2003, 2), SEASONRESULT(2004, 19), SEASONRESULT(2005, 10))
There are two consequences of this approach that are less than ideal. The query itself is a little bit complicated, involves creating types and, secondly, if we made a view from this query, its harder to use. (Example: Query that view to get me the season record for each team). Fortunately there is a simple change we can make that gives us a query as simple as our original example, and where we can create a view that can be more easily used for other queries.
Before I tell you what it is, observe one thing about Tom's solution (#1). In the simpler example where we knew how many columns (seasons) we'd need, we pivoted the seasons. In this example, where we didn't know how many seasons we had, we pivoted the other columns (team, points).
We are in the fortunate position where we know how many teams we have. Why not pivot "team" column, and then do it largely like our simpler example?
CREATE OR REPLACE VIEW CFLBySeason AS
SELECT season,
MAX(DECODE(team, 'Argonauts', points, NULL)) Argonauts,
MAX(DECODE(team, 'Alouettes', points, NULL)) Alouettes,
MAX(DECODE(team, 'Renegades', points, NULL)) Renegades,
MAX(DECODE(team, 'Tiger-Cats', points, NULL)) TigerCats
FROM CFL
GROUP BY season;
SELECT * FROM CFLBySeason;
Not only do we have a simpler query, but we can leverage the power of views to keep things simple. By turning this query into a view (called, for example, CFLBySeason), we can address other requirements very easily.
Example:
1. Show me the most points any team has had in any one season. And it would be simple enough to change this example to show total and/or average season performance as well.
SELECT MAX(Argonauts) Argonauts,
MAX(Alouettes) Alouettes,
MAX(Renegades) Renegades,
MAX(TigerCats) TigerCats
FROM CFLBySeason;
2. Show me how lopsided the division was each season
SELECT Season,
(Argonauts + Alouettes) TorMtl,
(Renegades + TigerCats) OttHam
FROM CFLBySeason;
3. I only care about Toronto. I want to see their season-by-season point totals.
SELECT Season, Argonauts FROM CFLBySeason;
I will admit that pivot queries can get complicated, but given how often they are the easiest solution to complex requirements, it is worth the investment to understand them.
I presented a simple, introductory example last September that used Canadian Football League teams and their points per season as an example. That example worked fine at the time, but what happened at the end of the CFL season? You entered some new rows of data to reflect the results of the 2005 season, but the query is hard-coded to show only 2002-2004 seasons. What should you do now? Re-write your query after every season? Fortunately, there is a better way. That is why today I would like to expand on this example to show you how to write a pivot query in the common case where you have an undefined number of columns (in this case, seasons/years).
First of all, let's add the 2005 season to our data. Also remember why I like to name my columns when I insert (or call stored procedures):
INSERT INTO CFL (season, team, points) VALUES (2005, 'Argonauts', 22);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Alouettes', 20);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Renegades', 14);
INSERT INTO CFL (season, team, points) VALUES (2005, 'Tiger-Cats', 10);
Our problem is that we don't know how many columns we need when we write the query. We only know how many columns we'll need at the time the query is executed. That sounds like a job for dynamic SQL which, along with REF CURSORs, can build an appropriate query at the time we call it. That technique is described in detail by Tom Kyte in his book Expert One-on-One Oracle, in Chapter 12 on Analytic Functions. I will apply this technique to our specific problem in an upcoming article.
Tom describes another way to get around this issue on his famous AskTom website. You can leave the seasons as rows (allowing you to have as few or as many as you need), and pivot the non-season columns instead. His solution leads us to another way of tackling this "I don't know how many columns I'll need" problem. Make season/points a single column.
Solution #1: Leave Seasons as rows and pivot the Team column.
CREATE OR REPLACE TYPE Teams AS OBJECT
(team VARCHAR2(16), points NUMBER(3));
CREATE OR REPLACE TYPE Seasons AS TABLE OF Teams;
SELECT c1.season,
CAST(MULTISET(SELECT c2.team, sum(c2.points) tot_points
FROM CFL c2 WHERE c1.season = c2.season
GROUP BY c2.team) AS Seasons) Standings
FROM CFL c1
GROUP BY season;
SEASON
----------
STANDINGS(TEAM, POINTS)
--------------------------------------------------------------------------------
2002
SEASONS(TEAMS('Alouettes', 27), TEAMS('Argonauts', 16), TEAMS('Renegades', 10), TEAMS('Tiger-Cats', 15))
2003
SEASONS(TEAMS('Alouettes', 26), TEAMS('Argonauts', 18), TEAMS('Renegades', 14), TEAMS('Tiger-Cats', 2))
2004
SEASON
----------
STANDINGS(TEAM, POINTS)
--------------------------------------------------------------------------------
SEASONS(TEAMS('Alouettes', 28), TEAMS('Argonauts', 21), TEAMS('Renegades', 10), TEAMS('Tiger-Cats', 19))
2005
SEASONS(TEAMS('Alouettes', 20), TEAMS('Argonauts', 22), TEAMS('Renegades', 14), TEAMS('Tiger-Cats', 10))
Solution #2: Pivot Seasons, just like we did before, and create an array for seasons and points.
CREATE OR REPLACE TYPE SeasonResult AS OBJECT
(season NUMBER(4), points NUMBER(3));
CREATE OR REPLACE TYPE TeamResult AS TABLE OF SeasonResult;
SELECT c1.team,
CAST(MULTISET(SELECT c2.season, sum(c2.points) tot_points
FROM CFL c2 WHERE c1.team = c2.team
GROUP BY c2.season) AS TeamResult) Results
FROM CFL c1
GROUP BY team;
TEAM
----------------
RESULTS(SEASON, POINTS)
--------------------------------------------------------------------------------
Alouettes
TEAMRESULT(SEASONRESULT(2002, 27), SEASONRESULT(2003, 26), SEASONRESULT(2004, 28), SEASONRESULT(2005, 20))
Argonauts
TEAMRESULT(SEASONRESULT(2002, 16), SEASONRESULT(2003, 18), SEASONRESULT(2004, 21), SEASONRESULT(2005, 22))
Renegades
TEAM
----------------
RESULTS(SEASON, POINTS)
--------------------------------------------------------------------------------
TEAMRESULT(SEASONRESULT(2002, 10), SEASONRESULT(2003, 14), SEASONRESULT(2004, 10), SEASONRESULT(2005, 14))
Tiger-Cats
TEAMRESULT(SEASONRESULT(2002, 15), SEASONRESULT(2003, 2), SEASONRESULT(2004, 19), SEASONRESULT(2005, 10))
There are two consequences of this approach that are less than ideal. The query itself is a little bit complicated, involves creating types and, secondly, if we made a view from this query, its harder to use. (Example: Query that view to get me the season record for each team). Fortunately there is a simple change we can make that gives us a query as simple as our original example, and where we can create a view that can be more easily used for other queries.
Before I tell you what it is, observe one thing about Tom's solution (#1). In the simpler example where we knew how many columns (seasons) we'd need, we pivoted the seasons. In this example, where we didn't know how many seasons we had, we pivoted the other columns (team, points).
We are in the fortunate position where we know how many teams we have. Why not pivot "team" column, and then do it largely like our simpler example?
CREATE OR REPLACE VIEW CFLBySeason AS
SELECT season,
MAX(DECODE(team, 'Argonauts', points, NULL)) Argonauts,
MAX(DECODE(team, 'Alouettes', points, NULL)) Alouettes,
MAX(DECODE(team, 'Renegades', points, NULL)) Renegades,
MAX(DECODE(team, 'Tiger-Cats', points, NULL)) TigerCats
FROM CFL
GROUP BY season;
SELECT * FROM CFLBySeason;
SEASON ARGONAUTS ALOUETTES RENEGADES TIGERCATS
---------- ---------- ---------- ---------- ----------
2002 16 27 10 15
2003 18 26 14 2
2004 21 28 10 19
2005 22 20 14 10
Not only do we have a simpler query, but we can leverage the power of views to keep things simple. By turning this query into a view (called, for example, CFLBySeason), we can address other requirements very easily.
Example:
1. Show me the most points any team has had in any one season. And it would be simple enough to change this example to show total and/or average season performance as well.
SELECT MAX(Argonauts) Argonauts,
MAX(Alouettes) Alouettes,
MAX(Renegades) Renegades,
MAX(TigerCats) TigerCats
FROM CFLBySeason;
2. Show me how lopsided the division was each season
SELECT Season,
(Argonauts + Alouettes) TorMtl,
(Renegades + TigerCats) OttHam
FROM CFLBySeason;
3. I only care about Toronto. I want to see their season-by-season point totals.
SELECT Season, Argonauts FROM CFLBySeason;
I will admit that pivot queries can get complicated, but given how often they are the easiest solution to complex requirements, it is worth the investment to understand them.