Monday, August 29, 2005

New Blogs

Not long ago, I wrote a blog on all the Oracle Blogs out there:

http://thinkoracle.blogspot.com/2005/07/oracle-blogs.html

Not long after that, I included a list of links to all the blogs I follow (see the side, below links).

Well there have been a couple of changes and new additions. The most exciting of which is I found a blog by Laurent Schneider. He is one of my favourite posters from the Oracle Forum:

http://forums.oracle.com/forums/forum.jspa?forumID=75

His posts usually involve how to use Oracle to solve someone's problems. Over the months I've been reading this forum, I've seen him cover a wide spectrum.

Here is a link to one of of his most recent articles:

http://laurentschneider.blogspot.com/2005/08/pivot-table.html

There are a few other changes:

Doug Burns changed his blog's location:
http://oracledoug.blogspot.com/

Radoslav Rusinov has a new blog, which he has kicked off with a discussion of the latest Burleson Boondoggle on PGA:
http://dba-blog.blogspot.com/

Hopefully these blogs will make up for my lack of content recently!

Wednesday, August 24, 2005

COMPUTE

Let's say you wanted to write a query that contained an aggregate (eg: sum, min, max, count). No problem, you can use 'GROUP BY'.

But suppose you wanted to write a query that was a report containing both the aggregate and the detail. That makes it trickier.

You may even know how to do it in other databases:

http://sybasease.blogspot.com/2005/08/compute.html

There is a simple way of accomplishing this in SQL*Plus. SQL*Plus includes a function of the same name (COMPUTE) that will do the formatting for you.

In order to use it, you also have to use 'BREAK'. Let's take a very brief look at it. BREAK, like COMPUTE, is an SQL*Plus command that you would issue at any time prior to your query. For example, the following BREAK command will remove all values in DEPTNO if they are the same as the preceding value, and skip 1 line after each grouping.

scott@Robert> BREAK ON deptno SKIP 1 NODUPLICATES
scott@Robert> SELECT ename, sal, deptno
2 FROM emp
3 ORDER BY deptno;

ENAME SAL DEPTNO
---------- ---------- ----------
CLARK 2450 10
VOLLMAN 5000
MILLER 1300

SMITH 800 20
ADAMS 1100
FORD 3000
SCOTT 3000
JONES 2975

ALLEN 1600 30
BLAKE 2850
MARTIN 1250
JAMES 950
TURNER 1500
WARD 1250


Now let's add our COMPUTE command. This one will calculate the subtotal of salary, just like it would using GROUP BY. The advantage here is that its in the same table.

scott@Robert> COMPUTE SUM LABEL subtotal OF sal ON deptno
scott@Robert> SELECT ename, sal, deptno FROM emp ORDER BY deptno;

ENAME             SAL     DEPTNO
---------- ---------- ----------
CLARK 2450 10
VOLLMAN 5000
MILLER 1300
---------- **********
8750 subtotal

SMITH 800 20
ADAMS 1100
FORD 3000
SCOTT 3000
JONES 2975
---------- **********
10875 subtotal

ALLEN 1600 30
BLAKE 2850
MARTIN 1250
JAMES 950
TURNER 1500
WARD 1250
---------- **********
9400 subtotal


Naturally, this also works with other aggregate functions like SUM, MIN, MAX, AVG, STD, VARIANCE, COUNT, NUMBER.

For reference, check out the SQL*Plus User's Guide:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a90842.pdf

You want chapters 7 and 13:
7 Formatting SQL*Plus Reports
13 SQL*Plus Command Reference

UPDATE: Actually there's a simple way without SQL*Plus, by using GROUPING, see Tom Kyte's Comment. Here is his example of my Oracle9 environment.

scott@Robert> SELECT DECODE(GROUPING_ID(ename), 1, 'Subtotal' ) tag
2 ename, deptno, SUM(sal)
3 FROM emp
4 GROUP BY GROUPING SETS( (ename,deptno), (deptno) );


TAG      ENAME          DEPTNO   SUM(SAL)
-------- ---------- ---------- ----------
CLARK 10 2450
MILLER 10 1300
VOLLMAN 10 5000
Subtotal 10 8750
FORD 20 3000
ADAMS 20 1100
JONES 20 2975
SCOTT 20 3000
SMITH 20 800
Subtotal 20 10875
WARD 30 1250
ALLEN 30 1600
BLAKE 30 2850
JAMES 30 950
MARTIN 30 1250
TURNER 30 1500
Subtotal 30 9400


Read about GROUPING, GROUPING_ID and GROUPING SETS in SQL Reference:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540.pdf

Wednesday, August 17, 2005

Keeping Tables Small

David Aldridge recently put together a list of ways to make table scans faster:

http://oraclesponge.blogspot.com/2005/08/list-ways-to-scan-table-faster.html

There is one really simple, effective but often overlooked way of decreasing the time it takes to scan a table that I want to elaborate on. It's based on this simple premise:

Fewer Rows = Faster Scans.

The time it takes to scan a table can be roughly calculated like so:
Number of Rows TIMES Time to Process One Row

Many people focus on the second part of the question (the time it takes to process a single row), when sometimes it is far more simple to decrease the number of rows.

What that means is, look at the table and determine if some of the data is used infrequently enough that it can be moved aside to an archived version of this table.

You may need to modify your existing applications to account for cases where you truly do need to look at all that data. In that case, you might keep a view of all data.

That is basically my point, so you can stop reading there if you wish. I have put together an example, where I elaborate on this a little further. It is meant only as a demonstration of what I mean, not any kind of compelling argument or proof.

First off, here is an easy way to create a nice, big test table.

CREATE TABLE ReallyBigTable AS SELECT * FROM ALL_OBJECTS;

SELECT COUNT (*) FROM ReallyBigTable;

40763

Now I'm going to create a query that ought to really suffer, performance-wise, the larger the table is:

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:11:42.09

Alright, now let's create an archive and active version of this table. For argument's sake, let's say that anything with an object_id below 40000 is used infrequently enough that it is safe to push aside for our business-specific purposes:

CREATE TABLE ReallyBigTable_Archive AS SELECT * FROM ReallyBigTable
WHERE object_id < 40000;

SELECT COUNT (*) FROM ReallyBigTable_Archive;

38000

CREATE TABLE ReallyBigTable_Active AS SELECT * FROM ReallyBigTable
WHERE object_id >= 40000;

SELECT COUNT (*) FROM ReallyBigTable_Active;

2763

I'm in a single-user environment doing a test, so I really don't have to worry right now about updates taking place. Otherwise we'd have to more clever about the creation of our archive and active tables. But I will say that creating these tables can be lightning-fast!

Ok, now let's try our query on our little table:

SELECT SUM(object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable_Active);

Time Elapsed: 00:00:02.08

So it completed in 0.3% of the time. Awesome!

"OK," you say, "but you're cheating!" So what if I am? Sometimes in the real world we CAN cheat. Sometimes we don't have to go through all the data, every time, every case. Don't get mad at Oracle or your DBA or your application when it's not fast enough, just get creative. You don't always have to apply painful Kolkian techniques or Millsapian analysis to get results.

Moving on, there may still be some applications that want to query on the whole dataset. No problem, that is still possible using UNION. But what kind of price will we pay? Let's see, in this one example:

DROP TABLE ReallyBigTable;

CREATE VIEW ReallyBigTable AS
SELECT * FROM ReallyBigTable_Archive
UNION ALL
SELECT * FROM ReallyBigTable_Active;

Remember, a view is just like a stored query, but we can use it like a table. Which means our applications don't have to change their queries.

http://thinkoracle.blogspot.com/2005/07/use-views.html

One limitation of this kind of view is that we won't be able to insert from our legacy applications anymore:

INSERT INTO ReallyBigTable (object_id) VALUES (NULL);

ORA-01732: data manipulation operation not legal on this view

Let's see how much of a performance penalty we have in this case by using a view instead of the original base table:

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:08:18.02

What??? The use of a view on two smaller sub-tables took less time than the original table? It only took 70% as much time. Probably something to do with having the data in memory, or statistics, or something like that. I'll get back to you in another blog. The main thing I wanted to demonstrate was that the performance would be in the same "order" as the original query. That is, you're not taking a huge performance hit by using a view.

Out of curiousity, let's take a mixed sub-example, where we are using our new, smaller table against that big view.

SELECT SUM (object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:00:43.05

Nice.

One final point, if I may. Notice I used "UNION ALL" instead of "UNION." I did this because UNION rearranges the order of the table, and removes duplicates.

http://thinkoracle.blogspot.com/2005/08/union-all.html

I don't care about duplicates and order, and I didn't want the performance hit, so I used UNION ALL. Of course, I am curious to know how significant the performance hit is in this case. Let's see.

CREATE OR REPLACE VIEW ReallyBigTable AS
SELECT * FROM ReallyBigTable_Archive
UNION
SELECT * FROM ReallyBigTable_Active;

SELECT SUM(object_id) FROM ReallyBigTable
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:13:20.02

So it took about 60% longer than with UNION ALL.

SELECT SUM (object_id) FROM ReallyBigTable_Active
WHERE object_id * 2 NOT IN
(SELECT object_id FROM ReallyBigTable);

Time Elapsed: 00:00:54.05

And this one about 25% longer. So in this case it was definitely noticeable.

Wrap-up:
1. David Aldridge has a great article on how to make your table scans go faster.
2. In addition to his techniques, I recommend "being creative" (aka "cheating") and trying to split your data.
3. Use Views to maintain applications that SELECT from the big, original table so you don't have to change code and can still query the whole set.
4. Using Views will not have a big performance hit. In fact, in this demonstration it was faster (a topic for another day's blog).
5. Use UNION ALL instead of UNION in this view to avoid the noticeable performance hit.

Sunday, August 14, 2005

UTL_HTTP

Let's say you want to pull some real-time information off the Internet and put it in your database. For example:
- Stock Market Quotes
- Temperature
- Sports scores

No need to write a separate application in Java or whatever, you can do it all directly in Oracle.

Your solution will involve using one of the many handy Oracle built-in utilities: UTL_HTTP.

For reference, flip open Chapter 78 of the 'Supplied PL/SQL Packages and Types Reference'.

Tom Kyte also had a discussion in the Appendix of "Expert One-on-One Oracle", and also on Ask Tom:
http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:285215954607

Let's look at a simple example, grabbing a market quotation.

You can get the latest quotes from Yahoo Finance. Using UTL_HTTP.REQUEST, you can get the content of that page, and then search within for the data you want.

SELECT UTL_HTTP.REQUEST('http://finance.yahoo.com/q?s=KMP') FROM DUAL;

If you're behind a firewall, include the IP or name of your proxy server as the second parameter.

For simplicity, I'm not including the output here, but you can see that we don't have our quote in there. That's because we only got the first 2000 bytes of the web page. If we want more, we need to use REQUEST_PIECES.

Down below I have included a working example. It's roughly thrown together, but it does illustrate the point: you can use UTL_HTTP to retrieve information from the Internet that you can put in the database.

As an aside, talk to some lawyers to make sure the data you are mining is not violating any copyrights.

After you write stored procedures to retrieve web pages, and to extract information from within and insert them into the database, you can automate the updates by calling the stored procedures using DBMS_JOB (a topic for another day!).

SET SERVEROUTPUT ON

DECLARE
l_pieces UTL_HTTP.HTML_PIECES;
-- We'll look at two 2000-byte pages at a time
l_two_pages VARCHAR2(4000);
l_start_read NUMBER;
l_end_read NUMBER;
l_quote VARCHAR2(12);
BEGIN
-- Grab up to a maxium of 32 2000-byte pages, and then go through them,
-- looking at 2 pages at a time in case the data we are looking for
-- overlaps a page boundary
l_pieces := UTL_HTTP.REQUEST_PIECES('http://finance.yahoo.com/q?s=KMP', 32);
FOR i IN 1 .. l_pieces.COUNT LOOP
l_two_pages := l_two_pages || l_pieces(i);
-- Look for a string preceding the information we want
-- If we find it, add 52 (magic, Yahoo-specific number)
-- to find the point where the quote will begin
SELECT INSTR(l_two_pages, 'Last Trade', 1, 1) INTO l_start_read FROM dual;
IF (l_start_read > 0) THEN
l_start_read := l_start_read + 52;
IF (l_start_read < 3950) THEN
SELECT INSTR(l_two_pages, '<', l_start_read, 1) INTO l_end_read FROM dual;
IF (l_end_read > 0) THEN
IF ((l_end_read - l_start_read) < 12) THEN
SELECT SUBSTR(l_two_pages, l_start_read, l_end_read - l_start_read) INTO l_quote FROM dual;
DBMS_OUTPUT.PUT_LINE(l_quote);
ELSE
DBMS_OUTPUT.PUT_LINE('Error (Quote more than 12 chars)');
END IF;
EXIT;
END IF;
END IF;
END IF;
l_two_pages := l_pieces(i);
END LOOP;
END;

Wednesday, August 10, 2005

UNION ALL

You want to write a query that contains the rows from 2 or more tables. What you want to use is one of Oracle's set operators: UNION, INTERSECT, or MINUS. (Note: in the ANSI SQL standard, MINUS is referred to as EXCEPT). My example will deal with UNION.

You may have tables containing your employees, contractors and clients, each with their own unique and appropriate columns. For illustrative purposes, however, let's consider only their names and phone numbers.

CREATE TABLE Employee (e_name VARCHAR2(32), e_phonenumber VARCHAR2(16));
CREATE TABLE Contractor (c_name VARCHAR2(32), c_phonenumber VARCHAR2(16), c_startcontract DATE, c_endcontract DATE);
CREATE TABLE Client (c_name VARCHAR2(32), c_phonenumber VARCHAR2(16));

Let's get some sample data:

INSERT INTO Employee VALUES ('Joe Smith', '555-555-1234');
INSERT INTO Contractor VALUES ('Adam Johnson', '555-555-8888', '01-Jan-04', '01-Mar-04');
INSERT INTO Contractor VALUES ('Bob Jackson', '555-555-1111', '01-Jan-04', NULL);
INSERT INTO Contractor VALUES ('Adam Johnson', '555-555-8888', '01-Jan-05', '01-Mar-05');
INSERT INTO Client VALUES ('Bill Taylor', '555-555-6767');
INSERT INTO Client VALUES ('Adam Johnson', '555-555-8888');

What you would like to do is create a view to contain all the phone numbers. You can use the UNION operator, which is very easy. Write your queries however you wish, just make sure that they all have the same number of columns and similar data types.

SELECT e_name, e_phonenumber FROM Employee
UNION
SELECT c_name, c_phonenumber FROM Contractor
UNION
SELECT c_name, c_phonenumber FROM Client;


E_NAME E_PHONENUMBER
-------------------------------- ----------------
Adam Johnson 555-555-8888
Bill Taylor 555-555-6767
Bob Jackson 555-555-1111
Joe Smith 555-555-1234



Excellent!

But observe two things:
1. The order of the results have been re-arranged
2. There are no duplicates.

Actually, 1 and 2 are tied closely together. Oracle re-arranges the results in order to put identical rows next to each other and remove duplicates. On large tables, you may get a bit of a performance hit.

If you don't care about removing duplicates, or especially if you want the duplicates, use UNION ALL instead:

SELECT e_name, e_phonenumber FROM Employee
UNION ALL
SELECT c_name, c_phonenumber FROM Contractor
UNION ALL
SELECT c_name, c_phonenumber FROM Client;

E_NAME                           E_PHONENUMBER
-------------------------------- ----------------
Joe Smith 555-555-1234
Adam Johnson 555-555-8888
Bob Jackson 555-555-1111
Adam Johnson 555-555-8888
Bill Taylor 555-555-6767
Adam Johnson 555-555-8888


If you want them sorted, but don't want duplicates removed, you can include an ORDER BY clause. Use UNION with ORDER BY if you want duplicates removed and you want a guaranteed order.

SELECT e_name, e_phonenumber FROM Employee
UNION ALL
SELECT c_name, c_phonenumber FROM Contractor
UNION ALL
SELECT c_name, c_phonenumber FROM Client
ORDER BY 1;


E_NAME                           E_PHONENUMBER
-------------------------------- ----------------
Adam Johnson 555-555-8888
Adam Johnson 555-555-8888
Adam Johnson 555-555-8888
Bill Taylor 555-555-6767
Bob Jackson 555-555-1111
Joe Smith 555-555-1234


If you just want to remove duplicates within tables, but not in the merged set, try the DISTINCT clause.

SELECT DISTINCT e_name, e_phonenumber FROM Employee
UNION ALL
SELECT DISTINCT c_name, c_phonenumber FROM Contractor
UNION ALL
SELECT DISTINCT c_name, c_phonenumber FROM Client;


E_NAME E_PHONENUMBER
-------------------------------- ----------------
Joe Smith 555-555-1234
Adam Johnson 555-555-8888
Bob Jackson 555-555-1111
Adam Johnson 555-555-8888
Bill Taylor 555-555-6767



I guess it all depends what you want.

Here is Dan Morgan's reference on Built-In Operators:
http://www.psoug.org/reference/ora_operators.html

But you also want to check out the Oracle SQL Reference, Chapter 8: SQL Queries and Subqueries.

For your reading pleasure, check out this article by Jonathan Gennick on set operators, including UNION:
http://five.pairlist.net/pipermail/oracle-article/2003/000003.html

Administrative note: I've added a new section to include the links to all the Oracle blogs I regularly visit, to save you from going here:
http://thinkoracle.blogspot.com/2005/07/oracle-blogs.html

Tuesday, August 09, 2005

OraBlogs!

Brian Duff maintains a blog that brings together all the feeds from the various Oracle blogs that are out there.

http://www.orablogs.com/orablogs/

There are some fine blogs that are not included because they do not support RSS 2.0 feed. Mine was one such case, I used atom. Peter Scott and Doug Burns then pointed me to Feedburner. It converts your feed from one format into another.

http://www.feedburner.com/

Just go to their web site, type in the URL to your feed, and then accept all the defaults. Just be sure that you have these three settings (courtesy of Doug):

1) Do NOT select Smartfeed
2) Select Convert Format Burner
3) Select RSS 2.0 from the list of options.

If all goes well, this should be my first post on OraBlogs. It would be great to kick things off with a great post but, alas, I have nothing to post about today. Instead, just like a bad 80s sitcom, I will do a "flashback highlight episode. Here are my 5 favourite posts from my 3 months of existence:

May 17th: NULL is not nothing
http://thinkoracle.blogspot.com/2005/05/null-vs-nothing.html

June 10th: NULLs in Oracle (kind of a "Part 2")
http://thinkoracle.blogspot.com/2005/06/nulls-in-oracle.html

July 1st (Canada Day): Extra Columns in a Group By
http://thinkoracle.blogspot.com/2005/07/extra-columns-in-group-by.html

July 26th: Use Constraints!
http://thinkoracle.blogspot.com/2005/07/use-constraints.html

July 29th: Using Views and other techniques to solve a particular problem
http://thinkoracle.blogspot.com/2005/07/oracle-by-example.html

Monday, August 01, 2005

Import Export

I recently installed a new Oracle instance on my laptop. I wanted to migrate my small development database (complete with structure and data) from my main workstation to my laptop.

It was very easy using the import (IMP) and export (EXP) tools. They are described in Chapter 1 (Export) and Chapter 2 (Import) of the Oracle Utilities Guide, available here:

http://www.oracle.com/technology/documentation/index.html

Everything you should need to know is in there. For good measure, I read Chapter 8 of Tom Kyte's "Expert One-on-One Oracle" on Import and Export.

Word of caution: Think twice before using IMP/EXP as your back-up strategy on large, complex databases.

Both utilities are located in $ORACLE_HOME/bin directory. You need to run the $ORACLE_HOME/rdbms/admin/catexp.sql file. You can see the commands you need by using the HELP=Y option at your command prompt

C:\> EXP HELP=Y

So for me, the process was as easy as this:

1. Export the data

C:\> EXP USERID=scott/tiger OWNER=scott FILE=scott.dmp

2. Copy the DMP file to the target machine

3. Import the data

C:\> IMP USERID=scott/tiger FILE=scott.dmp FULL=Y

Done! All the tables, triggers, procedures, view and constraints, as well as all the data. While I was at it, I just created a BAT file out of this and added it to the Task Manager to back-up my data regularly. (Unix: SH file and CRON).

As a final note, here is a good FQ, by Frank Naude:
http://www.orafaq.com/faqiexp.htm

This page is powered by Blogger. Isn't yours?