Tuesday, June 27, 2006
Recursion vs Iteration
When writing code to do repetitive tasks, the two primary approaches are iteration and recursion. Generally you can use either one interchangeably, but potentially with different performance and complexity.
A recursive function calls itself (possibly more than once), with different parameters, and defines an exit clause that is guaranteed to be reached.
CREATE OR REPLACE FUNCTION FIBONACCI_REC (in_number IN NUMBER)
RETURN NUMBER DETERMINISTIC
AS
BEGIN
CASE
WHEN in_number = 1 THEN RETURN 1;
WHEN in_number = 2 THEN RETURN 1;
ELSE RETURN (FIBONACCI_REC (in_number - 1) + FIBONACCI_REC (in_number - 2));
END CASE;
END;
An iterative function includes a loop, which iterates a pre-determined number of times, or checks for an exit clause every time through.
CREATE OR REPLACE FUNCTION FIBONACCI_IT (in_number IN NUMBER)
RETURN NUMBER DETERMINISTIC
AS
fib1 NUMBER := 1;
fib2 NUMBER := 1;
fib3 NUMBER := 1;
fib4 NUMBER;
BEGIN
FOR fib4 IN 3 .. in_number
LOOP
fib3 := fib1 + fib2;
fib1 := fib2;
fib2 := fib3;
END LOOP;
RETURN fib3;
END;
The advantages and disadvantages of the two are not always obvious, and you should really take it on a case-by-case basis. When in doubt: test it. But generally:
1. Recursion may be slower, and use greater resources, because of the extra function calls.
2. Recursion may lead to simpler, shorter, easier-to-understand functions, especially for mathematicians who are comfortable with induction formulae.
Either way, one of the most critical aspects of writing either a recursive or an iterative function is to have an exit clause (or "base case" for recursions) that is checked at every recursion/iteration and is guaranteed to be reached at some finite point, given any input. Otherwise you will get either:
Infinite Recursion: More and more functions being spawned, never closing, using up resources until there are none left, possibly crashing the system.
Infinite loop: A loop that cycles forever, burning CPU and never completing.
I can illustrate this point with my recursive example above. Do NOT try this, but if you inserted a negative number, what do you think would happen? Infinite recursion.
Now you have not only the basics on iteration and recursion, but, based on the topic I chose for the sample code, knowledge that I saw "Da Vinci Code" recently.
A recursive function calls itself (possibly more than once), with different parameters, and defines an exit clause that is guaranteed to be reached.
CREATE OR REPLACE FUNCTION FIBONACCI_REC (in_number IN NUMBER)
RETURN NUMBER DETERMINISTIC
AS
BEGIN
CASE
WHEN in_number = 1 THEN RETURN 1;
WHEN in_number = 2 THEN RETURN 1;
ELSE RETURN (FIBONACCI_REC (in_number - 1) + FIBONACCI_REC (in_number - 2));
END CASE;
END;
An iterative function includes a loop, which iterates a pre-determined number of times, or checks for an exit clause every time through.
CREATE OR REPLACE FUNCTION FIBONACCI_IT (in_number IN NUMBER)
RETURN NUMBER DETERMINISTIC
AS
fib1 NUMBER := 1;
fib2 NUMBER := 1;
fib3 NUMBER := 1;
fib4 NUMBER;
BEGIN
FOR fib4 IN 3 .. in_number
LOOP
fib3 := fib1 + fib2;
fib1 := fib2;
fib2 := fib3;
END LOOP;
RETURN fib3;
END;
The advantages and disadvantages of the two are not always obvious, and you should really take it on a case-by-case basis. When in doubt: test it. But generally:
1. Recursion may be slower, and use greater resources, because of the extra function calls.
2. Recursion may lead to simpler, shorter, easier-to-understand functions, especially for mathematicians who are comfortable with induction formulae.
Either way, one of the most critical aspects of writing either a recursive or an iterative function is to have an exit clause (or "base case" for recursions) that is checked at every recursion/iteration and is guaranteed to be reached at some finite point, given any input. Otherwise you will get either:
Infinite Recursion: More and more functions being spawned, never closing, using up resources until there are none left, possibly crashing the system.
Infinite loop: A loop that cycles forever, burning CPU and never completing.
I can illustrate this point with my recursive example above. Do NOT try this, but if you inserted a negative number, what do you think would happen? Infinite recursion.
Now you have not only the basics on iteration and recursion, but, based on the topic I chose for the sample code, knowledge that I saw "Da Vinci Code" recently.
Sunday, June 18, 2006
Meeting Tom Kyte
"You have redeemed my faith, after last year's presentation, which was such a disappointment."- One of Tom Kyte's biggest fans in COUG
Renowned Oracle expert and author of highly popular blog, pictured here with Tom Kyte.
I first met Tom Kyte about twenty minutes before his presentation about his favourite features in Oracle 10g. When I went up to introduce myself, I figured he was putting the final touches on his presentation, but he was actually answering questions for his AskTom web site. I guess you have to find the time for that somewhere!
Due in part to his reputation for fine speeches (his biggest fan notwithstanding), there were quite a few people in attendance. Tom reports having answered 33,000 different Oracle-related questions. I wonder how many people were there hoping they could find that one question that pushed him over the edge. Sadly, no such luck.
I enjoyed the presentation myself, despite still being on Oracle 9. Based on some of the features, including DBMS_SQLTUNE which uses SQL Profiles to find patterns in your data that can help speed up queries, I wonder how long before he comes to town announcing that the next release of Oracle is sentient.
I was a little surprised to hear his excitement about DBMS_ADVANCED_REWRITE, which allows you to tell Oracle "listen, when I write this query, what I REALLY want is this completely different query." I can't believe I'm going to have to learn to start typing "FOR_REAL" after all my queries. Maybe in Oracle 11 we'll need "NO_SERIOUSLY_I_MEAN_IT".
On a more serious note, it sounds like Oracle 10 has a lot more bells and whistles ultimately aimed to speed things up. Take PL/SQL compilation for example. When you "CREATE OR REPLACE" it will check whether a recompilation is really necessary, and will also look for optimizations within your code. I guess the end result is a little bit more overhead to check for things, but faster results, especially for inexperienced programmers.
Afterwards we all retired to Bottlescrew Bill's for COUG's annual social, where I enjoyed Tom's stories almost as much as his presentation. In contrast to the light-hearted ribbing with which I began this post, virtually everyone ranks him as one of the very best Oracle speakers. I'm really glad he paid us a visit, and look forward to hearing him again.
Renowned Oracle expert and author of highly popular blog, pictured here with Tom Kyte.
I first met Tom Kyte about twenty minutes before his presentation about his favourite features in Oracle 10g. When I went up to introduce myself, I figured he was putting the final touches on his presentation, but he was actually answering questions for his AskTom web site. I guess you have to find the time for that somewhere!
Due in part to his reputation for fine speeches (his biggest fan notwithstanding), there were quite a few people in attendance. Tom reports having answered 33,000 different Oracle-related questions. I wonder how many people were there hoping they could find that one question that pushed him over the edge. Sadly, no such luck.
I enjoyed the presentation myself, despite still being on Oracle 9. Based on some of the features, including DBMS_SQLTUNE which uses SQL Profiles to find patterns in your data that can help speed up queries, I wonder how long before he comes to town announcing that the next release of Oracle is sentient.
I was a little surprised to hear his excitement about DBMS_ADVANCED_REWRITE, which allows you to tell Oracle "listen, when I write this query, what I REALLY want is this completely different query." I can't believe I'm going to have to learn to start typing "FOR_REAL" after all my queries. Maybe in Oracle 11 we'll need "NO_SERIOUSLY_I_MEAN_IT".
On a more serious note, it sounds like Oracle 10 has a lot more bells and whistles ultimately aimed to speed things up. Take PL/SQL compilation for example. When you "CREATE OR REPLACE" it will check whether a recompilation is really necessary, and will also look for optimizations within your code. I guess the end result is a little bit more overhead to check for things, but faster results, especially for inexperienced programmers.
Afterwards we all retired to Bottlescrew Bill's for COUG's annual social, where I enjoyed Tom's stories almost as much as his presentation. In contrast to the light-hearted ribbing with which I began this post, virtually everyone ranks him as one of the very best Oracle speakers. I'm really glad he paid us a visit, and look forward to hearing him again.
Tuesday, June 13, 2006
Refreshing Data
Moving data from one database to another is a very common task for a database administrator. So common, in fact, that I have never bothered to write about it, because most DBAs already have their own preferred method.
If you haven't chosen your approach yet, allow me to show you the general flow, some points to keep in mind, and some of the decisions you'll have to make based upon the nature of your data and the tools at your disposal.
1. Lose Your Integrity
What do you do when you have one tables that relies on another, through a foreign key? In order to avoid having your import fail, you would need to have the tables loaded in the right order. Or, you could just disable all constraints.
Why all constraints? Why not just the foreign keys? Well, checking constraints takes time. Presumably this is data you're grabbing from an environment where these constraints have already been checked, why check them again? If you don't trust that data, don't worry, you'll catch the errors when you try to re-enable the constraints later on.
You may want to disable triggers while you're at it. Presumably those have already fired when the data was first entered, and those can be time consuming as well.
While you're in the disabling mood, drop your indexes. It will just take time to update them with every row you insert. Just drop them now and re-create them when you're done. Much faster over-all.
One more thing, you want to do these and all subsequent steps with logging turned off. Why keep a redo log of each transaction? That's just taking time and space. Chances are your mentality is that either all the data gets in, or all of it doesn't.
2. Wipe Out Your Data
You may already have data in the target database that you want to replace. You could insert your data in such a way that it checks if the data already exists before adding it. But that takes time and effort. Instead, you may just want to wipe out and re-insert all your data.
What is the best way to eliminate your data? Don't use "delete." Delete is for selectively deleting a subset of rows, or if you want to be able to rollback your work. Instead, in this case, truncate your data. No logging: much faster.
By the way, when you're truncating, you'll be glad that you disabled your foreign key constraints. Otherwise you'd have to be very careful in the order you truncated your tables.
You may want to consider using "reuse storage" which will keep the space reserved for the data. Since you're probably going to be inserting a roughly equal or superior amount of data, this will save you from having to expand your extents during the import.
An alternative to all this is to drop all the tables and then have your import process re-create the tables with the data involved. That option is only appealing if you're not 100% confident the table structure is the same, otherwise I don't see the point.
Before you wipe out your data, I'm assuming either you already have a backup, or you don't care about this data. Otherwise, you should have done this first:
3. Dump Your Data
Which is outside the scope of this article. :)
4. Copy Your Dump
5. Import the Data
There are many different tools you can use to import your data, and each one has many different methods and options. Here is a brief list of your choices which, of course, depend on how you dumped your data.
SQL*Loader/SQLLDR (Check the Oracle Server Utilities Guide, Part 2)
IMP utility
Oracle 10g Data Pump Export and Import (DBMS_DATAPUMP)
Oracle Data Manager (Oracle Enterprise Manager Console)
Your own custom application (possibly using Oracle Call Interface/OCI).
If your data import is still ridiculously slow, ask yourself if you're using RAID 5. Are you? RAID 5 is slow, man. No getting around that, that's the nature of the beast.
6. Restore Your Integrity
Now that the job is done, it is time to restore your integrity (and lose your faith in man). Re-enable your constraints, and triggers, and restore your indexes. This may take some time, so get a coffee.
7. Recalculate Statistics
In order for the optimizer to work effectively you'll need to re-calculate your statistics (DBMS_STATS.GATHER_SCHEMA_STATISTICS). If it knows how big the tables are, the optimizer will be able to make the best choices in query plans. You can import the statistics, if they were kept up-to-date in your production system, but might as well gather them now.
That's it!
This was by no means meant to be a comprehensive step-by-step guide with an exhaustive list of everything to consider. But at least now I've made an attempt to draw out the basic flow, and some of the things you should keep in mind. Hopefully this will help you design a solution of your own, most appropriate for your data and how you use it.
If you haven't chosen your approach yet, allow me to show you the general flow, some points to keep in mind, and some of the decisions you'll have to make based upon the nature of your data and the tools at your disposal.
1. Lose Your Integrity
What do you do when you have one tables that relies on another, through a foreign key? In order to avoid having your import fail, you would need to have the tables loaded in the right order. Or, you could just disable all constraints.
Why all constraints? Why not just the foreign keys? Well, checking constraints takes time. Presumably this is data you're grabbing from an environment where these constraints have already been checked, why check them again? If you don't trust that data, don't worry, you'll catch the errors when you try to re-enable the constraints later on.
You may want to disable triggers while you're at it. Presumably those have already fired when the data was first entered, and those can be time consuming as well.
While you're in the disabling mood, drop your indexes. It will just take time to update them with every row you insert. Just drop them now and re-create them when you're done. Much faster over-all.
One more thing, you want to do these and all subsequent steps with logging turned off. Why keep a redo log of each transaction? That's just taking time and space. Chances are your mentality is that either all the data gets in, or all of it doesn't.
2. Wipe Out Your Data
You may already have data in the target database that you want to replace. You could insert your data in such a way that it checks if the data already exists before adding it. But that takes time and effort. Instead, you may just want to wipe out and re-insert all your data.
What is the best way to eliminate your data? Don't use "delete." Delete is for selectively deleting a subset of rows, or if you want to be able to rollback your work. Instead, in this case, truncate your data. No logging: much faster.
By the way, when you're truncating, you'll be glad that you disabled your foreign key constraints. Otherwise you'd have to be very careful in the order you truncated your tables.
You may want to consider using "reuse storage" which will keep the space reserved for the data. Since you're probably going to be inserting a roughly equal or superior amount of data, this will save you from having to expand your extents during the import.
An alternative to all this is to drop all the tables and then have your import process re-create the tables with the data involved. That option is only appealing if you're not 100% confident the table structure is the same, otherwise I don't see the point.
Before you wipe out your data, I'm assuming either you already have a backup, or you don't care about this data. Otherwise, you should have done this first:
3. Dump Your Data
Which is outside the scope of this article. :)
4. Copy Your Dump
5. Import the Data
There are many different tools you can use to import your data, and each one has many different methods and options. Here is a brief list of your choices which, of course, depend on how you dumped your data.
SQL*Loader/SQLLDR (Check the Oracle Server Utilities Guide, Part 2)
IMP utility
Oracle 10g Data Pump Export and Import (DBMS_DATAPUMP)
Oracle Data Manager (Oracle Enterprise Manager Console)
Your own custom application (possibly using Oracle Call Interface/OCI).
If your data import is still ridiculously slow, ask yourself if you're using RAID 5. Are you? RAID 5 is slow, man. No getting around that, that's the nature of the beast.
6. Restore Your Integrity
Now that the job is done, it is time to restore your integrity (and lose your faith in man). Re-enable your constraints, and triggers, and restore your indexes. This may take some time, so get a coffee.
7. Recalculate Statistics
In order for the optimizer to work effectively you'll need to re-calculate your statistics (DBMS_STATS.GATHER_SCHEMA_STATISTICS). If it knows how big the tables are, the optimizer will be able to make the best choices in query plans. You can import the statistics, if they were kept up-to-date in your production system, but might as well gather them now.
That's it!
This was by no means meant to be a comprehensive step-by-step guide with an exhaustive list of everything to consider. But at least now I've made an attempt to draw out the basic flow, and some of the things you should keep in mind. Hopefully this will help you design a solution of your own, most appropriate for your data and how you use it.
Wednesday, June 07, 2006
Call For Topics
This morning I read that fellow Oracle blogger Tim Hall is looking for Oracle topics on which to base his next article. It occurs to me that since changing positions a few months ago that I have dealt with Oracle a lot less than previously, and am open to hearing some ideas.
Virtually all my articles are based on questions I get from customers or colleagues, or problems I have to solve in my day-to-day work. There are times when that stream is a trickle, and times when its a raging river.
Right now its a trickle. But I'd like to keep up the momentum rather than have this blog go the way of my Sybase blog.
And don't confine your questions to my area of interest (development), because I happen to know that several other excellent bloggers like Eddie Awad, Gary Myers, Tom Kyte, Doug Burns and others may also pick up on any ideas you leave in my comments.
Well maybe not Tom, I think he gets enough questions. :)
Virtually all my articles are based on questions I get from customers or colleagues, or problems I have to solve in my day-to-day work. There are times when that stream is a trickle, and times when its a raging river.
Right now its a trickle. But I'd like to keep up the momentum rather than have this blog go the way of my Sybase blog.
And don't confine your questions to my area of interest (development), because I happen to know that several other excellent bloggers like Eddie Awad, Gary Myers, Tom Kyte, Doug Burns and others may also pick up on any ideas you leave in my comments.
Well maybe not Tom, I think he gets enough questions. :)