Tuesday, February 24, 2009
Get Rid of NULL
Well, you know what? I've had it. Let's get rid of NULL. There, I said it.
I've written articles explaining NULL before, a couple of times in fact*. So have plenty of other Oracle specialists, in both blogs and books. And it's just not getting through. So let's just get rid of NULLs.
* Links to my previous articles:
NULLs in Oracle
NULL vs Nothing - ANSI SQL is unlike programming languages because NULL is not nothing.
The basic problem is that programmers coming from different backgrounds mistakenly think NULL is nothing. You may think NULL is nothing, you may design your application to treat NULL as nothing, but the problem is that Oracle does not consider NULL to be nothing. NULL is, quite simply, "I don't know."
NULL is not nothing, it's a value. It's a value as real as any other. It's simply an unknown value. So why do we need NULL anyway? Let's just call it UNKNOWN. That should avoid any programmer misconceptions.
Let me give you an example to help clear it up. Let's take a table of employees, and for whatever reason, HR keeps track of everyone's favourite colour. How would you treat the following situations?
1. An employee leaves that section blank.
2. An employee writes "blind since birth - never seen a colour."
3. An employee writes "that's personal" or "what a stupid question."
4. An employee writes "I do not have a favourite colour."
In most cases today, each of these employees would have NULL entered as their favourite colour. But would it be appropriate in all cases? In some cases the employee has a favourite colour, we simply don't know what it is. In at least one case, the employee has no favourite colour (ie "nothing").
What if I asked you if two of these employees had the same favourite colour? You couldn't tell me, could you? What if I asked you if they all had different favourite colours? You can't answer that either.
So what would I do? I'd get rid of NULL and use a value more appropriately named UNKNOWN instead.
"But Robert," some of you are thinking, "What about foreign keys?" You may have a table called colour and employee.colour_id might map to colour.id. UNKNOWN may be appropriate for some of those employees, what about the employee without a favourite colour? We can't put UNKNOWN because we know his favourite colour (ie. none). So what do we do? Create another type called NONE or NOTHING?
No. No we don't. I doubt Oracle would even support that properly anyway. I'll tell you what we'd do. We'd insert "Nothing" into COLOUR. And why not? What's the problem with that?
In more practical terms, think of the "supervisor_id" field in a typical employee table. The supervisor_id column would be self-referential to employee.id. What I'm saying is this:
1. Employees with an unknown supervisor would have supervisor_id = UNKNOWN (which is just a simple re-naming of NULL).
2. There would be a "Nobody" employee. (Make sure not to pay him/her!)
3. Employees with no supervisor (the President, for example) would have a supervisor_id equal to Nobody's id.
Since I'm adding "Nobody" as an employee, would I therefore want to add "Unknown" as an employee too, make the column "NOT NULL", thus avoiding the NULL/UNKNOWN value altogether?
No. No way. Look at those questions I asked above - what if I asked you if two employees had the same boss? If they both had a supervisor_id set to the "Unknown" employee, you'd say "yes," which would be wrong! They might have two completely different supervisors - the point is that we don't know. Which is what NULL/UNKNOWN value is for.
I know I'm not alone in my opinion, but I also know that there is no consensus on this issue. I'm expecting more than a few comments from experienced Oracle experts finding fault with my proposals and/or examples, and backing it up quite convincingly. NULL is no doubt one of the great RDBMS arguments, and while it may be frustrating to work with, I hope it made for interesting reading.
Friday, February 20, 2009
OVERLAPS
"Yep," said Chad. "I just can't seem to get it right."
"Don't put it all on yourself," said the guru. "What can I do to help?"
"Well, I've got it narrowed down to this one procedure, which is responsible for checking if the room is available or not. It takes a start time, and end time, and the room number. For some reason it sometimes says the room is available when it isn't."
The wise guru studied the code, scratched his beard, and finally provided his much anticipated wisdom. "Have you ever heard of OVERLAPS?"
The well-trained database analyst dutifully opened up the Oracle documentation. He knew the Oracle guru strongly believed in the completeness and accuracy of the Oracle documentation, and it would certainly win him some points to be seen going directly to the guru's favourite source of information.
Unfortunately Chad was a little embarrassed when he couldn't find OVERLAPS anywhere. Finally the guru broke the tension with a hearty laugh. "You won't find it in the Oracle documentation, but you were right to look there first!" Chad smiled sheepishly. "For some reason Oracle has chosen not to advertise the existence of this command, but here's how it works."
The guru slid the keyboard over, cracked his knuckles, and typed out this query.
SQL> SELECT 'Yes' FROM DUAL
2 WHERE (DATE '2009-01-01', DATE '2009-01-03')
3 OVERLAPS (DATE '2009-01-02', DATE '2009-01-04');
'YE
---
Yes
"Wow, that's neat! But without documentation, how am I supposed to know how this works?"
"One of the many advantages to using Oracle is its popularity. As a result, there is an abundance of Oracle experts around the world, most of whom are quite generous with their knowledge and experience. You should be sure to include some of the more popular and trusted forums and sites to your daily reading list."
Chad looked optimistic, but still a little shaken. "Hey, don't feel so bad," said the guru as he placed his hand on the shoulder. "Open up your copy of Expert One-on-One Oracle, and check out Chapter 1. Flip to page 31."
Chad studied the code, which attacked the very same problem. Finally he looked up, obviously still confused. The guru smiled at him. "Study the query carefully. What happens when a meeting request comes in that both starts and ends while another meeting is running?"
Chad studied the code with furrowed eyebrows, but eventually his face broke out into a broad smile. "The room gets double-booked! Wow, even Tom Kyte got this wrong!"
"See? It happens to the best of us."
"Once I test this, should I rewrite my procedure to use OVERLAPS?"
"Personally, I wouldn't recommend using any undocumented feature. Furthermore, my own experience with OVERLAPS is that it doesn't consider equality as an overlap, and that may not satisfy your business requirements."
SQL> SELECT 'Yes' FROM DUAL
2 WHERE (DATE '2009-01-01', DATE '2009-01-03')
3 OVERLAPS (DATE '2009-01-03', DATE '2009-01-04');
no rows selected
"Instead, just consider OVERLAPS a handy tool to help you with your testing."
"Thanks!"
The guru turned up to leave, but just as he was about to leave sight, he turned around in a fashion reminiscent of Peter Falk's Columbo. "You know, I could use some lunch ..."
---
To the Chads out there, here are some authors with more information on OVERLAPS:
Eddie Awad
Kevin Meade
David Aldridge
Thursday, January 22, 2009
Cleaning up with ALL_TAB_COLS
I had just gotten off the phone with a former teammate who was moved to another department. They had a new database analyst who was struggling with his new "database independent" application, and he was sent my way. It was just around the time that he leaned his head into my office that I wished I had a second exit.
"You're the Oracle guru?"
I smiled at him, and nodded as modestly as I could. He dragged his feet into my office and fell into a chair. "Wow, you look so ... normal."
"What can I help you with?" I sighed.
He began going over the long list of issues he was trying to tackle, and data he was trying to clean up. He had a very sad story, I'll admit. He had a database shared by a host of buggy applications that had different ideas of how to handle NULLs and other values. He was left with tables that were supposed to be identical, but weren't, data that was supposed to match, but didn't, and felt more optimistic about finding needles in haystacks than the data he needed to fix. After several minutes it was time for one of my trademark interruptions.
"The first thing you need to study is the ALL_TAB_COLS table."
"What's that?" I waited a moment for him to pull out the Oracle documentation, but then I forgot that he's new, and I haven't trained him yet. I helpfully turned my monitor so he could see it, and brought it up on the screen.
"Just as the name suggests, it's your one-stop shop for all the information you might need about all the columns in all the tables," I explained as he read through the details. "For instance, remember your question about how to find all the columns in your database that store the province? Do you see how this table has your answer?"
"Absolutely," he said, sliding away my keyboard and bringing up a SQL*Plus session.
SQL> SELECT table_name, column_name
2 FROM all_tab_cols
3 WHERE column_name like '%PROV%';
"Aw man," he said upon seeing the results. "Some of these aren't even my tables. I don't want to wade through everything!"
"Then don't. Either use USER_TAB_COLS instead, or make use of the OWNER column." I helpfully added one more where clause to his query.
SQL> SELECT table_name, column_name
2 FROM all_tab_cols
3 WHERE column_name LIKE '%PROV%'
4 AND owner = 'NEWBIE';
"And you were trying to find columns that support NULL in one place, but not in others? Think you can manage that?"
He slid the keyboard back, studied the documentation, chewed on my pen for a while (grr!) and came up with something he found satisfying.
SQL> SELECT table_name, column_name FROM all_tab_cols WHERE nullable='Y'
2 AND column_name IN
3 (SELECT column_name FROM user_tab_columns WHERE nullable = 'N');
He smiled, and studied the documentation further. "Hey, I know how else I can use this. I had this import the other day that brought in a bunch of NULLs and my stupid buggy application doesn't handle NULLs very well. This can help me find the tables and columns with NULLs!"
SQL> SELECT table_name, column_name, num_nulls
2 FROM all_tab_cols
3 WHERE num_nulls > 0;
"What are these other tables? ALL_TAB_COLUMNS and ALL_TAB_COL_STATISTICS?" I paused for a moment, wondering whether I should step back and start educating him on NULLs before answering his question, but was gratified that this internal debate gave him enough time to bring up the Oracle documentation and answer the question for himself.
"Oh, I see. ALL_TAB_COLUMNS is the same, except with hidden columns filtered out, and ALL_TAB_COL_STATISTICS is just extracted from it." He leaned back and that's when I first noticed that he was smiling. "This is sweet! My application is also having trouble with duplicates and certain bad values, and I can use this to help me find the tables that have them."
"Yes, and as a bonus, you'll see there's useful information that can help you determine where you need to re-analyze your statistics. I'm glad ALL_TAB_COLS can get you started."
"Well before I go back to my desk and study all of this, I wanted to know if you had any suggestions on how to compare data between tables. I've got two tables that are supposed to have identical data, but I think they don't. How do I get all the rows that are different?"
I wordlessly typed the simplest query he ever expected to see.
SQL> SELECT * from table1
2 MINUS
3 SELECT * from table2;
"Wow, thanks, this was great," he said, as he got up to leave my office.
"Wait. There's one more thing I need to teach you," I interrupted one last time.
"What's that," he asked, eagerly sitting back down.
"Buying lunch."
Wednesday, April 23, 2008
Bashing RDBMS
Why Relational Databases end up being the bottleneck
For the appropriate context of this debate, you can review this forum discussion on Joel's web site as just one example.
Stored Proc to avoid frequent builds
As is evident from the title and content of my blog, I'm a database professional. Nevertheless, I try to stay as objective as possible when evaluating this debate. I think you'll agree that my earlier article on J2EE vs RDBMS was as dispassionate and even-handed as possible.
That being said, the author makes an excellent point that's valid on more levels than merely the most direct.
I'll concede that many people strongly object to leveraging the RDBMS based on ignorance. They program poorly, resulting in poor solutions, so they blame the technology that is so darn hard to understand.
Without a good, solid foundation in RDBMS I can certainly understand how every schema you design and all the PL/SQL you hatch-patch together is probably going to compare quite unfavourably to what you designed in your more familiar language.
I'll also concede that getting the necessary foundation in RDBMS to leverage its power and to design superior solutions (depending on the situation) requires a real investment in time and study.
My final concession is that, generally speaking, while slapping something together in .Net or Java with only the basics under your belt isn't likely going to result in an award-winning application, I'd consider it reasonable to expect that it would probably still wind up being superior than something you slapped together in PL/SQL with a similar domain unfamiliarity.
My point? To leverage the power of RDBMS you obviously need developers with good, solid understandings of RDBMS, and they can be very hard to find. On the other hand, if you do as much as possible in Java or .Net not only would it be easier to find experienced developers, but you could probably get away with a higher ratio of less experienced developers. That won't always get you the best end result, but it will save you time and money.
Of course, I'm not advocating avoiding RDBMS simply because it's harder to find experienced specialists. I've seen how quickly people can become very proficient with even advanced RDBMS like Oracle (by reading awesome blogs, for instance). Furthermore, I've seen the vastly superior solutions you can sometimes come up with when designing database applications after having made that investment.
So while it is valid to claim that the benefits of a technology don't outweigh the investment required to use it properly, it isn't valid to dismiss a technology as being inferior just because you didn't make that investment.
As always I've enabled comments and I'd love to hear your thoughts on Jonathan Holland's article, the debate on Joel's web site, my response, and any of your feelings on this topic. Thanks in advance.
Friday, April 18, 2008
Avoid Deprayments
A deployment is when you deploy a change and verify that it was successfully deployed and functioning properly.
A deprayment is when you "deploy and pray." It's kind of like Dr. Evil in the Austin Powers movie: you don't actually witness that the deployment succeeded, you just walk away and assume everything went according to plan.
Every time you depray something you are picking up the dice and gambling. It's true that sometimes customers request something so quickly and urgently that you feel you don't have the time to prepare a proper verification plan and then to verify your work. But as the domain expert, it's our job to assess the risks and make sure that the work is being done as properly and professionally as possible given each situation.
Verifying Deployments
So how do we verify our deployments?
The first step is certainly to deploy the change on a test server first so that we know what to expect when it comes time to deploy our change to a production system. This is also the best opportunity for the most rigorous testing. Some issues can only surface with a real volume of data and users doing real things simultaneously, so your test instance should be set up appropriately.
Once you've figured out what could go wrong with your deployment on production, create a verification plan for each of those identified failure points. This verification plan should identify:
1. How to test for it.
2. What results you could get from the test, and what each one means.
3. What action to take in each case.
Sample Verification Tests
Here are some ideas you can include in your verification plan to test for various failure points.
1. Check the Oracle alert log files
The Oracle alert log file includes a chronological list of messages and errors including things such as internal errors, block corruption, and deadlock errors. It will also keep track of certain administrative operations such as CREATE. ALTER and DROP.
You can find this log file in the admin/SID/bdump directory (or elsewhere; check your init.ora configuration file), and it's called alertSID.log (where SID is your ORACLE_SID).
Of course, the Oracle alert log file is just one of many log files that can help you. For instance, you can also look at the trace files. There are lots of log files at your disposal, study your Oracle documentation and select the ones that will be most valuable based on the nature of your deployment.
2. TNSPING
Oracle has a utility called tnsping that can be used to determine if a particular Oracle service (such as the database itself) can be reached from a particular client or server, and also how long it took.
Here is an example of how to test your connection to a service named service_name (in tnsnames.ora) 5 times.
TNS Ping Utility for 32-bit Windows: Version 9.2.0.5.0 - Production on 18-APR-20
08 09:42:15
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
C:\ORACLE9\network\admin\sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP) (HOST = 10.72.1.1) (PORT = 1234)) (CONNECT_DATA = (SID = service_name) (GLOBAL_NAME = service_name.domain.com)))
OK (20 msec)
OK (0 msec)
OK (0 msec)
OK (10 msec)
OK (0 msec)
(Yes, I'm still using Oracle 9, stop laughing)
What kind of errors can come back? You might get TNS-03505 if the service name is invalid, or TNS-12541 if that name can't be resolved to a service with a functioning listener.
You may also be interested in the TRCROUTE utility which will give you information about the actual route the test packet took.
For more information about tnsping, check the Net Services Administrator's Guide.
3. Using sqlplus, connect to the database
Don't forget the most obvious test. This surprisingly simple test actually covers a great deal, provided you complete this test using the application's or user's credentials, not sys.
4. Once connected, execute a simple query
Example: SELECT 1 FROM DUAL;
5. Run a standard battery of the obvious tests
For example, if you're deploying a new stored procedure, execute your various test cases. That includes the various failure cases, such as calling it with invalid parameters and/or NULL.
On a production instance you may be somewhat limited to what tests you can reasonably perform. But having flushed it all out on the test instance previously you should know what tests are appropriate.
6. Test using the application itself
Unfortunately I have had experience with deployments where everything worked fine when we did our back-end tests, and we felt that given that success, everything else should work. So we walked away, users came back in, and sure enough there was a show-stopping error when accessing the application using it's interface.
Don't slip up on this. Even though everything "should" work when our back-end tests pass, Murphy's Law insists that something else can (and will) go wrong.
Check all the applications and interfaces out, possibly in conjunction with select users.
7. What else could be affected?
The basic tests I listed are just a start to get you in the right mode of thinking, but there are so many other considerations behind a good verification plan.
Do you have automated scripts that need to be re-tested?
Will this affect your back-up and recovery process?
Did you test any related custom programs or configuration?
Are there any security considerations for which to test?
Have you tested any sizing concerns?
Document Your Results
The documented verification plan you have just created will come in handy in the future, too. That information can be invaluable in troubleshooting whatever issues may arise after the fact.
For example, if something were to fail in the weeks following your deployment, whether it's related to your deployment or not, you can go back and confirm exactly what was tested, how, when and what the results were.
Where to Draw the Line
As the domain experts, we also have some difficult decisions to make about whether a particular verification is advisable. We do that by balancing three things:
1. The likelihood of the failure.
2. The impact of that failure.
3. The cost/complexity of testing for that failure.
In some cases you will examine these three factors and decide against testing for a particular type of failure. In this situation it's important to document your decision and communicate it with all affected parties.
Avoid the Deprayment
Looking at the simplicity and ease of some of the verification tests in one hand, and then looking at the cost and expense of troubleshooting a failed deployment in the other, it makes absolutely no sense to ever "deploy and pray" no matter how urgent the request.
Here is a quick summary of how to avoid the dreaded deprayment:
1. Identify your failure points.
2. Create a verification plan.
3. Review the log files, use tnsping, and connect with sqlplus
4. Test using all interfaces, and with user's credentials
5. Document your results.
As for all my articles, I have enabled comments and I'm appreciative of anyone who can share more tricks on creating verification plans, and stories of how these plans have saved their customers time and money. Thanks!
