Friday, April 18, 2008
Avoid Deprayments
There are deployments and there are deprayments. What's the difference?
A deployment is when you deploy a change and verify that it was successfully deployed and functioning properly.
A deprayment is when you "deploy and pray." It's kind of like Dr. Evil in the Austin Powers movie: you don't actually witness that the deployment succeeded, you just walk away and assume everything went according to plan.
Every time you depray something you are picking up the dice and gambling. It's true that sometimes customers request something so quickly and urgently that you feel you don't have the time to prepare a proper verification plan and then to verify your work. But as the domain expert, it's our job to assess the risks and make sure that the work is being done as properly and professionally as possible given each situation.
Verifying Deployments
So how do we verify our deployments?
The first step is certainly to deploy the change on a test server first so that we know what to expect when it comes time to deploy our change to a production system. This is also the best opportunity for the most rigorous testing. Some issues can only surface with a real volume of data and users doing real things simultaneously, so your test instance should be set up appropriately.
Once you've figured out what could go wrong with your deployment on production, create a verification plan for each of those identified failure points. This verification plan should identify:
1. How to test for it.
2. What results you could get from the test, and what each one means.
3. What action to take in each case.
Sample Verification Tests
Here are some ideas you can include in your verification plan to test for various failure points.
1. Check the Oracle alert log files
The Oracle alert log file includes a chronological list of messages and errors including things such as internal errors, block corruption, and deadlock errors. It will also keep track of certain administrative operations such as CREATE. ALTER and DROP.
You can find this log file in the admin/SID/bdump directory (or elsewhere; check your init.ora configuration file), and it's called alertSID.log (where SID is your ORACLE_SID).
Of course, the Oracle alert log file is just one of many log files that can help you. For instance, you can also look at the trace files. There are lots of log files at your disposal, study your Oracle documentation and select the ones that will be most valuable based on the nature of your deployment.
2. TNSPING
Oracle has a utility called tnsping that can be used to determine if a particular Oracle service (such as the database itself) can be reached from a particular client or server, and also how long it took.
Here is an example of how to test your connection to a service named service_name (in tnsnames.ora) 5 times.
TNS Ping Utility for 32-bit Windows: Version 9.2.0.5.0 - Production on 18-APR-20
08 09:42:15
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
C:\ORACLE9\network\admin\sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP) (HOST = 10.72.1.1) (PORT = 1234)) (CONNECT_DATA = (SID = service_name) (GLOBAL_NAME = service_name.domain.com)))
OK (20 msec)
OK (0 msec)
OK (0 msec)
OK (10 msec)
OK (0 msec)
(Yes, I'm still using Oracle 9, stop laughing)
What kind of errors can come back? You might get TNS-03505 if the service name is invalid, or TNS-12541 if that name can't be resolved to a service with a functioning listener.
You may also be interested in the TRCROUTE utility which will give you information about the actual route the test packet took.
For more information about tnsping, check the Net Services Administrator's Guide.
3. Using sqlplus, connect to the database
Don't forget the most obvious test. This surprisingly simple test actually covers a great deal, provided you complete this test using the application's or user's credentials, not sys.
4. Once connected, execute a simple query
Example: SELECT 1 FROM DUAL;
5. Run a standard battery of the obvious tests
For example, if you're deploying a new stored procedure, execute your various test cases. That includes the various failure cases, such as calling it with invalid parameters and/or NULL.
On a production instance you may be somewhat limited to what tests you can reasonably perform. But having flushed it all out on the test instance previously you should know what tests are appropriate.
6. Test using the application itself
Unfortunately I have had experience with deployments where everything worked fine when we did our back-end tests, and we felt that given that success, everything else should work. So we walked away, users came back in, and sure enough there was a show-stopping error when accessing the application using it's interface.
Don't slip up on this. Even though everything "should" work when our back-end tests pass, Murphy's Law insists that something else can (and will) go wrong.
Check all the applications and interfaces out, possibly in conjunction with select users.
7. What else could be affected?
The basic tests I listed are just a start to get you in the right mode of thinking, but there are so many other considerations behind a good verification plan.
Do you have automated scripts that need to be re-tested?
Will this affect your back-up and recovery process?
Did you test any related custom programs or configuration?
Are there any security considerations for which to test?
Have you tested any sizing concerns?
Document Your Results
The documented verification plan you have just created will come in handy in the future, too. That information can be invaluable in troubleshooting whatever issues may arise after the fact.
For example, if something were to fail in the weeks following your deployment, whether it's related to your deployment or not, you can go back and confirm exactly what was tested, how, when and what the results were.
Where to Draw the Line
As the domain experts, we also have some difficult decisions to make about whether a particular verification is advisable. We do that by balancing three things:
1. The likelihood of the failure.
2. The impact of that failure.
3. The cost/complexity of testing for that failure.
In some cases you will examine these three factors and decide against testing for a particular type of failure. In this situation it's important to document your decision and communicate it with all affected parties.
Avoid the Deprayment
Looking at the simplicity and ease of some of the verification tests in one hand, and then looking at the cost and expense of troubleshooting a failed deployment in the other, it makes absolutely no sense to ever "deploy and pray" no matter how urgent the request.
Here is a quick summary of how to avoid the dreaded deprayment:
1. Identify your failure points.
2. Create a verification plan.
3. Review the log files, use tnsping, and connect with sqlplus
4. Test using all interfaces, and with user's credentials
5. Document your results.
As for all my articles, I have enabled comments and I'm appreciative of anyone who can share more tricks on creating verification plans, and stories of how these plans have saved their customers time and money. Thanks!
A deployment is when you deploy a change and verify that it was successfully deployed and functioning properly.
A deprayment is when you "deploy and pray." It's kind of like Dr. Evil in the Austin Powers movie: you don't actually witness that the deployment succeeded, you just walk away and assume everything went according to plan.
Every time you depray something you are picking up the dice and gambling. It's true that sometimes customers request something so quickly and urgently that you feel you don't have the time to prepare a proper verification plan and then to verify your work. But as the domain expert, it's our job to assess the risks and make sure that the work is being done as properly and professionally as possible given each situation.
Verifying Deployments
So how do we verify our deployments?
The first step is certainly to deploy the change on a test server first so that we know what to expect when it comes time to deploy our change to a production system. This is also the best opportunity for the most rigorous testing. Some issues can only surface with a real volume of data and users doing real things simultaneously, so your test instance should be set up appropriately.
Once you've figured out what could go wrong with your deployment on production, create a verification plan for each of those identified failure points. This verification plan should identify:
1. How to test for it.
2. What results you could get from the test, and what each one means.
3. What action to take in each case.
Sample Verification Tests
Here are some ideas you can include in your verification plan to test for various failure points.
1. Check the Oracle alert log files
The Oracle alert log file includes a chronological list of messages and errors including things such as internal errors, block corruption, and deadlock errors. It will also keep track of certain administrative operations such as CREATE. ALTER and DROP.
You can find this log file in the admin/SID/bdump directory (or elsewhere; check your init.ora configuration file), and it's called alertSID.log (where SID is your ORACLE_SID).
Of course, the Oracle alert log file is just one of many log files that can help you. For instance, you can also look at the trace files. There are lots of log files at your disposal, study your Oracle documentation and select the ones that will be most valuable based on the nature of your deployment.
2. TNSPING
Oracle has a utility called tnsping that can be used to determine if a particular Oracle service (such as the database itself) can be reached from a particular client or server, and also how long it took.
Here is an example of how to test your connection to a service named service_name (in tnsnames.ora) 5 times.
TNS Ping Utility for 32-bit Windows: Version 9.2.0.5.0 - Production on 18-APR-20
08 09:42:15
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
C:\ORACLE9\network\admin\sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP) (HOST = 10.72.1.1) (PORT = 1234)) (CONNECT_DATA = (SID = service_name) (GLOBAL_NAME = service_name.domain.com)))
OK (20 msec)
OK (0 msec)
OK (0 msec)
OK (10 msec)
OK (0 msec)
(Yes, I'm still using Oracle 9, stop laughing)
What kind of errors can come back? You might get TNS-03505 if the service name is invalid, or TNS-12541 if that name can't be resolved to a service with a functioning listener.
You may also be interested in the TRCROUTE utility which will give you information about the actual route the test packet took.
For more information about tnsping, check the Net Services Administrator's Guide.
3. Using sqlplus, connect to the database
Don't forget the most obvious test. This surprisingly simple test actually covers a great deal, provided you complete this test using the application's or user's credentials, not sys.
4. Once connected, execute a simple query
Example: SELECT 1 FROM DUAL;
5. Run a standard battery of the obvious tests
For example, if you're deploying a new stored procedure, execute your various test cases. That includes the various failure cases, such as calling it with invalid parameters and/or NULL.
On a production instance you may be somewhat limited to what tests you can reasonably perform. But having flushed it all out on the test instance previously you should know what tests are appropriate.
6. Test using the application itself
Unfortunately I have had experience with deployments where everything worked fine when we did our back-end tests, and we felt that given that success, everything else should work. So we walked away, users came back in, and sure enough there was a show-stopping error when accessing the application using it's interface.
Don't slip up on this. Even though everything "should" work when our back-end tests pass, Murphy's Law insists that something else can (and will) go wrong.
Check all the applications and interfaces out, possibly in conjunction with select users.
7. What else could be affected?
The basic tests I listed are just a start to get you in the right mode of thinking, but there are so many other considerations behind a good verification plan.
Do you have automated scripts that need to be re-tested?
Will this affect your back-up and recovery process?
Did you test any related custom programs or configuration?
Are there any security considerations for which to test?
Have you tested any sizing concerns?
Document Your Results
The documented verification plan you have just created will come in handy in the future, too. That information can be invaluable in troubleshooting whatever issues may arise after the fact.
For example, if something were to fail in the weeks following your deployment, whether it's related to your deployment or not, you can go back and confirm exactly what was tested, how, when and what the results were.
Where to Draw the Line
As the domain experts, we also have some difficult decisions to make about whether a particular verification is advisable. We do that by balancing three things:
1. The likelihood of the failure.
2. The impact of that failure.
3. The cost/complexity of testing for that failure.
In some cases you will examine these three factors and decide against testing for a particular type of failure. In this situation it's important to document your decision and communicate it with all affected parties.
Avoid the Deprayment
Looking at the simplicity and ease of some of the verification tests in one hand, and then looking at the cost and expense of troubleshooting a failed deployment in the other, it makes absolutely no sense to ever "deploy and pray" no matter how urgent the request.
Here is a quick summary of how to avoid the dreaded deprayment:
1. Identify your failure points.
2. Create a verification plan.
3. Review the log files, use tnsping, and connect with sqlplus
4. Test using all interfaces, and with user's credentials
5. Document your results.
As for all my articles, I have enabled comments and I'm appreciative of anyone who can share more tricks on creating verification plans, and stories of how these plans have saved their customers time and money. Thanks!
Comments:
<< Home
Thanks for this article. Occasionally I have to assume DBA duties and I have never liked performing deployments. I know Oracle PL/SQL, but I am not a trained DBA. There are just too many things that can go wrong.
Post a Comment
<< Home