Friday, March 10, 2006
Handling Performance Issues
Typically performance issues are handled by looking for common symptoms and trying common solutions. Given a sufficient level of experience, this is successful quite often with minimal effort. But there are times where this does not work and you need a more systematic approach.
Let's take a look at one approach, bearing in mind:
1. This is a rough, first, high-level pass
2. It is NOT Oracle (or even database) specific
3. I am not including the details on the HOW
Also, some people may notice Cary Millsap's influence in this approach.
1. Rank the most significant performance issues, by specific application, from the business user's perspective.
2. Carefully measure the total time currently taken for each of these specific applications.
3. Determine exactly how fast the application would need to run in order to meet the business user's needs.
Now, for each business application in order of importance, perform steps starting at 4-9:
4. Break down the specific application into tasks.
5. Carefully measure how often each task is currently being executed, how long each execution currently takes, and from that, how much total time it currenty takes and what % of total time each task represents.
Now, for each specific task in order of total % of time taken, perform steps 6-9:
6. If you COMPLETELY ELIMINATED the time taken by this task and all tasks below it, would the performance goal be met?
YES: Continue, with step 7-9
NO: Stop until the situation changes. Continue to the next application, steps 4-9.
7. Predict how to reduce time spent for this task, either by:
a) Reducing the number of times this task is being done
OR
b) Reducing the time it takes to execute a task once
Note: If required, recursively perform steps 4-9 by breaking the task down further into sub-tasks.
8. Perform a cost-benefit analysis of your plan in step 7. Is it worthwhile? If so, do it.
9. Have you reached the performance goal of this application?
YES: Proceed to the next application and perform steps 4-9.
NO: Proceed to the next task and repeat steps 6-9.
Notes:
1. By doing things in order of business importance
a) the users are most likely to experience results
b) anything we "harm" is, by definition, less important
2. By addressing tasks in order of time spent
a) we get the best results first
b) adverse results will only affect tasks that take less time
3. By defining success up front, and performing a cost-benefit analysis before taking action, we can avoid wasting time by stopping either
a) when the goal is met
b) when the goal is proven to be impossible.
Since this is a rough sketch, I especially invite people's thoughts.
Let's take a look at one approach, bearing in mind:
1. This is a rough, first, high-level pass
2. It is NOT Oracle (or even database) specific
3. I am not including the details on the HOW
Also, some people may notice Cary Millsap's influence in this approach.
1. Rank the most significant performance issues, by specific application, from the business user's perspective.
2. Carefully measure the total time currently taken for each of these specific applications.
3. Determine exactly how fast the application would need to run in order to meet the business user's needs.
Now, for each business application in order of importance, perform steps starting at 4-9:
4. Break down the specific application into tasks.
5. Carefully measure how often each task is currently being executed, how long each execution currently takes, and from that, how much total time it currenty takes and what % of total time each task represents.
Now, for each specific task in order of total % of time taken, perform steps 6-9:
6. If you COMPLETELY ELIMINATED the time taken by this task and all tasks below it, would the performance goal be met?
YES: Continue, with step 7-9
NO: Stop until the situation changes. Continue to the next application, steps 4-9.
7. Predict how to reduce time spent for this task, either by:
a) Reducing the number of times this task is being done
OR
b) Reducing the time it takes to execute a task once
Note: If required, recursively perform steps 4-9 by breaking the task down further into sub-tasks.
8. Perform a cost-benefit analysis of your plan in step 7. Is it worthwhile? If so, do it.
9. Have you reached the performance goal of this application?
YES: Proceed to the next application and perform steps 4-9.
NO: Proceed to the next task and repeat steps 6-9.
Notes:
1. By doing things in order of business importance
a) the users are most likely to experience results
b) anything we "harm" is, by definition, less important
2. By addressing tasks in order of time spent
a) we get the best results first
b) adverse results will only affect tasks that take less time
3. By defining success up front, and performing a cost-benefit analysis before taking action, we can avoid wasting time by stopping either
a) when the goal is met
b) when the goal is proven to be impossible.
Since this is a rough sketch, I especially invite people's thoughts.
Comments:
<< Home
I think this methodology misses some important issues:
It presupposes everything is close. This may be reasonable for a 10g installation, may be way off for an older one that has never been looked at. For example, some vendor slapped in a generic db and their app, not doing any capacity planning beyond gross disk size, then threw physical memory at it.
It presupposes the business user's perspective is correct. Personally, I've noted that a person is brought in to solve a particular problem, but the range of problems that are already there is much larger. Users have simply accepted that is how the system is and don't see problems. There must be a technical evaluation independent of the business analysis to see these things.
Not useable for greenfield systems.
A system with a broad range of applications on it may actually have the worst problems further down the list as made by the business concerns. One example I've seen is the batched report that no one cares takes 4 hours to run - then putting the relevant objects in the recycle pool brings it down to minutes and severely increases performance for everything else.
That's why as an experienced person I take some time to just poke around and see what strikes me as "wrong" before doing any formal analysis. Usually what happens is I give some recommendations, people kind of go "uh huh" and ignore it, then I fix the actual problem they got me for, then they realize I actually do know what I'm talking about, then they keep me around for several years after the 3 month contract.
Perhaps the difference is between sites that are sorta well run and sites that aren't. Easy pickin's in the latter.
It presupposes everything is close. This may be reasonable for a 10g installation, may be way off for an older one that has never been looked at. For example, some vendor slapped in a generic db and their app, not doing any capacity planning beyond gross disk size, then threw physical memory at it.
It presupposes the business user's perspective is correct. Personally, I've noted that a person is brought in to solve a particular problem, but the range of problems that are already there is much larger. Users have simply accepted that is how the system is and don't see problems. There must be a technical evaluation independent of the business analysis to see these things.
Not useable for greenfield systems.
A system with a broad range of applications on it may actually have the worst problems further down the list as made by the business concerns. One example I've seen is the batched report that no one cares takes 4 hours to run - then putting the relevant objects in the recycle pool brings it down to minutes and severely increases performance for everything else.
That's why as an experienced person I take some time to just poke around and see what strikes me as "wrong" before doing any formal analysis. Usually what happens is I give some recommendations, people kind of go "uh huh" and ignore it, then I fix the actual problem they got me for, then they realize I actually do know what I'm talking about, then they keep me around for several years after the 3 month contract.
Perhaps the difference is between sites that are sorta well run and sites that aren't. Easy pickin's in the latter.
The is very close to the approach I try to teach here.
Joel has a good point, there is no exact guide to follow, just a list of good practices and procedures you evolve per application.
One thing I do here, that is a slight departure from your thoughts but Joel pointed it out, is I do a very detailed technical analysis of the database and waits, slow SQL and other factors that impact database performance first. Then I have a face to face with the users and try to relate their experience to the technical end issues. That assumes you do have a good understanding of the business application, if you do not, then it is a large amount of work, but you learn the application.
Joel has a good point, there is no exact guide to follow, just a list of good practices and procedures you evolve per application.
One thing I do here, that is a slight departure from your thoughts but Joel pointed it out, is I do a very detailed technical analysis of the database and waits, slow SQL and other factors that impact database performance first. Then I have a face to face with the users and try to relate their experience to the technical end issues. That assumes you do have a good understanding of the business application, if you do not, then it is a large amount of work, but you learn the application.
Joel and Herod,
Thanks for your comments. You raise an important point.
When I was referring to the "common symptoms" you should look for and the "common solutions" you should attempt first, I meant to include efforts that may include looking at waits, or slow SQL, etc. So we're in agreement there.
I also agree that sometimes reducing the resources used by something that no one cares about can actually speed up tasks people do care about. So it is a good practise to tune "unimportant" things, but at the same time, if it ain't broke don't fix it. Plus, to get the best bang for your buck, its generally best to focus on the things that matter most to the end users.
Cheers,
Robert
Thanks for your comments. You raise an important point.
When I was referring to the "common symptoms" you should look for and the "common solutions" you should attempt first, I meant to include efforts that may include looking at waits, or slow SQL, etc. So we're in agreement there.
I also agree that sometimes reducing the resources used by something that no one cares about can actually speed up tasks people do care about. So it is a good practise to tune "unimportant" things, but at the same time, if it ain't broke don't fix it. Plus, to get the best bang for your buck, its generally best to focus on the things that matter most to the end users.
Cheers,
Robert
Depends on your definition of "broke." :-)
Saw one of those $800K custom motorhomes on the freeway today, with something flapping underneath. I'm sure the people inside had no clue. It wouldn't be good for them to get a clue when it breaks off and jams up the third axle.
While I acknowledge Compulsive Tuning Disorder, I don't think that applies to just looking at everything that is going on, because you just can't tell with any methodology I've seen all the things that could go/are going wrong. Preventative Maintenance is not a bad thing. I think it should be a formalized DBA duty.
Getting back to the OP, Craig Shallahammer has four interesting points in his Modern Performance Myths paper at orapub.
Post a Comment
Saw one of those $800K custom motorhomes on the freeway today, with something flapping underneath. I'm sure the people inside had no clue. It wouldn't be good for them to get a clue when it breaks off and jams up the third axle.
While I acknowledge Compulsive Tuning Disorder, I don't think that applies to just looking at everything that is going on, because you just can't tell with any methodology I've seen all the things that could go/are going wrong. Preventative Maintenance is not a bad thing. I think it should be a formalized DBA duty.
Getting back to the OP, Craig Shallahammer has four interesting points in his Modern Performance Myths paper at orapub.
<< Home