Monthly database extract at a large health
company
A client had a monthly database extract that was
running for 16 hours total. It ran in parallel, but one ran 3 hours, one 4 hours and the other for 9
hours. Many people looked into it and could find nothing to help. I looked into the top SQL
statements. Their cpu times were excellent with no room for significant improvement. However,
they were accessing the index and data. They were not index-only. I recommended adding all the
columns the SQL needed to the indexes, to make them index-only. The cpu time went up by 1-2% and disk
space increased accordingly, but the elapsed time was cut in half to 8 hours. The client was very
happy.
Story over? Not quite... What could
possibly make an extract run for 8 hours on a mainframe? Accessing the database
wrong...
This database (multiple tables) is in sequence by
company. They have 100 or so companies involved and many clients for each company. Each client
has a primary salesperson. However, the extract does not run by company. The extract runs by
salesperson. They have 2,000+ salespeople. This means the entire database is being
processed using "skip-sequential" access for each salesperson. They are literally walking thru the
entire database 2,000+ times. If the extract was done by company, they would walk thru the database
once. This client is currently very happy their 16 hours were cut to 8 hours, but is considering
changing their extract. Hopefully I'll be able to add "the rest of the story" someday. I expect
this to run in 10 minutes or less if they change their extract to work by company, the way the database is
designed.
~~~~~
|