Sales transactions at a large department store
We had a sales history database with tables for
sales totals for each item at each store by day and week. The tables were in sequence by day/week, item and
then store. We processed individual transactions, summarized them by day, week, item and store, then updated
the database by doing either inserts or updates, depending upon whether the row already existed (do an update
first; the row usually already exists; if it fails, do an insert). By accident, one of the programmers
realized the files we were processing were sorted by day, store and then item, instead of day, item and then
store. A quick review of the program showed that it worked the same with either sequence (it looked for
changes in either store or item to trigger updating the database; it didn't care which changed.) This process
ran for approximately 55 minutes every night. We simply changed the sort to the correct sequence and the
runtime was cut to 15 minutes per night.
Why? Because the input sequence was store, then
item, we were basically doing "skip-sequential" processing through the entire tables once per store (we had
100 stores). Switching the sequence meant we walked thru the tables one time, which dramatically increased
the amount of updates/inserts done in the bufferpool, instead of requiring disk I/O. The cpu time did not
change much, but the elapse time was cut by 73%.