Credit card transactions at a large regional
bank
One
of my key concepts in database design is that a row of data gets inserted once and deleted once, but can be
accessed trillions of times or more in between. Because of this, databases should be designed for best
performance of the applications that use them, not for ease of inserting or deleting data. This bank broke
that rule. They partitioned their tables of Mastercard and Visa transactions on transaction date, so they
could load a partition everyday (more or less) and delete one everyday. But how is this information
used? When you go online to look at your transactions, you know your account number and transactions
are listed with the newest ones first. If you use your card a lot, you could have transactions from all
30 or so days in the month. That means to build your list, one or more rows get retrieved from 30 different partitions. Had the database been sequenced by
account number and then transaction date in descending sequence, all of the rows would have been together in one or
two pages of data retrieved at the same time and no sort performed. That's easily 100 times less work to
do.
But, how do you insert & delete data efficiently? These tables should be partitioned by ranges of account number. All indexes on the tables need to be partitioned the same way. The
transactions to be loaded could be split into multiple files using the same account number ranges and
concurrent online loads performed for each partition. Or you could rely on DB2's built in parallel processing
and load one large file. Or somewhere in between using ranges of partitions. These tables
must also be reorganized to reduce fragmentation, so the deletes can be done as part of the reorganizations.
A simple clause to drop any data where the current date minus the transaction date is greater than a certain number
of days can be added to the online reorganizations. These will also run
concurrently by partition or range of partitions with no interference or
conflicts.
~~~~~
|