Increasing MySQL Performance through Aggressive Data Archiving
Many strategies exist for increasing the performance of a database system. Some attempt to make each access to the data faster: for example, faster hardware may be used, or system settings may be tweaked. Other approaches attempt to decrease the total number of database operations: for example, by caching database queries or entire pages. Nevertheless, generally, performance decreases as the sytem grows larger, and as a result, access times increase; after at certain point, it may be difficult to offset this increase with additional hardware.
It is also true, however, that "not all data is created equal" - some data is far more relevant than others. Although MySQL and other RDBMses attempt to give priority to frequently accessed data, it can be said that MySQL performance still decreases with table size - older, infrequently accessed data can slow down systems significantly. Archiving this data can be a powerful and important approach to speeding up database systems - by decreasing the size of the working data set, nearly all operations become faster.
This talk will outline the use of pt-archiver to archive database tables; the design of simple archiving scripts in an application's host langauge, such as Ruby or Python, will also be briefly covered. A number of archiving strategies, from very simple to relatively complex, will be covered; for example, an enterprise application may decide to archive records after a certain period of activity. Some types of child records (i.e. unaccepted bids) may be deleted or moved to "deep storage", whereas other types of child records (accepted bids) may be moved to a "staging area" - ready to be unarchived transparently when requested by a user. With some thought, it is nearly always possible to develop an archiving scheme that balances business needs and the desire for performance.
Finally, some consideration will be given to the various options for storing the archived data, including archive tables, flat files (CSV, XML, SQL), and document stores (MongoDB, et al.) Each have different advantages and disadvantages, and choosing an appropriate approach can be very important.