Lin is a graduate student in the Computer Science Department at Carnegie Mellon University, where he is advised by Andy Pavlo. His research interest includes database systems, data management, and machine learning.
He obtained his Bachelor's degree in Peking University, majoring in Computer Science and Technology. In PKU, he worked on data and information management with Prof. Bin Cui.
Database management systems (DBMSs) are notoriously difficult to deploy and administer because of their long list of functionalities. If a system could optimize itself automatically, then it would remove many of the complications and costs involved with its deployment. Most of the advisory tools built by researchers and vendors are incomplete because they require humans to make the final decisions about any database change and only fix problems after they occur. Recent work has proposed "self-driving" DBMSs that optimize the system for both the application's current workload, as well as the expected workload in the future. These systems will support existing tuning techniques and capacity planning without requiring a human to determine the right way and proper time to deploy them.
The first step towards such an autonomous DBMS is the ability to model and predict the target application's workload. In this talk, I present a robust forecasting framework called "QueryBot 5000" that we designed for self-driving operations. The framework integrates with any DBMS to predict the expected arrival rate of queries in the future based on historical data. It then provides multiple prediction horizons (short- vs. long-term) with varying aggregation intervals. I also discuss our vision and progress on how a self-driving DBMSs uses these forecast models to optimize its performance.