Knowing the Unknowable: Per-Query Metrics

2 April 4:50PM - 5:40PM @ Ballroom F

Experience level: 
50 minutes conference
What if you'd like to know something about your SQL statements, but it's not possible to measure it? Consider the relationship between a statement's execution and some work it causes that can't be measured inside the server. In this talk, Baron will demonstrate how to infer these relationships with statistical techniques. This allows you to avoid measuring things that are intrusive or expensive, and to know things that you can't measure anyway. Performance matters. For example, executing a multiple linear regression over a million samples of 500,000 independent variables is not feasible, and won't produce good results even if it does run to completion. A much higher-performance algorithm is necessary. Baron will discuss the statistical background for the algorithm, the classic solutions (including standard regression and its variations) and why they don't work well, and exactly how the technique works. This session will include some math. A white paper and supporting source code and sample data files will be available for reproducing and studying the technique and its results.