RAW: Fast queries on JIT databases
Integrating and ingesting data into databases is quickly becoming a bottleneck in face of massive data as well as increasingly heterogeneous data formats. Queries, on the other hand, are often ad-hoc and supported by pre-cooked operators which are not adaptive enough to optimize access to data. As data formats and queries increasingly vary, there is a need to depart from the current status quo of static query processing primitives and build dynamic, fully adaptive architectures. Data virtualization, i.e, abstracting data out of its form and manipulating it regardless of the way it is stored or structured, is a promising step in the right direction. To offer unconditional data virtualization, however, a database engine must replace static parts like pre-loading data and using ``pre-cooked'' query operators. At the same time, users must be able to express data analysis processes in a query language of their own choice. I will present RAW, a query engine which reads data in its raw format and processes queries using adaptive, just-in-time operators. The key insight is use of virtualization and dynamic generation of operators. RAW's query engine is generated just-in-time; its caches and its query operators adapt to the current query and the workload, while also treating raw datasets as its native storage structures. Finally, RAW features a language which looks like SQL but is extended enough to support heterogeneous data models, and to which existing languages can be translated. I will demonstrate some innovative language features and show how one can achieve multiple tasks which would normally require more than one systems and heavy scripting through a few lines of SQL.
Professor and CEO, EPFL and RAW Labs
Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. Her research interests are in data-intensive systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating data management to support computationally- demanding, data-intensive scientific applications. She has received an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), eight best-paper awards in database, storage, and computer architecture conferences (2001-2012), and an NSF CAREER award (2002). She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is the vice chair of the ACM SIGMOD community, a senior member of the IEEE, and an ACM Fellow. She is a member of the Global Agenda Council for Data, Society and Development of the World Economic Forum.