MySQL and Impala ecosystem: SQL friendly Hadoop
Seznam.cz is the largest and the most visited web portal and search engine in the Czech Republic. It is one of a few search engines in the World which successfully competes with Google in the field of local full-text search. Besides the Search engine, Seznam runs over 40 different web services such as News portals, Map portal, Email service and many more. Thanks to various services we have many projects where we need different data warehouses. Presented data warehouse is designed for PPC advertising system Sklik.cz to provide an internal BI tool. Our warehouse had been historically implemented as one MySQL instance maintaining tens of billions of rows inside many tables. Critical analytical queries run up to hours. We had to choose an appropriate open-source solution which would provide query acceleration, nearly 100% SQL compatibility, easy scaling and long-term potential. We selected Cloudera Impala and successfully migrated critical parts of the original data warehouse into it. Impala is designed to execute data queries in Hadoop in real time via well known SQL standard. Impala fits very well into our existing Hadoop and MySQL ecosystem. During this presentation we will introduce Impala to users who haven’t had a chance to meet this distributed BI tool yet (the whole presentation will be conceived from MySQL users’ point of view). We will focus on architecture, how Impala runs different types of queries (Impala and MySQL comparison), briefly on database scheme proposal, how Impala fits into the Hadoop ecosystem, how to choose proper data storage (HDFS - text, Parquet, Avro, HBase or Kudu), tuning types, best ways how to import data and other topics. At the end we will mention possible competitive solutions such as Hive, Shark and Druid.
SW Architect, Seznam.cz
Tomas is a big-data architect and a database specialist in a Czech company called Seznam.cz. He has over eight years’ experience with design, development and optimization of database systems while focusing on MySQL, Hbase, Impala, Solr and Hive. He organizes MySQL and Hadoop trainings and workshops for his colleagues at Seznam and externally for other companies and Czech universities.
Sklik.cz Development Leader, Seznam.cz
Lukas is an experienced developer and a database specialist in the largest Czech web service company Seznam.cz. He started programming as early as at the age of 13 and he later became very enthusiastic about databases. In his career he has worked mostly with technologies like MySQL, Hbase, Hadoop, MongoDB, Python, C++ and Java. Lukas organizes MySQL trainings and workshops for his colleagues at Seznam and externally for other companies and Czech universities.