Building Data Warehouse with Hadoop and MySQL

Trends in Architecture and Design
Experience level: 
50 minutes conference
Hadoop is gaining popularity as an integration point for various data sources inside a company. Hadoop's scalability properties and batch-oriented architecture also make it a great candidate for a data warehousing solution. In this session we will review a practical use case for building a Hadoop data warehouse and using MySQL as one of the data sources. There are many challenges associated with building a robust data platform on Hadoop. First of all you need to decide on optimal combination of various Hadoop components, like SQL-engine, file formats, job coordination, etc. We will review how Hive, Azkaban and various column-oriented file formats fit into a data warehouse model. Another problem with using Hadoop as a data warehouse platform is it's write-once nature. This presents an issue for volatile datasets and breaks the idea of "slowly changing dimensions". We will review possible solutions to this problem, including data versioning and pushing additional data pre-processing to MySQL. Additionally this session will address questions of importing/exporting data between MySQL and Hadoop.


Big Data Consultant / Solutions Architect , Pythian
Danil Zburivsky, Big Data Consultant/Solutions Architect, Pythian. Danil has been working with databases and information systems since his early years in university, where he received a Master's Degree in Applied Math. Danil has 7 years of experience architecting, building and supporting large mission-critical data platforms using various flavours of MySQL, Hadoop and MongoDB. He is also the author of “Hadoop Cluster Deployment” book. Besides databases Danil is interested in functional programming, machine learning and rock climbing.