Building Data Warehouse with Hadoop and MySQL
Trends in Architecture and Design
50 minutes conference
Hadoop is gaining popularity as an integration point for various data sources inside a company. Hadoop's scalability properties and batch-oriented architecture also make it a great candidate for a data warehousing solution. In this session we will review a practical use case for building a Hadoop data warehouse and using MySQL as one of the data sources. There are many challenges associated with building a robust data platform on Hadoop. First of all you need to decide on optimal combination of various Hadoop components, like SQL-engine, file formats, job coordination, etc. We will review how Hive, Azkaban and various column-oriented file formats fit into a data warehouse model. Another problem with using Hadoop as a data warehouse platform is it's write-once nature. This presents an issue for volatile datasets and breaks the idea of "slowly changing dimensions". We will review possible solutions to this problem, including data versioning and pushing additional data pre-processing to MySQL. Additionally this session will address questions of importing/exporting data between MySQL and Hadoop.
Consultant / MySQL DBA, Pythian
Originally from Ukraine, Danil has been working with different RDBMS since his early years in university. He holds a Master's Degree in Applied Math at the Donetsk National University in Ukraine where he also worked as a database developer on an accounting system. Danil got his first taste for MySQL while working as a DBA for Sonopia Corp, where he participated in the database design and support for a social mobile network application. In his current role as Team Lead, he is helping people all over the world to solve problems with their MySQL systems for Pythian (www.pythian.com), a leading, global database and application infrastructure services company.