From Relational to Hadoop - Migrating your data pipeline

1 April 9:30AM - 12:30PM @ Ballroom F

Experience level: 
3 hours tutorial
This tutorial is for experienced MySQL database professionals who are interested in using Hadoop as a way to scale ETL processes. In the tutorial, we will look at an entire ETL process and for each stage demonstrate how to implement it in Hadoop. The tutorial will contain live demonstration of Hadoop ETL techniques. We will provide demo Hadoop servers in the cloud for all attendees to run the examples with prepared datasets. Attendees will need to bring their own laptops. Knowledge of Hadoop is not required. We will begin with a quick Hadoop primer where we will explain Hadoop's architecture and its core components: Map Reduce and HDFS. The rest of the tutorial will built on this understanding and show how Hadoop's unique architecture can be applied to optimize ETL processes. The tutorial will continue with a discussion of different data ingestion methods and a demonstration of Sqoop to pull data from MySQL to Hadoop. Once the data is loaded to the system we will demonstrate the use of Hive for data transformation. We'll compare HQL to more familiar SQL and the attendees will experiment with running Hive queries and using Hive UDFs. We will also demonstrate the use of Oozie to manage the ETL workflow. Attendees will leave knowing how to take the first steps in using Hadoop as part of their data processing pipeline and familiar with some of the popular tools they can use in their implementation.


System Architect, Confluent
Gwen Shapira has 15 year experience in database engineering. Working with Oracle, MySQL and recently Hadoop. Gwen enjoys sharing her knowledge at her blog, on twitter and by speaking at conferences. She's a co-author of the book Hadoop Applications Architectures.
Big Data Consultant / Solutions Architect , Pythian
Danil Zburivsky, Big Data Consultant/Solutions Architect, Pythian. Danil has been working with databases and information systems since his early years in university, where he received a Master's Degree in Applied Math. Danil has 7 years of experience architecting, building and supporting large mission-critical data platforms using various flavours of MySQL, Hadoop and MongoDB. He is also the author of “Hadoop Cluster Deployment” book. Besides databases Danil is interested in functional programming, machine learning and rock climbing.