From Relational to Hadoop - Migrating your data pipeline

Tutorials
1 April 9:30AM - 12:30PM @ Ballroom F

Experience level: 
Intermediate
Duration: 
3 hours tutorial
This tutorial is for experienced MySQL database professionals who are interested in using Hadoop as a way to scale ETL processes. In the tutorial, we will look at an entire ETL process and for each stage demonstrate how to implement it in Hadoop. The tutorial will contain live demonstration of Hadoop ETL techniques. We will provide demo Hadoop servers in the cloud for all attendees to run the examples with prepared datasets. Attendees will need to bring their own laptops. Knowledge of Hadoop is not required. We will begin with a quick Hadoop primer where we will explain Hadoop's architecture and its core components: Map Reduce and HDFS. The rest of the tutorial will built on this understanding and show how Hadoop's unique architecture can be applied to optimize ETL processes. The tutorial will continue with a discussion of different data ingestion methods and a demonstration of Sqoop to pull data from MySQL to Hadoop. Once the data is loaded to the system we will demonstrate the use of Hive for data transformation. We'll compare HQL to more familiar SQL and the attendees will experiment with running Hive queries and using Hive UDFs. We will also demonstrate the use of Oozie to manage the ETL workflow. Attendees will leave knowing how to take the first steps in using Hadoop as part of their data processing pipeline and familiar with some of the popular tools they can use in their implementation.


Speakers

Solutions Architect, Cloudera
Biography: 
Gwen Shapira is a Solutions Architect at Cloudera, where she helps customers build productions applications using Hadoop ecosystem components. With 15 years of data warehouse experience, Gwen loves showing customers how open source tools can be used to build a faster and more scalable data warehouse. Gwen shares her experience on her blog http://prodlife.wordpress.com, on twitter (@gwenshap) and at conferences.
Big Data Consultant / Solutions Architect , Pythian
Biography: 
Danil Zburivsky, Big Data Consultant/Solutions Architect, Pythian. Danil has been working with databases and information systems since his early years in university, where he received a Master's Degree in Applied Math. Danil has 7 years of experience architecting, building and supporting large mission-critical data platforms using various flavours of MySQL, Hadoop and MongoDB. He is also the author of “Hadoop Cluster Deployment” book. Besides databases Danil is interested in functional programming, machine learning and rock climbing.