From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication

3 April 1:00PM - 1:50PM @ Ballroom D

Experience level: 
50 minutes conference
Getting data into Hadoop is not difficult, but it is complex if what yo want to do is load 'live' or semi-live data into your Hadoop cluster from your MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and the load into Hadoop. Replicating into Hadoop with Tungsten Replicator enables you to stream replication data from your MySQL servers straight into Hadoop. Using the leading replication service built into Tungsten Replicator, and supporting all the topology and reliability features of Tungsten Replicator, the Hadoop applier enables you to replicate data directly from MySQL into Hadoop. This session will include a look at the existing methods of loading Hadoop data, an examination of how the Hadoop replicator works, and a live demo of replicating data from MySQL into Hadoop. + Traditional Loading Methods + Sqoop: Your Data Loading Frenemy + Replicating from MySQL + How the Hadoop Replicator Works + Live Demo of replication


Director of Documentation, Continuent
A professional writer for over 15 years, Martin 'MC' Brown is the author and contributor to over 26 books covering an array of topics, including programming, system management and web technologies. His expertise spans myriad development languages and platforms – Perl, Python, Java, JavaScript, C, C++, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS and more. The combination has resulted in expertise in web programming, systems management and integration, and XML and DocBook technologies for writing and publishing documentation. A former LAMP Technologies Editor for LinuxWorld magazine, and a regular contributor to, LinuxPlanet, ComputerWorld and IBM developerWorks. As a Subject Matter Expert for Microsoft he provided technical input to their Windows Server and certification projects. He draws on a rich and varied background as founder member of a leading UK ISP, systems manager and IT consultant for an advertising agency and Internet solutions group, technical specialist for an intercontinental ISP network, and database designer and programmer – and as a self-confessed compulsive consumer of computing hardware and software. In his pre-writing life he spent more than 10 years designing and managing mixed platform environments, developing the rare talent of being able to convey the benefits and intricacies of his subject with equal measures of enthusiasm, professionalism, in-depth knowledge and insight. Most recently he has concentrated on building high quality user-focused information through his books, articles, and MySQL and the MySQL groups within Sun and Oracle. In addition to producing the content, he has also developed the documentation systems to improve he quality and efficiency of the documentation being written. MC is currently the Director of Documentation for Continuent and is responsible for building the documentation and supporting information.
Senior Software Engineer, Continuent Inc.
Linas has extensive experience in developing heterogeneous replication solutions between MySQL, Oracle and PostgreSQL. Implemented support for MySQL to Oracle/PostgreSQL/Greenplum replication and, also, replication POC from PostgreSQL to other DBMS. In addition to developing, he's helping heterogeneous replication customers get deployed. Before joining Continuent, Linas was Head of IT at FBC "Finasta".