Time series collection and processing in the cloud: integrating OpenTSDB with Google Cloud Bigtable
Data comes in different shapes. One of the these shapes is time series data. Time series is a very important abstraction since it can be used to describe multiple different processes. You can discover patterns in your website users behavior, capture sensor metrics from industrial equipment or track movement of celestial bodies using time series. The real power of this abstraction lies in providing a simple mechanism for different types of aggregations and analytics. It is easy to calculate minimum and maximum values over a given period of time, calculate average, sums and other statistics. OpenTSDB is a popular open source project that provides a unified way to ingest high-volumes of time series data. OpenTSDB relies on HBase to provide scalable and reliable storage, but implements it’s own logic layer for storing and retrieving data on top of it. One of the challenges in using OpenTSDB at scale is a need to deploy and maintain large HBase cluster. With a public release of Google Cloud Bigtable it is now possible to utilize flexibility of the cloud for HBase-like deployments. Since HBase is based on original Google’s work for Bigtable both systems are compatible on API level. Cloud Bigtable is a managed service and thus it requires minimum maintenance efforts even for large installations. We would like to introduce a result of collaboration between Pythian and Google — an open source add-on to OpenTSDB which enables integration between OpenTSDB and Google Cloud Bigtable. We will demonstrate how to setup a powerful and scalable time series collections system in several minutes. We believe that this project will be attractive to anyone dealing with monitoring systems, sensor data capturing or any other time series systems. During this presentation we will cover the following topics: Overview of time series data and OpenTSDB use cases Google Cloud Big Table properties and features OpenTSDB and Big Table integration details Deployment approach Future development
Big Data Consultant / Solutions Architect , Pythian
Danil Zburivsky, Big Data Consultant/Solutions Architect, Pythian. Danil has been working with databases and information systems since his early years in university, where he received a Master's Degree in Applied Math. Danil has 7 years of experience architecting, building and supporting large mission-critical data platforms using various flavours of MySQL, Hadoop and MongoDB. He is also the author of “Hadoop Cluster Deployment” book. Besides databases Danil is interested in functional programming, machine learning and rock climbing.
Big Data Architect, Pythian
Christos is a principal architect at Pythian creating and delivering Big Data platforms for some of the world's top tech organizations. Having more than 15 years of experience in designing and implementing software, he has a strong interest in building scalable, high throughput systems.