Issue addressed: Ad hoc analytics on clickstream data arriving too fast for InnoDB or NoSQL to handle.
The Company: Headquartered in New York, Intent Media is a fast-growing online advertising startup. The company helps some of the largest online retailers monetize their traffic more efficiently at scale by showing highly relevant and targeted advertising to the 97+% of e-commerce visitors who do not transact.
The Challenge: The Intent Media platform processes hundreds of millions of events a day generated by media placements across leading e-commerce sites — a textbook “Big Data” challenge. Intent Media’s data is used to optimize media placements, drive segmentation models, and create analytics reports supporting publishers, advertisers, and internal business processes. Intent Media hosts its systems on Amazon EC2, and its TokuDB database contains tables approaching a billion records.
Intent Media turned to TokuDB to support ad hoc analysis for two core reasons: performance at scale with massive data volumes; and analyst familiarity with the SQL toolset.
A number of options they had considered were insufficient. These included:
InnoDB – Familiar toolset, but not fast enough. “InnoDB performance breaks down quickly when tables get very large, but our testing demonstrated that TokuDB could provide a familiar SQL environment to analysts that continues to perform superbly as data sizes grow,” according to CTO Josh Hartmann.
Pig/ Hive, backed by MapReduce – Not fast enough for ad-hoc reporting, with limited support. “Analysts need a responsive toolset that can bring back answers in seconds or minutes – not hours,” Hartmann said. “While Pig and Hive on top of MapReduce can handle very large datasets, it comes at a big cost. Both tools are much less responsive in the hands of analysts, and in the case of Pig requires retooling the team to learn a new language.”
Other NoSQL solutions – Promising performance in limited situations, but with big functional limitations. “We looked at a variety of NoSQL engines, but the ability for our data analyst team to stick with what they know was key for us,” Hartmann said. “Our analysts can write more complex queries with joins without having to fall back to implementing logic in software.”
The Solution: Intent Media imports its data into TokuDB.
Intent Media’s original installation of TokuDB in 2010 was completed in a matter of hours. Since then, they have upgraded to TokuDB v5.0 to take advantage of its rich feature set, including Hot Column Addition and Deletion (HCAD).
As a growing business with an evolving data model, HCAD was a big win for Intent Media. Now Intent Media has the flexibility to quickly and painlessly modify their schema on the fly, without taking the database offline.
“Column additions in the past simply were not practical, taking days to complete,” Hartmann said. “They now take a matter of seconds, and can be accomplished in a non-disruptive fashion. This has dramatically improved our ability to adapt to the changing needs of our business, without fear that a schema change would lock up a table for a week or more, blocking other time-sensitive analyses.”
Performance: When evaluating TokuDB, Intent Media looked at several metrics. “Insert performance was important to us, but even more critical was how fast queries run after the fact,” Hartmann said. This behavior helped drive the decision to TokuDB.
Scalability: “Tokutek has been with us from the beginning, starting with a few million rows at the start, to scaling with us now for a database with tables approaching billions of rows. With TokuDB, we’ve been able to keep up with this growth with consistently fast performance,” Hartmann said. “Managing terabytes of data now is as easy as managing 50 gigabytes was at the beginning.”
Flexibility: “This dramatic reduction in time it takes to add a column will allow us to continually and dynamically test and adopt algorithms on a daily basis,” Hartmann said. “It gives our business the agility that our competitors lack and allows us to maximize performance for our customers.”
SQL Interface: By not having to switch to a NoSQL solution or a MapReduce toolset such as Pig, Intent Media was able to leverage capabilities such as rich indexing and a powerful high level language like SQL.