Over the past few months, I’ve seen an increase in the following use case while working on performance and schema review engagements:
I need to store exponentially increasing amounts of data and analyze all of it in real-time.
This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:
While there are many approaches to this problem – and often times, the solution is actually a hybrid of many individually tailored components – a solution that I have seen more frequently in recent work is HP Vertica.
At the 30,000 foot overview, Vertica is built around the following principles:
Over the next few weeks, I’ll discuss several aspects of Vertica including:
While Vertica is by no means the silver bullet that will solve all of your needs, it may prove to be a very valuable tool in your overall approach to managing big data.