I need to store exponentially increasing amounts of data and analyze all of it in real-time.
This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:
- Cost of rapidly increasing, expensive disk storage (OLTP disks need to be fast == $$)
- Performance decrease as the data size increases
- Wasted hardware resources (excess I/O, etc)
- Impact against other time-sensitive transactions (i.e. OLTP workload)
While there are many approaches to this problem – and often times, the solution is actually a hybrid of many individually tailored components – a solution that I have seen more frequently in recent work is HP Vertica.
At the 30,000 foot overview, Vertica is built around the following principles:
- Columnar data store
- Highly compressed data
- Clustered solution for both availability and scalability
Over the next few weeks, I’ll discuss several aspects of Vertica including:
- Underlying architecture and concepts
- Basic installation and use
- Different data loading techniques
- Some various maintenance/operational procedures
- Some comparisons vs. traditional OLTP (MySQL) performance
- Some potential use-cases
- Integration with other tools (such as Hadoop)
While Vertica is by no means the silver bullet that will solve all of your needs, it may prove to be a very valuable tool in your overall approach to managing big data.