Tackling machine data on the ground to ensure successful operations for NASA in space
The Company: Southwest Research Institute (SwRI) is an independent, nonprofit applied research and development organization. The staff of more than 3,000 specializes in the creation and transfer of technology in engineering and the physical sciences. Currently, SwRI is part of an international team working on the NASA Magnetospheric Multiscale (MMS) mission. MMS is a Solar Terrestrial Probes mission comprising four identically instrumented spacecraft that will use Earth’s magnetosphere as a laboratory to study the microphysics of three fundamental plasma processes: magnetic reconnection, energetic particle acceleration, and turbulence.
The Challenge: SwRI is responsible for archiving an enormous quantity of data generated by the Hot Plasma Composition Analyzer (HPCA). The device is used to count hydrogen, helium, and oxygen ions in space at different energy levels. These instruments require extensive calibration data and each one is a customized, high precision device that is built, tested, and integrated by hand. SwRI must capture and store all the test and calibration data during the 2-3 week bursts activity that are required for each of the 4 devices.
“During each of these calibration runs, there are several data sources flowing into the server, each one leading to an index in the database,” said Greg Dunn, a Senior Research Engineer at SWRI. “Each packet that arrives gets a timestamp, message type, file name and location associated with it. A second process goes through that data and parses it out – information such as voltage, temperature, pressure, current, ion energy, particle counts, and instrument health must be inserted into the database for every record. This can load the database with up to 400 or 500 inserts per second.”
“Being able to monitor the performance of the instrument and judge the success of the tests and calibrations in near real time is critical to the project,” noted Dunn. “There are limited windows to do testing cycles and make adjustments for any issues that arise. Any significant slip in the testing could cost tens of thousands of dollars and jeopardize the timing of the satellite launch.”
“We started seeing red flags with InnoDB early in the ramp-up phase of the project, as our initial data set hit 400GB,” said Dunn. “Size was the first issue. Each test run was generating around 94 million inserts or around 90GB of data, quickly exceeding the capacity allocated for the program. In addition, as our database grew to 800M records, we saw InnoDB insertion performance drop off to a trickle. Even with modest data streams at 100 records per second, InnoDB was topping out at 45 insertions per second. Being able to monitor these crucial calibration activities in a timely fashion and in a cost effective manner was at risk.”
To keep up with the workload and data set, SwRI considered several options, but they failed to meet program performance and price goals. These included:
Partitioning / Separate Databases – “We considered partitioning, but this can be a challenge to set up and it introduces additional complexity,” said Dunn. “We also looked at putting each calibration into its own database, but that would have made it much more difficult to correlate across different databases.”
Additional RAM – “Increasing the available RAM from 12 GB up to 100 GB was not enough by itself,” claimed Dunn. “We briefly considered keeping everything in RAM, but that was not a realistic or efficient way to address a data set size that was promising to grow to several terabytes by the end of the program.”
The Solution: Once TokuDB was installed, SwRI’s big data management headache quickly subsided. “The impact to our required storage was dramatic,” noted Dunn. “We benefited from over 9x compression. In our comparison benchmarks, we went from 452GB with InnoDB to 49GB with TokuDB.”
There was also a dramatic improvement in performance. “Suddenly, we no longer had to struggle to keep up with hundreds of insertions per second,” stated Dunn. “Our research staff could immediately see whether or not the experiment was running correctly and whether the test chamber was being used effectively. We didn’t have to worry that insufficient data analysis horsepower might lead to downstream schedule delays.”
Cost Savings: “The hardware savings were impressive,” noted Dunn. “With InnoDB, going to larger servers, adding 100s of GBs of additional RAM along with many additional drives would have easily cost $20,000 or more, and still would not have addressed all our needs. TokuDB was by far both a cheaper and simpler solution.”
Hot Column Addition: “As we continue to build out the system and retool the experiments, flexibility in schema remains important,” stated Dunn. “TokuDB’s capability to quickly add columns of data is a good match for our environment, where our facility is still evolving and sometimes has new sensors or monitors installed that need to be added to existing large tables.”
Fast Loader: “The open source toolset that Tokutek designed to parallelize the loading of the database was very helpful,” said Dunn. “We were able to bring down the load of the database from MySQL dump backup from 30 hours to 7 hours.”