Peter Spradling, Jerry Zhilin Jiang
Original paper: R. Agarwal, A. Khandelwal, and I. Stoica. Succinct: Enabling queries on compressed data. In USENIX Symposium on Networked Systems Design and Implementation, 2015.
This is a reproduction of Succinct, a distributed in memory data store that utilizes compression to fit more data into memory. Succinct’s compression scheme enables fast queries directly on top of the compressed representation. According to the original authors, Succinct exposes “a minimal, but powerful API,” while requiring much less memory than traditional databases. Below, we attempt to reproduce some of the original paper’s findings and evaluate certain claims. Specifically, we attempt to reproduce the paper’s comparison of Succinct and MongoDB in terms of memory usage, throughput, and latency.
The original paper found that MongoDB’s in memory indicies took up 8x more memory than the actual data, whereas Succinct’s total in memory representation was actually smaller than the raw data, leading to dramatically higher memory usage for MongoDB given the same amount of input data. The original paper also found that Succinct displayed measurably better throughput and latency with respect to MongoDB on a workload of 10,000 search requests.
Our reproduction contains mixed results. We were able to reproduce similar Succinct throughput and latency metrics, but due to complications with measuring memory usage in Spark and compiler issues with the C++ version of Succinct, we were not able to obtain a post-compression memory usage metric that we feel confident is correct. Of all our the attempts to measure memory usage, our most promising method yielded a metric 28% larger than the original data, which would contradict many of the paper’s results, but as we discuss below, we believe the issue is likely with our measurement.
We also could not verify the MongoDB memory usage metrics. We tested both MongoDB 2.7 and MongoDB 4.0, and like the original paper, we created indexes on every column in the smallkv dataset. Our results show a 4:1 and 3:1 ratio of metadata to data in MongoDB 2.7 and MongoDB 4.0, not the 8:1 ratio that the original authors mentioned.