sauron database design

The Eye of Sauron sees all market activity.

Learning SQL will be too much additional friction. The NoSQL database I’ve had the most experience with is MongoDB. I found:

  • a cheap MongoDB database plan from ScaleGrid

  • Arctic, a currently maintained OSS Python library for uploading timeseries data with 2.5k stars on GitHub

I just want my collected data to be fresh within ~15 minutes for backtesting. My requirements for data storage are thus:

  • cheap overall

  • robust, so the data doesn’t get lost

  • easy-to-query repeatedly

  • easy-to-upload new data

My requirements are currently NOT:

  • lightning-fast uploads and queries

  • low-latency, nanosecond optimization

I don’t need any scaling requirements as of yet (I just want to start out with funding rates), but I need to have an eye to the future ramifications of my current design decisions.

Looking at ScaleGrid’s current pricing plans, I’d need to upgrade once I reach storage requirements over 18 GB. What will it take for me to reach this point?

  • from ByBit, the platform’s October 1, 2021 ETH transactions in CSV format is ~2 MB

  • given that 1 GB is 1k MB, I can store ~500 days of all daily ETH transactions per GB

  • this means I can store 9k days of all daily ETH transactions, or 24 years of data

At some point, if I somehow begin to outscale the size of my MongoDB cluster, I can simply download the data into CSV format then clear the MongoDB. I can compress and store the files on Kaggle or in my Columbia Google Drive for later backtesting.

The conclusion is: sign up for the ScaleGrid plan, and store whatever data I want. I’ll design the backtesting system to query from MongoDB for now!