Over at Backblaze they have created a custom solution for a petabyte of storage that only costs a bit more than the hard disc drives themselves. They even tell you how to do it yourself in their blog posting Petabytes On A Budget.
Their platform runs on open source solutions, so there are no additional software costs. The form factor of 4U is designed to fit in modern data centers. I think they are several interesting take aways from Backblaze’s work:
- Storage is cheap. Putting it together can cost a lot, especially if you use some of the popular commercial solutions.
- It’s still possible to come up with elegant designs using off the shelf components.
- Elegant designs still take an amazing amount of detail to pull off (just read the part of the post on vibration issues and how they were handled).
Software and Storage
The post glosses over the software and access details, but points out that both control and data access is through HTTPS running through Tomcat and Backblazes own software. We are a point in time where petabytes of storage are a commodity being accessed through a simple protocol.
For me, the post gets really interesting just at the point they start talking about “Cloud Storage: The Next Step”. Quoting from the post:
Building a cloud includes not only deploying a large quantity of hardware, but, critically, deploying software to manage it. At Backblaze we have developed software that de-duplicates and chops data into blocks; encrypts and transfers it for backup; reassembles, decrypts, re-duplicates, and packages the data for recovery; and monitors and manages the entire cloud storage system. This process is proprietary technology that we have developed over the years.
As storage needs keep expanding the amount of duplicated data stored around the planet grows exponentially. Yet removing duplicated data is just one of the things that the Backblaze software has to deal with. For Cloud Computing to take off, we need to move a couple of layers up the application stack to deal with data as more than just storage.
Do you know where your data is being stored?