If you run a small or medium-sized business that requires a lot of digital storage, the time will come when you need your own cloud. Whether it’s storage for clients or backups for company data, the big name providers don’t come cheap. You’re probably considering Amazon S3, Dell, Sun, EMC, and NetApp, among others.
But if you really want to cut costs, take the challenge into your own hands. With a few tricks and a bit of know-how, you can get pedabytes of storage for as little as a tenth of the price as any of those big name brands.
In this guide, we’ll show you how using a few key ingredients:
- Consumer-grade hard drives to keep costs low
- Readily-available commodity components
- Space-efficient racks and boxes
- Free software
- HTTPS to store and retrieve data
This method has been tried and tested by BackBlaze, a cloud backup company that pioneered this scheme. We’ll borrow heavily from the knowledge BackBlaze has generously shared with the world.
Drives and components
When stripped away of everything but the core function of what all the big enterprise cloud brands do, what you get is as simple as transfering data to and from a hard drive over the internet. If you’re building your own cloud server, the hard drives you purchase will largely determine the price point and make up the bulk–estimate at least half and as much as 80 percent–of your investment.
The software is free, so the remainder comes from the enclosure, racks, and all the components. The open-source design for Backblaze’s newest enclosure claims it can bring the cost down to as little as 3.6 cents per GB of storage, totalling 240TB with all 60 drives accounted for. The pods sit in 4U racks and the 6.0 design extends a few inches past the edge, so consider how much space you have in your server room.
You can buy one of these “pods” pre-built for between $3,000 and $7,000, depending on how many hard drive slots you need, from 45 Drives or Blackblaze. With the pre-built pod, the only other thing you need to buy is the actual hard drives.
Or you can build one yourself. For a 6th-gen pod with 60 hard drives, the full parts lists along with estimated prices can be found in this PDF. Most components can be found on Newegg and Amazon, but some will come from special distributors or contract assemblers.
While the parts vary for each version of Backblaze’s design, here’s a generic rundown of everything you’ll need:
- 4U chassis
- Power supply
- On/Off switch
- Case fan
- Fan mounts
- CPU Fan
- CPU (Intel)
- 8GB RAM
- Port multiplier backbplanes
- SATA III cards
- SATA III cables
- Cable harnesses
- Screws and cable ties
Once you have all the parts, it’s time to start assembly. You can download the following from BackBlaze:
- Wiring diagrams (ZIP file)
- Wiring routes (ZIP file)
- Build book (PDF)
All in all the 60-drive set up costs an estimated $3,500 according to BackBlaze. Remember that doesn’t include the drives. With 45 4TB hard drives, the total bill comes out to about $10,500, the company says.
Note that in an earlier design, BackBlaze used a direct-wire setup, in which all of the hard drives are wired directly to a Rocket 750 SATA card. Later, they switched back to their original configuration, which uses port multiplier backplanes that hold five drives each. Depending on what hardware is available at what price, both are good options.
Creating a cloud
So now you’ve built a huge storage server for a fraction of what it would have cost you to use someone else’s servers, but you still need to make it into a cloud that’s accessible to clients, staff, and/or applications.
Let’s work from the bottom up. Backblaze recommends 64-bit Debian Linux as the operating system. The fdisk tool is used to create one partition per drive. If you bought one of the pre-made pods, many of the drivers will come pre-installed. Drives are clustered in sets of 15 into RAID6 volumes with two parity drives each using the mdadm utility.
Now you must choose between the JFS or ext4 filesystem. Ext4 is more common but JFS is what BackBlaze uses. Each pod has its own HTTPS IP address, which is how it will be accessed. ext4 supports up to 1EiB, but the distro copy of e2fsprogs only supported 16TB. Building from source using the 64bit flag solves this.
Once all that is in place, you end up with about 83 percent of usable space out of the total. This is where we have to stop relying on Backblaze for advice, as its cloud software is proprietary.
You have a couple options for cloud software. NFS is tried and tested on Linux but not all that compatible with mobile devices.
Another option is Oxygen Cloud, which uses the Oxygen Storage Connector to convert used storage on the server into storage that can be used with Oxygen Cloud apps. Oxygen cloud encrypts data in transit end-to-end. You also get access to Oxygen Tunnel Gateways, which allow you to access your storage from outside your own firewall without having to change your configuration.
Before you go to Newegg and start filling up your shopping cart with hard drives and components, it’s important to consider the potential drawbacks of not going with a provider like Amazon S3 or EMC.
The biggest risk is that you could lose data. That means you could lose your job and/or harm your company and coworkers. The system uses a single disk for the host operating system, some do not have redundant or failover power supplies, and any health or monitoring software must be built, installed, and configured by hand.
It’s also not as easy to expand or maintain. When you pay for Amazon S3, maintenance is all taken care of for you. But a task as simple as swapping out a failed drive in a DIY storage pod requires you remove 12 screws and the top cover, not to mention setting up custom wiring harnesses. Much of the necessary maintenance will require the system to be taken offline and possibly powered down.
The simplest solution to both of these problems is to make an extra one or more redundant servers that act as failovers.
All images by Backblaze licensed under CC BY 2.0
wow, this is brilliant
and the drawbacks are manageable
thanks a lot for your article.
Just for the record, they don’t use JFS no more, since years, as you can read here:
«We upgraded the Linux 64-bit OS from Debian 4 to Debian 5, but we no longer use JFS as the file system. We selected JFS years ago for its ability to accommodate large volumes and low CPU usage, and it worked well.»