File server: a researcher’s best friend
Academics create and store a large portion of their work as digital data. Research data these days are either born digital or digitized for analysis and long-term preservation. Our data repositories keep growing. We write programs, record video footage, make audio recordings, create countless text documents, web pages, drawings, etc. It is no longer possible to store all of our data on a single laptop or workstation hard drive.
Long-term preservation: the 3:2:1 backup
All of the data need to be backed up. The popular 3:2:1 rule, developed by Peter Krogh for digital photographers, applies here, as well. You need to have three copies of each digital file, two on different media, and one offsite. Many academics already take advantage of the numerous backup options, including external USB hard drives, memory sticks, even DVDs. Increasingly, we become dependent on various types of "cloud" storage, especially the free options such as Dropbox and Microsoft SkyDrive. I imagine that most of us have figured out a method that works well for our needs and have managed to avoid major data loss disasters. However, judging by the considerable number of requests from friends and colleagues I receive for data recovery services, perhaps backup does not work as effectively as it should.
Long-term data preservation is obviously a priority for academics. However, it is equally important, I argue, to use an appropriate data access solution. We have already established that only a fraction of our data can live on the machine we happen to be using. The rest needs to be stored somewhere, within easy reach. This is particularly important for those of us who prefer lightweight devices (including tablets) for at least some types of work. Such devices have very little internal storage so we must be able to access all of the data we need quickly and easily. Good data access solutions, much like backup, require seamless implementation. Slow and cumbersome data access can be a source of a huge headache for most of us, especially when working from home or traveling.
Researchers often find themselves in need of sharing data. We seem to be stuck in the past here, as well. Even though the floppy disk finally disappeared, email attachments remain the most popular data sharing medium. My Inbox is filled with gigantic attachments ranging from text documents to uncompressed audio and video files. I am sure you have received multiple emails with attachments because someone used "Reply to all" by accident. The email protocol was not created with binary data sharing in mind. It lacks as much in efficiency as it does in security. Some of us use online collaboration suites, such as Adobe Connect Pro, Office 365, or Google Docs. They are fine tools, but, again, they are not ideal for sharing data, especially for large-scale, long-term projects.
Fragmentation in data backup, access, and sharing used to drive me nuts. Using different methods for different purposes, and, worse, different methods with different colleagues or collaborators, can be a very frustrating experience. Most importantly, it is one that, due to its inherent complexity and poor implementation, can lead to data loss and security breaches. Even though a file server is not going to solve the fragmentation problem completely, it will help make your digital life much easier.
What is a file server?
File servers vary greatly both in terms of form and function, but, minimally, a file server is a computing device attached to the network with a file storage, access, and sharing capability. The file server we are building in this series, resembles a modern computer in that it has a fast CPU, ample RAM, and a large data storage capacity. However, it is designed around data management rather than pure computational speed or 3D rendering capability. The robust hardware is run by a specialized operating system, also know as "server" OS, as opposed to "client" operating systems, such as Windows 7 or Mac OS X. The hardware and software are tightly integrated and optimized to perform certain types of data-centric operations. In the course of this work log it will become clear as to what those features are and how they are implemented to serve our particular needs.
The importance of the chassis can be easily overlooked. After all, it is just an enclosure within which all of the components are installed. However, the chassis needs to be spacious, functional, well-ventilated, quiet, and with at least some potential for expansion. Those who have read articles on my website are probably familiar with my obsession with silence. The file server is no different. We want the chassis to offer superb cooling as well as sound dampening. The server is likely to reside somewhere in the lab or home office (I don't recommend shoving it in a closet!) and stay on 24/7, so it needs to be quiet. Finally, the chassis must offer easy access to the hard drives. Ideally, we would want an enclosure with hot-swappable hard drive bays, but it may suffice to have a case in which hard drives are easily accessible.
For the purposes of this build, I chose the Fractal Design Define Mini case (Figure 1). Fractal Design is known for making computer enclosures of superb quality and for paying special attention to the needs of quiet computing. The Mini is no exception. Also, it comes with six 3.5-inch, easily accessible hard drive bays, which is unique for a case of this size. Speaking of size, it is a small mATX case, so you will only be able to fit a micro-ATX motherboard. It is, obviously, no problem for a file server since the only expansion slot will need is a PCI-e slot for the RAID controller. Finally, the case is gorgeous! It has a fine, understated look that won't disturb even the classiest of laboratories or offices.
Figure 1. Fractal Design Define Mini mATX chassis
As I mentioned earlier, we are limited to mATX motherboards, but it is hardly a limitation since there are many quality motherboards available in this form factor. The more important choice one needs to made is what platform to choose. The first decision is the manufacturer: Intel or AMD? I decided to go with AMD, but an Intel platform would be just as good. The platform needs to be stable, energy efficient, and, most importantly, compatible with the server operating system. I chose the Gigabyte 78LMT-S2P motherboard that is compatible with the latest six-core AMD processors and will be available with the future "Bulldozer" CPUs (so-called AM3+ socket). The motherboard has built-in Ethernet (Gigabit), audio, and video capability, and the total of six SATA II ports. The six SATA 3 GB/s ports are important for those of you who do not plan to use a dedicated RAID controller and will, therefore, need to connect the hard drives to those SATA II inputs. Having six of them means you will be able to connect one DVD drive (for installing the operating system) and five hard drives. Finally, there is one special capability that only some motherboards have, name Wake on LAN (WOL). Wake on LAN is extremely useful if you want to be able to start the server up remotely, from another networked machine. Say, you are traveling overseas and your sever is off. You can use any computer with Internet access to send a little data packet to your server and wake it up. Handy, isn't it?
Out of the vast array of AMD CPUs I chose a six-core Phenom II X6 1090T chip. It is a modern, 64-bit, fast CPU that is energy efficient and performs particularly well with multi-threaded applications. It is also substantially less expensive than an Intel equivalent. If you buy the retail version, you will also get a very capable heatsink and fan, though, for optimal noise reduction, you should probably install an aftermarket heatsink, such as the Gelid Tranquillo.
I would suggest the minimum of 8 GB of RAM. I chose two ADATA DDR3 PC1333 4-GB sticks for the total of 8 GB. This particular motherboard has only two RAM slots but if you plan to use RAM-intensive applications, such as large databases, you should probably go with an mATX motherboard that supports offers four RAM slots. For this particular build, 8 GB of RAM is going to be sufficient.
Choosing hard drives for a server is always a bit tricky. Your applications and computing needs should determine the type of drive you want. For this build, we are mostly interested in typical file server applications, so our priority is cool and quiet operation rather than the highest speed. The Western Digital Caviar Green-series of hard drives has a proven track record, particularly for file serves. Those are excellent storage drives. I chose four 1TB WD drives for storage.
You will also need a hard drive for the operating system. I was thinking of using an SSD (as I did recently in the silent workstation), but but it would substantially increase the cost of the build. Therefore, I went with a 500 GB Seagate 7200 RPM hard drive. It performs a bit faster than the WD drives and is perfect as a system drive.
We end up with the total of 4.5 TB of storage. However, not all of it is going to be usable storage. The 500 GB drive is going to have a separate partition for the operating system, and a small storage partition, which I will leave empty for now. All of the data storage is going to reside on the 4-disk drive array. I cannot overemphasize the importance of choosing a redundant storage configuration. Volumes have been written about redundant storage, so I will only talk about it briefly here. The idea is to have one of the four drives be redundant, so in case one drive fails, no data loss will occur. This particular configuration is known as RAID 5. It is a simple, yet effective strategy.
There are two different implementations of RAID 5 that you can choose from: (1) implemented by the operating system, also known as "software RAID" and (2) maintained by a dedicated RAID controller, also known as "hardware RAID." Neither is truly software-only or hardware-only, but I will spare you the debate. The rule of thumb is that a hardware-based RAID 5 is going to perform much faster and is independent of the server operating system. I believe that it is a better option, despite being more expensive. Therefore, for this build I chose the Areca ARC-1210 PCI-Express x8 SATA II (3.0Gb/s) Controller Card. I'd never used an Areca card before, but I had read good reviews and found it to be compatible with both Linux and Windows-based file servers.
The RAID controller has its own firmware, or tiny operating system. The firmware creates and maintains the RAID array. It is also providing dedicated data processing to "help" the OS, so that both read and write performance is significantly better than its software-only solution. The firmware can be accessed without the server OS, before it boots. It is a simple process:
- Install the controller in an available PCI-e slot
- Connect the hard drives to it
- Start the computer
- When the RAID firmware shows up on the scree, hit the required combination of keys
- Enter the firmware and choose the configuration you want (Figure 2)
- Initialize the array
Figure 2. Areca setup screen
You're done, for now. The controller must now initialize the drives, which, in my case, took a couple of hours. You only do this once. Now, you are ready to install the operating system on the 500 GB Seagate drive.
The power supply must be reliable, cool, and quiet. I chose the modular OCZ 80+ efficient power supply, but any good PSU from a reputable brand should be sufficient.
There are three commonly used types of operating systems for file serves (1) Unix/Linux-based, (2) Mac OS X Server, and (3) Microsoft Server. The Macintosh OS is, for most practical purposes, only limited to Apple hardware, so we have a choice of a Linux- or a Windows-based system. It's an old debate, which I always prefer to avoid. Go with whatever system suits your needs better. Both systems are very good, so we should feel fortunate that we have a choice. The person for whom I am building this server prefers a Microsoft OS because it fits perfectly with their work environment. The Windows Home Server 2011 is a particularly good choice, especially for people who are not experience system administrators. It is a solid OS, based on Windows Server 2008, but it has an overlay of very useful and user-friendly features, such as automatic backup/restore of workgroup computers, serving media files, remote music and video streaming (including smartphones and tablets), and many other useful features. In terms of speed and stability, it is an excellent choice as well. Finally, it is rather inexpensive at $60 for the OEM version.
Putting it together
Building a computer requires patience and attention to detail. It is always a good idea to test the build on a bench before installing all the components inside a chassis. That way, you can easily change things around without having to deal with the confines of the chassis. An open bench will make your work much easier.
The Areca RAID controller turned out to be pleasure to work with. It installed without the slightest problem. Once you've installed the OS, you can set up a very handy web utility that allows you to control the array from within a web browser. The WD drives work nicely. The entire array is fast. Very fast, indeed.
Figure 3. Areca controller browser setup
The Fractal Design case comes with two very quiet fans (intake and exhaust) and a fan speed controller that installs in one of the available PCI slots. You can fine tune the system to run really quietly. The file server is noticeably louder than my recent workstation (LINK), but I opted not to install third-party CPU heatsinks and ultra quiet case fans. Overall, I would rank the system as home-office-noise-floor quiet, which means that it should not be audible in a typical office from around 1 meter away.
Figure 4. A look at the final build inside the Fractal Design Define Mini case
The file server is an incredibly useful piece of technology for researchers, engineers, architects, programmers, or anyone whose work requires large quantities of data stored in a secure and fast repository. The Windows Home Server 2011 is a very nice piece of software. Again, others have written extensive reviews (e.g., Paul Thurrott's article), so I will limit my thoughts to some of the features that I found particularly useful. First of all, it is very stable, based on a proven server platform. It has a wonderfully rich set of features for a home user, such as streaming to a DLNA device, an XBox, an HTPC, or a web-enabled TV. The automatic backup of all workgroup PCs is very useful, indeed.
Figure. The WHS 2011 Web access interface
Please, email me if you have any questions about this article. Thanks!