Las Solanas Consulting

Storage Virtualization | FAQs & Discussions

How To Build A Low-Cost iSCSI SAN using Dell PowerEdge PE840 Servers

A White Paper by Tim Warden

Dell calls it an "Essential". You call it a "Tower Server". I call it a "SAN Storage Array".

I just took delivery of a couple of "Essential" Dell PE840s. In this cookbook, I demonstrate how to turn them into iSCSI SAN Storage Arrays implementing Synchronous Data Mirroring (i.e. RAID-1). We also prove scalability by expanding our SAN storage using an MD1000 SAS shelf. Our objectives are:

  • Implement a highly-available shared storage environment or SAN
  • Use standard Dell servers
  • Assure scalability with the new SAN solution
  • Keep the costs of the project "under the radar"

The choice of the Dell PowerEdge is completely arbitrary; we could build this same type of SAN storage array using servers from IBM, HP, Sun, etc.

The server was configured using Dell's online store; Date of this configuration: February 26, 2007.

DEMYSTIFYING THE WORLD OF SAN STORAGE

First off, let's demystify the world of SAN storage; it ain't rocket science, folks. The features associated with those expensive and sophisticated SAN storage arrays are all implemented in software running on a proprietary, closed box machine. They call it "Firmware" or "Flare Code"; we call it "Software" because that's what it is: software.

Clariion ST34371N Jumper
Third jumper from the top...
pull it and reformat...
the Clariion drive becomes a "normal" Seagate drive...

The disk drives cost 4 times more than the equivalent you buy at Fry's or CompUSA, but they are manufactured by the same two or three companies, on the same assembly line. Want to turn a Clariion / Seagate SCSI drive into a regular drive? Pull the "Reserved" jumper off and re-format it with 512 byte blocks instead of 520. The drive will still identify itself as a Clariion drive, but it will behave exactly like the far less expensive Seagate drive it fundamentally is.

The Clariion-badged version of a drive is nothing more than the "Reserved" jumper, 520-byte blocks, and a Clariion signature on the drive. The Clariion signature assures that you won't try to put a non-EMC drive in the Clariion. The 520-byte block is for their "sniffer" application, a background utility running on the Storage Processor that walks every sector of all the drives, comparing a computed parity over the 512 bytes with the saved parity in the extra 8 bytes of the sector. Of course, Clariion SPs would reject any disk that didn't have the signature and 520-byte sectors. And it goes without saying that the Clariion division wouldn't release the firmware application to the field; if you toasted the firmware, only Webo could get you a drive. Voila, the justification for the ludicrous expense.

And why do they charge so much for cache? Have you looked closely at the cache? It's standard RAM. The exact same type of ECC DIMMs that Dell, IBM and HP put in their servers.

Incidentally, have you done a Flare "firmware" upgrade on a Clariion? Now tell me, was that "software" or "firmware" you were downloading? Did you happen to notice that "Start" menu in the lower left hand corner? That's called "Windows-Embedded".

Clariion Seagate
This Clariion 4GB SCSI-2 drive cost over $3,000 new...
How much does a 300GB SAS drive cost today?

"You must remember this:
a disk is still a disk,
a drive is just a drive..."

You see, Data General understood the value of integrating commodity parts to build products — that was their heritage, dating back to De Castro's Nova[1] days. And that was the direction Data General began moving in around 1999. They were starting to build the Clariions on standard Intel boards and soon would be porting their proprietary "Flare" code into Services running on Windows. Apparently EMC saw the light at the end of the tunnel and realized it was indeed an oncoming train. Rather than wait for the impending wreck, they jumped on the train[2] and are still the number one storage vendor, selling Clariions. And let's face it: the latest generation of Windows-embedded Clariion is a far more powerful box than the DMX with its PowerPC channel directors.[3] EMC has accepted the fact that Shared Storage is an application suitable for high-performance commodity servers running Windows; now it's our turn to come to the same conclusion.[4]

They call it an "Appliance"; we call it "Obsolescence". Years ago, you chose your Mini Computer based on the office automation program the vendor offered. Some liked "All In One" on their DEC VAX's (running VMS). Some preferred "CEO" on their Data General Eclipse's. No one today would consider buying an "appliance" Mail or Database server based on a closed, proprietary Mini Computer architecture. But when we buy a closed SAN storage array, that's essentially what we're doing.

And with every expensive array based on commodity hardware, we are re-purchasing the software that gives that array it's "SAN Storage Array" feature set or personality. We'd be better off to find an open software solution that runs on commodity hardware. You buy Dell servers? Run your Storage Service on a Dell server. Want to upgrade from iSCSI to 4GB Fibre Channel? Buy the HBA and install it in your commodity Dell server. The value of the software transcends the hardware; it maintains its value long after the Pentium 3 is obsolete, the PCI-X slots have been abandoned for PCI Express, and the generations-old 1GB Fibre Channel HBA has been forgotten.

Even if the traditional SAN storage array vendor is still implementing their "firmware" in ASICs, you can be sure that shortly after they ship their machine, AMD processors and faster memory busses will soon eclipse their performance advantage. The easiest way to benefit from Moore's Law is to stay within the norm of the critical mass. That's why Apple abandoned the PowerPC G5 for Intel Dual Core processors...

PowerEdge 840

But I digress. Enough philosophizing — let's build the SAN storage array... using commodity hardware and portable software.

THE SAN STORAGE PROCESSOR

For this project, we are going to use DataCore Software Corporation's SANmelody™ software to implement our SAN Storage Array.

This Storage Virtualization software offers many features associated with High-End storage systems, such as Synchronous Data Mirroring and Asynchronous Replication.

When you consider the service of Shared Storage as an application solution much like your Mail or DataBase applications, the benefits become evident:

CONFIGURING THE SERVER HARDWARE

Dell PE840 iSCSI SAN Storage Array
Dell PE840 iSCSI SAN Storage Array

PowerEdge 840 Qty 1
FREE UPGRADE to Dual Core Intel® Pentium®D 915, 2.8GHz, No Operating System Unit Price $2,342.00
Save $50 on PowerEdge 840 Servers through Dell Small Business!
Expires Thursday, March 01, 2007
Catalog Number: 4 BECWMK1
Module Description
PowerEdge 840 FREE UPGRADE to Dual Core Intel® Pentium®D 915, 2.8GHz
Memory DISCOUNTED UPGRADE to 2GB DDR2,533MHz, 4x512MB Single Ranked DIMMs
Network Adapter Onboard Single Gigabit Network Adapter
TCP/IP Offload Engine Enablement Broadcom® Dual Port TCP/IP Offload Engine Enabled
CD/DVD Drive 48X CD-ROM, 680MB, Internal
Primary Hard Drive Controller Onboard SATA Controller - No RAID
Hard Drive Configuration Onboard SATA, 4 Drive connected to Onboard SATA Controller - No RAID
Drive Cage Configuration Chassis with Non-Hot Swap Drives for PE840
Primary Hard drive 500GB, SATA, 3.5-inch, 7.2K RPM Hard Drive
2nd Hard Drive 500GB, SATA, 3.5-inch, 7.2K RPM Hard Drive
3rd Hard Drive 500GB, SATA, 3.5-inch, 7.2K RPM Hard Drive
4th Hard Drive 500GB, SATA, 3.5-inch, 7.2K RPM Hard Drive
Hardware Support Services 3Yr BASIC SUPPORT: 5x10 HW-Only, 5x10 NBD Onsite
System Documentation No Hard Copy Documentation, E-Docs only and OpenManage CD kit
Operating System No Operating System
Keyboard No Keyboard Option
Floppy Drive No Floppy Drive
Mouse NO MOUSE OPTION
Installation Support Services No Installation Assessment


TURNING THE PE840 INTO A SAN STORAGE ARRAY

Let's pop open the hood and have a look inside our PE840 SAN Storage Array.

Cache is RAM, Disks is Disks
Cache and Internal SATA Drives

We've configured our SAN storage processor hardware with 2GB of cache and a Gigabit iSCSI port, as well as 2 TB of raw storage, all in a 5U enclosure. The one thing that's missing is the intelligence to turn this ordinary high-power server into a SAN: the SAN volume, caching, and LUN masking management, as well as the drivers to run the Gigabit Ethernet ports in "target" mode so that they will behave like a storage array's iSCSI ports.

For this functionality we will use a unique software package from DataCore Software Corporation called SANmelody™. This software will turn the Dell PowerEdge 840 into a SAN storage array far more powerful than any appliance.

PCI, PCI-X and PCI Express for Expansion
Ample slots for adding NICs and an External RAID Controller

SANmelody will install its iSCSI target drivers on any available Gig-E ports on the server. But SANmelody isn't just a simple iSCSI Target driver. SANmelody will use the server's 2GB of RAM as storage processor cache. The SANmelody software also implements advanced Storage Pooling, Thin Provisioning and Over-Subscription, to help us maximize the utilization of our 2TB of internal storage. Finally, SANmelody offers optional advanced features such as Snapshots, Asynchronous Replication and Synchronous Data Mirroring with Auto-Failover to secure our data. We'll use this later feature to build High Availability beyond what you would expect from a traditional "Appliance" SAN Storage Array.

IMPLEMENTING HIGH-AVAILABILITY

To address our High Availability requirement, we turn again to the DataCore SANmelody software. The traditional midrange storage array achieves High-Availability through redundancy: two storage processors share a common backplane over which they can mirror their data (i.e. Mirrored Write Caching). Most of the tradition arrays implement mirrored write caching over a traditional media, such as a shared FC bus on the backplane. Both storage processors are attached to the physical spindles in a "dual-ported" fashion. While at first this appears to be an excellent means of protecting the data and keeping it available, the storage array itself is a single point of failure. I have heard numerous accounts first-hand from customers who have had downtime on their highly-available systems. A few have involved broken water pipes flooding an inadequately insulated data center. One involved a failed fibre channel drive in a StorageTek-badged array. The drive's failed behavior resulted in a LIP storm, which caused both storage processors to begin taking down RAID groups.

Such interruptions to production could be avoided by simply separating the dual storage processors, each with its own physical storage attached. Mirrored write caching could be implemented by a Fibre Channel or iSCSI inter-connect. This is exactly what we propose to do using the SANmelody software. The SANmelody software has an optional "Auto Failover" feature that creates a partnership between two SANmelody servers. For any given virtual volume, a synchronously mirrored pair can be created, effectively a RAID-1 mirror. If any component fails (cable, HBA, SANmelody server, etc.), the application servers can fail over to the surviving SANmelody node and continue to access their data. The SANmelody failover feature is an intelligent implementation with similar characteristics one would find on a mid-range or Tier-1 system. However, SANmelody takes High-Availability to the extreme, separating the storage processors and physical spindles.

Synchronous Mirroring with iSCSI
  2 x $20 Gig-E Cards
+ 1 CAT-6 crossover
+ SANmelody Auto-Failover Option
= iSCSI Synchronous Mirrors

SANmelody's synchronous data mirroring is implemented on a virtual volume basis. This is to say that you are not required to mirror all of the storage -- you only mirror those volumes for which you want extreme availability.

Of course, two SANmelody nodes also means double the number of front end ports, double the processing power.

We will create a second storage array using another, similarly configured PowerEdge 840. We will separate the two servers, either across the data center, across the building, or across campus, depending on the availability of real estate and fibre access. In my lab, I just have them plugged into separate circuits on either side of the room.

We will interconnect the two SANmelody storage arrays using a CAT-6 crossover cable between those two TrendNet cards I added. This interconnect will serve as a mirror channel and will transport the iSCSI synchronous writes for implementing our mirrored write cache between the two storage processors.

Total cost of this Extreme Highly-Available iSCSI SAN? The two PE840 servers together price retail at about $4700 for 4TB raw. The software required to implement the synchronously mirrored, dual storage processor SAN will come in below $10K, including 24/7/365 support. Total cost around $14K.

STORAGE CAPACITY EXPANSION

To help us get an idea of the real cost of this solution, let's add a shelf of additional capacity to our SANmelody SAN servers. To begin with, we'll need to add a RAID controller to each of the Dell PE840s. We'll install a PERC 5/E SAS external RAID adapter in the PE840's 16 Lane PCI Express slot.

Here's the config details for the RAID controller and MD1000; note we've only populated the shelf with 7 x 500GB SATA-II drives, leaving us 8 free slots for future expansion. This gives us 3.5TB raw for under $8,000. Those 500GB 7200RPM SATA-II drives go for $374 each. Filling out the 8 free slots, we can add an additional 4.0TB raw for less than $3000.

PowerVault® MD1000 Unit Price   $7,993.00
PowerVault MD1000
Catalog Number: 4 BVCWKK1
Module Description
PowerVault MD1000 PowerVault MD1000 External Storage Array, SAS and SATA support
Enclosure Management Module Two Enclosure Management Modules, PowerVault MD1000, SAS/SATA
Server RAID Controller PERC 5/E SAS external RAID adapter, PCI-Express, for MD1000
Cables SAS cable, 1 meter, connects MD1000 to PERC 5/E or another MD1000
1st Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
2nd Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
3rd Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
4th Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
5th Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
6th Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
7th Hard Drive 500GB 7200 RPM SATA II Hard Drive, Universal Carrier
8th Hard Drive Single Blank Hard Drive Filler
9th Hard Drive Single Blank Hard Drive Filler
10th Hard Drive Single Blank Hard Drive Filler
11th Hard Drive Single Blank Hard Drive Filler
12th Hard Drive Single Blank Hard Drive Filler
13th Hard Drive Single Blank Hard Drive Filler
14th Hard Drive Single Blank Hard Drive Filler
15th Hard Drive Single Blank Hard Drive Filler
Hardware Support Services 3Yr BASIC NBD: L1 Hardware Queue, Next Business Day Onsite, M-F 8am-6pm
Rack Rails No Rails Included
Installation Services PowerVault Installation Declined

As we were building this as an Extreme Highly-Available configuration, we'll need two of these to implement the synchronous data mirroring. I used the example with only 7 drives to illustrate you can grow the solution incrementally. If we fill out the MD1000 tray with 15 x 500GB drives, that brings the price of the mirrored 15TB raw expansion to $22,000. Not bad at all, especially compared to EqualLogic!

STARTING OUT SIMPLE

Of course, if you aren't interested in building the Extreme Highly-Available system with Synchronous Data Mirroring and total redundancy, you can slash the price by more than half: only one PE840, and a single node SANmelody license: the price will be less than $4000 for the simple 2TB iSCSI SAN solution; adding in the MD1000 with 7 drives (or 3.5TB raw) for the additional capacity will bring the total price in under $13,000; if you want a fully populated MD1000 yielding a total 9.5TB, that'll bring your cost to around $16,000. This single-head solution will be as rock solid as any single-head PS50e.

If your EqualLogic dealer complains the solution I've built only has 2 iSCSI ports, remind him that there is room for expansion. You've got three available slots (PCI, PCI-X and PCIe 1x slot) ready to take addition Gig-E NICs.

When you're ready to upgrade to the fully redundant, Extreme Highly-Available system, it will be a question of adding the additional PE840, connecting a mirror channel (in our example above, we were using a crossover cable), and upgrading your SANmelody license to include the "Auto-Failover" option, which implements the Synchronous Data Mirroring. The whole procedure can be accomplished without downtime, without disrupting production.

WORLDCLASS CUSTOMER SERVICE

DataCore Software Corporation has invested a lot of money and resources to assure its Customer Service organization is second to none. The Customer Support Engineers are all DataCore employees with extensive SAN and OS experience — they won't balk at your ESX questions; they won't lead off your incident with questions like "Is it plugged in?". Support is 24/7/365, "follow the sun".

DataCore is a member of TSANet (www.tsanet.org), to assure that any inter-vendor issues are quickly resolved without finger-pointing. DataCore can open an incident on your behalf with any TSANet members in order to jointly resolve your problem. TSANet membership is extensive, and includes all the major vendors: Microsoft, VMWare, IBM, HP, Dell, EMC, Intel, Apple, Novell, Sun, Veritas, etc.[5] For a complete TSANet member list, click here.

NEXT STEPS

Las Solanas Consulting is not a DataCore or Dell reseller, and DataCore does not publish their prices on the web. Please contact DataCore™ or a DataCore reseller for pricing information.

If you want to take a test drive of my recommended SANmelody configuration, contact Las Solanas Consulting or DataCore Software Corporation directly. Optionally, you can download a free, no-obligation 30-day evaluation. The evaluation software includes iSCSI and FC support, as well as support for their unique Thin Provisioning feature — however, it does not have support for the Synchronous Mirroring feature. Nonetheless, you'll see for yourself how SANmelody out-performs the traditional storage vendor's SANs or Storage Arrays.

Attend A Live Online Demo of SANmelody

Want to see a live of demo of SANmelody? Every Monday at 2pm Eastern, DataCore hosts a no-obligation, no-hassle online demo of the SANmelody product. The author of this paper is frequently one of the presenters. The presentation shows the various product features in the form of a demo of the installed product. Following the demo the presenter opens the floor to questions.   [Click here to Register...]

[FOOT NOTES]

[1] Visionary Edson De Castro left a successful career at Digital Equipment to found Data General in order to pursue his idea of building computers by integrating commodity components. The Nova was Data General's first Mini Computer and was based on the commodity components of the day.

[2] EMC purchased Data General in 1999, acquiring their successful Clariion business for mere peanuts.

[3] The DMX Fibre Channel Directors indeed have two PowerPC chips. The first was a 603 chip, I believe... not even a G4 or G5. But hey, it's better than the Motorola 68060s on the older Symmetrix boards. You need performance? Don't buy a DMX...

[4] No joke! The Clariion CX Series runs on Windows Embedded.

[5] TSANet's member list includes several of DataCore's competitors. Curious why those same competitors never mention TSANet in their sales calls.