Las Solanas Consulting

Storage Virtualization | FAQs & Discussions

Should I defragment SAN-based volumes?

A White Paper by Tim Warden

This is a complicated question that cannot be answered with a simple yes or no. The objective of volume defragmentation is to improve I/O performance principally by reducing seek latency on an inherently slow physical medium: the disk drive or "spindle".

In a single physical disk environment, such as the hard drive on your laptop, occasional volume defragmentation can significantly improve performance, as most of us have observed. Reading a large file or document from disk is must faster when the file is stored on consecutive blocks. If the file is heavily fragmented, the actuator assembly (actuator, arms, heads, etc.) of the physical drive may have to "travel" some distance and wait for the disk platter to rotate into position before the read can continue. Defragmentation in this environment allows reading an entire file without "thrashing the disk". The blocks are contiguous, and so seek and rotational latency is minimized.

In a Storage Area Network (SAN) or Shared Storage Array, the physical disk or spindle has been virtualized; your server volumes are based on logical entities or "LUNs" that often have no direct relationship with the geometry of a particular physical spindle. We must therefore take into account many other factors affecting volume performance in light of defragmenting.

To begin with, consider the nature of a Shared Storage Array. Generally speaking, its purpose is to offer a set of disk services to consolidate and enhance data storage. The array is typically shared by several storage clients (i.e. Application Servers) who may all be concurrently vying for the shared resources. These resources include:

  • Network bandwidth (Fibre Channel or IP/iSCSI Cables and Switches)
  • Storage Array Front-End bandwidth (the Fibre or iSCSI ports on the array)
  • Storage Processor CPU cycles
  • Storage Processor cache
  • RAID controller cycles and cache
  • Bandwidth of Back-End physical disk connections
  • Physical disks
  • Synchronous Mirroring and/or Asynchronous Replication Bandwidth
How these resources are configured and how the disk space is allocated all influence the decision on whether to defragment. Let us look at defragmenting from two points of view: the process of defragmentation itself, and the end result of a defragmented logical volume.

Defragmenting is Very I/O Intensive

Anyone who has defragmented a laptop or desktop computer's internal hard disk knows that defragmentation is extremely disk intensive and can disrupt other concurrent processes.

This is exacerbated in a shared storage environment. The storage processor begins allotting the shared resources (cache, channels, RAID controller activity) to the heavy I/O load generated by the incessant reads and writes of the defragmentation process. Additionally, if the volume you are attempting to defragment is a partition on a RAID group, other servers owning volumes attributed to other partitions on that same RAID group will all suffer seriously degraded performance.

Some defragmentation programs quiesce or throttle their activity when they sense other I/O on the server, but alas, they can only listen for I/O activity on the server on which they are running. They have no visibility on I/O activity on other servers in the SAN.

If the volume you are defragmenting is on a RAID-5 group, performance is further degraded depending on whether the storage processor implements write cache coalescing and the size of the defragment program's writes. The size of a RAID-5 stripe is, among other things, dictated by the number of disks in the stripe. If the defragmenter's writes are smaller than the stripe, those physical disks containing the write destination blocks as well as the disk containing the parity information will need to be read into cache, the parity recalculated, and those same disk blocks re-written.

Worse still, if these small writes cross RAID-5 stripe boundaries, the result will be two stripes worth of reads, parity calculations, and writes. (For a detailed discussion of RAID concepts, consult this tutorial in Wikipedia.)

Of course, if the volume you are defragmenting is small, the immediate impact on performance will be temporary. However, if the target volume is over 10GB, the time required to complete defragmentation can be disruptive to a production environment, having an adverse affect on other critical, I/O intensive servers. Your only option is to defragment during "off hours" if your organization has such a concept in this 24/7 world.

Does Defragmenting Always Improve Performance On Logical Volumes?

Not necessarily. While defragmentation will certainly reduce File System processing for a given volume, it may not improve responsiveness of the retrieving of the data from the spindles. In fact, when multiple volumes are placed on partitions sharing a particular RAID group of spindles, defragmentation may actually degrade overall performance of the group of volumes, depending on how thorough the defragmentation process was. Depending on the RAID type employed, defragmented volumes may actually increase the distance the heads of the individual spindles have to travel (seek latency and subsequent rotational latency) to respond to various-sized concurrent read and write requests for the multiple volumes. It is difficult to draw a general conclusion that regrouping sectors of a logical volume into contiguous "logical" blocks on a RAID group will have a positive or adverse affect on performance.

Advocates of defragmenting will point out that some defrag programs optimize the file system organization to improve performance. While their assertion has merit in terms of the number of I/O requests of meta-data potentially required to satisfy a read or write, the impact of a file system reorg in a SAN shared storage environment is again difficult to measure as defragmenting could actually result in some frequently accessed blocks being placed on slower sectors of the RAID group's spindles — which could override any anticipated gains of the File System optimization. Remember that in a shared storage environment the logical blocks are often scattered across the physical spindles in a RAID group and not necessarily on the "fastest" sectors — unless you have gone to great lengths to isolate your most speed-critical volumes to known fast areas (e.g., the "Hyper-0" sectors on a Symmetrix DMX). Unfortunately, most shops don't have the time or resources to micro-manage or plan their SAN storage at that level.

Defragmenting Not Recommended for Virtual Storage Pools

In Virtual Storage Pools, the physical storage and/or logical volumes of RAID groups are completely abstracted from the virtual volumes mapped to storage clients. For instance, consider a Windows server in a SAN. In the Logical Disk Manager, a Virtual Volume may show up as, say, Disk 2. Format that volume with an NTFS file system. Write several gigabytes worth of data onto it. If Disk 2 is a volume coming from a Virtual Storage Pool based on several RAID-5 stripes, the Disk 2 data may in fact be spread across multiple disks on multiple RAID groups. Storage Allocation Units are assigned to specific groups of contiguous sectors for a volume on an as-needed (e.g. as written) basis, and may be striped across the storage pool members in a round-robin or striping basis.

Virtual Storage Pools that employ striping often yield outstanding performance results, similar to striping used in RAID-10, RAID-50, or RAID-100 type configurations. However, defragmentation of a logical volume becomes meaningless because there is no linear sector to sector, block to block correlation between the logical and physical disks. Defragmenting will typically only result in additional latency as the "logically contiguous" blocks become even more scattered across the pool.

Finally, defragmenting a Thin-Provisioned or Sparse Volume has the undesirable effect of causing unnecessary allocation of storage from the pool.

Defragmenting Not Recommended for Replicated Volumes

If your volumes are the source of a Synchronous Data Mirror — or worse still, Asynchronous Data Replication — then you should definitely avoid defragmentation. Each write associated with the defrag process will result in corresponding mirror writes to the destination or replication site. This can easily consume the bandwidth of the interconnect. In the case of Asynch Replication over a slower link (T1, etc.), it can break the model, potentially causing a near or full resynchronization of the entire volume across the slow link (cf. "Internet Tubes").

Defragmenting Not Recommended for Snapshot Source Volumes

Defragmenting a volume which has active snapshots associated will have the adverse effect of causing the snapshot manager to save the original blocks as the blocks are reorganized on the source volume. In the common "Copy On Write" implementations, this can create an additional performance lag.

In those implementations that simply write the new blocks into "reserve space" and re-direct pointers (such as in NetApp's Snapshot implementation), the defragmentation process can easily eat up all the reserve space in the volume.

Defragmenting Not Recommended for CDP Volumes

If you've implemented a CDP (Continuous Data Protection) system, you'll want to avoid defragmenting those volumes, as the precedure will quickly eat up your CDP journal space. Hopefully, you will have chosen to implement CDP at the storage controller level and not as an agent on your server. However, if you are using host-based CDP agent that runs at the volume level (but not necessarily the file level), you will further impair performance on the server as each defrag write will correspond to two physical writes: one to the disk, and another agent write to the CDP server.

Alternatives For Improving Performance

If poor SAN I/O performance is causing you to study articles on disk defragmentation, you might want to consider looking for other ways to reduce I/O latency.

Use Intelligent Caching Storage Processors

Caching is a common and effective means of improving performance for just about any application. Caching is used ubiquitously from the CPU (with 3 levels of cache), to the RAM in your application servers, to the expensive cache on your storage processors. The idea is always to avoid accessing data over slow media, whether that be the slow spindles, the 1, 2, or 4Gb fibres, the slow PCI, PCI-X or PCI Express busses, the slow 667MHz memory busses or the slow L3 and L2 CPU cache. Slow is relative, with L2 being the fastest, and the spindles being the slowest.

Most spindles give I/O response times in thousands of microseconds. The typical storage processor gives I/O response times in hundreds of microseconds on a cache hit (e.g. where the data you are looking for is already in the storage processor's memory).

One Storage Virtualization vendor, DataCore Software Corporation, measures their I/O response times in tens of microseconds. This is a result of their sophisticated caching engine combined with a real-time driven I/O Subsystem. The fact that the software runs on common x86 servers is also a contributing factor, given Time-To-Market considerations, not to mention that you have a choice in what hardware you use to build the Storage Processor on.

Placing their SANsymphony product in front of your existing SAN Storage Arrays will have the effect of turbo-charging performance for a variety of reasons:

  • Additional layer of high performance adaptive cache (e.g. server RAM) that will change the character of both front-end and back-end I/Os on your existing SAN storage array, reducing contention.
  • Real-Time I/O subsystem uses I/O polling instead of interrupt handlers to complete I/Os
  • Takes advantage of new technologies your existing SAN doesn't have: PCI-Express, 4Gbps Fibre Channel, 667MHz memory, etc. For example, SANsymphony can talk 4Gb FC between your application servers and its cache, and on the backend talk 2Gb FC between its cache and your older SAN storage arrays.
  • Allows channel fan-out, effectively creating a Network Storage Processor that has more FC and/or iSCSI front-end ports than your existing SAN storage array.
  • Can be used to implement higher performance nested RAID levels on top of your existing SAN storage array or arrays: RAID 10, RAID 100, RAID 10+1, RAID 0+1, RAID 50+1, etc.

Distribute the Work Load Across Storage Processors and Channels

Consult your SAN Storage Arrays' utilities to determine if you have adequately distributed (or "Load Balanced") the I/O charge across your HBAs, fibres, switches, and storage processors.

Choose Optimized RAID levels for performance-critical volumes

Depending on the characteristics of your I/Os, optimizing RAID levels and / or re-organizing the physical layout of your LUNs can improve performance. If, for example, your database tables and logs are located on the same RAID 5 spindles and you are having performance issues, you should consider moving the logs onto another set of spindles, perhaps in a RAID 1 or RAID 10 configuration. Keeping small transaction write-intensive volumes off RAID-5 will definitely improve performance.

While attempting to avoid contention for the spindles, you should also consult your SAN Storage Arrays' documentation on how the spindles are physically connected to the storage processor. Keep in mind that in a dual-storage processor array, the disks are dual-ported. Laying out the spindles of different RAID groups across different back-end channels and keeping performance-critical volumes on separate channels and spindles can reduce contention on the back end resources.

Use Storage Virtualization to Implement QoS in your SAN Environment

DataCore Software Corporation offers a unique QoS or Quality of Service feature in their SANsymphony product. SANsymphony implements a Network Storage Processor at the heart of the SAN, caching I/Os and abstracting the storage from the servers. The QoS feature allows you to create performance groups, where the group members are mappings of volumes to channels (a la LUN masking). These Performance Groups (known as QoS Domains) can be I/O throttled by both IOPS and MB/s, allowing you to reserve SAN resources for your most critical applications.

For instance, you can place snapshot mappings into a MB/s-restricted "Backup Domain" to insure that the backup server doesn't cause undue storage array thrashing or use up switch / fibre bandwidth if run during production hours.

Use Tiered Virtual Storage Pools

Tiering your storage through virtualization can allow you to move less critical volumes off onto lower cost, lower performance storage, freeing up your higher performance, higher cost resources to focus on your most critical applications. For instance, rather than implement snapshots on your Tier-1 storage, using a Storage Virtualization platform such as SANsymphony can allow you to take snapshots of your Tier-1 volumes on another storage platform. Thus you recover expensive Tier-1 storage space otherwise reserved for snapshots, and the Tier-1 Array is no longer involved in the Snapshot processing.

Optimize Your Application Server's I/O Subsystem

Quite often, the performance bottleneck occurs in the Application Server (or storage client)'s I/O Subsystem. If your Windows-based application servers are I/O Bound, you should consider trying a utility from DataCore Software Corporation called "UpTempo". This software package replaces the standard Windows I/O Subsystem with DataCore's unique caching and high-performance I/O engine. Several benchmarks available on the DataCore site indicate the product can improve I/O response times by a factor of 3 to 4x, but of course, YMMV.

UpTempo's interface is a simple Windows Control Panel and allows you to select the volumes (via "Drive Letters") you want UpTempo to cache. You can choose Read Caching (the default) or Read/Write Caching. You can also tune the amount of memory assigned to UpTempo cache via a slider control. (Hint: Be conservative and leave the slider low (256MB or more) to start; a common error when first evaluating the product is to set the slider unnecessary high, turning all the server's RAM into UpTempo I/O cache, leaving little for the applications themselves, and thereby causing excessive paging.)

The good news is you can evaluate UpTempo for free. DataCore offers a free 30-day evaluation of the product, available for download from their website.

Other White Papers of Interest