Should I defragment SAN-based volumes?
A White Paper by Tim Warden
This is a complicated question that cannot be answered with a simple yes or no.
The objective of volume defragmentation is to improve I/O performance principally
by reducing seek latency on an inherently slow physical medium: the disk drive
or "spindle".
In a single physical disk environment, such as the hard drive on your laptop,
occasional volume defragmentation can significantly improve performance, as most
of us have observed. Reading a large file or document from disk is must faster
when the file is stored on consecutive blocks. If the file is heavily fragmented,
the actuator assembly (actuator, arms, heads, etc.) of the physical drive may have
to "travel" some distance and wait for the disk platter to rotate into position
before the read can continue. Defragmentation in this environment allows reading
an entire file without "thrashing the disk". The blocks are contiguous, and so
seek and rotational latency is minimized.
In a Storage Area Network (SAN) or Shared Storage Array, the physical disk or
spindle has been virtualized; your server volumes are based on logical entities or
"LUNs" that often have no direct relationship with the geometry of a particular
physical spindle. We must therefore take into account many other factors
affecting volume performance in light of defragmenting.
To begin with, consider the nature of a Shared Storage Array. Generally
speaking, its purpose is to offer a set of disk services to consolidate and enhance
data storage. The array is typically shared by several storage clients
(i.e. Application Servers) who may all be concurrently vying for the shared
resources. These resources include:
- Network bandwidth (Fibre Channel or IP/iSCSI Cables and Switches)
- Storage Array Front-End bandwidth (the Fibre or iSCSI ports on the array)
- Storage Processor CPU cycles
- Storage Processor cache
- RAID controller cycles and cache
- Bandwidth of Back-End physical disk connections
- Physical disks
- Synchronous Mirroring and/or Asynchronous Replication Bandwidth
How these resources are configured and how the disk space is allocated
all influence the decision on whether to defragment. Let us look at
defragmenting from two points of view: the process of defragmentation
itself, and the end result of a defragmented logical volume.
Defragmenting is Very I/O Intensive
Anyone who has defragmented a laptop or desktop computer's internal
hard disk knows that defragmentation is extremely disk intensive and
can disrupt other concurrent processes.
This is exacerbated in a shared storage environment. The storage
processor begins allotting the shared resources (cache, channels, RAID
controller activity) to the heavy I/O load generated by the incessant
reads and writes of the defragmentation process. Additionally, if the
volume you are attempting to defragment is a partition on a RAID
group, other servers owning volumes attributed to other partitions
on that same RAID group will all suffer seriously degraded performance.
Some defragmentation programs quiesce or throttle their activity
when they sense other I/O on the server, but alas, they can only
listen for I/O activity on the server on which they are running.
They have no visibility on I/O activity on other servers in the
SAN.
If the volume you are defragmenting is on a RAID-5 group,
performance is further degraded depending on whether the storage
processor implements write cache coalescing and the size of the
defragment program's writes. The size of a RAID-5 stripe is,
among other things, dictated by the number of disks in the stripe.
If the defragmenter's writes are smaller than the stripe, those
physical disks containing the write destination blocks as well
as the disk containing the parity information will need to be
read into cache, the parity recalculated, and those same disk
blocks re-written.
Worse still, if these small writes cross RAID-5 stripe boundaries,
the result will be two stripes worth of reads, parity calculations,
and writes. (For a detailed discussion of RAID concepts, consult this tutorial in
Wikipedia.)
Of course, if the volume you are defragmenting is small, the
immediate impact on performance will be temporary. However, if the
target volume is over 10GB, the time required to complete defragmentation
can be disruptive to a production environment, having an
adverse affect on other critical, I/O intensive servers. Your only
option is to defragment during "off hours" if your organization has
such a concept in this 24/7 world.
Does Defragmenting Always Improve Performance On Logical Volumes?
Not necessarily. While defragmentation will certainly reduce File System processing
for a given volume, it may not improve responsiveness of the retrieving of the data
from the spindles. In fact, when multiple volumes are placed on partitions
sharing a particular RAID group of spindles, defragmentation may actually
degrade overall performance of the group of volumes, depending on how
thorough the defragmentation process was. Depending on the RAID type
employed, defragmented volumes may actually increase the distance the heads
of the individual spindles have to travel (seek latency and subsequent
rotational latency) to respond to various-sized concurrent read and write
requests for the multiple volumes. It is difficult to draw a general conclusion
that regrouping sectors of a logical volume into contiguous "logical"
blocks on a RAID group will have a positive or adverse affect
on performance.
Advocates of defragmenting will point out that some defrag
programs optimize the file system organization to improve performance.
While their assertion has merit in terms of the number of I/O requests of
meta-data potentially required to satisfy a read or write, the impact of
a file system reorg in a SAN shared storage environment is again
difficult to measure as defragmenting could actually result in some
frequently accessed blocks being placed on slower sectors of the RAID group's
spindles which could override any anticipated gains of the File
System optimization. Remember that in a shared storage environment the
logical blocks are often scattered across the physical spindles in a RAID
group and not necessarily on the "fastest" sectors unless you have
gone to great lengths to isolate your most speed-critical volumes to known
fast areas (e.g., the "Hyper-0" sectors on a Symmetrix DMX). Unfortunately,
most shops don't have the time or resources to micro-manage or plan their
SAN storage at that level.
Defragmenting Not Recommended for Virtual Storage Pools
In Virtual Storage Pools, the physical storage and/or logical volumes
of RAID groups are completely abstracted from the virtual volumes
mapped to storage clients. For instance, consider a Windows server
in a SAN. In the Logical Disk Manager, a Virtual Volume may show
up as, say, Disk 2. Format that volume with an NTFS file system.
Write several gigabytes worth of data onto it. If Disk 2 is a
volume coming from a Virtual Storage Pool based on several
RAID-5 stripes, the Disk 2 data may in fact be spread across multiple
disks on multiple RAID groups. Storage Allocation Units are
assigned to specific groups of contiguous sectors for a volume
on an as-needed (e.g. as written) basis, and may be striped across
the storage pool members in a round-robin or striping basis.
Virtual Storage Pools that employ striping often yield outstanding
performance results, similar to striping used in RAID-10, RAID-50, or
RAID-100 type configurations. However, defragmentation of a logical
volume becomes meaningless because there is no linear sector to
sector, block to block correlation between the logical and physical
disks. Defragmenting will typically only result in additional latency
as the "logically contiguous" blocks become even more scattered across
the pool.
Finally, defragmenting a Thin-Provisioned or Sparse Volume
has the undesirable effect of causing unnecessary allocation
of storage from the pool.
Defragmenting Not Recommended for Replicated Volumes
If your volumes are the source of a Synchronous Data Mirror
or worse still, Asynchronous Data Replication then you should
definitely avoid defragmentation. Each write associated with the
defrag process will result in corresponding mirror writes to the
destination or replication site. This can easily consume the bandwidth
of the interconnect. In the case of Asynch Replication over a slower
link (T1, etc.), it can break the model, potentially causing a near or
full resynchronization of the entire volume across the slow link
(cf. "Internet Tubes").
Defragmenting Not Recommended for Snapshot Source Volumes
Defragmenting a volume which has active snapshots associated will
have the adverse effect of causing the snapshot manager to save
the original blocks as the blocks are reorganized on the source
volume. In the common "Copy On Write" implementations, this can
create an additional performance lag.
In those implementations that simply write the new blocks
into "reserve space" and re-direct pointers (such as in NetApp's
Snapshot implementation), the defragmentation process can easily
eat up all the reserve space in the volume.
Defragmenting Not Recommended for CDP Volumes
If you've implemented a
CDP (Continuous Data Protection)
system, you'll want to avoid defragmenting those volumes, as the precedure
will quickly eat up your CDP
journal space. Hopefully, you will have
chosen to implement CDP
at the storage controller level and not as
an agent on your server. However, if you are using host-based CDP
agent that runs at the volume level (but not necessarily the file
level), you will further impair performance on the server as each
defrag write will correspond to two physical writes: one to the disk,
and another agent write to the CDP server.
Alternatives For Improving Performance
If poor SAN I/O performance is causing you to study articles on
disk defragmentation, you might want to consider looking for
other ways to reduce I/O latency.
Use Intelligent Caching Storage Processors
Caching is a common and effective means of improving performance for
just about any application. Caching is used ubiquitously from the
CPU (with 3 levels of cache), to the RAM in your application servers,
to the expensive cache on your storage processors. The idea is always
to avoid accessing data over slow media, whether that be the slow
spindles, the 1, 2, or 4Gb fibres, the slow PCI, PCI-X or PCI Express
busses, the slow 667MHz memory busses or the slow L3 and L2 CPU cache.
Slow is relative, with L2 being the fastest, and the spindles being
the slowest.
Most spindles give I/O response times in thousands of microseconds.
The typical storage processor gives I/O response times in hundreds of
microseconds on a cache hit (e.g. where the data you are looking for
is already in the storage processor's memory).
One Storage Virtualization
vendor, DataCore Software Corporation,
measures their I/O response times in tens of microseconds. This is a
result of their sophisticated caching engine combined with a real-time
driven I/O Subsystem. The fact that the software runs on common x86 servers
is also a contributing factor, given Time-To-Market considerations, not
to mention that you have a choice in what hardware you use to build
the Storage Processor on.
Placing their SANsymphony product in front of your existing SAN
Storage Arrays will have the effect of turbo-charging performance for
a variety of reasons:
- Additional layer of high performance adaptive cache (e.g. server RAM) that
will change the character of both front-end and back-end I/Os on your existing
SAN storage array, reducing contention.
- Real-Time I/O subsystem uses I/O polling instead of interrupt handlers to complete I/Os
- Takes advantage of new technologies your existing SAN doesn't have: PCI-Express, 4Gbps
Fibre Channel, 667MHz memory, etc. For example, SANsymphony can talk 4Gb FC between your
application servers and its cache, and on the backend talk 2Gb FC between its cache and
your older SAN storage arrays.
- Allows channel fan-out, effectively creating a Network Storage Processor that
has more FC and/or iSCSI front-end ports than your existing SAN storage array.
- Can be used to implement higher performance nested RAID levels on top of your
existing SAN storage array or arrays: RAID 10, RAID 100, RAID 10+1, RAID 0+1, RAID 50+1, etc.
Distribute the Work Load Across Storage Processors and Channels
Consult your SAN Storage Arrays' utilities to determine if you have
adequately distributed (or "Load Balanced") the I/O charge across
your HBAs, fibres, switches, and storage processors.
Choose Optimized RAID levels for performance-critical volumes
Depending on the characteristics of your I/Os, optimizing RAID levels
and / or re-organizing the physical layout of your LUNs can improve
performance. If, for example, your database tables and logs are
located on the same RAID 5 spindles and you are having performance
issues, you should consider moving the logs onto another set of
spindles, perhaps in a RAID 1 or RAID 10 configuration. Keeping
small transaction write-intensive volumes off RAID-5 will definitely
improve performance.
While attempting to avoid contention for the spindles, you should
also consult your SAN Storage Arrays' documentation on how the spindles
are physically connected to the storage processor. Keep in mind that
in a dual-storage processor array, the disks are dual-ported. Laying
out the spindles of different RAID groups across different back-end
channels and keeping performance-critical volumes on separate channels
and spindles can reduce contention on the back end resources.
Use Storage Virtualization to Implement QoS in your SAN Environment
DataCore Software Corporation
offers a unique QoS or Quality of Service feature in their
SANsymphony product. SANsymphony implements a Network Storage Processor
at the heart of the SAN, caching I/Os and abstracting the storage from the
servers. The QoS feature allows you to create performance groups, where
the group members are mappings of volumes to channels (a la LUN masking).
These Performance Groups (known as QoS Domains) can be I/O throttled by
both IOPS and MB/s, allowing you to reserve SAN resources for your most
critical applications.
For instance, you can place snapshot mappings into a MB/s-restricted
"Backup Domain" to insure that the backup server doesn't cause undue
storage array thrashing or use up switch / fibre bandwidth if run during
production hours.
Use Tiered Virtual Storage Pools
Tiering your storage through virtualization can allow you to move less critical
volumes off onto lower cost, lower performance storage, freeing up your higher
performance, higher cost resources to focus on your most critical applications.
For instance, rather than implement snapshots on your Tier-1 storage, using
a Storage Virtualization platform such as
SANsymphony can allow you to take snapshots of your Tier-1 volumes on another
storage platform. Thus you recover expensive Tier-1 storage space otherwise
reserved for snapshots, and the Tier-1 Array is no longer involved in the
Snapshot processing.
Optimize Your Application Server's I/O Subsystem
Quite often, the performance bottleneck occurs in the Application Server
(or storage client)'s I/O Subsystem. If your Windows-based application
servers are I/O Bound, you should consider trying a utility from
DataCore Software Corporation
called "UpTempo".
This software package replaces the standard Windows I/O Subsystem with DataCore's
unique caching and high-performance I/O engine. Several benchmarks available
on the DataCore site indicate the product can improve I/O response times by
a factor of 3 to 4x, but of course, YMMV.
UpTempo's interface is a simple Windows Control Panel and
allows you to select the volumes (via "Drive Letters") you want UpTempo
to cache. You can choose Read Caching (the default) or Read/Write Caching.
You can also tune the amount of memory assigned to UpTempo cache via a
slider control. (Hint: Be conservative and leave the slider low (256MB
or more) to start; a common error when first evaluating the product is
to set the slider unnecessary high, turning all the server's RAM into
UpTempo I/O cache, leaving little for the applications themselves,
and thereby causing excessive paging.)
The good news is you can
evaluate
UpTempo for free. DataCore offers a free 30-day evaluation of the product,
available for download from their
website.
Other White Papers of Interest
|