Las Solanas Consulting

Storage Virtualization | FAQs & Discussions

DR and Asynchronous Replication - Tutorial and Best Practices

A White Paper by Tim Warden

Disaster Recovery (or DR as the industry has come to call it) refers to a contingency plan to salvage one's business in the event of a "smoking crater" event — a catastrophe. The scope of the topic is broad and creating a DR plan can involve many parts of the organization, including management, facilities, etc.

In the scope of IT, Disaster Recovery refers specifically to the implementation of some mechanism to protect the company's data, usually transporting it or replicating it offsite where it can then be used to get critical business systems back up and running. This alternate site can be a cold site, a "colo" (or collocation facility), or even a production data center at another of your organization's sites.

In the event a major incident occurs at the primary datacenter, a DR site provides some level of recoverability that could be measured in minutes, hours, or days depending on how much planning and resources have gone into the DR implementation.

DR compared to HA or BC

To begin with, let's delineate between a few of the common terms often employed when discussing matters of data protection and availability of service: H/A, BC and DR. The concepts are different, but they are not mutually exclusive.

High Availability or H/A refers to systems and components designed to withstand a variety of non-catastrophic local failures. The vendors implement H/A in servers and storage arrays via redundancy: redundant power, cooling, cabling, switches, RAID groups, dual processors, etc. The idea is that a fault of a component (such as an GBIC on an HBA) or a pulled cable shouldn't stop the show. An alternate path or component can take over without missing a beat. With H/A, users shouldn't notice any disruption in service when such a failure occurs.

Business Continuity / Continuance or BC takes this one step further. It is the idea of adding some additional level of redundancy to the architecture so that it can withstand the failure of entire systems without stopping production. Often when storage vendors talk about Business Continuity, they are implying the use of Synchronous Mirroring between two of their high-end storage arrays, perhaps separated over some short distance, such as between two buildings on a campus.

Whereas Business Continuity is usually implemented on a local campus or metro basis, DR is understood to imply geographical separation. Like BC, that separation can be "across the parking lot" or "across campus", but more typically it's "across the state" or "across the country".

As I said, H/A, BC and DR are not mutually exclusive. Many shops have requirements to be both Highly Available and have a DR plan in place; in some cases the two may be deployed using the same mechanisms, such as a stretch cluster across a campus or metropolitan area, where the two ends are both active production sites, synchronous mirrors of each other.

However, in the majority of cases, DR sites are not just BC extensions of the main site. Many shops with critical availability requirements will implement BC locally, and have a contingency DR site in another state or part of the country.

Bringing the DR site into production is a decision taken only when the primary site is deemed unavailable because of a disaster. How quickly that site is brought online depends on how much effort and resources have gone into the planning. Bringing the site online typically requires network changes (DNS, etc.), and an understanding that once you've flipped the switch, some planning will be required to flip it back once the primary site is restored to service.

How Much DR is Enough?

A DR project requires planning and the planning phase should not be taken lightly. While my intention is to orient this paper towards the technical aspects of a DR implementation, the choices you will make should, obviously enough, be guided by a risk assessment involving input from those "powers that be" in your organization.

Example. You run a small datacenter for a manufacturing subsidiary of a large company whose headquarters is overseas. Your local management expects you to keep systems running even if a large fire consumes the datacenter. You decide to implement your "DR plan" using pairs of VMotion/DRS enabled ESX servers and Synchronous Data Mirroring between pairs of storage servers. You physically separate the pairs of servers and storage arrays at extreme ends of the plant, separated by firewalls. You provide distinct redundant paths for both network and disk traffic. You install an external power generator at the site. You've architected a solution that combines aspects of Business Continuance with DR. This architecture can withstand a major fire or the destruction of half the plant. In fact, this type of solution should provide continuous service even if one half of the datacenter is down.

On the other hand, if a terrorist act or a so-called "Act of God" takes out the entire site, you probably don't care whether there is a DR site at the overseas headquarters or not — you reason that the plant is "history", so you and your colleagues will be out looking for new jobs. The subsidiary's data is not your fiduciary responsibility and therefore not your headache.

Perhaps the plant's management (who happen to be officers of the company) wouldn't agree with your plan. Depending on the perspective, one could argue you have not at all implemented DR...

What Do You Have To Lose?

...Which brings us to the topic of risk assessment. The risk assessment can be relatively straightforward. What external or internal forces could create an unrecoverable incident at the datacenter and how will the DR plan address that? For instance, if your datacenter is located in a seismically active area, you will want your DR site to be at least a few hundred miles away in a seismically stable zone, rather than 50 miles south near the same fault line.

As a part of the planning process, you'll need to define exactly what the DR site is supposed to provide. Which systems will you replicate? You may have some business systems that are "nice to have" but not critical or essential. And are you only interested in having a recent copy of the data offsite in order to satisfy regulatory laws? Or is the DR site supposed to provide a restoral of service within a defined window? Partial or full? And at what quality of service?

The common terms you will hear are MTRS (Mean Time to Restore Service) or RTO (Recovery Time Objective), meaning "How quickly can we flip the switch and get production running again at the DR site?" Another acronym you will need to be familiar with is RPO or Recovery Point Objective. The RPO defines the acceptable rollback point in case of disaster, or how much recent data loss your organization can tolerate. The RPO is typically expressed in minutes or hours, depending on the importance of the data you are replicating. You may have different RPO's for different systems, determined by the nature of the data and bandwidth / cost limitations.

You will obviously want to consider the feasibility of using existing resources (another of your organization's sites located in another city) versus building a site or leasing space and bandwidth at a colo. Is there sufficient bandwidth between the sites? Is that bandwidth provisioned for other applications, such as VoIP or Video Conferencing?

Three Way Intersite Replication
Three Geographically Distant BC Sites Inter-Replicating

Have you implemented Server Virtualization or Storage Virtualization? If not, you should consider both, as they facilitate implementing DR and can save you a lot of money.

Does your SAN storage array or Storage Virtualization system offer Thin Provisioning? This can make an enormous difference in bandwidth utilization for a few of the replication tasks.

Do you have databases with multiple tables spanning multiple LUNs? How will your replication strategy ensure consistency across those inter-dependent volumes?

How are you implementing backups today? Are tapes transported offsite? Do you have a VTL? Could the new DR site replace or augment the current backup mechanism? Are you implementing CDP (Continuous Data Protection) and will your CDP system be compatible with the DR implementation?

These are among the questions the consultant or VAR should be asking when you begin planning your DR project.

Synchronous vs. Asynchronous Mirroring

So far, we've discussed the differences between Availability and Disaster Recovery. We've also briefly touched on planning, or determining what you require from a DR implementation. Now let's turn our attention to the implementation and discuss the replication of your data.

Mirroring or replicating data essentially involves writing two copies of the data, one to each of a mirrored half. There are a few ways to implement mirroring. Many OS's have the ability to create disk mirrors, or "Software RAID-1". There are also third party software products that can implement asynch mirroring on behalf of the host. In general, it is preferable to have an independent, shared storage server implement the data mirroring, as it can consolidate the activity for many hosts and offload those hosts from the performance penalty of the extra writes.

As for Synchronous vs Asynchronous, as the terms would indicate, it's all about time. In particular, it's about the time that data written to disk is "committed". In Synchronous Data Mirroring, the write acknowledgement isn't returned to the requesting server until the data has been written to both storage arrays. This is akin to "RAID-1": the two mirror halves stay in "lockstep" — you know that at any moment you have two identical copies of the data on two devices.

Because the acknowledgement isn't returned until both storage devices have a copy of the write, this implies that the bandwidth of the channels used for the writes needs to be sufficient so as not to cause undue latency which would affect the responsiveness of the production servers.

As this latency directly affects the performance, Synchronous Data Mirroring is often not a viable solution for DR where distance and limited bandwidth would cause roundtrip latency to go way up. In such cases, Async Replication is the method of choice.

While not a hard and fast rule, most people associate Synchronous Mirroring with Business Continuity, whereas Asynch Mirroring is typically associated with Disaster Recovery.

Asynchronous Mirroring — An Overview

In Asynchronous Mirroring, we still have the concept of mirror halves, but for the purpose of clarity we will refer to them as a "source" volume and a "destination" or "replicated" volume. The source volume is the local production volume a host (such as your SQL server) writes to and reads from. Some agent then copies any of the writes over to the destination volume at the distant DR or remote site.

The agent is almost always software, implemented either on the host itself, on a SAN-based or DAS-based Storage Array, or on an intermediary engine, perhaps a Storage Virtualization platform.

If the agent is running on the host itself or on a file server (a.k.a. "NAS" storage such as a CIFS or NFS service), the replication can be implemented at the file level. When the agent is run on a Storage Array or Virtualization platform, the replication is invariably implemented at a raw disk or block level, where only the low-level writes or disk blocks that have been modified are copied to the destination volume. The scope of this article is limited to disk- or block-level replication.

With Asynchronous Replication it is understood that the bandwidth between the two sites is limited, and so writes to the local source volume are acknowledged immediately, and the agent subsequently replicates those changes to the destination volume. The destination volume is usually always in a "catch up" mode, anywhere from a few writes to a few gigabytes worth of writes behind the current state of the source. How far the destination falls out of synch (or lags behind) the source volume is determined by the quantity of changes occurring on the source volume, the bandwidth of the intersite link used for the replication, network QoS policies and/or replication schedules or throttles. There are many other factors affecting overall replication system performance, some of which we will discuss later. Finally, network outages that break the replication link can cause the two volumes to fall further out of synch.

Clearly, provisions must be made to keep track of changes to the source volume in order to replicate those changes and bring the source and destination as close to synch as time and bandwidth permit. There are a number of ways to do this, ranging from marking changed blocks that need to be replicated, to buffering the changes in a reserved storage area.

Some vendors offer de-duplicating technologies, which attempt to reduce the replication traffic by deleting buffered writes for blocks that have been written (and buffered) more than once; the idea is that they will only replicate the latest change to the block. There are pros and cons to the different technologies used; I personally prefer at least having the option to buffer every change to every block, as that facilitates snapshotting and newer technologies such as CDP (Continuous Data Protection) at the DR site.

Bandwidth — The Replication Challenge

Bandwidth usually presents the biggest challenge. High-speed connections aren't cheap and if you are an SMB, you may be financially constrained to T1 speeds.

Of course, the asynchronous replication engine is often based on a protocol, protocol converter, or application running on top of IP or TCP. Between the transport protocol overhead and the replication protocol overhead, you never get anything near the full, theoretical bandwidth of the connection.

If you're lucky enough to get 70% of the bandwidth usable, you're likely to see transfer rates for a dedicated connection similar to those in the table below.


Technology

Mb/s
Theoretical
GB/h
Expected
GB/h
T1 1.536 0.66 0.46
10Base-T LAN 10 4.39 3.08
DS3 (T3) 43.2 18.98 13.29
100Base-T LAN 100 43.95 30.76
OC3 155 68.12 47.68
OC12 622 273.34 191.34

Of course, there is also the overhead of Asynchronous Replication housekeeping (reconstruction of replicated blocks in the destination volume, etc.) not to mention the contention with other applications that may be consuming storage and network resources at the source and at the destination sites.

Ultimately, you'll need to quantify the change rate or the cumulative amount of data writes that occur over a period of time for the volumes you need to actively replicate. You will use that value to determine what bandwidth you will need. We will refer to the period of time as your replication latency window. It is determined by such factors as production or peak hours and your target RPO (Recovery Point Objective, the rollback point, or acceptable lag between where you were when the disaster hit and your "last known good" at the DR site).

For instance, if your organization is 24/7, has a very aggressive RPO and you're changing a lot of data on an hourly basis, you'll need more bandwidth. On the other hand, if most of your data writes occur between 8am and 5pm, and your RPO is 24 hours, and you only have about 20GB of data changing daily, then you can tolerate a 24 hour replication latency window. You then determine you only need about a 3Mb/s connection between the two sites... the replication will "catch up" during the off-hours.

Measuring The Change Rate

Many tools are available to measure the rate of change of your data, and you may already have monitoring tools available in your organization, either on the servers themselves or perhaps on a SAN storage array.

If you need help collecting the change rate, DataCore offers a package called SANmaestro that can be used to determine the quantity of writes per volume. If you already have a DataCore SAN in place, you'll know that SANmaestro can collect and report on hundreds of parameters from the DataCore storage controllers.

If you have not yet virtualized your storage with DataCore, you can still use their Windows-based SANmaestro agents to collect the change rate on any of your Windows servers.

Tweaking Performance

The vendor or consultant with whom you will work to deploy the solution should be able to provide some guidelines for how to maximize performance, based upon the vendor's particular implementation. This may involve choosing a suitable media to buffer replication traffic, making sure the buffering doesn't create contention with production volumes, adjusting buffer sizes and parameters, etc.

In the replication example at the end of this article, I will give a handful of general best practices for the DataCore AIM product.

Know Your Data — Know What To Replicate

We've already discussed determining which applications you need to replicate. Perhaps you have application servers on the floor that are "nice to have" and not "Critical" in a disaster recovery scenario. Your available bandwidth (and budget) will ultimately determine which of those "nice to have" servers can be replicated.

When considering the data to be replicated to the DR site, separate out that disk data which is "program", "system", or "execution" oriented data from the real "user" data. The "pipe" between your two sites more than likely will have limited bandwidth (often a T1 or DS3), so you'll only want to actively replicate the essential data.

Note there will be some essential servers on the floor that will not be candidates for replication, although you will certainly need their counterparts to offer those same services at the DR site. Examples include the servers providing your DNS, DHCP, or Active Directory services.

But even on your database and file servers, there will be disk-based data that does not need to be actively replicated. Consider a Windows 2003 based SQL server. Typically, you'll have a "C:" drive that serves as your Boot and System partition. In most cases, this drive will have the Windows system, standard Program Files as well as the SQL application and utilities.

By default, it will also contain the page file used as swap space for virtual memory. If the server is a memory hog, the page file usage may not be negligible. This means your "C:" drive can have a lot of transitory write activity. Unfortunately, a block level replication engine can not distinguish between such temporary writes and the real data you need to replicate, so if you're replicating the "C:" drive, you will also be replicating any of the page file writes and temporary caches that are located on that volume.

For this reason (among many others, such as performance, stability, manageability, portability, etc.), it is always preferable to keep user data well separated from system and program data. In the case of your SQL database, you'll usually have at least one or two additional drives to hold logs and tables, perhaps drive letters "E:" for the logs and "F:" for the tables (let's not make the example too complicated...)

You'll only need to replicate the "E:" and "F:" drives which contain the real data. You will want to have a clone of the "C:" drive at the DR site. This may be a ghost image or perhaps you'll use the replication engine to initially replicate the "C:" drive into a "quiesced" or "crash coherent" state and then break the replication relationship. This will be your "baseline" "C:" drive containing the functioning system and applications. You will then need to determine a policy for applying any updates or patches to not only the local server (local "C:" drive) but also its dormant DR counterpart, the split mirror half serving as a baseline at the DR site. Presumably you can use something like Remote Desktop, VNC, Virtual Console or an ILO type package to access the DR servers to patch them as your policy dictates. You may determine that the dormant DR servers do not require the same rigor for patching as the live production servers.

In this discussion we've talked about one server, replicating its three drives to the DR site. More typically, you will have several servers you need to replicate: one or more file servers, database servers, etc., so your bandwidth will be shared by all.

As such, you may need to look at creative techniques for reducing the quantity of data you will replicate. For instance, some database vendors can offer their professional services to help you implement the replication of a database wherein you only actively replicate the redo logs. You make a baseline copy of the tables at the DR site, and periodically apply the replicated redo logs to the baseline. In our example above, that could potentially eliminate the need to actively replicate the "F:" drive. Much has been written on this subject by the DB vendors, so we'll leave it outside the scope of our discussion.

Replication with Virtual Machines a la VMWare

Server Virtualization packages such as VMWare or Virtual Iron facilitate the implementation of a DR site. Even if you aren't currently using server virtualization for your production systems, being able to consolidate several of the replicated servers as guests in a virtualized environment can save your organization a lot in terms of capital assets, facilities and management.

Virtual servers also make it easier to bring up an isolated environment in which you run your replicated servers and their associated data to test for coherency.

Several tools exist from companies such as VisionCore, PlateSpin and VMWare to provide Physical to Virtual services to help you redeploy physical production servers in a virtual DR environment.

Most shops implementing virtual servers with asynchronous replication simply replicate the LUNs containing their existing virtual server file systems (a.k.a. VMFS in VMWare). While this is a viable solution, the additional layer of virtualization implied by the .vmdk files means some important guest file system information may not be committed to disk as frequently as required by your RPO. Ultimately, what this means is that although a replicated VMFS may appear coherent and healthy, one or more of its .vmdk files may not be.

If possible, consider using RDMs (Raw Device Mappings) in ESX to hold the data. RDM's makes it easier to control which volumes you replicate and make it easier to take crash coherent snapshots of the volumes. In VI3, RDMs work with VMotion in Physical Compatibility Mode, and so present an alternative to having the data prisoner in a proprietary vmdk.

If you are only planning on using server virtualization at the DR site, RDM's can facilitate the virtualization, allowing you to leave the replicated volumes as they are, presenting them to the virtual machines in their native, non-virtualized state.

However, using RDM's have its own set of issues. Most of the ESX admins I've spoken with don't like the idea of having to manage the multiple RDM's associated with the "server sprawl" that is typical of virtual server environments. If you have many volumes, you will need to consider not only that management aspect, but also the disk management overhead on both the hosts and the storage server.

If you must use VMFS volumes, at least separate the guest VMs' system and program file data from the user data you need to replicate by placing their corresponding vmdk's on different VMFS volumes. You will then actively replicate a guest's VM's user data and use a complementary mechanism (storage server-based snapshots, etc.) to replicate the system volumes.

VDI adds yet another perspective to splitting out data from systems. If you were to replicate a wholesale VMFS containing many virtual desktops and their associated volumes, you would also be replicating the page files, web browser caches, etc.

Finally, if you are using or plan to use VCB (VMWare Consolidated Backup), keep in mind that the current implementation does not support having a VM guest's .vmdk files on different VMFS's, so this could present a challenge.

Establishing The Asynchronous Mirror

When establishing a relationship between a replication source volume and its partner destination volume at the DR-site, a baseline mirror will need to be created. This is to say, before the destination volume can be used, all the data on the source will need to be initially copied to the destination.

If the volume is large and the intersite bandwidth limited (which is often the case), some means will need to be employed to initialize the mirror "out of band". Most vendor's solutions provide a mechanism (or white paper) describing how to physically transport a tape or disk containing the mirror to the remote site. The idea is that FedEx or DHL will be faster than your T1.

Depending on the SAN or shared storage array technology you are using, this may easy or hard. In the case of the DataCore solution, it is a relatively easy procedure, and can be accomplished using simple USB-2 or Firewire disks.

The table below is a rough estimate of how long it would take to initialize a volume given different network technologies. Keep in mind that these measurements are based on initializing the volumes one at a time, and presuming that each has dedicated use of the bandwidth. Real life is, obviously, quite a bit different, but this table should help you get some idea of whether you will be able to establish the mirrors via the network or whether you'll have to ship disks.

Est. Hours To Replicate Capacity in GB

Technology
20 80 120 200 300 730
T1 42.33 169.31 253.97 423.28 634.92 1544.97
10Base-T LAN 6.50 26.01 39.01 65.02 97.52 237.31
DS3 / T3 1.50 6.02 9.03 15.05 22.57 54.93
100Base-T LAN 0.65 2.60 3.90 6.50 9.75 23.73
OC3 0.42 1.68 2.52 4.19 6.29 15.31
OC12 0.10 0.42 0.63 1.05 1.57 3.82

Again, the numbers above are just a rough guide based on getting about 70% of the advertised bandwidth to move your data. There can be a lot of other factors that can seriously skew those numbers: network congestion, disk and server performance, etc. The main point to take home is that if it's going to take more than a week (> 168 hours) to get everything synched up, you may want to consider shipping disks.

If you have the luxury of establishing the asynchronous mirror relationship prior to formatting the source volume, you will not need to do this initial synchronization, as both source and destination volumes can be considered in synch in as much as they are both un-formatted. So if you are planning to standup a new file server that you wish to replicate, create the asynch mirror relationship before formatting and creating the file shares. Also, if your OS supports the notion, do a "Quick Format" as opposed to a low-level format which would write every block on the disk, effectively resulting in a full synchronization.

Avoid Routines That Generate Unnecessary Write Traffic

The first and most obvious that comes to mind is Disk Defragmentation. Presumably you will be implementing block-level replication in a SAN-based or shared storage environment where disk defragmentation has less value.

For example, if the volume you are defragmenting is a SAN-based volume on, say, a RAID-5 group on an EVA-6000, the defragmentation becomes meaningless as the EVA "virtualizes" that RAID-5 group and actually writes the volume across all the spindles in the storage array. Defragmenting will simply move blocks around on the EVA, potentially scattering them further, potentially increasing seek and rotational latencies, all the while potentially consuming all the bandwidth of your intersite link by redundantly replicating the defrag writes.

[Read my white paper on Defragmentation of SAN Volumes.]

Coherency of Replicated Volumes

[discussion of volume coherency, ability of servers and apps to recover from volumes marked inconsistent, etc.]

Replicated Volumes: A Moving Target

Once the mirror has been established, the destination volume will be in a constant state of flux as new replicated blocks are received and destaged to their corresponding logical locations on the volume. In this state, you will be unable to use the replicated volume. If you attempted to mount it on a DR site server, you would eventually get errors as the DR site server would not be aware the volume's contents were changing beneath it.

To that end, the replication solution should provide some means for you to access the destination volumes while they are actively receiving the replication. Typically, this is done with point-in-time "Snapshots".

Implementing Coherency Points

Snapshots are point-in-time copies or clones of volumes. They are usually implemented in the shared storage array, or in the case of a SAN or tiered storage environment, they are ideally implemented in a Storage Virtualization engine. While different Snapshot technologies exist, the common denominator is that they allow you to manipulate, mount, backup, etc. an independent copy or image of a production volume without disturbing the actual volume.

Snapshots are commonly used to create rollback points, eliminate backup windows, or generate copies of real data for dev/test, datamining, etc. They are also an ideal way of making crash-coherent rollback points of the replicated volumes.

Snapshots provide a means to implement secure off-site tape backup at your DR site.

The replication technology you choose should allow you to automate the snapshot process or at least to script the process. It should also allow you to create more than one snapshot relationship per replicated volume. For instance, you may want a script to take a snapshot taken every morning at 3am and initiate a backup of the snapshot to tape, whereas you may also have another snapshot on the volume that is updated every thirty minutes in accordance with your RPO.

Employing Source Snapshots To Control Replication

Depending on the vendor's implementation, you may also be able to use snapshots at the source site as a means to control or reduce replication traffic. The idea here is to replicate the snapshot volume — and periodic updates to the snapshot volume — instead of replicating the ever-changing production source volume. This is a means of de-duplicating the replication stream in that regardless of how many times a block has changed within a given snapshot interval, only its most recent content will be replicated.

As we have noted, there are a few methods to implement snapshot technology. For the sake of this discussion we will limit the example to "Copy On Write" type snapshots with the proviso that the implementation can create complete snapshot clones where all blocks are copied to the snapshot volume.

In "Copy On Write" snapshots, a relationship is created between a source volume and some reserve space or snapshot volume that presents an image of the source at some particular fixed point of time. When the snapshot is first "taken" or "enabled", we will call that time T0. Most Copy On Write snapshot engines allow you to periodically increment or update the snapshot to T1, T2, etc. With each update Tn, we are in effect copying the most recent changes to all changed blocks over the interval [Tn-1 .. Tn] from the source volume to the snapshot volume. If we are replicating the snapshot volume, those most recent changes will be replicated as they are written to the snapshot volume.

Consider a volume "G:" that you want to replicate to the DR site with an RPO of 30 minutes. At the production site, create a clone or complete image snapshot of the source volume "G:". Let's call the snapshot volume "S:". Replicate that snapshot volume "S:" to the DR site. Once initialized, we will have a baseline volume of time T0 at the DR site. Call that replicated snapshot "R:". Now schedule the snapshot "S:" to automatically increment or update every 30 minutes.

  • 8:00 - Snapshot updates to T1
  • 8:18 - You create a file on "G:" named "8h18"
  • 8:25 - You mount a snapshot of the replicated volume "R:" at the DR site and notice that file "8h18" does not yet exist.
  • 8:30 - Snapshot updates to T2
  • 8:37 - You create a file on "G:" named "8h37"
  • 8:45 - You mount a snapshot of the replicated volume "R:" at the DR site and see file "8h18", but not file "8h37" which has not yet been replicated.
  • 9:00 - Snapshot updates to T3
  • 9:05 - You mount a snapshot of the replicated volume "R:" at the DR site and observe that file "8h37" is now available.

The example above makes assumptions about the replication latency window, and obviously the times chosen were completely arbitrary. Clearly, all the factors we have discussed in previous sections still apply: you will need to assure that the snapshot updates that need to be replicated from "S:" to "R:" can take place within the defined replication latency window — in this case, our RPO of 30 minutes.

To recap, in some instances replicating snapshots of your production volumes can be useful for reducing replication traffic by only replicating the most recent changes of an incremental snapshot interval, effectively de-duplicating the changed blocks.

Testing The Solution — Is The Replicated Data Usable?

You will want to periodically test your DR environment to assure it is ready to deploy.

Snapshots can be used to mount copies of the replicated volumes on their corresponding DR servers.

Using server virtualization at your DR site will facilitate bringing up your replicated servers with their data in an isolated environment, enabling you to verify the readiness of the servers and coherency of data without actually "flipping the switch".

A Replication Solution Based on DataCore Products

Continue to the next page to read about how DataCore implements Async Replication.