Monitoring Data Replication - Overview
Replication is a continuous activity and details of on-going replication activity is shown in the Data Replication Monitor in the CommCell Console. See View Data Replication Monitor for step-by-step instructions.
From the Replication Monitor you can:
- View details of data replication activities. See View details of data replication activities for step-by-step instructions.
- View the failed files for a Replication Pair. See View the failed files for a Replication Pair for step-by-step instructions.
- Filter which clients activities are displayed. See Filter which clients activities are displayed for step-by-step instructions.
- Send Log Files of Replication Pair. See Send Log Files for instructions.
- Start/Suspend/Resume/Abort data replication activity. See Start/Suspend/Resume/Abort Data Replication Activity for step-by-step instructions.
All other job-based activity, such as Recovery Point creation, is reflected in the Job Controller. See Controlling Jobs in Job Management for comprehensive information.
CDR utilizes phases to perform three types of operations - initial data transfer or baselining, smart synchronization, and continuous data replication. The sequence of these phases is listed below along with details of CDR activities during each phase, and the consequence of an interruption, such as a temporary loss of connectivity:
|Baseline Scan|| For Windows only, start NTFS journaling on the source to track any file operations that occur during the entire Baseline phase.
Scan source path to obtain the number of files and bytes to transfer.
Generate Collect File.
|The Replication Pair will show a Job State of Preparing for Replication in the Data Replication Monitor.
If this phase is interrupted:
A Full Re-Sync will start at this phase.
|Calculates checksum on the source and destination to identify files that will be sent to the destination.
Data is transferred from the Replication Pair source path to the destination path using the checksum.
|If this phase is interrupted, it can resume again at the same point.|
|SmartSync Scan||Create a non-persistent snapshot; for Windows, compare it to the change journal.
Scan snapshot and generate a new Collect File for any files or directories that were added or data that was modified since the beginning of the Baseline Scan phase.
| For Windows:
If this phase is interrupted:
A Smart Re-Sync will start at this phase.
|Processing Orphan Files||Compare the Collect File to the Destination to identify orphan files, and apply orphan file settings.||Any data that was deleted on the replication source during the Baselining phases are treated according to your settings for Orphan Files.
If this phase is interrupted:
| Checksum Calculation
(On Windows only)
|Calculate checksums on the source and destination to identify files that have changed since Baseline Scan.||If this phase is interrupted, it will resume again from the beginning of this phase; however, if the snapshot is no longer available, it will return to the SmartSync Scan phase.|
|SmartSync||Transfer all changed files to destination from the new Collect File.||If this phase is interrupted:
| Updating Smart Sync
(On Windows only)
|Compare time stamps on source and destination and update.
Temporary snapshot is deleted.
|If this phase is interrupted, it will resume again from the beginning of this phase.|
|Replication||Data is continuously replicated from the source to destination.||Log Transfer & Log Replay activity is on-going. For more information, refer to Replication Logs.
The Replication Pair will show a Job State of Replicating in the Data Replication Monitor.
If the Replication phase is interrupted, when restarted, if it is possible, replication will begin again from the last log replayed on the destination; if this is not possible, the Replication Pair will return to the Baseline Scan phase (Full Re-Sync) or to the SmartSync Scan phase (a Smart Re-Sync) depending on the nature and duration of the interruption. Note that if a user manually restarts Replication by choosing Start Full Resync, the Replication Pair will return to the Baseline Scan phase.
For the SmartSync Scan, while new files and directories will be copied in their entirety, modified files do not need to be copied. Thus, for larger files, only the modified portion is re-copied, while smaller files with substantial changes may be copied in their entirety. Modified files below a certain size threshold are copied again as complete files, while files above that size are broken into blocks with just the changed blocks copied to the destination computer.
Files smaller than 256KB will be copied in their entirety whether they match the destination or not. For files above 256KB in size, only the changed blocks will be transferred; the default block size for hashing is 64KB. The default values of the minimum file size and the block size for hashing, can be configured in Replication Set Properties. See Create a Replication Set for step-by-step instruction.
By default, CDR handles interruptions by seamlessly restarting replication, but if that is not possible, Smart Re-Sync will be started. However, some interruptions will require a Full Re-Sync. The following sections describes each phase and restart behavior when the phase is interrupted:
Smart Re-Sync is the default behavior of CDR when activities are interrupted and cannot be seamlessly restarted at the same point again. In general, CDR endeavors to do the following in such cases, wherever possible:
- continue logging on the source
- continue replaying logs on the destination which were received before the interruption
- restart activities exactly where they were interrupted, or as close to that point as possible
For examples of commons types of interruptions, and how Smart Re-Sync handles the recovery, refer to System Behavior when Replication is Interrupted.
For a detailed listing of each phase, and the specifics of the exact point at which Smart Re-Sync restarts activities, refer to Job Phases.
Full Re-Sync should be necessary only in cases such as the following:
- the data on the destination is altered by means outside of the replication process, e.g., manually deleted or modified, etc.
- an interruption is of long enough duration that the logs overflow on the source
In such a case, all existing content in the destination path is considered inconsistent and Full Re-Sync is recommended to rebuild it again based on the current data in the specified source path. When you start replication from the Replication Set or Replication Pair level, you can specify Full Re-Sync, causing the Replication Pair to begin at the Baseline Scan phase.
Data Replication will be interrupted if a hard disk used for either a source or destination is put into the 'standby' state through the power schema configuration. It will be necessary to abort activity for all affected Replication Sets and restart them again using Start Full Resync after such an event.
Changes to the following configuration items will not be effective until data replication activity has been interrupted and restarted:
- Job Results Directory in Advanced Client Properties (Job Configuration) - any Replication Pairs in the Replicating state must be aborted and restarted.
- Impersonate User in Advanced Client Properties (Job Configuration) on a Destination computer - restart destination computer. (This applies on Windows only.)
- Automatically delete Orphan Files in Replication Set Properties (Orphan Files) - any Replication Pairs in the Replicating state must be aborted and restarted.
- Exclude these Files/Folders/Patterns for content in Replication Set Properties (Filters) - any Replication Pairs will be aborted and restarted by the system.
The following will require data replication to be interrupted and restarted:
- On Windows, if chkdsk is run on a hard disk used for either a source or destination, the affected Replication Pairs in the Replicating state must be aborted and restarted using Smart Re-Sync.
- By default, CDR will always replicate only the new or updated data in the source path. If data is deleted on the destination, since there has been no change on the source, that data will not be replicated again, unless you abort the Replication Pair and perform the following to recopy the data from the source to the destination again:
- On Windows, perform a Full Re-Sync.
- On UNIX, perform a Smart Re-Sync.
There are several ways in which data replication activity can be interrupted, and CDR recovers from each of them in a similar manner. The table below provides a listing of common causes of interruption, and the effect of them on Baselining, SmartSync, and data replication, as well as how CDR recovers from them.
Effect Of Interruption & Smart Re-Sync
|Abort a Replication Pair during Baselining phases||Baselining activities stop on the source.
When the Replication Pair is restarted, Baselining activities will resume, restarting at the beginning of the phase if necessary, then SmartSync and data replication activities will begin automatically.
|Abort a Replication Pair during SmartSync phases||Logging stops on the source.
When the Replication Pair is restarted, SmartSync activities will resume, restarting at the beginning of a phase if necessary, and data replication activities will begin automatically.
|Abort a Replication Pair during Replication phase||Logging stops on the source.
When the Replication Pair is restarted, for NTFS or UNIX, Smart Re-Sync will continue the data replication activities automatically; for FAT file systems, Full Re-Sync will be necessary.
|Suspend a Replication Set||Baselining, SmartSync, and data replication activities stop for all Replication Pairs, but any logging activities will continue on the source.
When the Replication Set is resumed:
|Graceful or non-graceful shutdown of the source computer||The destination computer continues to replay the logs it has received.
When the source computer and software are running again, Replication Pair(s) will be in the System Aborted state for some time, then Smart Re-Sync will be performed.
|Graceful or non-graceful shutdown of the destination computer||Logging continues on the source.
When the destination computer and software are running again:
|CDR software shutdown on the source||All CDR-related activities stop.
When the software is restarted, CDR will start Smart Re-Sync.
|CDR software shutdown on the destination||Logging continues on the source.
|Replication Service is stopped on the source||Baselining, SmartSync, and data replication activities stop for all Replication Pairs, but logging continues on the source, and the destination computer continues to replay the logs it had received before the service was stopped.
When the Replication Service is started again:
|Replication Service is suspended on the destination||Baselining, SmartSync, and data replication activities stop for all Replication Pairs, and log replay stops on the destination, but logging continues on the source.
When the Replication Service is started again:
|Interruption of network connectivity (source and/or destination)||Baselining, SmartSync, and data replication activities stop for all Replication Pairs, but logging continues on the source, and the destination computer continues to replay the logs it had received before the network connectivity was interrupted.
When network connectivity is restored:
If the network interruption is for a significant amount of time, the following will occur:
| Source computer runs out of log space (Windows)
-- or --
Source computer tries to create new entries in a log before the old entries have been transferred to the destination (UNIX)
|Logging will stop, all logs will be deleted, all Replication Pairs will be System Aborted.
For instructions on restarting replication after it has been interrupted, see Start/Suspend/Resume/Abort Data Replication Activity.
The Data Replication Monitor shows the state of each Replication Pair. These states are briefly described:
|New Pair||The Replication Pair has been created, but no activity has taken place yet.|
|Preparing for Replication||CDR is scanning the source paths, preparing for initial transfer or Full Re-Sync.|
|Baseline||For detailed information, see Baseline.|
|Initial Sync||For detailed information, see Baseline Scan.|
|SmartSync Scan||For detailed information, see SmartSync Scan.|
|SmartSync||For detailed information, see SmartSync.|
|Processing||For detailed information, see Processing Orphan Files.|
|Replicating||Data is being continuously replicated.|
|Replicating (Not verifiable)||The most recent communication between the CommServe and CDR Client indicated the job was in the Replicating state, but this cannot be verified because communication has been interrupted.|
|Suspended||Replication activity has been temporarily halted, either by a user, or because communication between the source and destination has been interrupted. Logs continue to be written on the source.|
|Pending||There has been a temporary interruption and CDR is attempting to reconnect and resume operations.|
|Failed||Phase failed to complete, or log transfer has stopped, perhaps for connectivity issues; logs continue to be written on the source.|
|Paused||CDR is trying to resume replication activity.|
|Stopped|| Replication activity has been halted by one of the following:
|System Aborted||For CDR on Windows only, a Replication Pair will be in this state for 3 minutes if the source disk hosting replication logs runs out of space, after which the system will attempt to restart.|
To see more information about a particular Replication Pair, see View details of data replication activities.
You can change the state of a Replication Pair, or several at the same time. See Change the State of Replication Pair.
- The status of all Replication Pairs is not immediately updated when one Replication Pair is resumed. For instance, when all Replication Pairs had been placed in the Paused state, and you Resume one of them, a prompt will ask if you want all Pairs to be resumed. If you choose to do so, all the Replication Pairs that were placed in the Paused state will Resume, and be placed back in the same state they were in previously. However, the CommCell Console will not immediately reflect the status of all the other Replication Pairs that were resumed, and they may still be shown in the Paused state. The CommCell Console will properly synchronize and display the correct state of the Replication Pairs within a few minutes.
- During SmartSync of application data, Data Replication Monitor may display more than the actual number of files transferred.
The following information is available in the Data Replication Monitor:
|Active||When the symbol is green, it indicates recent activity for the Replication Pair; an orange symbol indicates no recent activity. An exclamation point preceding the symbol indicates that some files are not copied successfully to the destination computer during replication. To see failed files for a replication pair, see View the failed files for a Replication Pair for step-by-step instructions.|
|Phase||The current phase of the job; for more detailed information see Job Phases.|
|Job ID||A unique number allocated by the Job Manager for the operation.|
|State||The current state of the Replication Pair; for more detailed information see Job States.|
|Last Update Time||The date and time of the CommServe when the Job Manager last updated the Data Replication Monitor.|
|Pair Abort Reason||For a Replication Pair that was aborted, the reason is listed.|
|Last Error||The most recent error message for this Replication Pair.|
Initial Sync Information
|Start Time||The date and time of the CommServe when data replication activity began for the Replication Pair.|
|Number of Files To Be Transferred||The files remaining to be transferred for the Replication Log file currently being replayed on the destination.|
|Number of Files Already Transferred||The files transferred for the Replication Log file currently being replayed on the destination.|
|Data To Be Transferred during Initial Sync On Source||The aggregate size of all files to be transferred between the source and destination for the Replication Pair. The actual data transferred may differ slightly from this number, based on whether a given file actually gets transferred in full or in part.|
|Data Transferred during Initial Sync On Destination||The sum of all data already transferred between the source and destination for the Replication Pair.|
|Throughput Unit||The rate of data transfer during Baseline phase, in GB/hour.|
|Progress||The percentage of files transferred for the Replication Log file currently being replayed on the destination.|
Replicating State Information
|Last Log Played Time||The date and time of the CommServe when the most recent Replication Log was played on the destination computer.|
|Replicated Data||The sum of all data transferred between the source and destination machines since the Start Time.|
|Attempts||The number of attempts at replication the system has made for the Replication Pair.|
|Latest Source Log||The number of the most recent Replication Log that was created on the source computer.|
|Latest Destination Log||The number of the most recent Replication Log that was replayed on the destination computer. If this number is lower than the Latest Source Log number, it indicates that the destination computer has not yet replayed all of the Replication Logs that have been created on the source computer.|
|Pair ID||A unique number allocated by the Job Manager that identifies the Replication Pair.|
|Source Path||The path on the source computer for the Replication Pair.|
|Destination Path||The path on the destination computer for the Replication Pair.|
|Replication Set||The name of the Replication Set.|
|Replication Type||The type of replication configured for the Replication Set. (See Data Replication Type.)|
|Client||The CDR Client that is the source computer for the Replication Pair.|
|Destination Host||The CDR Client that is the destination computer for the Replication Pair.|
The following information is available in the Attempts window:
|Phase||The phase that the Replication Pair was in at the time of the attempted activity.|
|State||Current state of the Replication Pair.|
|Start Time||The date and time of the CommServe when the attempted activity began for the Replication Pair.|
|End Time||The date and time of the CommServe when the attempted activity ended for the Replication Pair.|
|Elapsed Time||The amount of time that elapsed while the activity was being attempted for the Replication Pair.|
|Files to Transfer||Files to be transferred to the destination computer for the Replication Pair, based on the initial scan.|
|Files Transferred||Files already transferred to the destination computer for the Replication Pair.|
|Data Transferred||The sum of all data already transferred between the source and destination during the attempted activity.|
|Data to Transfer||The aggregate size of all files to be transferred between the source and destination for the Replication Pair. The actual data transferred may differ slightly from this number, based on whether a given file actually gets transferred in full or in part.|