Job Management

Topics | How To | Support | Related Topics


Overview

Viewing Job Information

Controlling Jobs

Viewing Job Status

Job Filters

Important Considerations for Running Jobs

Preempting Jobs

Restarting Jobs

Retrying Jobs

Resuming Jobs

Other Considerations


Overview

The Job Controller allows you to manage and monitor the following types of jobs:

The Job Controller window displays all the current jobs in the CommCell. A status bar at the bottom of the job controller shows the total amount of jobs; the amount of jobs that are running, pending, waiting, queued and suspended; and the high and low watermarks. The watermarks indicate the minimum and maximum number of streams that the Job Manager can use simultaneously.

Viewing Job Information

Information about a job is continually updated and available in the Job Controller or Job History window. When a job is finished, the job stays in the Job Controller for five minutes. Once a job is finished, more information about that job is obtainable using the Job History.

The following job information is displayed, depending on the selected Job History:

Job ID A unique number allocated by the Job Manager that identifies the data protection, data recovery, or administration operation.
Operation The type of data protection, data recovery, or administration operation being conducted.
Client/Client Computer For data protection operations, the client computer to which the backup set and subclient belong. For data recovery operations, the computer from which the data originated.
Destination Client The destination client to which the recovered data will be stored.
Agent Type The agent that is performing the operation. (e.g., Windows 2000 File System).
Instance/Partition The instance/partition in the client computer that represents the database that was included in this operation.
Subclient The subclient that was protected during the operation. Note that a deleted subclient will have a Unix time stamp appended to its name in cases where another subclient is currently using the same name as the deleted subclient.
Job Type The type of operation that is being conducted on data.
Backup Type The type of backup that was conducted: Differential, Full, Incremental or Synthetic.
Failed Folders The number of folders that were not included in the operation.
Failed Files The number of files that were not included in the operation.
Storage Policy The storage policy to which the operation is being directed.
MediaAgent The MediaAgent to which the operation is being directed.
Status The status of the operation. For job status descriptions, see Job Status Levels
Progress A status bar indicating its progress. The progress bar is not visible for certain operations (e.g., data aging) or for the initial phases of some data protection operations.
note.gif (292 bytes) The Job Controller progress bar will not display the progress of SAP for MAXDB backup and restore jobs accurately. This is true because Data Protection Suite cannot detect data or objects transferred by SAP for MAXDB due to the way SAP for MAXDB transfers these items.
Errors Displays any errors that have occurred during the operation, such as a hardware problem or the job has run outside of an operation window. (See Job Errors for more information.)
Backup Set The backup set that was protected/recovered during the operation and to which the subclient belongs.
Index Displays New Index to indicate a new index was created during the operation. If blank, a new index was not created.
Instance/Partition The instance/partition in the client computer that represents the database that was included in this operation.
Phase The current phase of the operation. The number of phases varies depending on the operation.
User Name The name of the user who initiated the operation.
Priority The priority that is assigned to the operation. (For more information, see Job Priorities and Priority Precedence).
Start/Start Time The date and time on the CommServe when the operation started.
End Time The date and time on the CommServe when the operation was completed.
Elapsed The duration of time consumed by the operation.
Libraries The libraries that is being used by the operation.
Drives/Mount Paths The drives/mount paths that are being used by the operation. For more information about media, see Media Operations.
Last Update Time The last time the Job Manager received job updates for the operation.
Transferred The amount of data that has been transferred for the operation at the present time.
Estimated Completion Time The time that the system estimates for this job to be completed.
Size on Media The amount of compressed data that was transferred to the media (excluding duplicated data).
note.gif (292 bytes)
  • The amount displayed is a compressed amount and includes valid and invalid attempts of the backup jobs.
  • Application data that is backed up may include sparse files. As a result, the displayed size of the data may be greater than expected.
  • If viewing from the storage policy copy level, amount displayed may be less if job is partially copied.
Size of Application The amount of the application data that has been protected.
note.gif (292 bytes)
  • Application data that is backed up may include sparse files. As a result, the displayed size of the data may be greater than expected.
  • If job has completed with multiple attempts, the amount displayed may be larger.
Size of Backup The amount of compressed data that has been protected, which includes all application data and metadata.
Content Indexed Displays whether content indexing was used for the operation.
If viewing job history data from:
  • Versions 6.1.0 and prior: Yes or No
  • Versions 7.0.0 and later: Full, Partial, or No

Note that if a job is displayed as partially content indexed, not all of the data protected in the job was content indexed successfully. Rerun content indexing on this job so that the protected data is fully content indexed.

Delay Reason The description of the reason why the operation may be pending, waiting, or failing.
Alert The name of the job-based alert, if configured for the job.
Job Initiation The origin of the operation: the CommCell Console (Interactive), a schedule (Scheduled), or a third party interface (Third Party).
Maximum Number of Readers The maximum number of readers that can be used for the operation.
Automated Content Classification Policy Name of the Automated Content Classification Policy.
Legal Hold Name The Name specified for the Legal Hold data.
Legal Hold Retention Time The time frame for which the Legal Hold Data will be retained.
Number of Readers in Use The number of readers currently in use for the operation.
Number of Objects The total number of objects including successful, failed and skipped.
note.gif (292 bytes) For a Unix File System iDataAgent backup job that includes hard links and for which the HLINK registry key is set to Y and the appropriate hard link updates are applied, the value in this field will also account for the number of hard links and hard link groups that were backed up.

See the Service Pack documentation for more information on hard link updates.

Restart Interval The amount of time the Job Manager will wait before restarting a job that has gone into a pending state. This is set in the Job Management (Job Restarts) tab.
Max Restarts The maximum number of times the job will be restarted after a phase of the job has failed. This is set in the Job Management (Job Restarts) tab.
Error Code Error Code for job pending or job failure reason. (See Job Errors for more information.)
Retained By The type of retention rules defined for the job, basic or extended. For more information, see Data Aging.
Description A brief description of the running job.

The Pause and Play buttons allow you to control how the Job Controller displays real time information from active jobs. The Pause button stops the Job Controller from displaying real time information collected from jobs. The play button allows the Job Controller to display real time job updates.

To see all the columns in the Job Controller window, use the scroll bar at the bottom of the window.

Job Errors

If a job has not completed successfully, the Error Code column will display a unique code linking to available troubleshooting and knowledgebase article(s) relevant to that error from the customer support website. These articles may include special considerations for the type(s) of job(s) you are running, suggested workarounds for issues, and common causes for that particular error.

If an error code pertains to more than one issue, the customer support website will display links to all articles for which the code is relevant. Conversely, if an error code does not have any articles associated with it, the customer support website will display a message indicating that no articles exist for that code.

Error codes may also be obtained from several other windows and dialog boxes, including:

Note the following when obtaining troubleshooting articles using error codes:

Note that jobs which fail Data Integrity Validation will be moved to pending status. Review the error code and description of the pending job from the job controller to identify the reason for failure. See Data Integrity Validation - Troubleshoot for troubleshooting Data Integrity validation errors.

For step-by-step instructions on viewing information about job errors, see View Troubleshooting Article(s) Available from the Customer Support Website.

Flags

The Job Controller window also provides a Flags column, which is located on the left-hand side of the Job Controller window. The Flags column displays an icon for any running jobs that encounter one of the following scenarios:

If neither of the above scenarios are present, the Flags column will remain empty.

Viewing Additional Job Details

To view additional details about a particular job, right click the job in the Job Controller window and select Detail.

The Job Controller also provides the facility to view job information using other CommCell Console features, including:


Controlling Jobs

You can select a job in the Job Controller and perform a control action on that job individually. You can also control multiple jobs simultaneously in two ways:

Either method allows you to perform actions on:

You can perform the following actions on jobs:

Suspend Temporarily stops a job. A suspended job is not terminated; it can be restarted at a later time. Only preemptible jobs can be suspended.
Commit Gracefully completes the current backup job, as of that point-in-time. Applicable only for Silo backup jobs. See Commit Silo Backup for details.
Resume Resumes a job and returns the status to Waiting, Pending, Queued, or Running depending on the availability of resources or the state of the operation windows and activity control settings.
Kill Terminates a job.
Change Priority Change the priority of a job or a group of jobs that are currently active. Note that the lower the priority number, the higher priority the Job Manager gives to the job when allocating resources.

Controlling the Number of Simultaneously Running Streams

The Job Controller window displays all the current jobs in the CommCell. A status bar at the bottom of the job controller shows the total amount of jobs; the amount of jobs that are running, pending, waiting, queued and suspended; and the high and low watermarks. The watermarks indicate the minimum and maximum number of streams that the Job Manager can use simultaneously.

The low watermark, which will only display in the status bar if defined, is used in conjunction with the high watermark. If the high watermark value is reached, and a low watermark is defined, the Job Manager will wait until the low watermark value is reached to start any new jobs. The low watermark value can be defined using the JMRunningJobsLowWaterMark registry key.

note.gif (292 bytes)
  • The high watermark has a default value of 10 for SRM Reports.
  • The high watermark has a default value of 100 for WorkStation backup jobs running to one destination. You can use the SetKeyIntoGlobalParamTbl.sql qscript with the JMReplicationJobActivityLevelHighWaterMark global parameter to change the default value. For more information, see Command Line Interface - QScripts.

Viewing Job Status

The following table describes the status levels that may appear in the Job Controller window for a particular job:

Completed The job has completed successfully. Note that pop-up messages for reporting job completion can be enabled or disabled using the F12 key.
Completed With One or More Errors The job has completed with errors.

The following administration conditions will result in the Completed With One or More Errors status level.

  • Disaster Recovery Backup
    • During the operation, Phase 1 failed and Phase 2 completed, or Phase 1 completed and Phase 2 failed.
  • Data Aging
    • During the operation, one or more components failed, e.g., subclients failed to be aged or job history failed to be removed.
  • Install Updates
    • During the operation, one or more clients failed to be updated.
  • Offline Content Indexing
    • During the offline content indexing operation, one or more backup data failed to be content indexed.
  • Information Management
    • During an information management operation, if the operation defined in the Automated Content Classification Policy is partially successful.

The following iDataAgent-specific conditions will result in the Completed With One or More Errors status level.

  • Exchange Compliance Archiver
    • During a retrieve operation, one or more files failed to be retrieved.
  • Exchange Mailbox Archiver and Exchange Public Folder Archiver
    • During a recovery operation, one or more files failed to be recovered.
  • Microsoft Windows File System
    • During a system state backup operation, one or more non-critical components failed to be backed up.
    • During a file system restore operation, one or more files failed to restore or were locked.
    • During a system state restore operation, one or more non-critical components failed to be restored.
  • Microsoft Exchange Server
    • During a backup operation of a storage group assigned to a subclient, one or more databases failed to be backed up.
    • During a restore operation, one or more databases failed to be restored.
  • Informix
    • During a backup operation, one or more files failed to be backed up.
  • Oracle, Oracle RAC
    • During a backup operation, one or more files failed to be backed up.
  • SAP for Oracle, SAP for MAXDB
    • During a backup operation, one or more files failed to be backed up.
  • SharePoint Server iDataAgent
    • During a backup operation, one or more elements in the subclient content failed to be backed up.
    • During a restore operation, one or more elements in the subclient content failed to be restored.
  • SharePoint Archiver
    • During a migration archiving operation, one or more elements in the subclient content failed to be archived.
    • During a recovery operation, one or more elements in the subclient content failed to be recovered.
  • Sybase
    • During a backup operation, one or more files failed to be backed up.
  • UNIX File System
    • During a backup operation, one or more files failed to be backed up.
  • Online Content Indexing Agents
    • During an online content indexing operation, one or more files failed to be content indexed.
Dangling Cleanup A job phase has been terminated by the job manager, and the job manager is waiting for the completion of associated processes before killing the job phase.
Failed The job has failed due to errors or the job has been terminated by the job manager.
Interrupt Pending The job manager is waiting for the completion of associated processes before interrupting the job due to resource contention with jobs that have a higher priority, etc.
Kill Pending The job has been terminated by the user using the Kill option, and the job manager is waiting for the completion of associated processes before killing the job.
Killed The job is terminated by the user using the Kill option or by the Job Manager.*
Pending The Job Manager has suspended the job due to phase failure and will restart it without user intervention.
Queued
  • The job conflicted with other currently running jobs (such as multiple data protection operations for the same subclient), and the Queue jobs if other conflicting jobs are active option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared.
  • The activity control for the job type is disabled, and the Queue jobs if activity is disabled option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared.
  • The Queue Scheduled Jobs option was enabled from the General tab of the Job Management dialog box. Scheduled Jobs can be resumed manually using the Resume option or resumed automatically by disabling the Queue Scheduled Jobs option.
  • The job started within the operation window's start and end time.
  • The running job conflicted with the operation window and the Allow running jobs to complete pass the operation window option was not enabled from the General tab of the Job Management dialog box. (This is only applicable for jobs that can be restarted. See Restarting Jobs for more information.)
Running The job is active and has access to the resources it needs.
Running (Cannot be verified) During a running operation, the Job Alive Check failed. See Job Alive Check Interval for more information.
Suspend Pending A job is suspended by a user using the Suspend option, and the Job Manager is waiting for the completion of associated processes before stopping the job.
Suspended
  • A running, waiting or pending job has been manually stopped by a user using the Suspend option. The job will not complete until it is restarted using the Resume option.
  • A job has been started in a suspended state using the Start Suspended or Startup in Suspended State options available from the dialog box of the job that was initiated. Restore jobs from Search Console can be started in the suspended state using the Start End User restores in suspended state and Start Compliance restores in suspended state options in the Browse/Recover Option Dialog box in the Control Panel.
System Kill Pending The job has been terminated by the Job Manager*, and the Job Manager is waiting for the completion of associated processes before killing the job.
Waiting The job is active, waiting for resources (e.g., media or drive) to become available or for internal processes to start.

*The Job Manager will terminate a job when:

Job Status Changes

The status of a job and the preemptibility of the phase of the job in the Job Controller determines the actions (Kill, Suspend, or Resume) that you can perform. The following table describes the status of a job after an action has been performed on it:

Original Status Actions Available New Status
Running Suspend Suspended
Kill Killed
Waiting Suspend Suspended
Kill Killed
Interrupt Pending N/A N/A
Pending Suspend Suspended
Resume Returns to original state, resources and other conditions permitting
Kill Killed
Suspend Pending N/A N/A
Queued Suspend Suspended
Resume (scheduled jobs only) Changes into a state of an active job, resources and other conditions permitting
Kill Killed
Suspended Resume
  • Returns to original state, resources and other conditions permitting
  • Changes into a state of an active job, resources and other conditions permitting
Kill Killed
Kill Pending N/A N/A
Dangling Cleanup N/A N/A

Job Filters

You can filter the jobs that are displayed in the Job Controller by creating a job filter from the Filter Definition dialog box. You can filter by Data Protection, Data Recovery, Data Collection (for SRM jobs), and Administration operations. The filter can also be based on an active job for a particular CommCell entity.

CommCell Administrators can utilize filters created by all users. All other users can only utilize the filters that they create. If a user account is deleted, their filters will automatically be deleted as well.

Important Considerations for Running Jobs


Job Preemption Control

Jobs or operations fall into two main phases:

Preemptible Phase In a preemptible phase, the job can be interrupted by the Job Manager or suspended by the user and then restarted without having to start the phase over again from the beginning. Preemption is defined by the Job Manager at each phase of a job. A File System backup phase is one example of a preemptible phase; the Job Manager can interrupt this phase when resource contention occurs with a higher priority job. You can also suspend this phase in progress and resume it later.
Non-preemptible Phase A non-preemptible phase is one that cannot be interrupted by the Job Manager or suspended by the user. It can only run to completion, be killed by administrative action, or be failed by the system. For example, the data recovery operations of database agents are non-preemptible.

Both preemptible and non-preemptible jobs can also be defined in terms of their restartability; preemptible jobs are always restartable. In addition, even jobs that are not preemptible might fail to start and be in a "waiting" state; these are restartable as well. For more specific information on this topic, see Job Restart.

Preemptible and Non-Preemptible Jobs

The following table lists the types of preemptible and non-preemptible jobs:

Preemptible and Restartable Non-preemptible and Non-Restartable Non-preemptible but Restartable
  • Data protection operations for most non-database agents.
  • DataArchiver archive jobs during the Archive Index and Archive Content Index phases of the job.
  • Data recovery operations for most File System-like (indexing-based) agents during the restore phase.
  • Data recovery operations from the Search Console.
  • Most administration jobs including Install Automatic Updates and Download Automatic Updates.
  • Jobs that are run using an alternate data path cannot preempt other jobs. Similarly such jobs can also be preempted by other jobs which does not use an alternate data path.

  • Silo backup and restore operations.
  • Data recovery operations for database-like agents.
  • Media export, erase media, and inventory jobs.
  • SAN volume data protection jobs (non-preemptible in its scan phase).
  • All QR jobs on Unix platforms.
  • Data protection operations for database agents.
  • The system state phase of Windows File System data protection operations.
  • Offline and Online Content Indexing jobs.
  • Data Collection operations for SRM Agents.

For information on Agents that support Job Restarts, see the following:

Controlling Job Preemption for the CommCell

You can specify that certain operations will preempt other operations based on their job priority, in cases where multiple jobs are competing for media and drives.

If a running job is preemptible, the Job Manager can interrupt the running job and allocate the resources to a higher-priority job. (The interrupted job enters a waiting state and resumes when the resources it needs becomes available.)

You can:

See Set Job Preemption Control for the CommCell.

Configuring Preemptibility for Select Job Types

You can specify which of the following types of jobs are preemptible:

To configure preemptibility in the CommCell for specific job types, see Specify Preemptibility of Job Types.

What happens when a job is Preempted

The following table provides information on the Status of the job in the Job Controller window and the Reason for job delay displayed in the Job Details dialog box when a job is preempted. In addition, a brief explanation on what happens when a job is preempted is also provided.

Job Status in the Job Controller Reason for Job Delay Additional Information
Data Protection Operation Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Data Recovery Operations (for File System-like agents) Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Data Recovery Operation (for Database-like agents) Not Preemptible
Index Restore (Browse Backup Data) Not Preemptible
Auxiliary Copy Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Synthetic Full Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available  
note.gif (292 bytes) The higher priority job that is doing the preemption for resources will display the Reason for Job delay as follows:

Waiting for job[ ] to release the resources.

Important Considerations


Restarting Jobs

Restartable jobs can be restarted either by a user or automatically by the Job Manager. Job Restartability can be configured in the Job Management Control Panel; restartability can be turned on or off, the maximum number of restart attempts can be specified, and the time interval between each restart attempt can be configured. These settings are for the entire CommCell, so that all jobs in the CommCell of a selected type will behave according to the Job Restart settings you have specified.

Restartable and Non-Restartable Jobs

Both preemptible and non-preemptible jobs can be restartable; preemptible jobs are always restartable after they are suspended; jobs that are not preemptible might fail to start and be in a "waiting" state and can be restartable as well. Additional insight about jobs that fail to start can be gained from reviewing What Happens When There are no Resources for a Job.

The following types of operations can be restarted, if so configured:

The Job Restarts tab in Job Management Control Panel lists all agents that can be configured for restartability for data protection, data collection and data recovery operations. For more information see, Specify Job Restartability for the CommCell.

For a specific job, you can override one of these settings, the maximum number of restart attempts, by specifying the Number of Retries in the Job Retry tab of the job initiation dialog box for that particular job. See How to Configure Job Restarts for more specific direction on this.

In all cases, whether the Max Restarts setting is used in the Job Management Control Panel, or the Number of Retries setting in the Job Retry tab, once the maximum number of retries has been reached, if the job has still not restarted successfully, the Job Manager will kill the job.

note.gif (292 bytes)
  1. The job-based setting will have no affect unless restartability has been turned on in the Job Management Control Panel.
  2. You can not configure the interval between restart attempts for an individual job, only the number of attempted restarts.
  3. Data Aging restartability can only be set in the Job Management Control Panel; you cannot set it in the Job Retry tab of the job initiation dialog box for that particular job.
  4. The restartability of Unix raw partition backup jobs either manually or by the system is not supported. Therefore, you should run such jobs under high priority.
  5. Data Protection/Data Collection Jobs that enter a Running (Cannot be verified) job state during a temporary network or CommServe service outage will not be restarted. These jobs do not enter a pending state; they will continue, without interruption, when the network or CommServe services become available. For more information, see Fault Tolerance.
  6. Restarting an Oracle On Demand backup job for multiple scripts for the same instance will cause the instance, whose backup was interrupted, to be backed up again from the beginning of the script which was running. Because of this restart behavior, if the archive files for that instance were successfully backed up before the restart, they will be backed up again after the restart. As a result, Job Manager may count the data size of archive files twice for the instance that the Oracle On Demand backup job was restarted from. Therefore, the size of data reported as backed up for this job (in the Job Details and Backup Job History) will reflect the duplicate size of the archive files that were backed up twice for that instance. The scripts should be updated to prevent this behavior before resuming the job.

  7. If a data management job for the DB2 DPF iDataAgent goes to a pending state, and if the job has completed on some of the nodes, the restart option will start the job on all the nodes unless the sBKPRESTARTFAILEDNODESTimeOut registry key is set appropriately.

Configuring Job Restarts for the CommCell

  1. Using the Job Management control panel, Job Restarts are configured for the entire CommCell. For each job, Specify Job Restartability for the CommCell.
  2. For Agents that support the capability, to override the CommCell's Max Restart setting for a particular job, you can specify the Number of Retries in the Job Retry tab of the job configuration dialog box for the following types of jobs:

    Job Name

    How To Configure Job Restarts

    Notes

    Auxiliary Copy In the Auxiliary Copy dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. See Start an Auxiliary Copy or Schedule an Auxiliary Copy for step-by-step instructions.
    Data Protection In the Backup Options or Archive Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. Refer to information specific to your Agent, beginning with the Compliance Archiving, Backup Data, or Migration Archiving page.
    Data Recovery In the Restore Options or Recover Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. Refer to information specific to your Agent, beginning with the Retrieve Data - Exchange Compliance Archiver Agent, Restore Backup Data, or Recover Archived Data page.
    Data Collection In the Schedule Data Collection Job dialog, click Advanced, then select Job Retry tab and specify Number of Retries. See, Data Collection and Run/Schedule a Data Collection Job for an SRM Instance, Agent or Subclient for detailed information.
    Disaster Recovery Backup In the Disaster Recovery Backup Options dialog, select the Job Retry tab and specify Number of Retries. See Starting a Disaster Recovery Backup or Scheduling a Disaster Recovery Backup for step-by-step instructions.
    Erase Stub jobs for Exchange Mailbox Archiver In the Erase Stubs selected for deletion in Outlook dialog, select the Job Retry tab and specify Number of Retries. See Erase Stubs for step-by-step instructions.
    Offline Content Indexing In the Content Indexing dialog box, click Advanced, then select the Job Retry tab and specify Number of Retries. See Start or Schedule Offline Content Indexing Operations for step-by-step instructions.
    Online Content Indexing In the Backup Options dialog box, click Advanced, then select the Job Retry tab and specify Number of Retries. See Start or Schedule Online Content Indexing Operations for step-by-step instructions.

QR Volume Creation Restartability

QR Volume Creation restartability is only supported on Windows platforms. See Create a QR Volume for more information.

Single Volume Subclient

The Quick Recovery Agent maintains a restart string during the Volume Creation (copying) phase of full and incremental copy jobs to keep track of the progress made on each volume being copied. This restart string is updated on the CommServe database every time 1 GB of data is copied per volume. If a job is resumed from a suspended or pending state, this restart string will be retrieved and used to identify the location in the volume from where to resume the copying. For example, a job was suspended with 2.8 GB of the data copied for a particular volume; since the restart string on the volume was last updated when 2 GB completed copying, the job resumed from that point.

Multi-Volume Subclient

In the QR Volume Creation phase, volumes are copied sequentially (i.e., not in parallel). This affects job restartability behavior for a multi-volume subclient. When a QR Volume Creation job is interrupted (suspended or pending), some of the volumes in the subclient may be completely copied while others may not be copied yet at all. If the job is restarted (either manually or automatically), the behavior toward each volume in the subclient will depend on the condition of the volume at the time of job interruption. Refer to the following table for the expected behavior (for each volume) when resuming an interrupted QR Volume Creation job for a multi-volume subclient.

Volume Condition at the Time of Job Interruption Behavior when Job Restarts
volume was successfully copied The Quick Recovery Agent copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, after the volume was successfully copied. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the changes made to the volume from 2:00 to 3:00 P.M.

volume was partially copied The Quick Recovery Agent runs the full or incremental copy, and then copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, during the copying of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran the initial copy job and then copied the changes made to the volume from 2:00 to 3:00 P.M.

volume was not yet copied If itís a full copy, the Quick Recovery Agent runs a normal full copy.

For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran a full copy job, copying all the data in the volume up to 3:00 P.M.

If itís an incremental copy, the Quick Recovery Agent copies any changes that the original incremental would have copied as well any changes to the volume that occurred after the starting point of the original incremental copy job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the data that the original incremental copy would have copied, as well as the changes made to the volume from 2:00 to 3:00 P.M.


Retrying Jobs

The Job Initiation dialog box provides several configuration options for retrying jobs, including:


Resuming Jobs

Jobs that have been in a waiting or pending state can be resumed by right-clicking on the job itself in the Job Controller and selecting Resume Job.


Other Considerations

Several additional job management capabilities are available. These capabilities are described in the following sections.

Hardware Considerations for Data Recovery Operations

The occurrence of a hardware failure during a restore operation puts the job in a device wait state for indefinite time. If a hardware failure occurs, you need to kill the job and start it at a later time when the hardware is available.

When a hardware failure occurs during a restore, the restore job will go into a device wait state indefinitely and will need to be killed.

Job Alive Check Interval

The Job Alive Check Interval option within the General tab of the Job Management dialog box allows you to specify the time interval by which the Job Manager will check active jobs to determine if they are still running.

Job Update Interval

The Job Update Interval allows you to view or modify how often information must be updated for data protection and data recovery operations in the Job Details.

The Job Updates tab of the Job Management dialog box displays the:

It also includes:

Job Running Time

At the time of job initiation, you can determine the total amount of time a job can run before it is killed by the Job Manager. The configurable parameters for Job running time allow you to control the following:

You can configure the Total Running Time and whether to Kill running jobs when total running time expires in the Job Retry tab of the job initiation dialog box for the following types of jobs:

Job Queuing

Setting jobs to be queued allows a job that would otherwise fail to remain in the Job Controller in a Queued state, i.e., waiting. Once the condition that caused the job to be queued clears, the Job Manager will automatically resume the job. Jobs can be queued if:

You can also set scheduled jobs to be queued. If jobs are scheduled and the Queue Scheduled Jobs option is enabled, these jobs will start in the Job Controller in a Queued state at their scheduled time. These jobs can be manually resumed or, if the Queue Scheduled Jobs option is disabled, these jobs will resume automatically. Selecting this option is especially useful during times of maintenance. Rather than suspend each job manually after it has started, you can enable the Queue Scheduled Jobs option, which will start all the scheduled jobs in the Job Controller in a Queued state. Once you have completed the maintenance, you can manually resume specific scheduled jobs, or simply deselect the Queue Scheduled Jobs option to automatically resume all the scheduled jobs.

The following types of jobs can be queued:

You can set the jobs to be queued from the General tab of the Job Management dialog box. The following types of jobs can be queued:

When a Non-Full Backup is Automatically Converted to a Full Backup

Under the following conditions, a non-full backup is automatically converted to a full backup:

Some agents have additional scenarios in which a non-full backup is also automatically converted to a full backup:

What Happens When There are no Resources for a Job

Each job requires certain resources for its successful completion. Absence of these resources has different impact on different type of jobs. The following table discusses the resources required by each job, the status of the job in the Job Controller window when there are no resources and the corresponding examples of the Reason for job delay displayed in the Job Details dialog box. In addition, a brief explanation on what happens when a job does not have the required resources is also provided.

By default the HDPS Media & Library Manager service on the CommServe cleans up any media and drive reservation that is held by a job which failed to release the resource when it was abruptly terminated, every 1440 minutes. You can modify the frequency using the nRESOURCERELEASEINTERVALMIN registry key.

 

Job Resources Status in the Job Controller Reason for Job Delay Additional Information
Data Protection Operation

Streams, Active Media, Drive Waiting See Example 1. Job checks for necessary resources.
Waiting See Example 2. If the resources are not available the job retries to reserve the resources when ever they are freed.
Does not hold on to any resource until all the necessary resources are available.
Data Recovery Operations (for File System-like agents) Drive Pending The media is already reserved by some other job(s). If the resources are not available the job retries to reserve the resources when ever they are freed.
Data Recovery Operation (for Database-like agents) Drive Failed See Example 1. Job checks for necessary resources.
Running See Example 2. If the resources are not available it retries every 2 minutes to reserve the resources.
Does not hold on to any resource until all the necessary resources are available.
Index Restore Operation (Browse Backup Data) Drive Failed See Example 1. Job checks for necessary resources.
Running See Example 2. If the resources are not available it retries every 2 minutes to reserve the resources.
Does not hold on to any resource until all the necessary resources are available.
Auxiliary Copy Destination Drives Pending See Example 1. Job checks for necessary resources.
Waiting Job reserves 2 drives for source and destination media.
Waiting See Example 2. If the above resources are not available, it retries every 2 minutes to reserve these resources.
Does not hold on to any resource until all the necessary resources are available.
Source Media Running Once the 2 drives and destination media is obtained job reserves the source media.
Pending See Example 2. If the job encounters resource contention while reserving the source media, (Example 2) it retries every 20 minutes and a maximum of 144 times to obtain the source media.
Holds on to the 2 drives and destination media as long as it is not interrupted and as long as the source media is available.
Synthetic Full Streams, Destination Drives, Destination Media Waiting See Example 1. Job checks for necessary resources.
Waiting Job reserves streams, marks active media full, reserves 2 drives and destination media.
Waiting See Example 2. If the resources are not available the job retries to reserve the resources whenever they are freed.
Does not hold on to any resource until all the necessary resources are available.
Source Media Running Once the 2 drives and destination media is obtained job reserves the source media.
Pending See Example 2. If the job encounters resource contention while reserving the source media, (Example 2) it retries every20 minutes and a maximum of 144 times to obtain the source media.
Holds on to the 2 drives and destination media as long as it is not interrupted.

 


Back to Top