Job Management

Topics | How To | Support | Related Topics


Overview

Viewing Job Information

Controlling Jobs

Viewing Job Status

Customizing Completed with Errors Condition

Job Filters

Important Considerations for Running Jobs

Preempting Jobs

Restarting Jobs

Retrying Jobs

Resuming Jobs

Resubmitting Jobs

Job Results Directory

Other Considerations


Overview

The Job Controller allows you to manage and monitor the following types of jobs:

The Job Controller window displays all the current jobs in the CommCell. A status bar at the bottom of the job controller shows the total amount of jobs; the amount of jobs that are running, pending, waiting, queued and suspended; and the high and low watermarks. The watermarks indicate the minimum and maximum number of streams that the Job Manager can use simultaneously.

Viewing Job Information

Information about a job is continually updated and available in the Job Controller or Job History window. When a job is finished, the job stays in the Job Controller for five minutes. Once a job is finished, more information about that job is obtainable using the Job History.

The following job information is displayed, depending on the selected Job History:

Job ID A unique number allocated by the Job Manager that identifies the data protection, data recovery, or administration operation.
Operation The type of data protection, data recovery, or administration operation being conducted.
Client/Client Computer For data protection operations, the client computer to which the backup set and subclient belong. For data recovery operations, the computer from which the data originated.
Destination Client The destination client to which the recovered data will be stored.
Agent Type The agent that is performing the operation. (e.g., Windows 2000 File System).
Instance/Partition The instance/partition in the client computer that represents the database that was included in this operation.
Subclient The subclient that was protected during the operation. Note that a deleted subclient will have a Unix time stamp appended to its name in cases where another subclient is currently using the same name as the deleted subclient.
Job Type The type of operation that is being conducted on data.
Backup Type The type of backup that was conducted: Differential, Full, Incremental or Synthetic.
Failed Folders The number of folders that were not included in the operation.
Failed Files The number of files that were not included in the operation.
Storage Policy The storage policy to which the operation is being directed.
MediaAgent The MediaAgent to which the operation is being directed.
Status The status of the operation. For job status descriptions, see Job Status Levels
Progress A status bar indicating its progress. The progress bar is not visible for certain operations (e.g., data aging) or for the initial phases of some data protection operations.
note.gif (292 bytes) The Job Controller progress bar will not display the progress of SAP for MAXDB backup and restore jobs accurately. This is true because Simpana cannot detect data or objects transferred by SAP for MAXDB due to the way SAP for MAXDB transfers these items.
Errors Displays any errors that have occurred during the operation, such as a hardware problem or the job has run outside of an operation window. (See Job Errors for more information.)
Backup Set The backup set that was protected/recovered during the operation and to which the subclient belongs.
Index Displays New Index to indicate a new index was created during the operation. If blank, a new index was not created.
Instance/Partition The instance/partition in the client computer that represents the database that was included in this operation.
Phase The current phase of the operation. The number of phases varies depending on the operation.
User Name The name of the user who initiated the operation.
Priority The priority that is assigned to the operation. (For more information, see Job Priorities and Priority Precedence).
Start/Start Time The date and time on the CommServe when the operation started.
End Time The date and time on the CommServe when the operation was completed.
Elapsed The duration of time consumed by the operation.
Libraries The libraries that is being used by the operation.
Drives/Mount Paths The drives/mount paths that are being used by the operation. For more information about media, see Media Operations.
Last Update Time The last time the Job Manager received job updates for the operation.
Transferred The amount of data that has been transferred for the operation at the present time.
Estimated Completion Time The time that the Job Manager estimates for this job to be completed. The estimated time will be based on time zone of the CommServe computer.
Size on Media The total size of data that was transferred to the media (excluding duplicated data).
note.gif (292 bytes)
  • The amount displayed is the compressed amount (if compression is enabled) and includes valid and invalid attempts of the backup jobs.
  • Application data that is backed up may include sparse files, metadata, inode security data, etc. As a result, the displayed size of the data may be greater than expected.
  • If viewing from the storage policy copy level, amount displayed may be less if job is partially copied.
Size of Application The amount of the application data that has been protected.
note.gif (292 bytes)
  • Application data that is backed up may include sparse files, metadata, inode security data, etc. As a result, the displayed size of the data may be greater than expected.
  • If job has completed with multiple attempts, the amount displayed may be larger.
Size of Backup The amount of compressed data that has been protected, which includes all application data and metadata.
Content Indexed Displays Full, Partial, or No to indicate whether content indexing was used for the operation. Operations performed with older releases of the software may display Yes or No.

Note that if a job is displayed as partially content indexed, not all of the data protected in the job was content indexed successfully. Rerun content indexing on this job so that the protected data is fully content indexed.

Delay Reason The description of the reason why the operation may be pending, waiting, or failing.
Alert The name of the job-based alert, if configured for the job.
Job Initiation The origin of the operation: the CommCell Console (Interactive), a schedule (Scheduled), or a third party interface (Third Party).
Maximum Number of Readers The maximum number of readers that can be used for the operation.
Automated Content Classification Policy Name of the Automated Content Classification Policy.
Legal Hold Name The Name specified for the Legal Hold data.
Legal Hold Retention Time The time frame for which the Legal Hold Data will be retained.
Number of Readers in Use The number of readers currently in use for the operation.
Number of Objects The total number of objects including successful, failed and skipped.
note.gif (292 bytes) For a Unix File System iDataAgent backup job that includes hard links and for which the HLINK registry key is set to Y and the appropriate hard link updates are applied, the value in this field will also account for the number of hard links and hard link groups that were backed up.

See the Service Pack documentation for more information on hard link updates.

Restart Interval The amount of time the Job Manager will wait before restarting a job that has gone into a pending state. This is set in the Job Management (Job Restarts) tab.
Max Restarts The maximum number of times the job will be restarted after a phase of the job has failed. This is set in the Job Management (Job Restarts) tab.
Error Code Error Code for job pending or job failure reason. (See Job Errors for more information.)
Retained By The type of retention rules defined for the job, basic or extended. For more information, see Data Aging.
Description A brief description of the running job.

The Pause and Play buttons allow you to control how the Job Controller displays real time information from active jobs. The Pause button stops the Job Controller from displaying real time information collected from jobs. The play button allows the Job Controller to display real time job updates.

To see all the columns in the Job Controller window, use the scroll bar at the bottom of the window.

Job Errors

If a job has not completed successfully, the Error Code column will display a unique code linking to available troubleshooting and knowledgebase article(s) relevant to that error from the customer support website. These articles may include special considerations for the type(s) of job(s) you are running, suggested workarounds for issues, and common causes for that particular error.

If an error code pertains to more than one issue, the customer support website will display links to all articles for which the code is relevant. Conversely, if an error code does not have any articles associated with it, the customer support website will display a message indicating that no articles exist for that code.

Error codes may also be obtained from several other windows and dialog boxes, including:

Note the following when obtaining troubleshooting articles using error codes:

Note that jobs which fail Data Integrity Validation will be moved to pending status. Review the error code and description of the pending job from the job controller to identify the reason for failure. See Data Integrity Validation - Troubleshoot for troubleshooting Data Integrity validation errors.

For step-by-step instructions on viewing information about job errors, see View Troubleshooting Article(s) Available from the Customer Support Website.

Flags

The Job Controller window also provides a Flags column, which is located on the left-hand side of the Job Controller window. The Flags column displays an icon for any running jobs that encounter one of the following scenarios:

If neither of the above scenarios are present, the Flags column will remain empty.

Viewing Additional Job Details

To view additional details about a particular job, right click the job in the Job Controller window and select Detail.

The Job Controller also provides the facility to view job information using other CommCell Console features, including:

Viewing List of Backed Up Files in a Job

Use the ListFilesForJob utility to generate a list of files which are backed up during a specific job. Follow the steps give below to create a file which contains the list of files:

  1. Open the Command Prompt and navigate to following location:

    <Software_Installation_Directory>\simpana\Base\

  2. Enter the following command:

    ListFilesForJob.exe -job <JOBID> -ma <MAName> [-vm <Instance>] [-flag <ArchiveBitFlag>] [-tmpdir <TMPDIRPATH>] [-o <OUTFILENAME>]

    Where:

    JobID the job id of the job for which you are generating the list.
    MAName Name of the MediaAgent which is used to perform the backup job.
    Instance Name of the instance which you have used to instal the Windows File System iDataAgent

    This is an optional argument. If you do not specify any value, the job in Instance001 will be used by default to generate the list of files.

    ArchiveBitFlag 1 to set the Archive Bit

    0 to reset the Archive Bit

    This is an optional argument. If you do not specify any value, the archive bit will not change and the file that contains the list of files can not be deleted.

    TMPDIRPATH The directory in which you want to create the file.

    This argument is optional. If you do not specify any directory, the file will be created in the default temporary directory.

    The default temporary directory for the software is set using the dGALAXYTEMPDIR registry key. When you install Windows File System iDataAgent, the dGALAXYTEMPDIR registry key gets created at the following location:

    HKEY_LOCAL_MACHINE\SOFTWARE\CommVault Systems\Galaxy\Instance<xxx>\Base

    OUTFILENAME The name of the file in which you want to store the list
  3. Navigate to the directory specified in TMPDIRPATH and open the <OUTFILENAME> file to view the list of files.

Controlling Jobs

You can select a job in the Job Controller and perform a control action on that job individually. You can also control multiple jobs simultaneously in two ways:

Either method allows you to perform actions on:

You can perform the following actions on jobs:

Suspend Temporarily stops a job. A suspended job is not terminated; it can be restarted at a later time. Only preemptible jobs can be suspended.
Commit Gracefully completes the current backup job, as of that point-in-time. Applicable only for Silo backup jobs. See Commit a Job for details.
Resume Resumes a job and returns the status to Waiting, Pending, Queued, or Running depending on the availability of resources or the state of the operation windows and activity control settings.
Kill Terminates a job.
Change Priority Change the priority of a job or a group of jobs that are currently active. Note that the lower the priority number, the higher priority the Job Manager gives to the job when allocating resources.

When you suspend or resume a job, a dialog box appears offering you the ability to provide a reason for suspending or resuming a job. This reason, if entered, will be included in the Description field of the Job Details dialog box.

Controlling the Number of Simultaneously Running Streams

The Job Controller window displays all the current jobs in the CommCell. A status bar at the bottom of the job controller shows the total amount of jobs; the amount of jobs that are running, pending, waiting, queued and suspended; and the high and low watermarks. The watermarks indicate the minimum and maximum number of streams that the Job Manager can use simultaneously.
note.gif (292 bytes)
  • The high watermark has a default value of 10 for SRM Reports.
  • The high watermark has a default value of 100 for WorkStation backup jobs running to one destination. You can use the SetKeyIntoGlobalParamTbl.sql qscript with the JMReplicationJobActivityLevelHighWaterMark global parameter to change the default value. For more information, see Command Line Interface - QScripts.

Viewing Job Status

The following table describes the status levels that may appear in the Job Controller window for a particular job:

Completed The job has completed successfully. Note that pop-up messages for reporting job completion can be enabled or disabled using the F12 key.
note.gif (292 bytes) For a 1-Touch Recovery for Unix job, two jobs are listed as completed if the job is successful. The first job is the operating system recovery, and the second job is the data recovery. The 1-Touch Recovery job is completed once the data recovery is completed.
Completed With Warning The job has completed successfully but with a notification to the user.
Completed With One or More Errors The job has completed with errors.

The following administration conditions will result in the Completed With One or More Errors status level.

  • Disaster Recovery Backup
    • During the operation, Phase 1 failed and Phase 2 completed, or Phase 1 completed and Phase 2 failed.
  • Data Aging
    • During the operation, one or more components failed, e.g., subclients failed to be aged or job history failed to be removed.
  • Install Updates
    • During the operation, one or more clients failed to be updated.
  • Offline Content Indexing
    • During the offline content indexing operation, one or more backup data failed to be content indexed.
  • Information Management
    • During an information management operation, if the operation defined in the Automated Content Classification Policy is partially successful.

The following iDataAgent-specific conditions will result in the Completed With One or More Errors status level.

  • Exchange Compliance Archiver
    • During a retrieve operation, one or more files failed to be retrieved.
  • Exchange Mailbox Archiver and Exchange Public Folder Archiver
    • During a recovery operation, one or more files failed to be recovered.
  • Microsoft Windows File System
    • During a system state backup operation, one or more non-critical components failed to be backed up.
    • During a file system restore operation, one or more files failed to restore or were locked.
    • During a system state restore operation, one or more non-critical components failed to be restored.
  • Microsoft Exchange Server
    • During a backup operation of a storage group assigned to a subclient, one or more databases failed to be backed up.
    • During a restore operation, one or more databases failed to be restored.
  • Informix
    • During a backup operation, one or more files failed to be backed up.
  • Oracle, Oracle RAC
    • During a backup operation, one or more files failed to be backed up.
  • SAP for Oracle, SAP for MAXDB
    • During a backup operation, one or more files failed to be backed up.
  • SharePoint Server iDataAgent
    • During a backup operation, one or more elements in the subclient content failed to be backed up.
    • During a restore operation, one or more elements in the subclient content failed to be restored.
  • SharePoint Archiver
    • During a migration archiving operation, one or more elements in the subclient content failed to be archived.
    • During a recovery operation, one or more elements in the subclient content failed to be recovered.
  • Sybase
    • During a backup operation, one or more files failed to be backed up.
  • UNIX File System
    • During a backup operation, one or more files failed to be backed up.
Dangling Cleanup A job phase has been terminated by the job manager, and the job manager is waiting for the completion of associated processes before killing the job phase.
Failed The job has failed due to errors or the job has been terminated by the job manager.
Interrupt Pending The job manager is waiting for the completion of associated processes before interrupting the job due to resource contention with jobs that have a higher priority, etc.
Kill Pending The job has been terminated by the user using the Kill option, and the job manager is waiting for the completion of associated processes before killing the job.
Killed The job is terminated by the user using the Kill option or by the Job Manager.*
Pending The Job Manager has suspended the job due to phase failure and will restart it without user intervention.
Queued
  • The job conflicted with other currently running jobs (such as multiple data protection operations for the same subclient), and the Queue jobs if other conflicting jobs are active option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared.
  • The activity control for the job type is disabled, and the Queue jobs if activity is disabled option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared.
  • The Queue Scheduled Jobs option was enabled from the General tab of the Job Management dialog box. Scheduled Jobs can be resumed manually using the Resume option or resumed automatically by disabling the Queue Scheduled Jobs option.
  • The job started within the operation window's start and end time.
  • The running job conflicted with the operation window and the Allow running jobs to complete pass the operation window option was not enabled from the General tab of the Job Management dialog box. (This is only applicable for jobs that can be restarted. See Restarting Jobs for more information.)
Running The job is active and has access to the resources it needs.
Running (Cannot be verified) During a running operation, the Job Alive Check failed. See Job Alive Check Interval for more information.
Suspend Pending A job is suspended by a user using the Suspend option, and the Job Manager is waiting for the completion of associated processes before stopping the job.
Suspended
  • A running, waiting or pending job has been manually stopped by a user using the Suspend option. The job will not complete until it is restarted using the Resume option.
  • A job has been started in a suspended state using the Start Suspended or Startup in Suspended State options available from the dialog box of the job that was initiated. Restore jobs from Search Console can be started in the suspended state using the Start End User restores in suspended state and Start Compliance restores in suspended state options in the Browse/Recover Option Dialog box in the Control Panel.
System Kill Pending The job has been terminated by the Job Manager*, and the Job Manager is waiting for the completion of associated processes before killing the job.
Waiting The job is active, waiting for resources (e.g., media or drive) to become available or for internal processes to start.
Destination Client The restore client machine name. This allows users to verify if the restore data is being written to the correct machine/target.

*The Job Manager will terminate a job when:

Job Status for SAP for Oracle iDataAgent

In the case of SAP for Oracle iDataAgent, the job status is displayed depending on the BRTOOLs error codes.
BRTools Error Code Message Job Status
In the case of SAP for Oracle iDataAgent, the job status is displayed depending on the BRTOOLs error codes.

Job Status Changes

The status of a job and the preemptibility of the phase of the job in the Job Controller determines the actions (Kill, Suspend, or Resume) that you can perform. The following table describes the status of a job after an action has been performed on it:

Original Status Actions Available New Status
Running Suspend Suspended
Kill Killed
Waiting Suspend Suspended
Kill Killed
Interrupt Pending N/A N/A
Pending Suspend Suspended
Resume Returns to original state, resources and other conditions permitting
Kill Killed
Suspend Pending N/A N/A
Queued Suspend Suspended
Resume (scheduled jobs only) Changes into a state of an active job, resources and other conditions permitting
Kill Killed
Suspended Resume
  • Returns to original state, resources and other conditions permitting
  • Changes into a state of an active job, resources and other conditions permitting
Kill Killed
Kill Pending N/A N/A
Dangling Cleanup N/A N/A
note.gif (292 bytes) Jobs that are pending or have failed, will be killed after being in that state for more than 24 hours.

Customizing Completed with Errors Condition

You can control the overall status of a backup job by defining error decision rules. You can define multiple decision rules for an agent based on the following criteria:

The available job status you can select from the decision rule allows you to:

(the list below also reflects the job status priority used by the decision rules)

Once created, the agent applies the error decision rules at the end of the backup operation. During this process, the agent traverses the failures.cvf file to match the decision rules based on the priority of the rules. The failures.cvf file includes information of all the backup files that failed along with their associated error code. When a file and its error codes match a rule, the file is marked with the defined status, and the agent continues to traverse the failures.cvf file. However, if a file matches a rule that will mark the job as failed, the backup job ends immediately with the failed status. See the graph on the right to understand the process.

Here are some examples that show when it is useful to define error decision rules:

Example 1

You create a decision rule to ignore any error found in temporary files.

File Pattern: C:\temp\*

System Error Code: All Error Codes

On error mark Job as: Complete

Example 2

You create a decision rule to mark the backup job as failed when an error is found in system data files.

File Pattern: /**/*.dat

System Error Code: 1 - 10

On error mark Job as: Failed

Supported Agents

This feature is supported by the following agents:

Creating Decision Rules

Follow the steps below to add a decision rule for any errors that may occur during a backup operation:

1.
  • From the toolbar in the CommCell Console, click Control Panel.
  • Double-click Job Management.
2.
  • Click the Job Status on Errors tab.
  • From the Group Category pane, select the agent to add the decision rules for.
  • Click Add.
3.
  • By default, in the Add Job Error Decision Rule dialog box, all file patterns are considered for the new decision rule.

    You can define a file pattern by clearing the All File Pattern checkbox and entering a specific file pattern in the User Defined Pattern field.

  • From the System Error Code area:
    • If you want to include all errors, click All Error Codes.
    • If you want to define a specific set of error codes, enter the error code range.
  • Select the job status from the On Error mark Job as drop-down list to update the job if the rule is matched.
    For a SnapProtect backup job, if a decision rule with the "Completed with Errors" status is matched, the job status will be marked as "Complete".
  • Click OK.
4. Click OK.

You can add more error decision rules for the selected agent, or choose a different agent to add new decision rules.

You can also set the priority for the decision rules you created by moving a rule up (higher priority) or down (lower priority) using the arrow buttons.

Common Error Codes

The following table displays common error codes examples for Windows and Solaris computers:

Error Code Value System Error Message
1 Operation not permitted
2 No such file or directory
3 No such process
5 I/O Error
6 No such device or address
13 Permission denied
16 Mount device busy
32 Broken pipe

Please refer to the operating system vendor documentation for a comprehensive list of error codes.


Job Filters

You can filter the jobs that are displayed in the Job Controller by creating a job filter from the Filter Definition dialog box. You can filter by Data Protection, Data Recovery, Data Collection (for SRM jobs), and Administration operations. The filter can also be based on an active job for a particular CommCell entity.

CommCell Administrators can utilize filters created by all users. All other users can only utilize the filters that they create. If a user account is deleted, their filters will automatically be deleted as well.

Important Considerations for Running Jobs


Job Preemption Control

Jobs or operations fall into two main phases:

Preemptible Phase In a preemptible phase, the job can be interrupted by the Job Manager or suspended by the user and then restarted without having to start the phase over again from the beginning. Preemption is defined by the Job Manager at each phase of a job. A File System backup phase is one example of a preemptible phase; the Job Manager can interrupt this phase when resource contention occurs with a higher priority job. You can also suspend this phase in progress and resume it later.
Non-preemptible Phase A non-preemptible phase is one that cannot be interrupted by the Job Manager or suspended by the user. It can only run to completion, be killed by administrative action, or be failed by the system. For example, the data recovery operations of database agents are non-preemptible.

Both preemptible and non-preemptible jobs can also be defined in terms of their restartability; preemptible jobs are always restartable. In addition, even jobs that are not preemptible might fail to start and be in a "waiting" state; these are restartable as well. For more specific information on this topic, see Job Restart.

Preemptible and Non-Preemptible Jobs

The following table lists the types of preemptible and non-preemptible jobs:

Preemptible and Restartable Non-preemptible and Non-Restartable Non-preemptible but Restartable
  • Data protection operations for most non-database agents.
  • DataArchiver archive jobs during the Archive Index and Archive Content Index phases of the job.
  • Data recovery operations for most File System-like (indexing-based) agents during the restore phase.
  • Data recovery operations from the Search Console.
  • Most administration jobs including Install Automatic Updates and Download Automatic Updates.
  • Silo backup and restore operations.
  • Media refresh operations.
  • Deduplication database reconstruction job.
  • Data recovery operations for database-like agents.
  • Media export, erase media, and inventory jobs.
  • SAN volume data protection jobs (non-preemptible in its scan phase).
  • All QR jobs on Unix platforms.
  • Disk volume reconciliation jobs.
  • Data protection operations for database agents.
  • The system state phase of Windows File System data protection operations.
  • Offline Content Indexing jobs.
  • Data Collection operations for SRM Agents.

For information on Agents that support Job Restarts, see the following:

Controlling Job Preemption for the CommCell

You can specify that certain operations will preempt other operations based on their job priority, in cases where multiple jobs are competing for media and drives.

If a running job is preemptible, the Job Manager can interrupt the running job and allocate the resources to a higher-priority job. (The interrupted job enters a waiting state and resumes when the resources it needs becomes available.)

You can:

See Set Job Preemption Control for the CommCell.

Configuring Preemptibility for Select Job Types

You can specify which of the following types of jobs are preemptible:

To configure preemptibility in the CommCell for specific job types, see Specify Preemptibility of Job Types.

What happens when a job is Preempted

The following table provides information on the Status of the job in the Job Controller window and the Reason for job delay displayed in the Job Details dialog box when a job is preempted. In addition, a brief explanation on what happens when a job is preempted is also provided.

Job Status in the Job Controller Reason for Job Delay Additional Information
Data Protection Operation Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Data Recovery Operations (for File System-like agents) Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Data Recovery Operation (for Database-like agents) Not Preemptible
Index Restore (Browse Backup Data) Not Preemptible
Auxiliary Copy Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available
Synthetic Full Interrupt Pending No Job Delay Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Waiting No resources available  
Media Refresh Waiting No resources available Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources.
note.gif (292 bytes) The higher priority job that is doing the preemption for resources will display the Reason for Job delay as follows:

Waiting for job[ ] to release the resources.

Important Considerations


Restarting Jobs

Restartable jobs can be restarted either by a user or automatically by the Job Manager. Job Restartability can be configured in the Job Management Control Panel; restartability can be turned on or off, the maximum number of restart attempts can be specified, and the time interval between each restart attempt can be configured. These settings are for the entire CommCell, so that all jobs in the CommCell of a selected type will behave according to the Job Restart settings you have specified.

Restartable and Non-Restartable Jobs

Both preemptible and non-preemptible jobs can be restartable; preemptible jobs are always restartable after they are suspended; jobs that are not preemptible might fail to start and be in a "waiting" state and can be restartable as well. Additional insight about jobs that fail to start can be gained from reviewing What Happens When There are no Resources for a Job.

The following types of operations can be restarted, if so configured:

The Job Restarts tab in Job Management Control Panel lists all agents that can be configured for restartability for data protection, data collection and data recovery operations. For more information see, Specify Job Restartability for the CommCell.

For a specific job, you can override one of these settings, the maximum number of restart attempts, by specifying the Number of Retries in the Job Retry tab of the job initiation dialog box for that particular job. See How to Configure Job Restarts for more specific direction on this.

In all cases, whether the Max Restarts setting is used in the Job Management Control Panel, or the Number of Retries setting in the Job Retry tab, once the maximum number of retries has been reached, if the job has still not restarted successfully, the Job Manager will kill the job.

note.gif (292 bytes)
  1. The job-based setting will have no affect unless restartability has been turned on in the Job Management Control Panel.
  2. You can not configure the interval between restart attempts for an individual job, only the number of attempted restarts.
  3. Data Aging restartability can only be set in the Job Management Control Panel; you cannot set it in the Job Retry tab of the job initiation dialog box for that particular job.
  4. The restartability of Unix raw partition backup jobs either manually or by the system is not supported. Therefore, you should run such jobs under high priority.
  5. Data Protection/Data Collection Jobs that enter a Running (Cannot be verified) job state during a temporary network or CommServe service outage will not be restarted. These jobs do not enter a pending state; they will continue, without interruption, when the network or CommServe services become available. For more information, see Fault Tolerance.
  6. Restarting an Oracle On Demand backup job for multiple scripts for the same instance will cause the instance, whose backup was interrupted, to be backed up again from the beginning of the script which was running. Because of this restart behavior, if the archive files for that instance were successfully backed up before the restart, they will be backed up again after the restart. As a result, Job Manager may count the data size of archive files twice for the instance that the Oracle On Demand backup job was restarted from. Therefore, the size of data reported as backed up for this job (in the Job Details and Backup Job History) will reflect the duplicate size of the archive files that were backed up twice for that instance. The scripts should be updated to prevent this behavior before resuming the job.

  7. If a data management job for the DB2 DPF iDataAgent goes to a pending state, and if the job has completed on some of the nodes, the restart option will start the job on all the nodes unless the sBKPRESTARTFAILEDNODESTimeOut registry key is set appropriately.

Configuring Job Restarts for the CommCell

  1. Using the Job Management control panel, Job Restarts are configured for the entire CommCell. For each job, Specify Job Restartability for the CommCell.
  2. For Agents that support the capability, to override the CommCell's Max Restart setting for a particular job, you can specify the Number of Retries in the Job Retry tab of the job configuration dialog box for the following types of jobs:

    Job Name

    How To Configure Job Restarts

    Notes

    Auxiliary Copy In the Auxiliary Copy dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. See Start an Auxiliary Copy or Schedule an Auxiliary Copy for step-by-step instructions.
    Data Protection In the Backup Options or Archive Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. Refer to information specific to your Agent, beginning with the Compliance Archiving, Backup Data, or Migration Archiving page.
    Data Recovery In the Restore Options or Recover Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries. Refer to information specific to your Agent, beginning with the Retrieve Data - Exchange Compliance Archiver Agent, Restore Backup Data, or Recover Archived Data page.
    Data Collection In the Schedule Data Collection Job dialog, click Advanced, then select Job Retry tab and specify Number of Retries. See, Data Collection and Run/Schedule a Data Collection Job for an SRM Instance, Agent or Subclient for detailed information.
    Disaster Recovery Backup In the Disaster Recovery Backup Options dialog, select the Job Retry tab and specify Number of Retries. See Starting a Disaster Recovery Backup or Scheduling a Disaster Recovery Backup for step-by-step instructions.
    Erase Stub/Erase Data jobs In the Erase Stubs selected for deletion dialog, select the Job Retry tab and specify Number of Retries. See Erase Data from Outlook Add-In and Erase Data by Stubs for step-by-step instructions.
    Offline Content Indexing In the Content Indexing dialog box, click Advanced, then select the Job Retry tab and specify Number of Retries. See Start or Schedule Offline Content Indexing Operations for step-by-step instructions.
    Media Refresh In the Media Refresh Options dialog box, click Advanced, then select Job Retry tab and specify the Number of Retries. See Media Refresh for step-by-step instructions.

QR Volume Creation Restartability

QR Volume Creation restartability is only supported on Windows platforms. See Create a QR Volume for more information.

Single Volume Subclient

The Quick Recovery Agent maintains a restart string during the Volume Creation (copying) phase of full and incremental copy jobs to keep track of the progress made on each volume being copied. This restart string is updated on the CommServe database every time 1 GB of data is copied per volume. If a job is resumed from a suspended or pending state, this restart string will be retrieved and used to identify the location in the volume from where to resume the copying. For example, a job was suspended with 2.8 GB of the data copied for a particular volume; since the restart string on the volume was last updated when 2 GB completed copying, the job resumed from that point.

Multi-Volume Subclient

In the QR Volume Creation phase, volumes are copied sequentially (i.e., not in parallel). This affects job restartability behavior for a multi-volume subclient. When a QR Volume Creation job is interrupted (suspended or pending), some of the volumes in the subclient may be completely copied while others may not be copied yet at all. If the job is restarted (either manually or automatically), the behavior toward each volume in the subclient will depend on the condition of the volume at the time of job interruption. Refer to the following table for the expected behavior (for each volume) when resuming an interrupted QR Volume Creation job for a multi-volume subclient.

Volume Condition at the Time of Job Interruption Behavior when Job Restarts
volume was successfully copied The Quick Recovery Agent copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, after the volume was successfully copied. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the changes made to the volume from 2:00 to 3:00 P.M.

volume was partially copied The Quick Recovery Agent runs the full or incremental copy, and then copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, during the copying of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran the initial copy job and then copied the changes made to the volume from 2:00 to 3:00 P.M.

volume was not yet copied If itís a full copy, the Quick Recovery Agent runs a normal full copy.

For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran a full copy job, copying all the data in the volume up to 3:00 P.M.

If itís an incremental copy, the Quick Recovery Agent copies any changes that the original incremental would have copied as well any changes to the volume that occurred after the starting point of the original incremental copy job up to the time of the restart.

For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the data that the original incremental copy would have copied, as well as the changes made to the volume from 2:00 to 3:00 P.M.


Retrying Jobs

The Job Initiation dialog box provides several configuration options for retrying jobs, including:


Resuming Jobs

Jobs that have been in a waiting or pending state can be resumed by right-clicking on the job itself in the Job Controller and selecting Resume Job.


Resubmitting Jobs

If necessary, you can resubmit a job from the job history windows. This is useful if a job has failed, and you want to run it again. This removes the need to reconfigure a job with the same options. You can resubmit the same job directly from the job history windows. Once you resubmit the job, you will also have the ability to edit the schedule pattern (e.g., daily, weekly, monthly, etc.) and the job options, (e.g., if it is a schedule for a backup job, then the job options would be the the type of backup, full, differential, etc.).

For step-by-step instructions, see:

note.gif (292 bytes) Resubmitting jobs can only be executed for jobs that have run utilizing the current release of this software.

Job Results Directory

The Job Results directory stores the job results files from backup and restore operations of a client. The following sections describe the steps to configure this directory.

Calculating the Space Required for the Job Results Directory

Use the steps below to calculate the required space for the Job Results directory:

Size Values

How to Calculate

Example

Step 1: For backup jobs, identify the value for the directory size of the subclients. 6 * (number of files) * (subclient average filename size) Assume you have the following for all your subclients:
  • 2000 files like /dir1/dir2/filename1. The filename size of these files is 20.
  • 1000 files like /dir1/dir2/dir3/filename2. The filename size of these files is 25.

Using the values above, you can find that:

  • the number of files is 3000
  • the subclient average filename size is

    (20*2000 + 1000*25)/3000 = 21.67

So, you can conclude that the directory size of the subclients is:

6*3000*21.67 = 390060 bytes = 0.372 MB

Step 2: For restore jobs, identify the value for the average size of a restore job. 1.5 * (number of restore files) * (average filename size) Assume you have the following;
  • 1500 restore files like /dir1/dir2/filename1. The filename size of these files is 20.
  • 1000 restore files like /dir1/dir2/dir3/filename2. The filename size of these files is 25.

Using the values above, you can find that:

  • the number of restore files is 2500
  • the average filename size is

    (20*1500 + 1000*25)/2500 = 22

So, you can conclude that the average size of a restore job is:

1.5*2500*22 = 82500 bytes = 0.079 MB

Step 3: For each subclient configured for SnapProtect operations, identify the value for the size of the snapshot copy job. 7 * (number of files) * (subclient average filename size) Assume you have the following for all your SnapProtect subclients:
  • 1000 files like /dir1/filename1. The filename size of these files is 15.
  • 500 files like /dir1/dir2/filename2. The filename size of these files is 20.

Using the values above, you can find that:

  • the number of files is 1500
  • the subclient average filename size is

    (15*1000 + 500*20)/1500 = 16.67

So, you can conclude that the directory size of the subclients is:

7*1500*16.67 = 175035 bytes = 0.167 MB

Step 4: If using deduplication, identify the value for the size of the Source Side Database.

 

This value is controlled from the Client Side Deduplication tab of the client properties in the CommCell Console. The default size is 4 GB. Assuming the default size is being used, then 4 GB are 4096 MB.
Step 5: Using the size values identified in the previous steps, obtain the required space for the Job Results Directory. directory size of the subclients + (average size of a restore job * A) + (size of the snapshot copy job * B) + size of the Source Side Database

where A is the number of restore results to be stored for longer time and B is the number of snapshot copy jobs to be run.

Assume you plan to have:
  • 20 restore results to be stored for longer time.
  • 10 snapshot copy jobs to be run.

Using the size values from the examples above, the required space is:

0.372 + 0.079*20 + 0.167*10 + 4096 = 4099.62 MB

The minimum space required for the Job Results directory is the size value obtained from Step 1. If using SnapProtect backups, the minimum space required is the addition of Step 1 and Step 3.

Using UNC Paths for Job Results Directory

UNC paths are supported for job results directory by the Exchange Database iDataAgent 2007 and above when configured in Cluster Continuous Replicator environment. The Windows File System iDataAgent is also supported when configured in this environment.

When assigning UNC paths, the designated directory must be ONE level below the directory which is shared for this purpose. Examples:

\\machine1\<share_name>\job_results\ is shared. Then specify \\machine1\<share_name>\job_results\job_results_1 as the job results directory.

\\machine1\<share_name>\job_results\ is shared. Then specifying \\machine1\<share_name>\job_results as the job results directory is not supported.

User Impersonation for Accessing the Job Results Directory

On a Windows client, you need to specify a Windows User Account with the appropriate privileges to access the job results directory.

User impersonation requires that the specified user have write permissions to the product installation folders; otherwise, the user impersonation account may not take effect. This is especially true if the associated computer is not part of a domain and if the user is not a domain user. Additionally, users will need full permissions (registry rights) to the following registry key: \\HKEY_LOCAL_MACINE\SOFTWARE\CommVault Systems.

In addition, if UNC paths are used for job results and subclient contents are specified as UNC paths, the user impersonation account used for the job results directory must have access to both paths.

For the File System iDataAgent, the user impersonation occurs only once; therefore, the user impersonation account specified for the job results directory will take precedence and will be used to back up the contents of the UNC path included in the subclient content.

For the Virtual Server iDataAgent, the user impersonation account specified for the job results directory will take precedence and will be used to backup and restore data from a virtual machine. This may result into file access related issues during the backup. Therefore, it is recommended to use a local folder on the client computer as the job results directory.

For the Exchange iDataAgent, the account must have the following:

Follow the steps below to change the user account for accessing the Job Results directory for the client:

  1. From the CommCell Browser, right-click the icon of the client computer whose job results path user account you want to change, and then click Properties.
  2. From the Job Configuration tab of the Client Computer Properties dialog box, click User Name/Password.
  3. In the Change User Account dialog box, enter the appropriate User Impersonation account information.
  4. Click OK to save your changes.

Changing the Job Results Path of a Client

  1. From the CommCell Browser, right-click the icon of the client computer whose job results path you want to change, and then click Properties.
  2. From the Job Configuration tab of the Client Computer Properties dialog box, if necessary or desired, click User Name/Password to establish or change the Impersonate User account to access the Job Results Directory. If you do this, click OK once you have administered the account.
  3. From the Job Configuration tab, type a new job results path in the Job results path field.

    You can also click Browse to browse to a new job results path from the Browse for Job Result Path dialog box. Click OK.

  4. Click OK to save your changes.

Changing the Retention of the Job Results of a Client

  1. From the CommCell Browser, right-click the icon of the client computer whose job results retention criteria you want to change, and then click Properties.
  2. From the Job Configuration tab of the Client Computer Properties dialog box, select the number of days job results should be pruned from the Prune job results after field.
  3. Select a disk capacity after which job results should be pruned from the Prune job results when disk capacity reaches field.
  4. Click OK to save your changes.

Other Considerations

Several additional job management capabilities are available. These capabilities are described in the following sections.

Hardware Considerations for Data Recovery Operations

The occurrence of a hardware failure during a restore operation puts the job in a device wait state for indefinite time. If a hardware failure occurs, you need to kill the job and start it at a later time when the hardware is available.

When a hardware failure occurs during a restore, the restore job will go into a device wait state indefinitely and will need to be killed.

Job Alive Check Interval

The Job Alive Check Interval option within the General tab of the Job Management dialog box allows you to specify the time interval by which the Job Manager will check active jobs to determine if they are still running.

Job Update Interval

The Job Update Interval allows you to view or modify how often information must be updated for data protection and data recovery operations in the Job Details.

The Job Updates tab of the Job Management dialog box displays the:

It also includes:

Job Running Time

At the time of job initiation, you can determine the total amount of time a job can run before it is killed by the Job Manager. The configurable parameters for Job running time allow you to control the following:

You can configure the Total Running Time and whether to Kill running jobs when total running time expires in the Job Retry tab of the job initiation dialog box for the following types of jobs:

Job Queuing

Setting jobs to be queued allows a job that would otherwise fail to remain in the Job Controller in a Queued state, i.e., waiting. Once the condition that caused the job to be queued clears, the Job Manager will automatically resume the job. Jobs can be queued if:

You can also set scheduled jobs to be queued. If jobs are scheduled and the Queue Scheduled Jobs option is enabled, these jobs will start in the Job Controller in a Queued state at their scheduled time. These jobs can be manually resumed or, if the Queue Scheduled Jobs option is disabled, these jobs will resume automatically. Selecting this option is especially useful during times of maintenance. Rather than suspend each job manually after it has started, you can enable the Queue Scheduled Jobs option, which will start all the scheduled jobs in the Job Controller in a Queued state. Once you have completed the maintenance, you can manually resume specific scheduled jobs, or simply deselect the Queue Scheduled Jobs option to automatically resume all the scheduled jobs.

The following types of jobs can be queued:

You can set the jobs to be queued from the General tab of the Job Management dialog box. The following types of jobs can be queued:

When a Non-Full Backup is Automatically Converted to a Full Backup

Under the following conditions, a non-full backup is automatically converted to a full backup:

Some agents have additional scenarios in which a non-full backup is also automatically converted to a full backup:

What Happens When There are no Resources for a Job

Each job requires certain resources for its successful completion. Absence of these resources has different impact on different type of jobs. The following table discusses the resources required by each job, the status of the job in the Job Controller window when there are no resources and the corresponding examples of the Reason for job delay displayed in the Job Details dialog box. In addition, a brief explanation on what happens when a job does not have the required resources is also provided.

By default the CommVault Media & Library Manager service on the CommServe cleans up any media and drive reservation that is held by a job which failed to release the resource when it was abruptly terminated, every 1440 minutes. You can modify the frequency using the nRESOURCERELEASEINTERVALMIN registry key.

 

Job Resources Status in the Job Controller Reason for Job Delay Additional Information
Data Protection Operation

Streams, Active Media, Drive Waiting See Example 1. Job checks for necessary resources.
Waiting See Example 2. If the resources are not available the job retries to reserve the resources when ever they are freed.
Does not hold on to any resource until all the necessary resources are available.
Data Recovery Operations (for File System-like agents) Drive Pending The media is already reserved by some other job(s). If the resources are not available the job retries to reserve the resources when ever they are freed.
Data Recovery Operation (for Database-like agents) Drive Failed See Example 1. Job checks for necessary resources.
Running See Example 2. If the resources are not available it retries every 2 minutes to reserve the resources.
Does not hold on to any resource until all the necessary resources are available.
Index Restore Operation (Browse Backup Data) Drive Failed See Example 1. Job checks for necessary resources.
Running See Example 2. If the resources are not available it retries every 2 minutes to reserve the resources.
Does not hold on to any resource until all the necessary resources are available.
Auxiliary Copy Destination Drives Pending See Example 1. Job checks for necessary resources.
Waiting Job reserves 2 drives for source and destination media.
Waiting See Example 2. If the above resources are not available, it retries every 2 minutes to reserve these resources.
Does not hold on to any resource until all the necessary resources are available.
Source Media Running Once the 2 drives and destination media is obtained job reserves the source media.
Pending See Example 2. If the job encounters resource contention while reserving the source media, (Example 2) it retries every 20 minutes and a maximum of 144 times to obtain the source media.
Holds on to the 2 drives and destination media as long as it is not interrupted and as long as the source media is available.
Synthetic Full Streams, Destination Drives, Destination Media Waiting See Example 1. Job checks for necessary resources.
Waiting Job reserves streams, marks active media full, reserves 2 drives and destination media.
Waiting See Example 2. If the resources are not available the job retries to reserve the resources whenever they are freed.
Does not hold on to any resource until all the necessary resources are available.
Source Media Running Once the 2 drives and destination media is obtained job reserves the source media.
Pending See Example 2. If the job encounters resource contention while reserving the source media, (Example 2) it retries every20 minutes and a maximum of 144 times to obtain the source media.
Holds on to the 2 drives and destination media as long as it is not interrupted.

Back to Top