Troubleshooting Backup - Oracle RAC iDataAgent

Completed with one or more errors

Backup jobs from Oracle RAC iDataAgent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

  • When RMAN Script execution for the backup job completes with warnings.

  • When job is killed after backing up some data.

  • During offline backups, if the database cannot be opened after a backup.

Oracle Errors

If you receive an Oracle error during an Oracle backup operation, we recommend that you follow procedures published by Oracle Corporation on resolving the specific error. We also advise you to consult with your on-site Oracle database administrator, as needed.

RAC0001: Defining Oracle RAC client connections

Symptom

Oracle RAC Agent backups fail due to client connection mismatch.

Resolution

When configuring the Oracle RAC Agent, make sure to use a dedicated TNS or SCAN connection for each node in the Oracle RAC Agent.

For example, consider a 3 node RAC, where the DB is called RACDB, and the instance on each node is RACDB1, RACDB2, and RACDB3 on nodes 1 through 3 respectively. There should be a dedicated TNS service for each instance on each node, that will guarantee a consistent connection to the instance on each node.

The following entries should be available in the tnsnames.ora configuration file on all the nodes.

RACDB =
 (DESCRIPTION =
 (ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-cluster)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = racdb)
 )
 )
 RACDB1 =
 (DESCRIPTION =
 (ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-1-vip)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = racdb)
 (INSTANCE_NAME = racdb1)
 )
 )
 RACDB2 =
 (DESCRIPTION =
 (ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-2-vip)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = racdb)
 (INSTANCE_NAME = racdb2)
 )
 )
 RACDB3 =
 (DESCRIPTION =
 (ADDRESS = (PROTOCOL = TCP)(HOST = lx64rac-3-vip)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = racdb)
 (INSTANCE_NAME = racdb3)
 )
 )

In the above entries, the RAC database is called RACDB, and the instances are on Linux servers where node1 = lx64rac-1, node2 = lx64rac-2, and node3 = lx64rac-3. The RAC instances per node are RACDB1 through RACDB3 respectively.

When configuring the nodes from the CommCell Console, based on the above tnsnames.ora file, the following connect strings would be used:

sys/oracle@racdb1 for connecting to the lx64rac-1 node
 sys/oracle@racdb2 for connecting to the lx64rac-2 node
 sys/oracle@racdb3 for connecting to the lx64rac-3 node

TNS Connect Syntax

For "sys/oracle@racdb2"

sys = oracle account

oracle = password for sys account

racdb2 = the instance name on node lx64rac-2

For Oracle 11gR2 and above, there is a new SCAN connect which is also supported by the Oracle RAC iDataAgent. When scan connections are used, the connect strings would appear as follows:

sys/oracle@lx64rac-1:1521/racdb for connecting to the lx64rac-1 node
 sys/oracle@lx64rac-2:1521/racdb for connecting to the lx64rac-2 node
 sys/oracle@lx64rac-3:1521/racdb for connecting to the lx64rac-3 node

SCAN Connect Syntax

For "sys/password@scan-hostname:port/servicename"

sys = The Oracle account

password = The password for the sys account

scan-hostname:port = The scan node name and port of the listener on node

These connection settings are crucial for the backup to work correctly.

During the discovery phase of the backup, the Agent makes sure that when it connects to a specific instance, it runs a query to confirm that when the connection is made, that it connects to the specific instance on the correct node.

Instead of providing a dedicated service name, if the racdb service name was used, then the listener may establish the connection to any instance/node in the RAC. This becomes a problem since the MediaAgent will be expecting a pipeline connection from a predetermined node (say node1) but the listener based on a round-robin selection, winds up connecting to a different node (node 2 or node 3).

When this mismatched listener to pipeline condition occurs the data pipe will fail to connect causing the backup to also fail.

Here is an example of an RMAN session that gets generated with a Oracle RAC iDataAgent backup (using 2 nodes):

Rman Script:
 [CONFIGURE CONTROLFILE AUTOBACKUP ON;
 run {
 allocate channel ch1 type 'sbt_tape' connect sys/******@lx64rac-1:1521/racdb
 PARMS="SBT_LIBRARY=/opt/Commvault/Base/libobk.so,BLKSIZE=1048576,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch1,ThreadCommandLine=BACKUP -jm 45 -a 2:299 -cl 61 -ins 78 -at 80 -j 21200 -jt 21200:4:1 -bal 2 -rcp 0 -ms 2 -data -ma 16 -chg 1:1 -rac 1 -cn lx64rac-1 -vm Instance001)"
 TRACE 0;
 allocate channel ch2 type 'sbt_tape' connect sys/******@lx64rac-2:1521/racdb
 PARMS="SBT_LIBRARY=/opt/Commvault/Base/libobk.so,BLKSIZE=1048576,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch2,ThreadCommandLine=BACKUP -jm 45 -a 2:299 -cl 61 -ins 78 -at 80 -j 21200 -jt 21200:4:1 -bal 2 -rcp 0 -ms 2 -data -ma 16 -hn client1 -chg 2:1 -rac 2 -cn lx64rac-2 -vm Instance001)"
 TRACE 0;
 setlimit channel ch1 maxopenfiles 8;
 setlimit channel ch2 maxopenfiles 8;
 backup
 incremental level = 1
 filesperset = 4
 database
 include current controlfile spfile ;
 }
 exit;
 ]

In the allocate command there is a "connect" statement that directs RMAN / Oracle to connect to a specific node in the RAC.

The example shown above is using a scan connect. Essentially what happens is that the Oracle RAC iDataAgent starts an RMAN session on the first node listed in the storage tab of the subclient. It then sends an RMAN script/session similar to the one shown above.

Also note in the allocate command, the different nodes can be seen in the parameters passed in from the backup process. This indicates which node the MediaAgent needs to connect a pipeline. If the connect attaches to the wrong node, the SBT layer will fail to connect to the data pipe, as the MediaAgent will be expecting the connection from a specific node as directed in the ENV (environment) of the allocate command.

For this same reason a "/" cannot be used when defining/configuring nodes into a Oracle RAC Agent. This connection requires a network based connection and a "/" connect is a local only connection.

Backup Failures

RAC0002: Job fails due to sbtio.log size

Issue

Sometimes, jobs fail due to increase in the size of sbtio.log file in the $UDUMP directory.

Resolution

To resolve this, set the size limit for the sbtio.log file using the sMAXORASBTIOLOGFILESIZE registry key. Once the specified size limit is reached, the sbtio.log file gets pruned automatically.

RAC0003: Command line backup fails

Issue

Command Line backups fail.

Resolution

  • Make sure if the required media resource is available and then run the backups once again.

  • For on demand backups, you can run more than one script for an instance. However, backup jobs will fail if there are more than one instance in the argument file.

  • For Oracle on Windows, it is recommended to avoid using a space after a comma in the argument file. A backup job will fail if you leave a space after a comma in the argument file.

  • RMAN command line backup fails with the following error

    "Unable to open lock file /opt/commvault/Base/Temp/locks/.dir_lock: Permission denied"

    This may occur if the umask parameter is set as 022 in the .profile file for the Oracle instance. As a workaround, change the umask to 000 or 002 and try the backup again.

RAC0004: Command line backup fails for large backups

Issue

Sometimes, the third party command line jobs may hang when you perform large backups and restores.

Resolution

This happens since ClDBControlAgent updates the job manager for every 100MB data transfer and this causes the thread failure for large backups/ restores after transferring some of the data.

The following exception will be seen in the ClDBControlAgent.log:

5710030 304 02/22 03:47:23 608119 OraAgentBase::NotifyCommServeJobContinue() - m_jobObject->setUnCompBytesToAdd(105119744) ...
5710030 304 02/22 03:47:24 608119 CvThread::start_func() - Unhandled exception.
5710030 405 02/22 03:47:37 608119 ClOraControlAgent::OnClientTimeout() - Got timed out while waiting for msg from client 0

You can set sBYTESDIFFMBS registry key <value> in MBs in OracleAgent/.properties.

This will update the job manager at every <value> in MBs specified in the key.

RAC0005: Offline backup with lights out script fails

Issue

Offline backup using lights out script fails with the following error:

RMAN error "ORA-12528 TNS listener - all appropriate instances are blocking new connections

Resolution

As a workaround, add a reference to the database in the listener.ora file as shown in the example below:

SID_LIST_LISTENER =
 (SID_LIST =
 (SID_DESC =
 (SID_NAME = PLSExtProc)
 (ORACLE_HOME = C:\oracle\product\10.2.0\db_1)
 (PROGRAM = extproc)
 )
 (SID_DESC =
 (SID_NAME = rman10g)
 (ORACLE_HOME = C:\oracle\product\10.2.0\db_1)
 (SID = rman10g)
 )
 )

Oracle offline backup with lights out option fails when you use the default value for retry attempts for the subclient. As a workaround, increase the retry attempts by setting the Tries number value greater than or equal to 5. See Configuring an Offline Subclient for more details.

RAC0006: Backup timeout failure

Issue

The backup fails because of a timeout.

Resolution

The default time for resources to allocate streams during RMAN command line backups is 86400 seconds (i.e., 24 hours). If a backup fails due to a timeout being reached, you can configure the sALLOCATESTREAMSECS registry key to increase the waiting time period.

RAC0007: Backup fails intermittently on Linux clients

Issue

On Linux clients, if the libobk.so library fails to load, the backups may fail.

Resolution

As a workaround, do the following steps:

  1. Log in to the Oracle client computer as root.

  2. From the system prompt, enter the following command:

    ldconfig /<Base_directory_name>

    For example: # ldconfig <software installation path>/Base

This will ensure that the libobk.so library is loaded so that backups for Oracle on Linux can run successfully.

RAC0008: Backup fails on Windows clients

Issue

The backup fails on Windows Clients.

Resolution

Make sure that the Oracle user is part of administrator group. If the user is not part of administrator group, assign group permissions for the user as follows:

  1. From Windows Explorer, right-click ContentStore folder and then select Properties.

  2. Click the Security tab.

  3. Select the user and click Edit.

  4. Click the Allow checkbox for Full Control permission for the user, and then click OK.

  5. From the Registry Editor, navigate to HKEY_LOCAL_MACHINE | SOFTWARE.

  6. Right click CommVault Systems and select Permissions...

  7. Select the user and click Allow checkbox for Full Control permission.

RAC0009: Log backup fails

Issue

If the Oracle database is configured to save the archive logs in the Flash recovery area, and Oracle subclients having both Backup Recovery Area and Archive Delete enabled at the same time then the backup will fail.

Resolution

To resolve this, there should be two different subclients, one for Backup Recovery Area and the other for Archive Delete.

Log backup fails if you select the default USE_DB_RECOVERY_FILE_DEST entry as a log destination for the backup.

To resolve this, make sure that the log destinations are included in the PFile(init<SID>.ora) or SPFile (spfile.ora) file. Also ensure that the correct log destination is selected for the backup.

RAC0010: Database block corruption

Issue

The backup fails with the following error:

LISTING 2: r_20030520213618.log 
 RMAN-00571: =========================================================== 
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== 
 RMAN-00571: =========================================================== 
 RMAN-03009: failure of backup command on d1 channel at 05/20/2003 21:36:26 
 ORA-19566: exceeded limit of 0 corrupt blocks for file 
 /u01/app/Oracle/oradata/MRP/sales_data_01.dbf

Resolution

Make sure that the maximum value for database block corruptions is set for the backup. It is recommended that you set this value to match the number of corrupted database blocks identified by RMAN for the database file being backed up.

RAC0011: Backup fails because of $ORACLE_HOME/sqlplus/admin/glogin.sql

Issue

If the following line is present in the $ORACLE_HOME/sqlplus/admin/glogin.sql file, it may cause the SrvOraAgent server process on the CommServe to fail when browsing database contents or executing a backup.

set linesize 80

Resolution

To avoid such failures, comment out that line from the file and re-try the browse or backup operation.

  • Backup fails with following error:

    Character conversion not supported

    By default, the iDataAgent sets the NLS_LANG environment variable to American_America.US7ASCII character set. However, if the Oracle database on the client uses a different NLS character set (eg., WE8MSWIN1252), theiDataAgent’s backup operations may fail.

    In such cases, use the <oracle_SID>_NLS _LANG additional setting to set the NLS_LANG environment variable to American_America.<database_character_set> on the client computer.

RAC0012: Validation Error When Running RMAN Scripts

Issue:

An error message containing "Provide Valid Token" is returned when an RMAN script runs.

Solution

A valid token file was not included in the request.

  1. Run the qlogin command with the token file option (-f) to obtain a token file.

  2. Use the CvQcmdTokenFile parameter with the token file that the qlogin command generates.

    For information on required and optional SBT parameters, see SBT Parameters.

RAC0013: User Error When Running RMAN Scripts

Issue:

An error message containing "Provide competent user" is returned when an RMAN script runs.

Solution

The user does not have the correct permissions in the CommCell Console to run the backup jo

RAC0015: Some Channels Unexpectedly Terminate during a Multistream Oracle RAC 12c Backup

Issue

Some channels unexpectedly terminate during a multistream Oracle RAC 12c backup before the operation completes.

Solution

Check the value of the PGA_AGGREGATE_LIMIT database parameter and increase it. The minimum default value that Oracle recommends is 2 GB. You can get more information in Oracle Doc ID 1520324.1.

RAC0016: RMAN third party Command Line backups are not running

Before you run backups from the RMAN command line for the Oracle RAC Agent, set the SBT_LIBRARY path and environment variables for CvClientName and CvInstanceName in the RMAN script. For example, on a Solaris client, provide the following path:

util_par_file = <ORACLE_HOME>/dbs/init@.utl 
rman_parms="BLKSIZE=1048576,SBT_LIBRARY=/opt/commvault/Base64/libobk.so,ENV=(CvClientName=sunsign,CvInstanceName=Instance001)" rman_channels=1

where Cvclientname and CvInstancename are the names of the client and instance (for example, Instance001) where the SAP for Oracle Agent is installed.

On a Windows client, edit the $ORACLE_HOME\database\init<SID>.sap file and provide the parameter as given below

util_par_file = <ORACLE_HOME>\database\init@.utl
RMAN_PARMS="SBT_LIBRARY=,BLKSIZE=1048576,ENV=(CvClientName=<client>,CvInstanceName=<client_name>)"

where Cvclientname and CvInstancename are the names of the client and instance (for example, Instance001) where the Oracle RAC Agent is installed.

The SBT_LIBRARYfor the various platforms are listed below:

Platform

SBT_LIBRARY

AIX with 64 bit Oracle

<Client Agent Install Path>/Base/libobk.a(shr.o)

HP UX PA RISC 64 bit Oracle

<Client Agent Install Path>/Base64/libobk.sl

Solaris with 64 bit Oracle

<Client Agent Install Path>/Base64/libobk.so

All Other Unix platforms

<Client Agent Install Path>/Base/libobk.so

NOTE: The SBT_LIBRARY parameter is not applicable on Windows platforms.

When you use the RMAN utility on Solaris client, set the following parameter on the client computer:

crle -64 -c /var/ld/64/ld.config -l/opt/commvault/Base64:/lib/64:/usr/lib/64

RAC0017:  A Backup Might Fail If You Have a Custom glogin.sql File

We recommend that you create a login.sql file for the users and add any additional customized queries instead of adding them in glogin.sql. The software executes the customized queries.

Notes:

  • If you use SQL*Plus 12.1.0.2.0, then set the ORACLE_PATH parameter to the location of your sql.login file because the SQLPATH parameter is not supported until version 12.2.x (Windows configuration).

  • If you use SQL*Plus v12.2.x, then apply patch 25804573 from the Oracle support site. This addresses the known issue that prevents the software from reading the SQLPATH parameter.

For more information, go to the Oracle support site, Bug:25804573 SQL PLUS 12.2 NOT OBSERVING SQLPATH IN REGISTRY OR ENV VARIABLE FOR LOGIN.SQL.

RAC0018: Channel Allocation Fails

Issue

The channel allocation fails if you installed the Commvault software on a different path on an Oracle RAC node. The following error is displayed:

ORA-19554: error allocating device, device type: SBT_TAPE, device name:

Resolution

If you use SCAN connect to invoke RMAN, then the Commvault software must be installed on the same path on all Oracle RAC nodes.

Loading...