Restore Troubleshooting

PSQL0001: The latest log file in the pg_log directory displays error after a successful restore operation

Symptom

Following error appears in the latest log file after completion of a successful restore operation.

2014-03-04 18:13:47 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9
 2014-03-04 18:13:48 IST LOG: archive command failed with exit code 1
 2014-03-04 18:13:48 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9
 2014-03-04 18:13:49 IST LOG: archive command failed with exit code 1
 2014-03-04 18:13:49 IST DETAIL: The failed archive command was: test ! -f /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9 && cp pg_xlog/0000000700000001000000B9 /vkmgk/PostgreSQL/9.2/wal/0000000700000001000000B9
 2014-03-04 18:13:49 IST WARNING: transaction log file "0000000700000001000000B9" could not be archived: too many failures

Cause

Error appears in the log file if the archive command is run along with the test utility. Since both pg_xlog and WAL directories are backed up by the software, we cannot have the same copy of transaction log file in these directories.

Resolution

This error does not interfere with the backups and restores performed by the software, hence you can ignore them.

However, restarting the PostgreSQL Server will eliminate these warnings in the future.

To avoid the warnings, use the archive command without the test utility as shown in the example below.

archive_command = 'cp %p /opt/PostgreSQL/9.1/archive_dir/%f'

For more information on running a correct archive command, see PSQL0005.

PSQL0002: The file system restore job may sometimes complete with errors

Symptom 1

Failed to start the server

Cause 1

This occurs due to an improper shutdown of PostgreSQL server before a restore job.

Resolution 1

Follow these steps to resolve the issue,

Use the following command to manually start the PostgreSQL server:

bash-3.2$ ./pg_ctl -D /postgres/PostgreSQL/<version>/data start

The system will try to start the server and display the following message,

pg_ctl: another server might be running; trying to start server anyway
 2011-02-15 09:57:59 GMTFATAL: pre-existing shared memory block (key 5432001, ID 688130) is still in use
 2011-02-15 09:57:59 GMTHINT: If you're sure there are no old server processes still running, remove the shared memory block or just delete the file "postmaster.pid".
 pg_ctl: could not start server
 Examine the log output.

Now type the following command,
```
[root@cherry ~]# ps -ef|grep data
    
```

The system will display the following message,

root 1789 1562 0 05:04 pts/6 00:00:00 grep data

If server is already running, stop the server and resume the job. Type the following command,
```
[root@cherry ~]# ipcs -a
    
```

The system will display the following,

------ Shared Memory Segments --------
 key shmid owner perms bytes nattch status
 0x00000000 65537 gdm 600 196608 2 dest
 0x00000000 688130 postgres 600 37879808 1 dest
 ------ Semaphore Arrays --------
 key semid owner perms nsems
 0x0052e2c3 983042 postgres 600 17
 0x0052e2c4 1015811 postgres 600 17
 0x0052e2c5 1048580 postgres 600 17
 0x0052e2c6 1081349 postgres 600 17
 0x0052e2c7 1114118 postgres 600 17
 ------ Message Queues --------
 key msqid owner perms used-bytes messages

Remove all semaphores and rename postmaster.pid. Use the following command to rename the postmaster.pid.

bash-3.2$ mv postmaster.pid postmaster.pid-old
 resumed pending job. job will complete successfully and server should also start successfully.

If you are using PostgreSQL 12, follow the steps mentioned in Consideration for Restoring FSBased Backup Set on PostgreSQL 12.

Symptom 2

Failed to start the server

When you try to start the server manually, you get the following error.

$ ./pg_ctl -D /var/lib/postgresql/9.1/main start
 server starting
 $ postgres cannot access the server configuration file "/var/lib/postgresql/9.1/main/postgresql.conf": No such file or directory

Cause 2

This error occurs because the PostgreSQL configuration files are not present in the data directory.

Resolution 2

Create hard link to the PostgreSQL configuration file under data directory as shown in the following example.

root@pgubun124:~# ln /etc/postgresql/9.1/main/postgresql.conf /var/lib/postgresql/9.1/main

Now, start the server.

If you are using PostgreSQL 12, follow the steps mentioned in Consideration for Restoring FSBased Backup Set on PostgreSQL 12.

Symptom 3

Failed to start the server

Data directory is restored with incorrect permissions. You can see the following error in the PostGresRestoreLog.log file:

23600 6c197720 08/17 14:53:43 2235831 PostGresLogApply command = su - pgtest -c '/opt/PostgreSQL/9.2/bin/pg_ctl -w -t 900 -D /opt/pgsql/data start -l /tmp/postgresLog5432' failed ret = 1 out =  err = [pg_ctl: could not access directory "/opt/pgsql/data": Permission denied

Cause 3

This error occurs because the parent directory of the Data and Wal directories were deleted before running the restore operation.

Resolution 3

Fix access and owner permissions, and then start the PostgreSQL server manually.

To avoid this issue in the future, before running a restore operation, verify the following points:

A parent directory with proper permissions exists.
Delete only data and archive log directories. Do not delete their parent directory.

If you are using PostgreSQL 12, follow the steps mentioned in Consideration for Restoring FSBased Backup Set on PostgreSQL 12.

Symptom 4

Failed to start the server

Cause 4

During a cross-machine restore operation, user ID and group ID of the source and destination computers do not match.

Resolution 4

Fix access and owner permissions, and then start the PostgreSQL server manually.

To avoid this issue in the future, before running a restore operation, verify that user ID and group ID of the source and destination computers are the same.

If you are using PostgreSQL 12, follow the steps mentioned in Consideration for Restoring FSBased Backup Set on PostgreSQL 12.

Symptom 5

The restore job restored only the configuration files. Other folders under data directory are not restored.

Cause 5

PostgreSQL server was started with a symlink to the parent folder of data directory.

Resolution 5

Perform an application-free restore of a PostgreSQL server. For more information, see Restoring and Recovering a Server (Application-Free).
After the restore is complete, start the server using symlink to the data directory, instead of parent folder of the data directory to prevent this issue in the future.

Symptom 6

For backups taken from standby node, the restore operation fails to start server with the following error in PostgreSQL application log: FATAL: could not locate required checkpoint record.

Cause 6

This issue might occur if restore operation is performed from a backup that is initiated immediately after a failover or restore job. Suppose that the restored logs belong to timeline 3. The Server start might look for logs in timeline 2. The PostgreSQL application logs appear like:

2024-04-06 17:07:31.088 IST [33715] LOG:  starting point-in-time recovery to 2024-04-06 16:53:10+05:30
cp: cannot stat ‘/pgsql/wal_13/00000002.history’: No such file or directory
cp: cannot stat ‘/pgsql/wal_13/00000002000000000000001F’: No such file or directory
2024-04-06 17:07:31.098 IST [33715] LOG:  invalid checkpoint record
2024-04-06 17:07:31.098 IST [33715] FATAL:  could not locate required checkpoint record

Resolution 6

In backup_label file, check for START TIMELINE value under data directory.
In postgresql.conf file, set the value of recovery_target_timeline similar to the value of start_timeline, and then try to restart the server.

PSQL0003: Restore operation from dump based backup set may fail sometimes

Symptom 1

Database-level restore operation remains in pending state.

Cause 1

This issue may occur when you restore a large database and you do not have sufficient space in the WAL directory during restore.

Resolution 1

Make sure to provide enough space in the WAL directory when you perform dump based restores.

Symptom 2

Table level restores to an auxiliary database that does not exist in the server might fail.

Cause 2

This issue may occur when you use template1 as maintenance database and try to restore tables to an auxiliary database that does not exist in the server. The system will try to create an auxiliary database using template1. Since template1 is already in use as maintenance database, the restore will fail with the following error message:

createdb: database creation failed: ERROR: source database "template1" is being accessed by other users 
 DETAIL: There are 1 other session(s) using the database.

Resolution 2

Make sure to avoid using template1 as maintenance database when you try to restore a database to an auxiliary database that does not exist in the server.

Symptom 3

Table level restores to an auxiliary database that does not exist in the server might fail.

Cause 3

This issue may occur if certain objects are not dropped before performing a restore operation.

Resolution 3

The following database objects will be dropped automatically from PostgreSQL server during the table restore:

Table
View
Domain
Sequence

All the database objects other than the above mentioned objects, are to be dropped manually before performing a restore operation.

Symptom 4

Cross machine restore of database fails as pg_restore fails.

You can see the following error in the PostGresRestore.log file on the client:

10948 4127e940 06/06 13:30:40 215559 PostgresRestore::applyPostGres() - [/opt/PostgreSQL/9.1/bin/pg_restore -C -v -U postgres --port=5432 -d postgres < /opt/Commvault/iDataAgent/jobResults/2/2432/215559//fifo] pump the data, rett = 1

You can also see the error below on the PostgreSQL Server log:

ERROR: role "testdb" does not exist

Cause 4

All the roles present on the source server are not present on the destination server.

Resolution 4

On the destination server, create the roles manually with the same privileges as present on the source server. Then run the restore operation.

PSQL0004: Restore operation on PostgreSQL V9.2 and later completes successfully leaving the server in recovery mode

Symptom

On PostgreSQL version 9.2 and later, the restore operation completes successfully leaving the PostgreSQL Server in read-only or recovery mode.

Running the following query returns a true value:

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
 -------------------
 t
 (1 row)

Cause

For PostgreSQL version 9.2 and later, if the parameter wal_level is set to hot_standby or replica, and the parameter hot_standby is set to ON in the postgresql.conf file, the restore operation is paused after reaching the specified time. If the restore point is satisfactory, you can resume the restore operation by executing one of the following files:

For PostgreSQL version 9.x, execute pg_xlog_replay_resume file.
For PostgreSQL version 10.x and above, execute pg_wal_replay_resume file.

After applying the transaction logs to a specified time, the restore operation is paused. This leaves the PostgreSQL Server in read-only mode.

Since data and WAL directories are restored successfully and the server is online, the software considers that the restore job is successfully completed.

Resolution

If there is no data loss during the restore operation, resume the restore operation by executing the pg_xlog_replay_resume file for PostgreSQL version 9.x.

Executing this file gets the server into the read and write mode instantly.

postgres=# select pg_xlog_replay_resume();
 pg_xlog_replay_resume
 -----------------------
 (1 row)
 postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
 -------------------
 f
 (1 row)

Similarly, execute the pg_wal_replay_resume file for PostgreSQL version 10.x and above.

Executing this file gets the server into the read and write mode instantly.

postgres=# select pg_wal_replay_resume();
 pg_wal_replay_resume
 -----------------------
 (1 row)
 postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
 -------------------
 f
 (1 row)

PSQL0005: Error Code: [94:19] PostgreSQL table-level restore operation fails

Symptom

PostgreSQL table-level restore operation to a staging location fails when the Skip Import check box is selected. You can see the following error in the Job Controller window:

Error Code: [94:19]
 Description: PostgreSQL Database: [~win1320422065~] Restore Failed with PostgreSQL Error: [~Make sure all dependent tables are selected for table level restore. PG Error -- ~].

Cause

If the Collect Object List During Backup check box was not selected during the backup operation, we perform the object list creation during the restore operation. This also requires the schema information for the tables selected for restore.

Resolution

Restore the tables from the schema level in the Browse window.

PSQL0006: The file system restore job may complete with errors after redirect restore or Live Sync replication

Symptom 1

Failed to start the server after redirect restore/live sync replication

You can see the following error in the PostGresRestoreLog.log file:

18249 4749 01/21 14:59:59 100825311 PostGresLogApply command =su - root -c '/opt/PostgreSQL/9.5/bin/pg_ctl -w -t 900 -D /opt/PostgreSQL/9.5/hotstandby/d -o "-c config_file=/opt/PostgreSQL/9.5/hotstandby/d/postgresql.conf" start -l /opt/PostgreSQL/9.5/hotstandby/wal/postgresLog5222' failed ret = 1 out =  err = [pg_ctl: cannot be run as root

Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process.]

Cause 1

Destination PostgreSQL server was not online during the instance creation from CommCell Console. Therefore, the UNIX user is set as the owner of the <postgres_bin_direcory>/pg_ctl file. This changed the permissions of data and WAL directory for the UNIX user set, and hence the server started with the unprivileged user permissions.

Solution

Fix the access and owner permissions of data and WAL directory, and then restart the PostgreSQL server manually.

To avoid this issue in the future, verify that the PostgreSQL server is online before creating an instance from the CommCell Console.

Symptom 2

Data or tablespace directories restored with a cross machine redirect restore is not owned by the UNIX user that owns its parent directory.

Cause 2

UNIX user name is set during instance creation as the user who owns data directory. If the PostgreSQL server was not online during the instance creation, it is set as the owner of the <postgres_bin_direcory>/pg_ctl file.

Directories restored is owned by this user. If pg_ctl file was owned by root, UNIX user of the source instance is used.

Solution

Fix the access and owner permissions of data, tablespace and WAL directories, and start the PostgreSQL server manually. To avoid this issue for future restores, update the UNIX user name in the instance properties from the Command Center.

For more information, see Modifying UNIX User Name for a PostgreSQL Instance.

Browse Error Reporting

During a Browse operation, if one of the following error conditions occurs, an accurate problem description will be reported in the Browse window. This is extremely useful for troubleshooting.

MediaAgent Offline

Index Cache Inaccessible

Offline Requested Data Not Found