Topics | Related Topics


Achieving Parallel Data Protection Operations for Database using Data Streams

Achieving Parallel Data Protection Operations for File System iDataAgents

Achieving Parallel Data Protection Operations for NAS NDMP Data

Considerations for Multiple Streams


A data stream can be thought of as a data channel that connects the client file system or database to the storage media. Multiple streams provide for multiple channels through which data can flow. When used, multiple streams provide the means to parallelize an operation and thus improve the rate at which data can be written to or retrieved from the storage media.

The DB2, DB2 DPF, Informix, Microsoft SQL Server, Oracle, Oracle RAC, Sybase, and SAP iDataAgents support multiple streams per subclient, or instance. (Note that the SAP for MAXDB iDataAgent supports multiple streams for database file backups only, not for log and control file backups.) In addition, the Automatic File System Multi-Streaming feature extends multiple stream per subclient support to additional iDataAgents, configurable at the subclient level.

Agents that perform data protection operations using single streams can use any drive in the library provided it is not in use and not off-line. Note too that a given data stream always writes to or reads from the same media group. (This topic is discussed further in Removable Media Groups.)

The following illustration depicts how data streams are used in single and multi-stream data protection operations.

The top portion shows a data protection operation through the single data stream. When this operation begins, the subclient initiates one process to transfer the data. This data travels the data stream to the storage media.

The bottom portion of the illustration shows a hypothetical multi-stream database data protection operation. In this example, the data protection operation is configured to use three data streams. When the operation begins, the subclient launches three processes, one for each data stream, and transfers different database objects through the streams to the media. Since the data transfer occurs over three data streams, each conveying different database objects, all of the database objects involved in the data protection operation are written to three distinct media groups.

  • Data protection operations that cannot be parallelized using multiple data streams, can be run in parallel using subclients. For further details on this topic, see Establishing Parallel Data Protection Operations via Subclients.
  • The number of streams indicated on an Informix iDataAgent CommCell Console screen (e.g., the Informix Restore Options screen) may be greater than the number of streams that was set in the BAR_MAX_BACKUP parameter within the ONCONFIG file. This occurs because of system operations.

Achieving Parallel Data Protection Operations for Database using Data Streams

This section describes how the Agents for some database applications exploit multiple data streams to parallelize their data protection operations. This topic is being discussed to provide you with a thorough understanding of how the software addresses different operating environments. With this understanding, you will be able to make more informed decisions when you configure the system to support the various types of Agents.

Deploying multiple data streams in a data protection operation enables the subclient to distribute the database objects to all the streams and transmit those objects in parallel to the storage media. Hence a database, or portion thereof, that secures data using three data streams takes about one third the time than the same set of database objects would require using a single stream.

Data streams are configured attributes of each storage policy. When configuring a storage policy, you specify the maximum number of data streams that you want the subordinate copies to support. This number may be subject to limitations, depending on the type of storage device that you are using. For information relating to specific types of storage hardware (e.g., tape, magnetic disk, etc.) see Hardware-Specific Resource Issues.

The following illustration shows the relationships between storage policies, copies and data streams. Note that this is only a relational diagram. In terms of the physical layer, a data stream extends between a client computer and the storage media.

In this illustration, two storage policies complete with their own copies are shown. Each storage policy contains two copies each with the same number of streams. Copies 1 are the primary copies and carry all data for their respective storage policies. Copies 2 are secondary copies. They are used for auxiliary copy operations. Each data stream maps to a discrete set of archive files on the storage media.

It should be noted that, for clarity, the figure omits the data protection attributes such as compression mode, associated library, and retention periods. As explained in Storage Policy Copies, these attributes are established for each copy.

Allocating Data Streams

The maximum number of streams that can be created simultaneously must be the same for all copies within a given storage policy. The reason for this is that for some databases, the number of streams through which they are restored/recovered must equal the number of streams through which they were backed up. If different storage policy copies supported different numbers of streams, operations would fail if they tried to use one copy to restore/recover data that was backed up through a different copy with a greater number of streams.

Consequently, the maximum number of streams available to each copy of a given storage policy is limited by the smallest number of streams available to any copy within the storage policy. If the limiting factor severely hampers the efficiency of one of the copies (e.g., if a copy directed to magnetic disk media is limited by the restrictions placed on a copy directed to tape media), you may want to create separate storage policies for the different copies. For additional information, see Hardware-Specific Resource Issues.

Setting up Data Streams

You can either add or reduce the maximum number of data streams from the General tab of the Storage Policy Properties dialog box.

However, keep in mind that each stream requires the use of one media drive. Thus the maximum number of data streams can be as follows:

You can change the number of data streams if the storage policy does not have any data from data protection operations associated with it. However, it is recommended that you do not decrease the number of streams for a storage policy which contains data associated with a subclient which supports multiple streams. (For example, in the SQL Server iDataAgent, after running a backup using a storage policy with three streams, it is recommended that you do not decrease the number of streams for the storage policy.)

Data Streams on the Primary Copy vs. Secondary Copy

The number of data streams must be the same on the primary copy as the number of streams defined for the storage policy. However, for a secondary copy that combines streams, this number can be defined. For more information on combining streams, see Auxiliary Copy With Combined Streams.

Achieving Parallel Data Protection Operations for File System iDataAgents

Refer to Automatic File System Multi-Streaming for information on how streams can be used for non-database iDataAgents.

See Also:

Achieving Parallel Data Protection Operations for NAS NDMP Data

Refer to Backup - NAS NDMP - Multiple Data Stream Backups for information on how streams can be used to back up data on a NAS NDMP file server.

Considerations for Multiple Streams

Before performing any procedures using multiple streams, review the following information:

Back to Top