Loading...

Evaluating a Disk for Hosting the Deduplication Database

You can use simulateddb option with SIDB2 tool to evaluate the disk in which you plan to host the deduplication database (DDB). This helps to determine the size of the data and the DDB that can be hosted on the disk.

You can also use the user-interface version of this tool. See Deduplication Database Simulator for more details and usage.

Procedure

  1. Log on to the MediaAgent computer that you plan to host the DDB.
  2. From the command prompt, go to the software_installation_directory/Base folder, and run the following command by using one or more of the following parameters.

    Options

    Descriptions

    -simulateddb

    The keyword to simulate the DDB to evaluate the disk compatibility for hosting the DDB.

    -p

    The path where the DDB files will be located during simulation.

    For example: D:\DDB01

    -e

    To use the existing DDB files for simulation.

    • If -e option is used, the files from the existing DDB must be copied under folder names as  DDBSimulation and the path for -p option must be the location where the Primary.dat or Primary.idx files are located.

      For example: D:\DDBSimulation

    • If -e option is not used, the command will create new SIDB_Folder_n folder under provided path.

    -in

    The instance of the software using the tool.

    -datasize

    The application data size in GB.

    -threads

    Number of threads that are accessing the DDB.

    Default: 8

    Range: 1-8

    -dratio

    The expected deduplication ratio.

    Default: 5.

    -blocksize

    The deduplication data block size in KB.

    Default: 128 KB.

    -tlimit

    The query and insertion (Q&I) time limit in microseconds.

    Default: 1000.

     -tlimit and -datasize options cannot be used together.

    -cleanddb

    The files that are created during simulation process are deleted after completion of simulation.

    -noprunesim

    By default, pruning simulation is enabled to run with the DDB simulation only when the -tlimit param is specified.

    This parameter disables pruning simulation on the DDB.

    -outfile

    The location of the output file that stores the DDB simulation results.

    Syntax

    • Windows

      sidb2 -simulateddb {-p <DDBLocation> [-e]} -in <instance#> [-datasize <number>] [-threads <number>] [-dratio <number>] [-blocksize <number>] [-tlimit <number>] [-cleanddb] [-noprunesim] [-outfile <output file path>]

    • Linux

      ./sidb2 -simulateddb {-p <DDBLocation> [-e]} -in <instance#> [-datasize <number>] [-threads <number>] [-dratio <number>] [-blocksize <number>] [-tlimit <number>] [-cleanddb] [-noprunesim] [-outfile <output file path>]

Examples

  • For details on the projected average transaction time for an insert or query in the DDB based on the size of the application data that is backed up, use -simulateddb and -datasize options.

    sidb2 -simulateddb -in instance001 -p d:\DDB -datasize 500 -outfile D:\simulationresults.txt

  • For recommendations on the maximum application data size that can be backed up using the DDB based on the average access time for each record, use -simulateddb.

    This will run till it reaches the default threshold time limit of 1000 microseconds.

    sidb2 -simulateddb -in instance001 -p d:\DDB -outfile D:\simulationresults.txt

  • To run DDB simulation using existing DDB files. This simulation will run till it reaches the Q&I time threshold of 150 microseconds.

    SIDB2.exe -simulateddb -p f:\DDBSimulation\CV_SIDB\2\n\Split00 -e -in instance002 -tlimit 150 -outfile D:\simulationresults.txt

Output

The details of the DDB simulation are stored in the output file specified in the -outfile parameter. The following information is a sample of the contents of an output file.

SIDB2.exe -simulateddb -p f:\simulateDDB -in instance002 -tlimit 150

Warning!!
SIDB tool will create a new DDB now. It may take long for the tool to get finish
ed. You can cancel the operation and use -e option to use existing DDB instead.

Creating new DDB files under: [f:\simulateDDB\SIDB_FOLDER_1]

Performing QueryInsert ... [Wed Jun 25 22:59:28 2014]

[Parameters Used]
Threshold Time Limit -> [150.0] microseconds
Dedupe Ratio -> [5]
Block Size -> [128] KB
No. of threads -> [10]
Simulate pruning -> [YES]
No. of records already present:
[0] Primary records, [0] Secondary records.

Iteration [36430000] [Thu Jun 26 00:15:37 2014]
Total Primary records - [70742821]
Total Secondary records - [353713929]
Total QueryInsert time - [3866.471682] secs
Total Commit time - [180.877113] secs
Average time for last [10000] operations:
QueryInsert - [1753.26] microseconds
Commit - [4.94] microseconds
QueryInsert + Commit - [1758.20] microseconds
Moving average for last [500000] operations:
Moving average - [161.55] microseconds
----
Pruning iteration [320126]
Total Primary records pruned - [5172325]
Total Secondary records pruned - [25861695]
Total (Pri + Sec) pruning time - [4355.65] seconds
Total ZeroRef records pruned - [5111740]
Total ZeroRef pruning time - [133.59] seconds
Total Archive Files pruned - [78]
Time for last iteration
(Pri + Sec) - [0.013] seconds
ZeroRef - [0.000] seconds
(Pri + Sec) + ZeroRef - [0.014] seconds
Moving Average for last [50] iterations
Moving Average - [0.017] seconds
Pruning thread exiting as QI threads have exited.
Simulation threshold [QI time] reached. Number of QI threads [10].
Threshold = [150.00] microseconds.
Current value = [161.55] microseconds.

No. of records at threshold limit:
[70756183] Primary records, [353780725] Secondary records.

QueryInsert time taken per connection = [4187.761398] secs
Max. QueryInsert time taken = [4226.064840] secs
Commit time taken per connection = [189.973720] secs
Max. Commit time taken = [193.188206] secs
QueryInsert + Commit Time per connection = [4377.735118] secs

Deduplication DB Simulation Completed [Thu Jun 26 00:15:38 2014]

The disk is capable of hosting a Deduplication DB for:
42.174 TB of Application Data size
8.435 TB of data on disk
5.623 TB of front end application size
115.3 microseconds average Query & Insert time per block
Throughput for DDB server 35514 GB per Hour

Last modified: 3/1/2019 9:52:20 PM