Sensitive Data Governance: System Requirements and Hardware Specifications

Verify that your environment meets the system requirements for Sensitive Data Governance.

Commvault Packages

To use Risk Analysis for Sensitive Data Governance, install the following packages on computer(s). You will use the computer with the packages to create an Index Server in the Sensitive Data Governance guided setup.

  • Index Store

  • Content Analyzer

  • Web Server

Operating System

The computer that you install the required Commvault packages on must use one of the following operating systems.

Windows

  • Microsoft Windows Server 2022 Editions

  • Microsoft Windows Server 2019 Editions

  • Microsoft Windows Server 2016 x64 Editions

  • Microsoft Windows Server 2012 R2 x64 Editions

  • Microsoft Windows Server 2012 x64 Edition

  • Red Hat Enterprise Linux/CentOS

Linux

  • Red Hat Enterprise Linux/CentOS 7.3 and above

Hardware Specifications for the Index Server

Use the following guidelines to select the appropriate hardware for your Index Server.

Considerations

  • Size up to the larger specification if either of the following cases applies to you:

    • Your environment is between two sizes

    • You are using one Index Server for both Exchange data sources and Exchange backups

  • If you are using one Index Server for both files and email messages, use the size that matches the larger source. For example, if you have 40 TB of file source data and 15 TB of email source data, use the specification for a medium-sized server.

  • The following options affect the performance of the Index Server:

    • Custom entities that include multiple regular expressions

    • Data classification plans that include the optical character recognition (OCR) option or the pre-generated previews option

Specifications for File Data and Email Messages

Component

Medium

Small

File source data size per node

160 TB

80 TB

File objects per node (an estimate based on an average file size of 2 MB and on the assumption that 50 percent of documents are eligible for content indexing and there is more than one version of the file)

40 million

20 million

Email source application size

15 TB

5 TB

Email objects per node (an estimate based on an average message size of 100 KB. Messages with attachments are considered a single object)

150 million

50 million

CPU

16 cores

8 cores

RAM

32 GB

16 GB

Index disk space

(SSD class disk recommended)

6 TB

3 TB

Note

If the size of backed up data exceeds the prescribed limits, then you must configure dedicated index servers for email and file data based on the specifications as mentioned in the Specifications for Dedicated Servers for File Data and Specifications for Dedicated Servers for Email Messages sections.

Specifications for Dedicated Servers for File Data

Component

Large

Medium

Small

Source data size per node

320 TB

160 TB

80 TB

Objects per node (estimated based on an average file size of 2 MB and on the assumption that 50 percent of documents are eligible for content indexing and there is more than one version of the file)

80 million

40 million

20 million

Objects per node (estimated for live scan based on an average file size of 2MB)

160 million

80 million

40 million

CPU or vCPU

16 cores

16 cores

8 cores

RAM

64 GB

32 GB

16 GB

Index disk space

(SSD class disk recommended)

12 TB

6 TB

3 TB

Specifications for Dedicated Servers for Email Messages

Component

Large

Medium

Small

Source application size

25 TB

15 TB

5 TB

Objects per node (an estimate based on an average message size of 100 KB. Messages with attachments are considered a single object)

250 million

150 million

50 million

Number of mailboxes (based on an average mailbox size of 5 GB, and an average of 50,000 messages per mailbox)

5000

2000

400

CPU

16 cores

16 cores

8 cores

RAM

64 GB

32 GB

16 GB

Index disk space

(SSD class disk recommended)

10 TB

6 TB

2 TB

Hardware Specifications for the Content Analyzer

Content analyzer computers detect personally identifiable information (PII) in the data. Some Risk Analysis environments require dedicated content analyzer computers.

Review the following use cases to determine if you need dedicated content analyzer computers:

  • You need to identify machine learning-based entities. For example, Address, Contextual Date, Location, Money, Organization, Person, and Time are entities that are identified by using machine learning. For a complete list of built-in entities, see Personally Identifiable Information.

  • You want to configure parallel processing. You can add multiple content analyzer computers, and then create a separate data classification plan for each content analyzer.

Before you install the Content Analyzer package, use the following guidelines to select the appropriate hardware for your content analyzer computers.

Use case

CPU

RAM

Disk space

Machine learning-based entities

32 cores

64 GB

2 TB

Parallel processing

16 cores

32 GB

1 TB

Tip

To distribute the processing load, you can install the Content Analyzer package on multiple computers.To distribute the processing load, you can install the Content Analyzer package on multiple computers.

Loading...