V11 SP8

Connecting to a File System with Data Cube

You can use Data Cube to collect, organize, and mine the data residing in various file system repositories across your enterprise.

Entity Extraction

If configured by your administrator, you can enable entity extraction for your file system data sources. Entity extraction is a feature that can identify when data contains sensitive information, like credit card numbers and bank routing information.

If you enable entity extraction, entity data fields are appended to your data source schema. During crawling, these additional data fields will be populated with any matching entities found in the data source. You can later search for these entities in Data Cube, and use the built-in PII Dashboard report to visualize the amount of sensitive information is present in your file system data source.

Before You Begin

  • By default, Data Cube will attempt to access the file system using the credentials of the logged in Web Console user. Therefore, you must give read access to the username of the Web Console user performing this action or provide the credentials of an authorized user to accessing the file system when performing this procedure
  • To enable the entity extraction feature, your administrator must have configured a Content Analyzer cloud in the CommCell Console.

    For more information, see Configuring a Content Analyzer Cloud for File System Entity Extraction.


  1. Open the Data Cube dashboard.
  2. Click File System.
  3. On the Data Sources (File System) page, click Add File System.
  4. On the New Data Source (File System) page, configure the source as follows:
    1. Under Data Source Name:
      • Click the Analytics Engine list and select an Analytics Engine to store the analytics data.
      • In Data Source Name, enter a name for the data source. The name cannot contain spaces.
      • In Data Source Description, enter a description for the data source.
      • Click Next to proceed to the next section.
    2. Under Directory Details:
      • In Directory Paths, enter the paths to the file system directories or files that you want to crawl. You can add multiple paths on separate lines.
      • In User Name, enter a valid username with access to the directory paths.

        Note: If you do not provide a username, the data source will be accessed using the credentials you used to log in to the Web Console.

      • In Password, enter the password for the user name used access the directory paths.
      • To filter the files and folders that are included in the crawl, in the Include or Exclude fields, enter a pattern that includes wildcard to specify the files and folders that you want to include or exclude. Enter multiple filters as a comma-separated list.

        For example, to exclude some common multimedia files, in Exclude enter: *avi, *mpg, *mp3, *mp4, *mov. To exclude a folder named Sample Files, in Exclude enter */Sample Files.

      • To collect only the data that has changed since the previous crawl, enable Incremental Crawl.

        Note: The description below the Incremental Crawl button displays the type of crawl that will be performed for the data source.

      • Click Next to proceed to the next section.
    3. Under Advanced Options:
      • To filter the data included in the crawl by size, select a minimum and maximum value for the File Size option. Only files that are within the bounds of these values will be included in the data source.
      • To only index the metadata of the files, enable Index Only Metadata. When this option is selected, users can only search data based on metadata and not the contents of the data.

        Note: The description below the Index Only Metadata button displays the type of indexing that will be performed for the data source.

      • Click Next to proceed to the next section.
    4. Under Entity Extraction:
      • To enable entity extraction for the data source,
        1. Select Enable Entity Extraction.
        2. Click the Content Analyzer list and select the content analyzer cloud that you want to use.
        3. Click the Entities to Extract list and select the check boxes next to the types of entities that you want to include during crawls.

          For more information about the entity types, see Entity Extraction Types for Data Cube.

          Tip: Use the search bar to search for the entities that you want to select.

      • Click Next to proceed to the next section.
    5. Under Data Blending:
      • If you want to configure data blending, select Enable Data Blending and configure the data blending options.

        For more information, see Configuring Data Blending in Data Cube..

      • Select Start Crawling Now to start crawling the data source after the data source is saved.
  5. When finished, click Submit