Configuring Entity Extraction in Data Cube

If your administrator has configured the entity extraction feature, then you can enable entity extraction in the data source to uniquely identify special types of information, such as government-issued personal identification numbers, email addresses, phone numbers, financial data, and more. After configuring a data source, these entity types become additional data fields in the data source schema. When Data Cube crawls the data source, strings that match the selected entity types are copied to the appropriate entity extraction data fields. You can then use the entity extraction data in your own applications or view the PII Dashboard report to visualize the amount of sensitive data that exists in your data source.

For more information about entity extraction, see Entity Extraction.

Before You Begin

  • To enable entity extraction, your administrator must configure a Content Analyzer cloud in the CommCell Console. For more information, see Configuring a Content Analyzer Cloud for Data Cube Entity Extraction.

    Note

    Ensure that the data on which you want to perform Entity Extraction is configured to be crawled in the data source. For example, if the data that you want to use is in the content of a file system source, then ensure the Content Search option is not selected. For database sources, ensure the data field that you want to use is included in the SQL query that you specify in the data source configuration.

  • You must have at least one data source. See Connect to Data Sources.

  • You must crawl the data source. See Crawling a Data Source in Data Cube.

Procedure

  1. In a Web browser, log in to the Web Console and then click Analytics.

  2. In the left pane, click the connector group to view the data sources for that connector.

  3. On the Data Sources (connector) page, in the box of the data source that you want to use, click the name of the data source.

    The data source page appears.

  4. In the upper right of the page, click Entity Extraction.

  5. Under Entity Extraction, select Enable entity extraction.

    The entity extraction configuration options appear.

  6. Enter the following information:

    1. In the Content Analyzer list, click the Content Analyzer Cloud to use to extract the entities from the data source.

      The Content Analyzer list is populated by the Content Analyzer Clouds configured in your CommCell Console.

    2. In the Entities to Extract list, click the entity types that you want to identify in the data source.

      During a crawl, the Content Analyzer identifies these entities in the data and stores them in a separate field in the Data Cube data source. For more information about each entity type, see Entity Extraction Types for Data Cube.

    3. In the Fields to Extract Entities From list, enter the data fields in the data source schema to include for entity extraction as follows:

      • If the data source was not crawled at least once, type the names of the data fields that you want to include for entity extraction.

        The field names are case sensitive. You can type multiple fields as a comma-separated list.

      • If the data source was crawled at least once, click the list and select the data field names that you want to include for entity extraction.

Loading...