You can configure Entity Extraction to recognize custom entity types in your data based on matching regular expressions.
About Python Regular Expressions for Custom Entities
When you create the regular expression for your custom entity, use the Python regular expression syntax. The following contains information about using Python regular expressions for custom entities.
Note
Regular expressions for custom entities are case insensitive.
Escape Backslash Characters
If your Python regular expression contains a backslash character, then you must replace each backslash character with two backslashes. For example, given a custom entity books\books, you would create the Python regular expression, books[\\]books
.
Non-Capturing Groups
If you want to use parentheses () in your regular expression, use the non-capturing version (?:). Capturing contents of a group is not supported. Review the following example:
Correct - non-capturing version includes the question mark and colon (?:
).
\s(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?!\.)\b
Incorrect - original version does not include the question mark and colon (?:
).
\s((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?!\.)\b
Before You Begin
-
You must have deployed Content Analyzer and Entity Extraction in your CommCell Environment. For more information, see Content Analyzer and Entity Extraction.
-
Ensure that the regular expressions you use to create the custom entity types are written in Python format. For more information, see the regular expression documentation for Python.
-
You must be able to login and run QScripts on the CommServe computer. For more information, see Using QScripts from the Command Line.
Procedure
-
Log in to the CommServe computer, and then create a text file that contains the following JSON object:
{"entity_key": "data_field_name", "entity_regex": "custom_regular_expression"}
where:
-
data_field_name is the name of the data field that stores the entities on the Index Server database. You use the data field name when you query a data source for a particular entity, as in the entity match pattern.
-
custom_regular_expression is the regular expression used to identify data that is included in the custom entity. The regular expression must be written in the Python format.
For example, to create a custom entity named Vehicle_IdNum to extract entities that match the pattern of one alphabetical character followed by six numerical characters (such as A543256), you would create the following JSON object:,
{"entity_key": "Vehicle_IdNum", "entity_regex": "[A-Z]{1}[0-9]{6}"}
-
-
Save the file with the .txt extension, and note the file location.
Note
Do not save the file with the .json extension.
-
Open a Command Prompt window as an administrator.
-
To add the custom entity, go to software_installation_path/Base, and then run the following QScript:
qoperation execscript -sn QS_CreateCustomEntity.sql -si 'EntityName' -si '{ADD|REMOVE|UPDATE}' -si 'PathToTextFile'
where:
-
EntityName is the name of the entity type that you want to display in the Web Console and CommCell Console.
-
Select one of the following operations:
-
ADD, to add a new custom entity.
-
REMOVE, to remove a custom entity.
-
UPDATE, to update an existing custom entity with a new data field name or regular expression text file. You cannot change the entity name.
-
-
PathToTextFile is the local path to the .txt file you created on the CommServe computer.
For example, to add a custom entity using a text file located in C:\custom_entiteis\vehicle_idnum.txt and name the custom entity VehicleIdentificationNumbers, run the following QScript:
qoperation execscript -sn QS_CreateCustomEntity.sql -si 'VehicleIdentificationNumbers' -si 'Add' -si 'C:\custom_entities\vehicle_idnum.txt'
Afterwards, you can select the VehicleIdentificationNumbers entity from the entity extraction settings in the Web Console and CommCell Console.
-