Adding a CSV Data Source to Data Cube

You can add comma-separated values (CSV) files as data sources in Data Cube. After the data source is added, you can restructure the data. For example, you can combine data from two or more data sources into a single data source. For information about the available data restructuring options, see Restructuring Data in Data Cube.

Important

You can upload multiple CSV files to the same data source, either by selecting multiple files or by selecting a ZIP folder that contains the files. If you upload multiple CSV files, then the files must have the same data structure. If the structure of the CSV files is different, then you cannot combine them in a single data source.

Before You Begin

If you are using a Linux computer for the Index Server and you want to crawl an NFS share path, mount the NFS share path on the Index Server computer, and then use the mounted path when you configure the CSV data source.

Procedure

  1. From the Data Cube dashboard, next to CSV, click Add New.

    Tip

    Alternatively, click CSV to open the Data Sources (CSV) page, and then click Add CSV in the upper-right corner.

    The New Data Source (CSV) page appears.

  2. In the Data Source Name section, enter the following information:

  3. Field Name

    Description

    Steps

    Index Server

    The Index Server entity that you want to use for the data source.

    The Index Server list is populated by the Index Servers configured for Data Cube in your CommCell Console.

    Click the Index Server list and select an Index Server.

    Data Source Name

    The name of the data source as it will appear in the Data Cube dashboard.

    • Enter a name for the data source.

      Note: Only alphanumeric, dash, and underscore characters are supported.

    Data Source Description

    A short description of the data source that is visible to any user who can view the data source in Data Cube.

    • Optional: Enter a description for the data source.
  4. Click Next.

  5. In the CSV File Details section, enter the following information:

  6. Field Name

    Description

    Steps

    CSV File(s)

    Use this option to upload files directly to the data source, or specify a location and credentials to access the file on a file system.

    To select multiple CSV files or upload a ZIP folder that contains multiple CSV files, the structure of the data in the files must be identical.

    Tip

    If the data in the CSV file changes and you want to re-crawl the data source, specify the location of the file in the file system.

    Do one of the following:

    • To upload files directly to Data Cube, perform the following steps:

      1. Select Upload CSV Files(s) and then click Upload.

      2. Browse your file system, select one or more files, and then click Open.

        A progress bar appears. When the file is uploaded successfully, the file name appears in the CVS File(s) box.

      3. Repeat this procedure to add more files as necessary.

    • To specify the location of the file, perform the following steps:

      1. Select Specify CVS File(s) Location.

      2. In Folder Path, enter the local or shared path to the CSV file. The path must be relative to the Index Server you selected for this data source.

      3. If the Folder Path location requires different credentials to access the data, enter the credentials in User Name and Password.

    Column Separator

    The delimiter used in the file to separate values.

    • Select the delimiter used in the files.

    Header Information

    Use this option to configure the first row of the file as the column names in the data source.

    • If the first row of data in the file contains the names of the data fields or columns, select First row has column name.

    Columns

    Use this option to specify the names of the data fields or columns that you want to appear in the data source.

    Tip

    You can overwrite the column names that appear in the source files by selecting First row has column name and entering new column names in the Columns box.

    • Enter the column names as a comma-separated list.
  7. Click Next.

  8. In the CSV File Preview section, enter the following information:

    Field Name

    Description

    Steps

    Detect Data Type

    Use this option to allow Data Cube to automatically determine the type of data stored in each column of the data source. You can also use this option to manually configure the data type for each column.

    Tip: Data Cube supports several common data types, such as Boolean, date, integer, string, and UNIX timestamp.

    1. Optional: Click the slider to enable Detect Data Type.

      A data type list appears under each column in the data preview table.

    2. To change the data type, click the list under the appropriate column and select a different data type.

    Incremental Crawl

    Select this option to crawl only updated values in the source files.

    • Optional: Click the slider to enable incremental crawling.

    Primary Key

    When incremental crawl is enabled, the primary key used to uniquely identify rows in the source files.

    • If incremental crawling is enabled, enter a column name that is a unique primary key for the data.
  9. Click Submit.

    The data source name configuration page appears.

What to Do Next

Crawling a Data Source in Data Cube

Loading...