Adding a Website Data Source to Data Cube

You can connect to website sources using Data Cube. After the data source is added, you can restructure the data. For example, you can combine data from two or more data sources into a single data source. For information about the available data restructuring options, see Restructuring Data in Data Cube.

Before You Begin

If you are using Data Cube to connect to a secure website, then you must obtain credentials for logging on to the website.

Procedure

  1. From the Data Cube dashboard, next to Web Site, click Add New.

    Tip

    Alternatively, click Web Site to open the Data Sources (Web Site) page, and then click Add Web Site in the upper-right corner.

    The New Data Source (Web Site) page appears.

  2. In the Data Source Name section, enter the following information:

  3. Field Name

    Description

    Steps

    Index Server

    The Index Server entity that you want to use for the data source.

    The Index Server list is populated by the Index Servers configured for Data Cube in your CommCell Console.

    Click the Index Server list and select an Index Server.

    Data Source Name

    The name of the data source as it will appear in the Data Cube dashboard.

    • Enter a name for the data source.

      Note: Only alphanumeric, dash, and underscore characters are supported.

    Data Source Description

    A short description of the data source that is visible to any user who can view the data source in Data Cube.

    • Optional: Enter a description for the data source.
  4. Click Next.

  5. In the Web Site URL(S) section, enter the following information:

    Field Name

    Description

    Steps

    URL

    The full URLs of the websites that you want to crawl in the data source.

    • Enter the full URL to the websites that you want to crawl.

      Tip

      You can enter multiple URLs on separate lines.

    Domain Name

    For secure websites, enter the domain for the user name that you want to use to access the website.

    • Enter the domain name for the username to access the secure website.

    Username

    For secure websites, the user name that can access the website.

    • Enter the username to access the secure website.

    Password

    For secure websites, the password for the username used to access the website.

    • Enter the password for the secure website.
  6. Click Next.

  7. In the Advanced Options section, enter the following information:

    Field Name

    Description

    Steps

    Include Paths

    Use this option to include one or more specific web pages when crawling the website.

    If you enter URLs in the Include Paths field, then no other URLs are crawled.

    • Optional: Enter the URL to the webpage you want to include in the data source.

      Tip

      You can enter multiple URLs on separate lines.

    Exclude Paths

    Use this option to exclude one or more specific web pages when crawling the website.

    If you enter URLs to Exclude Paths, then the following occurs:

    • The excluded web pages are never crawled, even if you enter them in the Include Paths field.

    • Links that appear within the excluded web pages are also not crawled, unless those pages also appear within a different page that is included in the data source.

    • Optional: Enter the URL to the webpage you want to exclude from the data source.

      Tip

      You can enter multiple URLs on separate lines.

    Crawl Depth

    The number of successive links from the first URL that you want to include in the data source. If you want to increase the number of pages that are crawled in your data source, increase the crawl depth.

    • Select the crawl depth value that you want to configure.
  8. Click Submit.

    The data source name Configuration page appears.

What to Do Next

Crawling a Data Source in Data Cube

Loading...