You can connect to website sources using Data Cube. After the data source is added, you can restructure the data. For example, you can combine data from two or more data sources into a single data source. For information about the available data restructuring options, see Restructuring Data in Data Cube.
Before You Begin
If you are using Data Cube to connect to a secure website, then you must obtain credentials for logging on to the website.
Procedure
-
From the Data Cube dashboard, next to Web Site, click Add New.
Tip
Alternatively, click Web Site to open the Data Sources (Web Site) page, and then click Add Web Site in the upper-right corner.
The New Data Source (Web Site) page appears.
-
In the Data Source Name section, enter the following information:
-
Field Name
Description
Steps
Index Server
The Index Server entity that you want to use for the data source.
The Index Server list is populated by the Index Servers configured for Data Cube in your CommCell Console.
Click the Index Server list and select an Index Server.
Data Source Name
The name of the data source as it will appear in the Data Cube dashboard.
-
Enter a name for the data source.
Note: Only alphanumeric, dash, and underscore characters are supported.
Data Source Description
A short description of the data source that is visible to any user who can view the data source in Data Cube.
- Optional: Enter a description for the data source.
-
-
Click Next.
-
In the Web Site URL(S) section, enter the following information:
Field Name
Description
Steps
URL
The full URLs of the websites that you want to crawl in the data source.
-
Enter the full URL to the websites that you want to crawl.
Tip
You can enter multiple URLs on separate lines.
Domain Name
For secure websites, enter the domain for the user name that you want to use to access the website.
- Enter the domain name for the username to access the secure website.
Username
For secure websites, the user name that can access the website.
- Enter the username to access the secure website.
Password
For secure websites, the password for the username used to access the website.
- Enter the password for the secure website.
-
-
Click Next.
-
In the Advanced Options section, enter the following information:
Field Name
Description
Steps
Include Paths
Use this option to include one or more specific web pages when crawling the website.
If you enter URLs in the Include Paths field, then no other URLs are crawled.
-
Optional: Enter the URL to the webpage you want to include in the data source.
Tip
You can enter multiple URLs on separate lines.
Exclude Paths
Use this option to exclude one or more specific web pages when crawling the website.
If you enter URLs to Exclude Paths, then the following occurs:
-
The excluded web pages are never crawled, even if you enter them in the Include Paths field.
-
Links that appear within the excluded web pages are also not crawled, unless those pages also appear within a different page that is included in the data source.
-
Optional: Enter the URL to the webpage you want to exclude from the data source.
Tip
You can enter multiple URLs on separate lines.
Crawl Depth
The number of successive links from the first URL that you want to include in the data source. If you want to increase the number of pages that are crawled in your data source, increase the crawl depth.
- Select the crawl depth value that you want to configure.
-
-
Click Submit.
The data source name Configuration page appears.