Incremental Crawling of Database Sources in Data Cube

When connecting to a database with Data Cube, you can select Incremental Crawling to perform incremental crawling of the data source. When you select Incremental Crawling, only the values in the database that changed after the previous crawl are updated in the data source.

If you select Incremental Crawling, then you must provide the following additional information.

Option

Description

Incremental Crawling

Select this option to crawl only updated values in the database.

If you select this option, the first crawl of the database uses the query in SQL Query. Subsequent crawls use the queries in Delta SQL Query and Delta Import SQL Query.

If you don't select this option, then all crawls use the query in SQL Query.

Primary Key

The column that is the primary key in the database.

If you select Incremental Crawling, then you must enter this value.

Delta SQL Query

A SQL query that defines the criteria for determining the fields that were updated after the previous crawl.

If you select Incremental Crawling, then you must enter this value.

Delta Import SQL Query

A SQL query that selects the data to import from the database for the rows that match the criteria in Delta SQL Query.

If you select Incremental Crawling, then you must enter this value.

Example Queries

The following examples are queries for incremental crawling of database sources in Data Cube.

For information about connecting to a database, see Connecting to a Database with Data Cube.

Example 1: Crawling Order Details That Changed After the Previous Crawl

Note: The following example uses a data field named last_index_time, which is the time when the previous crawl completed. This field has a strongly typed datetime data type in the local time zone of the Index Server, formatted as YYYY-MM-DD HH:MM:SS (for example, 2016-09-07 08:11:42).

The following example contains configuration information to update the data for orders that changed after the previous crawl of a database table named ORDERITEM.

Field

Query

SQL Query

SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
FROM ORDERITEM

Primary Key

OrderItemID

Delta SQL Query

SELECT OrderItemID FROM ORDERITEM WHERE CONVERT(VARCHAR, CAST(OrderRevievedTime AS datetime), 20) > '${dih.last_index_time}'

Delta Import Query

SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice FROM ORDERITEM WHERE OrderItemID = '${dih.delta.OrderItemID}'

Example 2: Crawling Order Details from the Previous 24 Hours

The following example contains configuration information to update the data for orders that occurred in the previous 24 hours for a database table named ORDERITEM.

Field

Query

SQL Query

SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
FROM ORDERITEM

Primary Key

OrderItemID

Delta SQL Query

SELECT OrderItemID
FROM ORDERITEM
WHERE (OrderRevievedTime > (DATEADD (DAY, -1, GETDATE ())))

Delta Import Query

SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
FROM ORDERITEM
WHERE OrderItemID = '${dih.delta.OrderItemID}'

Loading...