Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Incremental Loading of data

3 posters

Go down

Incremental Loading of data  Empty Incremental Loading of data

Post  dbadwh Fri Jan 20, 2012 1:01 pm

Hi,
In our GL_account Dimension table, the data from source data base comes from 6 tables. We need to load the data incrementally into data mart using SSIS. Means, we want to load only the changed records using the SQL Server CDC feature. We have the server enabled with CDC. There is a account id which links the 4 tables. But the two table needs to be connected using the client id. These tables undergoes frequent changes in the source and needs to be updated into the data mart. Now the challenge here is, we need to load these two table which does not contain the account id. As per the incremental load using cdc, only these two tables needs to be loaded and not the entire set of tables and I don't want to do a full load.. Can any one tell me how to overcome this situation? Is there any strategy available for this?

dbadwh

Posts : 31
Join date : 2011-09-30

Back to top Go down

Incremental Loading of data  Empty Re: Incremental Loading of data

Post  PeteGrace Sun Jan 22, 2012 5:49 pm

A strategy that's worked for me is to load each of the required source tables incrementally, but into a staging DB / layer where I can query the full current picture of each table. We need to check for changes in any of the tables that will affect the row in the dimension, so our first stage in processing the dimension would be something like this:

SELECT {required columns}
FROM Table1
LEFT JOIN Table2 ON Table1.account_id = Table2.account_id
LEFT JOIN Table3 ON Table1.account_id = Table3.account_id
LEFT JOIN Table4 ON Table1.account_id = Table4.account_id
LEFT JOIN Table5 ON Table1.client_id = Table5.client_id
LEFT JOIN Table6 ON Table1.client_id = Table6.client_id
-- Change detection
WHERE Table1.changed_date = @yesterday_date
OR Table2.changed_date = @yesterday_date
OR Table3.changed_date = @yesterday_date
OR Table4.changed_date = @yesterday_date
OR Table5.changed_date = @yesterday_date
OR Table6.changed_date = @yesterday_date

The tables we're querying would be replicas of the source tables that we'd bulk load initially and then update via the CDC, setting a changed_date so that downstream processing can work with the replicas incrementally as above. The query above won't be the most efficient ever as the OR's in the WHERE clause mean it can't use an index on changed_date in any of the tables, but it's a significant improvement on bulk loading, bearing in mind that there might be a few transformations we need to do in between selecting the data and updating the dimension.

PeteGrace

Posts : 7
Join date : 2011-09-01

Back to top Go down

Incremental Loading of data  Empty Re: Incremental Loading of data

Post  dbadwh Mon Jan 23, 2012 5:57 am

Will it not create heavy impact on performance?

dbadwh

Posts : 31
Join date : 2011-09-30

Back to top Go down

Incremental Loading of data  Empty Re: Incremental Loading of data

Post  PeteGrace Mon Jan 23, 2012 7:21 am

Depends how big your tables are, but as I say, compared with the alternative of doing a bulk load it should represent a significant improvement.

If you're worried that your warehouse won't be able to cope with that sort of join then the only other option I can think of would be to update the dimension rows in multiple passes.

e.g.
UPDATE dimension
SET dimension.columnA = Table1.columnA etc.
FROM dimension
INNER JOIN Table1 ON account_id
WHERE Table1.changed_date = @yesterday_date

...

UPDATE dimension
SET dimension.columnB = Table5.columnB etc.
FROM dimension
INNER JOIN Table5 ON client_id
WHERE Table5.changed_date = @yesterday_date

and so on...

PeteGrace

Posts : 7
Join date : 2011-09-01

Back to top Go down

Incremental Loading of data  Empty Re: Incremental Loading of data

Post  Mike Honey Mon Jan 30, 2012 10:52 pm

Hi dbadwh,

I liked PeteGrace's first suggestion personally. I think you might be able to invoke the date indexes by moving the WHERE clause statements inside each LEFT JOIN, e.g.

LEFT JOIN Table2 ON Table1.account_id = Table2.account_id
AND Table2.changed_date = @yesterday_date

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Incremental Loading of data  Empty Re: Incremental Loading of data

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum