Extracting data when there is no timestamp in the source system
+2
beyeguru
DWH
6 posters
Page 1 of 1
Extracting data when there is no timestamp in the source system
Guys, what are your strategies to extract data when the source system does not have any timestamps. Is there any way to still get deltas ?
Thanks
Thanks
DWH- Posts : 2
Join date : 2010-02-17
Re: Extracting data when there is no timestamp in the source system
Your question is too general. Are you trying to extract transaction or dimension records?
beyeguru- Posts : 5
Join date : 2009-08-03
Re: Extracting data when there is no timestamp in the source system
You are right beyeguru, it is very general. But I wanted to see how people have solved this problem. It can be for transactions or dimensions. We can always use CRC to do this but than we would have to load all the data in the staging everyday and then compare CRC. Has anyone used other ways ?
Thanks
Thanks
DWH- Posts : 2
Join date : 2010-02-17
Re: Extracting data when there is no timestamp in the source system
I use a checksum if I cannot get a delta feed.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Extracting data when there is no timestamp in the source system
So long as the table has a primary key you can left join the destination table to get only the new records.
Example:
Select s.Attribute1, s.Attribute2, s.Attribute3......
From SourceTable s
Left JOIN TargetTable t
ON s.PrimaryKey = t.PrimaryKey
Where t.PrimaryKey IS NULL
this will retrieve the records not already in the target.
Example:
Select s.Attribute1, s.Attribute2, s.Attribute3......
From SourceTable s
Left JOIN TargetTable t
ON s.PrimaryKey = t.PrimaryKey
Where t.PrimaryKey IS NULL
this will retrieve the records not already in the target.
Re: Extracting data when there is no timestamp in the source system
I don't know if other database systems have something similar, but for Oracle, I used the ora_rowscn psuedo column because I didnt trust the application controlled modify time field. Its a psuedo column that stores the system change number that gets generated for each transaction. Unfortunantly it applies the SCN to all the rows in the block unless you use rowdependencies. There is an article at OraFaq about it. I keep track of the last ora_rowscn transaction number in my warehouse, and then pull anything greater than that. That will pull all the modifications as well as any other rows that were in that database block. From that I find out the delta to only get the actual modified columns. Oracle provides a better way with Change Data Capture, but due to reasons outside my control, I'm unable to implement.
mugen_kanosei- Posts : 13
Join date : 2009-02-03
Age : 43
Location : Japan
Re: Extracting data when there is no timestamp in the source system
Interesting... I wasn't aware of ora_rowscn. It is certainly useful when there is resistance to modify the OLTP system to support DW extracts... which is usually the case 99% of the time.
Unfortunately, you probably can't index it... so you wind up doing table scans. But, for lack of anything better, it will significantly reduce the number of rows you need to evaluate in the ETL process.
Unfortunately, you probably can't index it... so you wind up doing table scans. But, for lack of anything better, it will significantly reduce the number of rows you need to evaluate in the ETL process.
Similar topics
» Extracting - separate source tables vs SQL query
» Extracting Master/Detail data into Staging Tables
» Poorly sturctured data at source system
» Business Logic: DWH vs. Source system
» Data Architecture for Single Source System (Normalised).
» Extracting Master/Detail data into Staging Tables
» Poorly sturctured data at source system
» Business Logic: DWH vs. Source system
» Data Architecture for Single Source System (Normalised).
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum