Precision for the ETL Load/Audit related Timstamp field.
2 posters
Page 1 of 1
Precision for the ETL Load/Audit related Timstamp field.
Hi all,
For every target table, we always append the ETL load related Insert/Update/Delete timestamp for audit and timebased CDC for downstream extractions.
Questions:
1. Is it best practice to implement the timestamp datatype that supports Sub-SECONDS? And why?
2. What are the cons of have a timestamp datatype that only goes down to the SECONDS? Granted we will lose the precision going into the sub-seconds, but for practical purposes, is this really a problem?
It seems like the key here is consistency. As long as every stage of the ETL stream (stage, dim, fact etc) the timestamp datatype is the same, then we should fine.
Comments?
Thanks!
For every target table, we always append the ETL load related Insert/Update/Delete timestamp for audit and timebased CDC for downstream extractions.
Questions:
1. Is it best practice to implement the timestamp datatype that supports Sub-SECONDS? And why?
2. What are the cons of have a timestamp datatype that only goes down to the SECONDS? Granted we will lose the precision going into the sub-seconds, but for practical purposes, is this really a problem?
It seems like the key here is consistency. As long as every stage of the ETL stream (stage, dim, fact etc) the timestamp datatype is the same, then we should fine.
Comments?
Thanks!
juz_b- Posts : 17
Join date : 2009-02-07
Re: Precision for the ETL Load/Audit related Timstamp field.
Considering these are batch processes, I don't think it is really anything to be concerned about. Whole seconds should be fine. The only concern is that when pulling from the source system, you should use the clock on that system to log the time you initiated the extract whenever possible.
Re: Precision for the ETL Load/Audit related Timstamp field.
Thanks!
So are there instances where it would be appropriate to implement in an ODS or Data Warehouse environment? Maybe a realtime ODS or Data Warehouse? Or is it more suitable for OLTP systems?
So are there instances where it would be appropriate to implement in an ODS or Data Warehouse environment? Maybe a realtime ODS or Data Warehouse? Or is it more suitable for OLTP systems?
juz_b- Posts : 17
Join date : 2009-02-07
Re: Precision for the ETL Load/Audit related Timstamp field.
Fractional seconds don't matter in any situation, nor does the precision difference matter between systems.
If you are pulling data from system A, you must use the clock on system A as the point of reference when comparing timestamps to determine if something is 'new'. If you do not use system A's clock, you have problems well beyond any issues fractional second precision issues would cause.
You can even reduce the precision of your benchmark timestamp and it still doesn't change anything. Let's say you call system A to get the time and it returns time in milliseconds. You round the time to the second, or even minute... it doesn't matter, and use that to extract data. The extract SQL would pull new data that arrived between the time of the last pull and the current time. It doesn't matter how often you do this, each extract will always pull new data provided there are no long running active transactions on the source system.
In fact, it is common to back up the current time to avoid issues with active transactions. It is not unusual to get the current time and subtract a few minutes (or whatever is appropriate) and use that as the upper limit.
If you are pulling data from system A, you must use the clock on system A as the point of reference when comparing timestamps to determine if something is 'new'. If you do not use system A's clock, you have problems well beyond any issues fractional second precision issues would cause.
You can even reduce the precision of your benchmark timestamp and it still doesn't change anything. Let's say you call system A to get the time and it returns time in milliseconds. You round the time to the second, or even minute... it doesn't matter, and use that to extract data. The extract SQL would pull new data that arrived between the time of the last pull and the current time. It doesn't matter how often you do this, each extract will always pull new data provided there are no long running active transactions on the source system.
In fact, it is common to back up the current time to avoid issues with active transactions. It is not unusual to get the current time and subtract a few minutes (or whatever is appropriate) and use that as the upper limit.
Similar topics
» Type -2 - Effective From Date...Audit field(Update_TS) from OLTP or System Time ?
» FACT table Incremental load!
» ETL Logging vs Audit Dimension
» ETL Audit and Data Error log Table
» Audit Dimension Help
» FACT table Incremental load!
» ETL Logging vs Audit Dimension
» ETL Audit and Data Error log Table
» Audit Dimension Help
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum