Full history staging tables
3 posters
Page 1 of 1
Full history staging tables
In my ETL process, I typically move data from my source system(s), to a staging table of some sort and then from the staging tables into my dimensional model. My routines run nightly and because of that, my highest level of resolution on data is 1 day. I detect daily changes only.
The source systems tend to not have very good history tracking / date tracking of modifications, so I've relied on simply bringing source-system data in and comparing it to my staged data to see if there are any updates I need to bring into the staging DB. FYI - I'm referring to data such as "Vehicle", "Employee", etc. Things that would typically end up being modeled as an actual dimension.
In the past I've been all over the board with what would constitute an update (or even *how* to persist the updated data, i.e. overwrite, new record, etc) because it's been case by case depending upon the table in question, the fields in question, etc.
I recently had a thought: Why don't I just bring in every field for a given table, and if there is a change to any of them, stamp a new row in my staging DB with row_effective_date and row_expiration_date fields. This will allow me to have a full history of every record which creates limitless possibilities with my data warehouse and modeling new data from the staging tables.
This feels like it's using a sledgehammer to solve a problem, but really, it's the most straight forward to implement, I can treat all tables the same vs. having vastly different rules per each staging table and I can have an enormous swath of options when it comes to actually creating a dimensional model from the staged data.
Thoughts? Has anyone done this before?
The source systems tend to not have very good history tracking / date tracking of modifications, so I've relied on simply bringing source-system data in and comparing it to my staged data to see if there are any updates I need to bring into the staging DB. FYI - I'm referring to data such as "Vehicle", "Employee", etc. Things that would typically end up being modeled as an actual dimension.
In the past I've been all over the board with what would constitute an update (or even *how* to persist the updated data, i.e. overwrite, new record, etc) because it's been case by case depending upon the table in question, the fields in question, etc.
I recently had a thought: Why don't I just bring in every field for a given table, and if there is a change to any of them, stamp a new row in my staging DB with row_effective_date and row_expiration_date fields. This will allow me to have a full history of every record which creates limitless possibilities with my data warehouse and modeling new data from the staging tables.
This feels like it's using a sledgehammer to solve a problem, but really, it's the most straight forward to implement, I can treat all tables the same vs. having vastly different rules per each staging table and I can have an enormous swath of options when it comes to actually creating a dimensional model from the staged data.
Thoughts? Has anyone done this before?
ryno1234- Posts : 33
Join date : 2015-01-07
Re: Full history staging tables
I think maintaining a full history in your staging tables is a sound approach.
I am currently involved in a DW build where the following is used:
- Tables with a (reliable) Create/UpdateTime field in the source system have a new record written to the staging table when the UpdateTime is above the high-water mark. No checks are made to see if there are any changes to the underlying data.
- Other tables are extracted in full each day and compared to the staging table using a MD5 hash. If there are any changes, a new record is written.
I am currently involved in a DW build where the following is used:
- Tables with a (reliable) Create/UpdateTime field in the source system have a new record written to the staging table when the UpdateTime is above the high-water mark. No checks are made to see if there are any changes to the underlying data.
- Other tables are extracted in full each day and compared to the staging table using a MD5 hash. If there are any changes, a new record is written.
seemiyah- Posts : 2
Join date : 2015-08-18
Re: Full history staging tables
I guess I don't understand how the 'staging area' would be any different than the data warehouse itself. After all, isn't the goal to retain all data and a historical record? If one has the capacity and the business need, why not just make all the dimensions Type 2?
The reality is a business simply does not need a full historical record of all data. A lot of data has no historical business value. That's why we make decisions in the design.
But, if you go this route, the work to maintain a Type 2 dimension is pretty much the same as maintaing the staging area. So why create another repository?
The reality is a business simply does not need a full historical record of all data. A lot of data has no historical business value. That's why we make decisions in the design.
But, if you go this route, the work to maintain a Type 2 dimension is pretty much the same as maintaing the staging area. So why create another repository?
Similar topics
» Where do you put your staging Tables? DWH DB or as a separate DB
» FK btw tables at staging level
» Design of Staging Tables
» Staging Area, copy tables
» Handling History tables in Dimensional Model
» FK btw tables at staging level
» Design of Staging Tables
» Staging Area, copy tables
» Handling History tables in Dimensional Model
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum