Handling "Unknown" Dimension atrributes that are Type 2 changes
2 posters
Page 1 of 1
Handling "Unknown" Dimension atrributes that are Type 2 changes
In the data warehouse we're building, we've got an Employee dimension that has a self referencing attribute called ReportsTo which stores the surrogate key of their line manager who is also in the DW (in most cases). This attribute is a type 2 change as we want to capture historical heirachical changes in the company, however some people aren't always assigned a valid manager (its a really crappy system that i wont bore you with) but suffice to say they can store invalid managers rather than not speicfying one (such as someone at the top of the tree)
Obvioulsy as we process thir dimension record their ReportsTo would come up as an "unknown"/-1 and so assuming we put this into the dimension table any facts that occur involving this employee would be stored against this employees surrogate key. Now assuming we get them to fix the ReportsTo in the source system this would come through as type 2 change and so a new dimensional record would be created and any new events with this employee would be against that. Now the problem as i see it would lie in the cube as events would be rolled up against an "Unknown" manager.
Would it be better not to load these dimensions until they have fixed the "unknown" reports to as other wise it would give inaccurate historical reporting. ie treat it almost like a fact record as in "its not going in the DW unless the values are correct and valid".
hopefully that kind of makes sense
cheers for any thoughts
Obvioulsy as we process thir dimension record their ReportsTo would come up as an "unknown"/-1 and so assuming we put this into the dimension table any facts that occur involving this employee would be stored against this employees surrogate key. Now assuming we get them to fix the ReportsTo in the source system this would come through as type 2 change and so a new dimensional record would be created and any new events with this employee would be against that. Now the problem as i see it would lie in the cube as events would be rolled up against an "Unknown" manager.
Would it be better not to load these dimensions until they have fixed the "unknown" reports to as other wise it would give inaccurate historical reporting. ie treat it almost like a fact record as in "its not going in the DW unless the values are correct and valid".
hopefully that kind of makes sense
cheers for any thoughts
meb97me- Posts : 34
Join date : 2010-07-28
Re: Handling "Unknown" Dimension atrributes that are Type 2 changes
Even though it is a type 2, it doesn't mean you MUST create a new row when something changes. The situation you describe is one where it makes sense to simply update in place.
What is not clear is what happens if other attributes, of which you are also tracking history, change between the time the employee is first loaded and the management hierarchy is cleaned up. You will need to work out how you want to update the dimension in such instances. The ultimate goal is to try to capture as accurate and useful history as possible and your update process should do whatever it needs to do to achieve that.
What is not clear is what happens if other attributes, of which you are also tracking history, change between the time the employee is first loaded and the management hierarchy is cleaned up. You will need to work out how you want to update the dimension in such instances. The ultimate goal is to try to capture as accurate and useful history as possible and your update process should do whatever it needs to do to achieve that.
Re: Handling "Unknown" Dimension atrributes that are Type 2 changes
thanks again for a great response ngalemmo we'll adapt our ETL process as necessary
meb97me- Posts : 34
Join date : 2010-07-28

» Date dimension, "unknown" entry
» rationale behind dimension with Type 0 and missing Type 5
» Type 2 dimension or type 2 column?
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» rationale behind dimension with Type 0 and missing Type 5
» Type 2 dimension or type 2 column?
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|