Organizational hierarchy with employees and SCD
4 posters
Page 1 of 1
Organizational hierarchy with employees and SCD
Hi,
I have a question on modelling SCDs with my tblDimPosition dimension as depicted in my star schema here:
http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/modelling-free-text-comments-t784.htm
My tblDimPosition basically contains employee data along with each employee's organizational hierarchy information as well (Business Unit, SegmentName, RegionName and BranchName).
My organizational hierarchy information comes from a source data which lists the info as:
BusinessUnit Code, BusinessUnit Name, SegmentCode, SegmentName, RegionCode, RegionName, BranchCode(BSB), CostCenter, BranchName
My employee information comes from a separate source data and contains:
EmployeeID, FirstName, LastName, BSB, CostCenter.
To populate the tblDimPosition dimension, I do a join from the employee source data with the organizational source data via the BSB and CostCenter. This dimension gets populated on a monthly basis.
Now, suppose that in a subsequent month, the name of the RegionName has changed. So the organizational hierarchy source data for that month will have the new RegionName. The RegionCode stays the same because the business has simply decided to call the Region something else. It has not changed the RegionCode. I have been told that when this happens, I should simply overwrite the old RegionName with the new RegionName. The business doesn't want to keep historical information of old names and simply want to report the current name. So this means that I should not expire the existing records in the tblDimPosition dimension, rather I should just overwrite each record that has the old RegionName with the new RegionName. This implies a SCD Type 1 change.
My question is:
A) How can I elegantly go about implementing this logic in the ETL? I go through my source data row by row so whenever I encounter a new RegionName with the same RegionCode in the tblDimPosition dimension, do I just do a mass update of tblDimPosition (ie: update set RegionName= 'NewRegionName' where RegionCode = 'XXXX')? I feel this may not be an elegant solution.
B) I still need to handle situations where say a Region is now considered to be under a different Segment. In this case I would need to treat it as a SCD Type 2 change - so I would need to expire the existing record in tblDimPosition and insert a new record with the new Segment code and name. Is mixing a SCD Type 1 together with a SCD Type 2 change on a single dimension a valid way of doing things?
I hope I have explained my problem clearly enough. Please let me know if I haven't, I will attempt to explain it better.
Cheers.
I have a question on modelling SCDs with my tblDimPosition dimension as depicted in my star schema here:
http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/modelling-free-text-comments-t784.htm
My tblDimPosition basically contains employee data along with each employee's organizational hierarchy information as well (Business Unit, SegmentName, RegionName and BranchName).
My organizational hierarchy information comes from a source data which lists the info as:
BusinessUnit Code, BusinessUnit Name, SegmentCode, SegmentName, RegionCode, RegionName, BranchCode(BSB), CostCenter, BranchName
My employee information comes from a separate source data and contains:
EmployeeID, FirstName, LastName, BSB, CostCenter.
To populate the tblDimPosition dimension, I do a join from the employee source data with the organizational source data via the BSB and CostCenter. This dimension gets populated on a monthly basis.
Now, suppose that in a subsequent month, the name of the RegionName has changed. So the organizational hierarchy source data for that month will have the new RegionName. The RegionCode stays the same because the business has simply decided to call the Region something else. It has not changed the RegionCode. I have been told that when this happens, I should simply overwrite the old RegionName with the new RegionName. The business doesn't want to keep historical information of old names and simply want to report the current name. So this means that I should not expire the existing records in the tblDimPosition dimension, rather I should just overwrite each record that has the old RegionName with the new RegionName. This implies a SCD Type 1 change.
My question is:
A) How can I elegantly go about implementing this logic in the ETL? I go through my source data row by row so whenever I encounter a new RegionName with the same RegionCode in the tblDimPosition dimension, do I just do a mass update of tblDimPosition (ie: update set RegionName= 'NewRegionName' where RegionCode = 'XXXX')? I feel this may not be an elegant solution.
B) I still need to handle situations where say a Region is now considered to be under a different Segment. In this case I would need to treat it as a SCD Type 2 change - so I would need to expire the existing record in tblDimPosition and insert a new record with the new Segment code and name. Is mixing a SCD Type 1 together with a SCD Type 2 change on a single dimension a valid way of doing things?
I hope I have explained my problem clearly enough. Please let me know if I haven't, I will attempt to explain it better.
Cheers.
memphis- Posts : 19
Join date : 2010-10-21
Re: Organizational hierarchy with employees and SCD
Does anyone have any suggestions?
Thanks.
Thanks.
memphis- Posts : 19
Join date : 2010-10-21
Re: Organizational hierarchy with employees and SCD
You're not mixing. It's a type 2. It's just that you are not tracking history for all columns. Some information you just don't care about. Most folks just capture history on all columns since it is easier than doing the analysis to track history on specific columns. What's the concern with updating? Just treat the name change as a type 1 change.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Organizational hierarchy with employees and SCD
Thanks BoxesAndLines for replying.
I've had another think about my issue, and it basically just boils down to what to do when a description changes vs when a code changes.
So with updating, I'm planning to use the MERGE statement in sql, so basically I have:
When matched, based on the codes, then update that row without expiring the record. (Because the codes are the same, I don't care about the description, so update anyway, even if the description is the same as before)
When matched, but codes are different then expire the record and insert a new row. (This means the rollup in the hierarchy has changed so will need to reflect and keep track of this change)
When not matched, insert a new record.
Does this logic sound ok?
I've had another think about my issue, and it basically just boils down to what to do when a description changes vs when a code changes.
So with updating, I'm planning to use the MERGE statement in sql, so basically I have:
When matched, based on the codes, then update that row without expiring the record. (Because the codes are the same, I don't care about the description, so update anyway, even if the description is the same as before)
When matched, but codes are different then expire the record and insert a new row. (This means the rollup in the hierarchy has changed so will need to reflect and keep track of this change)
When not matched, insert a new record.
Does this logic sound ok?
memphis- Posts : 19
Join date : 2010-10-21
Re: Organizational hierarchy with employees and SCD
I only update if I need to update.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Organizational hierarchy with employees and SCD
Thanks for posting your question and your solution - it is the same as something I am dealing with. I think your solution is fine, but it does make your "region description" column neither a Type 1 or a Type 2. The reason it is not a typical type 1 is that you are going back into this history to update the Region names, even if the records are not current. The column is not type 2 either because you only do an insert if your "region code" column changes, not the description.
Here is another possible solution: put your region descriptions and codes into a separate dimension. If you have a "Region" dimension table that has just the code for the region and the description, then you can make the description a type 1 column and when the region gets renamed you update your region table and even the old facts point to the correct description.
You may be wondering what happens when a customer changes region. Do I have to link my customer dim to the region dim? Isn't that snowflaking? No you don't have to link the customer dim to the region dim - just link them both to the fact. Then you can pull all the metrics for customers in a certain region.
I would greatly appreciate feedback from the Kimball community on this topic as it seems it would come up quite often. The question is around when to place related attributes in the same dimension and when to split them.
Another example is the Title of an employee. If I put that in the employee dimension and call it type 2, things work well until someone's title gets renamed. In that situation the employee has not really changed position, but their stats will be split up over two position titles.
Here is another possible solution: put your region descriptions and codes into a separate dimension. If you have a "Region" dimension table that has just the code for the region and the description, then you can make the description a type 1 column and when the region gets renamed you update your region table and even the old facts point to the correct description.
You may be wondering what happens when a customer changes region. Do I have to link my customer dim to the region dim? Isn't that snowflaking? No you don't have to link the customer dim to the region dim - just link them both to the fact. Then you can pull all the metrics for customers in a certain region.
I would greatly appreciate feedback from the Kimball community on this topic as it seems it would come up quite often. The question is around when to place related attributes in the same dimension and when to split them.
Another example is the Title of an employee. If I put that in the employee dimension and call it type 2, things work well until someone's title gets renamed. In that situation the employee has not really changed position, but their stats will be split up over two position titles.
Re: Organizational hierarchy with employees and SCD
Technically it should be type 1 by default, while only updating current record may be simpler but rarely good enough.ryan.shirley wrote:The reason it is not a typical type 1 is that you are going back into this history to update the Region names, even if the records are not current
I can see the separation in the following scenarios:
1. Multi-valued attributes.
2. Low cardinality and highly repeated attributes in a Monster dimension.
3. Attributes in a very wide dimension (>100) that somehow can be grouped into sub-dimensions
4. Attributes from different sources that are loaded in drasticly different ETL process.
5. Fast changing attributes that could be FK'd directly in fact or factless tables.
In my understanding, it comes down to the difference between name change and relationship change, while the former could be type 1 and the latter is likely a type 2 SCD. If the Org structure is purely hierarchical in relation to employee (1-m) and the dimension is not terribly big, combining them into one dimension simplifies the hierarchy by just rolling up on attributes. In a single dimension, the relationship (hierarchical) change becomes SCD 2 attributes change.
However if the relationship between Org and employee is more complex than just hierarchical (eg. m-m), then it fits into scenario 1, so you might have to separate them out anyway. If keeping two sets of SCD2 changes in synch between Org and employee is too much involving, you can simply maintain the current relationship and let FKs in fact table reflect the historical correlations.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Enormous data size
» Help in design about employees cost
» Supplier Dimension to include Customers & Employees
» Design to calculate CHURN (movement of employees between business units)
» One Hierarchy in two dimensions
» Help in design about employees cost
» Supplier Dimension to include Customers & Employees
» Design to calculate CHURN (movement of employees between business units)
» One Hierarchy in two dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum