Architecture questions : implement historical data view without SCD?
3 posters
Page 1 of 1
Architecture questions : implement historical data view without SCD?
Hello !
I need some fresh points of view of an architecture I’ve just encountered.
This solution was designed by a consulting firm specialized in the HR IT field.
The DWH in this case is supposed to handle 100’000 employees for an organization of 10 layers and 3’000 entities. It is needed to address the reporting for every usual HR process (count of employees, remuneration, formation, leaves…), with drillthrough and historical view of the data.
To implement the historical view of the data, they haven’t used SCDs. For an historical view on an attribute of a dimension, they have created an additional dimension table containing the historical attribute and added a foreign key to this dimension in the fact table.
The actual view can be seen through the original dimension, the historical view through the additional dimension. For example:
* Dimension Employee (PK_E ; Employee Code ; Employee Name ; Employee Address)
* Dimension Employee Historical Address (PK_EHA ; Employee Historical Address)
* Fact Table (FK_E ; FK_EHA ; Data Date, Salary)
A variation of this method was used to address the need when it’s not attributes but layers in a hierarchy that must be tracked:
* Dimension Organization Level 1 (PK_O1 ; Code Level 1, …)
* Dimension Organization Level 2 (PK_O2 ; Code Level 2, …)
* Dimension Organization Level 3 (PK_O3 ; Code Level 3, …)
* Fact Table (FK_O1 ; FK_O2 ; FK_O3 ; Data Date ; Salary)
* Historical Fact Table (FK_O1 ; FK_O2 ; FK_O3 ; Organization Date ; Salary)
The link between the two fact tables is the FK OrgLevel3 since it’s the final granularity.
To see the actual data you have to request the Fact Table and join it to the dimensions.
To see historical data you have to request the Fact Table at the data date you want, link it to the historical fact table with the FK on Level3, filter the organization date you want and then join the dimensions.
Using this method, requesting the DWH was slow, so they created one aggregate table for each layer of each organization hierarchy (7+9) and each functional subject (5).
Now the DWH have over 100 dimension tables, 140 aggregate tables, 140 stored procedures to feed them and we are only at the part 2 of 7 of the project.
Why not use Analysis Services since the solution is full Microsoft 2005, I don't know.
And I wonder about the DWH growth in term of objects in base, but also in term of lines in the tables. The historical fact tables have to store 100% of the lines that are presented each month.
With the data profile here, a SCD would only require 5 updates et 5 inserts every 100 lines.
Am I missing something? Does this implementation have some crucial advantage that I overlooked?
Thx for your answers!
I need some fresh points of view of an architecture I’ve just encountered.
This solution was designed by a consulting firm specialized in the HR IT field.
The DWH in this case is supposed to handle 100’000 employees for an organization of 10 layers and 3’000 entities. It is needed to address the reporting for every usual HR process (count of employees, remuneration, formation, leaves…), with drillthrough and historical view of the data.
To implement the historical view of the data, they haven’t used SCDs. For an historical view on an attribute of a dimension, they have created an additional dimension table containing the historical attribute and added a foreign key to this dimension in the fact table.
The actual view can be seen through the original dimension, the historical view through the additional dimension. For example:
* Dimension Employee (PK_E ; Employee Code ; Employee Name ; Employee Address)
* Dimension Employee Historical Address (PK_EHA ; Employee Historical Address)
* Fact Table (FK_E ; FK_EHA ; Data Date, Salary)
A variation of this method was used to address the need when it’s not attributes but layers in a hierarchy that must be tracked:
* Dimension Organization Level 1 (PK_O1 ; Code Level 1, …)
* Dimension Organization Level 2 (PK_O2 ; Code Level 2, …)
* Dimension Organization Level 3 (PK_O3 ; Code Level 3, …)
* Fact Table (FK_O1 ; FK_O2 ; FK_O3 ; Data Date ; Salary)
* Historical Fact Table (FK_O1 ; FK_O2 ; FK_O3 ; Organization Date ; Salary)
The link between the two fact tables is the FK OrgLevel3 since it’s the final granularity.
To see the actual data you have to request the Fact Table and join it to the dimensions.
To see historical data you have to request the Fact Table at the data date you want, link it to the historical fact table with the FK on Level3, filter the organization date you want and then join the dimensions.
Using this method, requesting the DWH was slow, so they created one aggregate table for each layer of each organization hierarchy (7+9) and each functional subject (5).
Now the DWH have over 100 dimension tables, 140 aggregate tables, 140 stored procedures to feed them and we are only at the part 2 of 7 of the project.
Why not use Analysis Services since the solution is full Microsoft 2005, I don't know.
And I wonder about the DWH growth in term of objects in base, but also in term of lines in the tables. The historical fact tables have to store 100% of the lines that are presented each month.
With the data profile here, a SCD would only require 5 updates et 5 inserts every 100 lines.
Am I missing something? Does this implementation have some crucial advantage that I overlooked?
Thx for your answers!
Viim- Posts : 2
Join date : 2009-07-29
Re: Architecture questions : implement historical data view without SCD?
What they are doing with employee makes sense. Junking attributes (that is, creating junk dimensions) that you need to keep historical perspective on (such as address) simplifies a lot of things. But I don't understand where they are going with the hierarchy.
Certainly, the HR actions fact table should give you a point in time picture of what organization an employee belongs to. A much simpler hierarchy bridge with effective dates should allow you to report that organization against both current and past organizational structures. In a HR model I implemented to support workforce planning and analysis, the organization had a very ragged organizational hierarchy running as many as 12 levels deep. Certain organizations were designated as "business units", "departments" and "offices", however, they could be at any level of the hierarchy. The solution was to simply carry an attribute on the organizational unit as well as in the hierarchy bridge that identified one of these cases. There was also a flattened hierarchy table (with the lowest level org unit as the key, plus effective dates) that carried these three desginated units. The company only has around 860 org units, so it is not expected that these tables will ever get very big.
Certainly, the HR actions fact table should give you a point in time picture of what organization an employee belongs to. A much simpler hierarchy bridge with effective dates should allow you to report that organization against both current and past organizational structures. In a HR model I implemented to support workforce planning and analysis, the organization had a very ragged organizational hierarchy running as many as 12 levels deep. Certain organizations were designated as "business units", "departments" and "offices", however, they could be at any level of the hierarchy. The solution was to simply carry an attribute on the organizational unit as well as in the hierarchy bridge that identified one of these cases. There was also a flattened hierarchy table (with the lowest level org unit as the key, plus effective dates) that carried these three desginated units. The company only has around 860 org units, so it is not expected that these tables will ever get very big.
Re: Architecture questions : implement historical data view without SCD?
Thanks for your answer, but I think that my rusted English hasn’t made me clear enough.
Attributes
They are not junking attributes: they are maintaining attributes at the same time in the original dimension and in attribute dimensions. Because this is how they implement the historization.
And attributes we are talking about are not flags or attributes with a low number of values, it’s things like addresses having an 1 : 1 cardinality with the entity they refer to.
Hierarchies
The hierarchies of the organizations of this case have nothing special. They are regular, well structured and employees can only be linked to the lower level of them.
I understand that you had to find a specific solution for a specific situation, but in this case I just can’t understand why they didn’t use a classical SCD2 to do that.
A SCD2 on the organization dimension with a normal fact table just do the same thing as what has been implemented, but with a lower growth rate, easier querying and greater performance.
I just don’t get it...
Attributes
They are not junking attributes: they are maintaining attributes at the same time in the original dimension and in attribute dimensions. Because this is how they implement the historization.
And attributes we are talking about are not flags or attributes with a low number of values, it’s things like addresses having an 1 : 1 cardinality with the entity they refer to.
Hierarchies
The hierarchies of the organizations of this case have nothing special. They are regular, well structured and employees can only be linked to the lower level of them.
I understand that you had to find a specific solution for a specific situation, but in this case I just can’t understand why they didn’t use a classical SCD2 to do that.
A SCD2 on the organization dimension with a normal fact table just do the same thing as what has been implemented, but with a lower growth rate, easier querying and greater performance.
I just don’t get it...
Viim- Posts : 2
Join date : 2009-07-29
Re: Architecture questions : implement historical data view without SCD?
I don't understand it either. If I want to maintain a current view as well as a historic view for a given dimension, I create two FK's on the fact table. The first FK points to the historical dimension row. The second FK is a static column in the dimension that stays the same for the life of the dimension (i.e. the natural key of the dimension row). What you have implemented almost sounds like an outrigger implementation for highly volatile dimension values. Instead of just capturing history for rapidly changing values, you capture all history regardless of volatility. If that is in fact your situation, you have reasons to be concerned.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Data Warehouse Questions
» Tracking of historical data using SCD2 in a non-dimensional data model
» ODS Data Architecture
» Is An Historical Database Always a Data Warehouse?
» data model architecture for economic forecast data
» Tracking of historical data using SCD2 in a non-dimensional data model
» ODS Data Architecture
» Is An Historical Database Always a Data Warehouse?
» data model architecture for economic forecast data
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum