How use a star schema correctly in a BI system
3 posters
Page 1 of 1
How use a star schema correctly in a BI system
Hello,
I'm currently thinking about using slowly changing dimensions in my organisations. I think the concept of type 2 is clear to me. My question therefore does not directly concern the creation of type 2 scd but the use in the reporting (BI) system.
Explanation:
I have the usual scenarion which is that I have a dimension table with 3 identical natural key records which have 3 different surrogate keys, one of these records is currently active. In the fact table there are several records, a few use the surrogate key one, a few surrogate key two and a few the one that is currently active.
My question now is how I should deploy my dimension. Should I include the entire dimension table (including historic surrogate key records) or only the active dimension members. If I deploy only the active ones, then my report users (of cognos, pentaho, jaspersoft or whatever) can only view data that is referenced to these active dimension members. That must be wrong because it will only show current dimension member attributes and if for instance the customer parent changed or a customer name changed users will always only see the current state of the dimension which also prevents them from seeing historic data (in case the dimension member changed)
If I deploy the dimension with all dimension records, historical and active dimension records, then it is possible for users to retrieve all facts, those that reference to historic dimension members and those that retrieve active ones. The issue here is that this will work fine if the change concerns a change in the structure of a member (for instance one product moves from product group x to product group y) but if the change concerns a change of an attribute (e.g. A product name changes) then there would be 2 members with the same natural key in the same hierarchy. Users would be confused if they see that in they BI tool.
I think. I understood 80% of the basic dwh concept a few important things are still missing. It would be great if you could give me some advice.
Summary of my question
I just don't know how I should deploy my dwh to my users so that they can access the historic facts using the historic dimension members and the current facts using the current dimension members.
Any help very much appreciated.
Thanks!
I'm currently thinking about using slowly changing dimensions in my organisations. I think the concept of type 2 is clear to me. My question therefore does not directly concern the creation of type 2 scd but the use in the reporting (BI) system.
Explanation:
I have the usual scenarion which is that I have a dimension table with 3 identical natural key records which have 3 different surrogate keys, one of these records is currently active. In the fact table there are several records, a few use the surrogate key one, a few surrogate key two and a few the one that is currently active.
My question now is how I should deploy my dimension. Should I include the entire dimension table (including historic surrogate key records) or only the active dimension members. If I deploy only the active ones, then my report users (of cognos, pentaho, jaspersoft or whatever) can only view data that is referenced to these active dimension members. That must be wrong because it will only show current dimension member attributes and if for instance the customer parent changed or a customer name changed users will always only see the current state of the dimension which also prevents them from seeing historic data (in case the dimension member changed)
If I deploy the dimension with all dimension records, historical and active dimension records, then it is possible for users to retrieve all facts, those that reference to historic dimension members and those that retrieve active ones. The issue here is that this will work fine if the change concerns a change in the structure of a member (for instance one product moves from product group x to product group y) but if the change concerns a change of an attribute (e.g. A product name changes) then there would be 2 members with the same natural key in the same hierarchy. Users would be confused if they see that in they BI tool.
I think. I understood 80% of the basic dwh concept a few important things are still missing. It would be great if you could give me some advice.
Summary of my question
I just don't know how I should deploy my dwh to my users so that they can access the historic facts using the historic dimension members and the current facts using the current dimension members.
Any help very much appreciated.
Thanks!
Andi75- Posts : 1
Join date : 2014-04-06
Re: How use a star schema correctly in a BI system
Hi - hopefully this will help a bit:
A record in a fact table records an event - it consists of measures and references to attributes (held in dimensions) that apply to that event. The purpose of a Type 2 SCD is to be able to record the state of those attributes at the moment of the event and for them not to subsequently change over time.
Once you have built these structures then from a reporting perspective that fact that a dimension may, or may not, be an SCD is irrelevant - you are joining between a fact table and dimensions based on the surrogate key values and the issue of whether or not the Dim record(s) being referenced are the current version of each Dim record is of no concern.
Therefore I'm not clear what you mean by "deploy my dimension" - a dimension is a table in your database: what do you mean by "deploy" and how can you do anything other than make the whole table available? In any case, you wouldn't normally expose the underlying database structure to end-users but would use a presentation layer to expose it to them in a meaningful layer (whether that presentation layer is made up of DB Views or a Reporting tool's metadata layer)
A record in a fact table records an event - it consists of measures and references to attributes (held in dimensions) that apply to that event. The purpose of a Type 2 SCD is to be able to record the state of those attributes at the moment of the event and for them not to subsequently change over time.
Once you have built these structures then from a reporting perspective that fact that a dimension may, or may not, be an SCD is irrelevant - you are joining between a fact table and dimensions based on the surrogate key values and the issue of whether or not the Dim record(s) being referenced are the current version of each Dim record is of no concern.
Therefore I'm not clear what you mean by "deploy my dimension" - a dimension is a table in your database: what do you mean by "deploy" and how can you do anything other than make the whole table available? In any case, you wouldn't normally expose the underlying database structure to end-users but would use a presentation layer to expose it to them in a meaningful layer (whether that presentation layer is made up of DB Views or a Reporting tool's metadata layer)
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: How use a star schema correctly in a BI system
You have to use the whole table. Dimensions are your entry point into facts. You can't ever report on historical fact data if your dimension only has current rows.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Star Schema for Surgeries
» Star Schema vs All in one table
» Snowflake or Star Schema?
» Star Schema put to test!
» Star Schema for MPP databases
» Star Schema vs All in one table
» Snowflake or Star Schema?
» Star Schema put to test!
» Star Schema for MPP databases
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|