Relationship between scd hierarchies and then between fact tables
5 posters
Page 1 of 1
Relationship between scd hierarchies and then between fact tables
I have some SCD Type 2 dimension to be related hierarchically, as follows:
DIM_CATEGORY
DIM_SUBCATEGORY
DIM_PRODUCT
DIM_TIME
Also I have two fact tables related to Product dimension.
I need to relate the dimensions so that:
1) To see all subcategories by category
2) To see all products by subcategory and category
3) To see the lastest data of each hierchachy related to data in fact tables
4) To see data from fact tables, associated wiht 1, 2 and 3.
4) To show by month, the respective data from dimensions and facts according the "time". For example, for November 2011, if I want to see information (from fact tables) associated with Prod_Code=P1, I need to show the data product wich corresponds to Prod_Id = 1, category data which corresponds to Categ_Id =3 and the subcategory data which corresponds to Sub_Id=2.
I have found information about how to design these associations, but I always find examples where there is a single scd hierarchy level for dimensions, related with fact tables.
I would appreciate any idea how to solve this situation.
DIM_CATEGORY
Categ_ID | Categ_Code | C_Description | C_start_Date | C_end_date | C_active | C_Other_data |
1 | C1 | CAT1.0 | 01/01/2011 | 05/03/2011 | 0 | ZZZ |
2 | C1 | CAT1.1 | 05/03/2011 | 08/20/2011 | 0 | ZXY |
3 | C1 | CAT1.2 | 08/20/2011 | 1 | PPP | |
4 | C2 | CAT2.0 | 04/15/2011 | 1 | LLL |
DIM_SUBCATEGORY
Sub_ID | Sub_code | S_Description | S_start_date | S_end_date | S_active | S_Other_data | Categ_Code |
1 | S1 | SUB1.0 | 05/01/2011 | 10/25/2011 | 0 | Dsds | C1 |
2 | S1 | SUB1.1 | 10/25/2011 | 1 | Sass | C1 | |
3 | S2 | SUB2.0 | 01/24/2011 | 02/25/2011 | 0 | Qidiu | C1 |
4 | S2 | SUB2.1 | 02/25/2011 | 06/13/2011 | 0 | Pret | C1 |
5 | S2 | SUB2.2 | 06/13/2011 | 1 | Lkdj | C1 | |
6 | S3 | SUB3.1 | 04/21/2011 | 1 | Wwl | C2 |
DIM_PRODUCT
Prod_ID | Prod_code | P_Description | P_start_date | P_end_date | P_active | P_Other_data | Sub_Code |
1 | P1 | PROD1.0 | 06/05/2011 | 1 | Rrrs | S1 | |
2 | P2 | PROD2.0 | 07/13/2010 | 09/10/2011 | 0 | Gfsr | S1 |
3 | P2 | PROD2.1 | 09/10/2011 | 1 | Pkjkj | S1 | |
4 | P3 | PROD3.0 | 05/10/2011 | 12/06/2011 | 1 | Plll | S2 |
5 | P4 | PROD4.0 | 01/12/2011 | 1 | Tsfs | S3 |
DIM_TIME
Time_ID | Month | Semester | Year |
1 | 01 | 1 | 2011 |
2 | 02 | 1 | 2011 |
3 | 03 | 1 | 2011 |
4 | 04 | 1 | 2011 |
5 | 05 | 1 | 2011 |
6 | 06 | 1 | 2011 |
7 | 07 | 2 | 2011 |
8 | 08 | 2 | 2011 |
9 | 09 | 2 | 2011 |
10 | 10 | 2 | 2011 |
11 | 11 | 2 | 2011 |
12 | 12 | 2 | 2011 |
13 | 01 | 1 | 2012 |
Also I have two fact tables related to Product dimension.
I need to relate the dimensions so that:
1) To see all subcategories by category
2) To see all products by subcategory and category
3) To see the lastest data of each hierchachy related to data in fact tables
4) To see data from fact tables, associated wiht 1, 2 and 3.
4) To show by month, the respective data from dimensions and facts according the "time". For example, for November 2011, if I want to see information (from fact tables) associated with Prod_Code=P1, I need to show the data product wich corresponds to Prod_Id = 1, category data which corresponds to Categ_Id =3 and the subcategory data which corresponds to Sub_Id=2.
I have found information about how to design these associations, but I always find examples where there is a single scd hierarchy level for dimensions, related with fact tables.
I would appreciate any idea how to solve this situation.
Last edited by Claudia_CR on Tue Jan 10, 2012 6:25 pm; edited 1 time in total (Reason for editing : missing prod_code)
Claudia_CR- Posts : 4
Join date : 2012-01-09
Re: Relationship between scd hierarchies and then between fact tables
Hello,
Interesting problem...
I am curious why you have split out your hierarchies. (As opposed to including category and subcategory attribute values in the product dimension) Have you reasons for not consolidating the values into a single dimension?
Interesting problem...
I am curious why you have split out your hierarchies. (As opposed to including category and subcategory attribute values in the product dimension) Have you reasons for not consolidating the values into a single dimension?
pcs- Posts : 20
Join date : 2009-02-03
Re: Re: Relationship between scd hierarchies and then between fact tables
I have split out my hierarchies because each one has its own history, ie, as I show in the example, the data of the categories may change without the data from the subcategories or products change, and vice versa. Also I can not associate a top-level hierarchy to a hierarchy of lower level, because the ranges of validity of each time may not coincide. For example, suppose that the data of the any category change and effective ranges are as follows:
01/01/2011 - 06/15/2011 --> Category Data 1
06/05/2011 - to present --> Category Data 1.1
And then suppose the data of the associated subcategory is:
03/05/2011 - to present --> Subcategory Data 3
To represent changes in the category for all subcategories and associated products, I would have to create "n" records, and this will not show the data corresponding to every month.
My big problem is, how I can relate the changes between the hierarchies, how to represent this over time (to show the respective data by month) and also associate them with the facts?
01/01/2011 - 06/15/2011 --> Category Data 1
06/05/2011 - to present --> Category Data 1.1
And then suppose the data of the associated subcategory is:
03/05/2011 - to present --> Subcategory Data 3
To represent changes in the category for all subcategories and associated products, I would have to create "n" records, and this will not show the data corresponding to every month.
My big problem is, how I can relate the changes between the hierarchies, how to represent this over time (to show the respective data by month) and also associate them with the facts?
Claudia_CR- Posts : 4
Join date : 2012-01-09
Re: Relationship between scd hierarchies and then between fact tables
Based on Kimball's dimensional modeling, category and subcategory should be denormlised into product dimension, unless it's not hierarchical. The exact business case has be elaborated in the first 2 chapters in his Toolkit 2nd edition.
To sum up, category and subcategory are just two sets of attributes in product dimension which should be type 2 SCD if the hierarchical relationship change needs to be tracked over the time.
To sum up, category and subcategory are just two sets of attributes in product dimension which should be type 2 SCD if the hierarchical relationship change needs to be tracked over the time.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Relationship between scd hierarchies and then between fact tables
This is a snow flake structure where you are trying to normalize the product dimension for a set of reasons.
what i would suggest is populate CAT_ID instead of CAT_CODE in DIM_Subcategory and in this way you are always referring to the proper version of the Category when you are adding\revising the SubCategory table..
and is the same case with PROUDCT, populate SUB_ID instead of SUB_CODE..
its enough for your fact to get associated to a product and time dimensions.... and using PRODUCT_ID instead of PRODUCT_CODE you can tie back proper revisions
what i would suggest is populate CAT_ID instead of CAT_CODE in DIM_Subcategory and in this way you are always referring to the proper version of the Category when you are adding\revising the SubCategory table..
and is the same case with PROUDCT, populate SUB_ID instead of SUB_CODE..
its enough for your fact to get associated to a product and time dimensions.... and using PRODUCT_ID instead of PRODUCT_CODE you can tie back proper revisions
sgudavalli- Posts : 29
Join date : 2010-06-10
Age : 40
Location : Pune, India
Re: Relationship between scd hierarchies and then between fact tables
OK - I think I follow - It might be easier to manage if you combine them all in one table, however given your constraints let me know if something like this might work:
requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim
additionally -any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement...
requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim
additionally -any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement...
DATE | TIME_ID | PROD_ID | SUB_ID | CATEG_ID | MEASURES |
1/31/2011 | 1 | 5 | -1 | -1 | N |
2/28/2011 | 2 | 5 | -1 | -1 | N |
3/31/2011 | 3 | 5 | -1 | -1 | N |
4/30/2011 | 4 | 5 | 6 | 4 | N |
5/31/2011 | 5 | 5 | 6 | 4 | N |
5/31/2011 | 5 | 4 | 4 | 1 | N |
6/30/2012 | 6 | 4 | 5 | 2 | N |
6/30/2011 | 6 | 1 | 1 | 2 | N |
6/30/2011 | 6 | 5 | 6 | 4 | N |
7/31/2011 | 7 | 4 | 5 | 2 | N |
7/31/2011 | 7 | 1 | 1 | 2 | N |
7/31/2011 | 7 | 2 | 1 | 2 | N |
7/31/2011 | 7 | 5 | 6 | 4 | N |
8/31/2011 | 8 | 4 | 5 | 3 | N |
8/31/2011 | 8 | 1 | 1 | 3 | N |
8/31/2011 | 8 | 2 | 1 | 3 | N |
8/31/2011 | 8 | 5 | 6 | 4 | N |
9/30/2011 | 9 | 1 | 1 | 3 | N |
9/30/2011 | 9 | 2 | 1 | 3 | N |
9/30/2011 | 9 | 3 | 1 | 3 | N |
9/30/2011 | 9 | 5 | 6 | 4 | N |
10/31/2011 | 10 | 4 | 5 | 3 | N |
10/31/2011 | 10 | 1 | 2 | 3 | N |
10/31/2011 | 10 | 3 | 2 | 3 | N |
10/31/2011 | 10 | 5 | 6 | 4 | N |
11/30/2011 | 11 | 1 | 2 | 3 | N |
11/30/2011 | 11 | 3 | 2 | 3 | N |
11/30/2011 | 11 | 4 | 5 | 3 | N |
11/30/2011 | 11 | 5 | 6 | 4 | N |
12/31/2011 | 12 | 1 | 2 | 3 | N |
12/31/2011 | 12 | 3 | 2 | 3 | N |
12/31/2011 | 12 | 5 | 6 | 4 | N |
pcs- Posts : 20
Join date : 2009-02-03
Re: Relationship between scd hierarchies and then between fact tables
True, you also denormalise the hierarchy into the product dimension for a couple of good reasons, minimising the number of joins when you need to access the hierarchy and simplifying the ETL process dealing with three SCD2 dimensions and keeping them in synch. Could you elaborate for what set of reasons you should normalise the product dimension, or a dimension in general.sgudavalli wrote:This is a snow flake structure where you are trying to normalize the product dimension for a set of reasons.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Relationship between scd hierarchies and then between fact tables
Thank you all for your responses.
According to the recommendation of pcs:
"...any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement..."
Sounds good, but this means that I have to include in all fact tables the relationship between category, subcategory and product at a given time, through references to the id's of each one?
About the other recommendation:
"requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim"
I had initially thought of designing this way the relations:
If you observe I had added a "durable surrogate key" for each dimension, but I have the following doubts(considering the data from my first example):
1) Dim_Product uses the ID_Subcategory to specify the associated subcategory, and this does not represents all information about the data changes of any subcategory. For example:
- Sub_Id's 1 and 2 in Dim_Subcategory belong to Sub_cod = S1
- Prod_Id = 1 in Dim_Product belongs to Prod_cod = P1, which in turn belongs to the subcategory S1, but what Sub_id, 1 or 2? From 06/05/2011 until 10/25/2011 the Sub_id that corresponds is 1, and from 10/25/2011 until now the Sub_id that corresponds is 2, but in Dim_Product I have only one record, so I'll probably lose the relationship between Prod_Id = 1 and Sub_Id = 1.
2) Considering the above, for each new record in Dim_Subcategory I need to update the latest records in Dim_Product?
3) I have the same situation (1 and 2) with the changes in Dim_Category and as reflected in the subcategories.
According to the recommendation of pcs:
"...any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement..."
Sounds good, but this means that I have to include in all fact tables the relationship between category, subcategory and product at a given time, through references to the id's of each one?
About the other recommendation:
"requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim"
I had initially thought of designing this way the relations:
If you observe I had added a "durable surrogate key" for each dimension, but I have the following doubts(considering the data from my first example):
1) Dim_Product uses the ID_Subcategory to specify the associated subcategory, and this does not represents all information about the data changes of any subcategory. For example:
- Sub_Id's 1 and 2 in Dim_Subcategory belong to Sub_cod = S1
- Prod_Id = 1 in Dim_Product belongs to Prod_cod = P1, which in turn belongs to the subcategory S1, but what Sub_id, 1 or 2? From 06/05/2011 until 10/25/2011 the Sub_id that corresponds is 1, and from 10/25/2011 until now the Sub_id that corresponds is 2, but in Dim_Product I have only one record, so I'll probably lose the relationship between Prod_Id = 1 and Sub_Id = 1.
2) Considering the above, for each new record in Dim_Subcategory I need to update the latest records in Dim_Product?
3) I have the same situation (1 and 2) with the changes in Dim_Category and as reflected in the subcategories.
Claudia_CR- Posts : 4
Join date : 2012-01-09
Re: Relationship between scd hierarchies and then between fact tables
Claudia, I think you have gone too far down the relational track. Could you tell us why you don't denormalise category and subcategory into a single product dimension and handle the hierarchical changes by SCD2. In dimensional modeling, you should think dimensionally, not relationally. That is, denormalise dimension tables and normalise fact tables.
There are some cases you should normalise dimension tables, but only if denormalisation dose not work. So the guideline is, try to denormalise the dimension, and if it works, don't even think about normalising it. I highly suggest to you to read about chapter 1, 2 in Kimball's dimensional modeling book, unless you want to use alternative methodologies for your DW project.
There are some cases you should normalise dimension tables, but only if denormalisation dose not work. So the guideline is, try to denormalise the dimension, and if it works, don't even think about normalising it. I highly suggest to you to read about chapter 1, 2 in Kimball's dimensional modeling book, unless you want to use alternative methodologies for your DW project.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Relationship between scd hierarchies and then between fact tables
I have to agree with Hang on this - make sure denormalization cannot work for you before pursuing other avenues...
pcs- Posts : 20
Join date : 2009-02-03
Re: Relationship between scd hierarchies and then between fact tables
Thanks for your answers.
I understand that you say about denormalise category and subcategory into a single dimension and handle the hierarchical changes by SCD2.
But I am worried because every change in any data in my category, subcategory or product needs to be used in reports, these changes may occur at any time and independently, i.e., category data can be changed without the data from the subcategory or product change, or vice versa. If I design only a dimension that includes the tree entities I will need to create for every data category change "n" * "m" rows (n -> subcategories associated and m-> products associated to every subcategory) and for every data subcategory change "m" rows, so it could have a lot of rows and I dont know if this would be a good practice; besides my dimension table would have at least 60 attributes (columns) that would be the union of the attributes of the three entities.
Maybe my concerns are not unfounded, so I would appreciate if you can help in this regard.
On the other hand, about create a monthly accumulating snapshot like you proposed (captured the last day of each month), how it would relate this table to the dimension tables and fact tables?
Thanks again.
I understand that you say about denormalise category and subcategory into a single dimension and handle the hierarchical changes by SCD2.
But I am worried because every change in any data in my category, subcategory or product needs to be used in reports, these changes may occur at any time and independently, i.e., category data can be changed without the data from the subcategory or product change, or vice versa. If I design only a dimension that includes the tree entities I will need to create for every data category change "n" * "m" rows (n -> subcategories associated and m-> products associated to every subcategory) and for every data subcategory change "m" rows, so it could have a lot of rows and I dont know if this would be a good practice; besides my dimension table would have at least 60 attributes (columns) that would be the union of the attributes of the three entities.
Maybe my concerns are not unfounded, so I would appreciate if you can help in this regard.
On the other hand, about create a monthly accumulating snapshot like you proposed (captured the last day of each month), how it would relate this table to the dimension tables and fact tables?
Thanks again.
Claudia_CR- Posts : 4
Join date : 2012-01-09
Re: Relationship between scd hierarchies and then between fact tables
A typical retail business would stock tens of thousand products which should not be of any concern of explosive size because of SCD2 changes, even for the dimension with hunderes of thousand records. You only need to consider normalisation on a dimension when its potential size will go beyond millions which we call monster dimension.
Think of product dimension this way. Category and subcategory are just classifications of products. So it's logical to treat them as attributes of products. Any change in the category or sub category is the change of product attributes. So in a classic SCD 2 dimension, you add another record to your product dimension for the change, no matter if it is on a normal attribute or an attribute in a hierarchy. It's true that there would be a lot of data redundancy. But comparing with the complexity the single SCD dimension has resolved and the space it has saved in the fact table, it's a good tradeoff.
By the way, even if you normalise the relationship as suggested by sgudavalli, the total number of recrods incured by SCD 2 in product dimension will not reduce. The only data redundancy you would save is the attributes specific to category/subcategory. Again compared to 60+ attributes in product, that saving is ignorable.
Think of product dimension this way. Category and subcategory are just classifications of products. So it's logical to treat them as attributes of products. Any change in the category or sub category is the change of product attributes. So in a classic SCD 2 dimension, you add another record to your product dimension for the change, no matter if it is on a normal attribute or an attribute in a hierarchy. It's true that there would be a lot of data redundancy. But comparing with the complexity the single SCD dimension has resolved and the space it has saved in the fact table, it's a good tradeoff.
By the way, even if you normalise the relationship as suggested by sgudavalli, the total number of recrods incured by SCD 2 in product dimension will not reduce. The only data redundancy you would save is the attributes specific to category/subcategory. Again compared to 60+ attributes in product, that saving is ignorable.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Relationship between scd hierarchies and then between fact tables
If there are too many attributes then you might want to create a coverage factless fact table, having time,product,cat and subcat details.
Vishy- Posts : 75
Join date : 2012-02-28
Location : India
Similar topics
» Hierarchies in fact tables.
» Modelling parent-child relationship source tables to Fact with correct grain
» map M-M relationship between two fact tables
» Link two fact tables with many to many relationship
» Can a Bridge Table also be used for mapping?
» Modelling parent-child relationship source tables to Fact with correct grain
» map M-M relationship between two fact tables
» Link two fact tables with many to many relationship
» Can a Bridge Table also be used for mapping?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum