Loading Fact Table with Type 2 Slowly Changing Dimension
5 posters
Page 1 of 1
Loading Fact Table with Type 2 Slowly Changing Dimension
I'm new to data warehousing .. sorry if this is a basic question.
I have a type 2 slowly changing dimension already loaded and now I want to load a fact table. At least, I think it should be a fact table. A lot of the entities in the dimension table have a schedule associated with them. I thought it would be best to implement the schedule as a fact table. The problem I'm having is that the schedule should apply to all instances of an entity in the dimension table. Is the best solution to create a bridge dimension table that contains only unique entities from my dimension table?
Here's a made up example scenario:
I have students in a student dimension. I track changes (type 2) on the students' address so each student can have multiple records in the dimension table. I have another table (fact table) with each student's class schedule. I want to relate the class schedule to all instances of the student. Should I create another dimension table that only contains 1 row per student and relate the fact table to this "bridged" table?
Thanks!
I have a type 2 slowly changing dimension already loaded and now I want to load a fact table. At least, I think it should be a fact table. A lot of the entities in the dimension table have a schedule associated with them. I thought it would be best to implement the schedule as a fact table. The problem I'm having is that the schedule should apply to all instances of an entity in the dimension table. Is the best solution to create a bridge dimension table that contains only unique entities from my dimension table?
Here's a made up example scenario:
I have students in a student dimension. I track changes (type 2) on the students' address so each student can have multiple records in the dimension table. I have another table (fact table) with each student's class schedule. I want to relate the class schedule to all instances of the student. Should I create another dimension table that only contains 1 row per student and relate the fact table to this "bridged" table?
Thanks!
jgaull- Posts : 5
Join date : 2010-06-01
Re: Loading Fact Table with Type 2 Slowly Changing Dimension
jgaull wrote: I want to relate the class schedule to all instances of the student.
Why?
The purpose of a type 2 dimension is to provide the state of a dimension at the time the fact occured. A type 2 can also provide the current state of the dimension with a self-join on the dimension table. Other than the point-in-time state and/or the current state of the dimension, what would be the reason to see any other instance of the dimension?
To get the current state from a type 2 dimension, you join the fact table to the dimension table. You then join the dimension table to itself using the natural key and filtering on current flag = true. If you had effective and expiration dates in the dimension table, you could get something other than the current row by self-joining on the natural key and filtering on some date between the effective and expiration dates, but I would be hard pressed to come up with a reason why someone would want to do that.
Re: Loading Fact Table with Type 2 Slowly Changing Dimension
Thanks for your reply!
Are you saying that you do recommend that I keep the schedule in a fact table and that I should join the fact table row to the dimension row that is active at the time that I receive the schedule data? Would I ever need to update the foreign key in the fact table as the data in the dimension table changes?
So for the student example ...
If I have the following dimension table (Sorry for the formatting)
StudentKey LastName FirstName City RowIsCurrent RowStartDate RowEndDate
1 Smith John Chicago 0 5/1/2008 3/20/2009
2 Smith John Austin 0 3/21/2009 5/15/2010
3 Smith Jonn San Francisco 1 5/16/2010 12/31/9999
Here would be a made up fact table for data I received on 3/22/2009.
ScheduleKey DateKey StudentKey Class
1 20100510 2 Science
2 20100610 2 Math
3 20100525 2 History
The business case calls for the schedule to apply to all instances of John Smith regardless of where he's living but I only have the fact table joining to one instance of the student. Would it be better to have 3 versions of the fact table data (1 for each instance of the student)?
Thanks!!!!!
Are you saying that you do recommend that I keep the schedule in a fact table and that I should join the fact table row to the dimension row that is active at the time that I receive the schedule data? Would I ever need to update the foreign key in the fact table as the data in the dimension table changes?
So for the student example ...
If I have the following dimension table (Sorry for the formatting)
StudentKey LastName FirstName City RowIsCurrent RowStartDate RowEndDate
1 Smith John Chicago 0 5/1/2008 3/20/2009
2 Smith John Austin 0 3/21/2009 5/15/2010
3 Smith Jonn San Francisco 1 5/16/2010 12/31/9999
Here would be a made up fact table for data I received on 3/22/2009.
ScheduleKey DateKey StudentKey Class
1 20100510 2 Science
2 20100610 2 Math
3 20100525 2 History
The business case calls for the schedule to apply to all instances of John Smith regardless of where he's living but I only have the fact table joining to one instance of the student. Would it be better to have 3 versions of the fact table data (1 for each instance of the student)?
Thanks!!!!!
jgaull- Posts : 5
Join date : 2010-06-01
Re: Loading Fact Table with Type 2 Slowly Changing Dimension
You most definiately do not want 3 versions of the fact data for each version of the student.
A critical piece missing in the student dimension is a natural key, that is, the business key that uniquely identifies the student. This would be the same value on each student row. That value ties the three rows together, allowing you to self-join on the dimension to locate whichever version of student you want to use for your report.
There are other techniques as well, covered in other discussions: http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/scd-type-1-and-2-t433.htm
A critical piece missing in the student dimension is a natural key, that is, the business key that uniquely identifies the student. This would be the same value on each student row. That value ties the three rows together, allowing you to self-join on the dimension to locate whichever version of student you want to use for your report.
There are other techniques as well, covered in other discussions: http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/scd-type-1-and-2-t433.htm
Thanks!
Thank you for taking the time and helping me out.
I definitely do have a natural key in my data warehouse and I should have made one up for my example. I am able to move forward now based on your recommendations. I will have only one version of the fact data and I can use the natural key in the dimension table to find what instance of the dimension entity I wish to view. This is great advice and I really appreciate your help!
I definitely do have a natural key in my data warehouse and I should have made one up for my example. I am able to move forward now based on your recommendations. I will have only one version of the fact data and I can use the natural key in the dimension table to find what instance of the dimension entity I wish to view. This is great advice and I really appreciate your help!
jgaull- Posts : 5
Join date : 2010-06-01
type 2 slowly changing dimension already loaded
Hi. I just registered into group.
Your business requirement is clear.
The student schedule relates to a student not the location that he or she resides on.
You need to normalize your student dimension by separating the location information into its own dimension and create location key in the fact table that points to this (location) dimension.
Based on what you have stated it is clear that you need to tack changings on location dimension only.
This simple change will meet your requirements. (I think so anyway).
Your business requirement is clear.
The student schedule relates to a student not the location that he or she resides on.
You need to normalize your student dimension by separating the location information into its own dimension and create location key in the fact table that points to this (location) dimension.
Based on what you have stated it is clear that you need to tack changings on location dimension only.
This simple change will meet your requirements. (I think so anyway).
jmamedov- Posts : 1
Join date : 2011-03-06
Re: Loading Fact Table with Type 2 Slowly Changing Dimension
I wonder if you really meant StudentKey=3 in the fact table, as the SK-3 is the latest version of John Smith. There would be an extra work, but still achievable, to retrieve the second last version of SCD2 dimension.jgaull wrote:Here would be a made up fact table for data I received on 3/22/2009.
ScheduleKey DateKey StudentKey Class
1 20100510 2 Science
2 20100610 2 Math
3 20100525 2 History
Another point is, do you also tend store the snapshot datekey=20090322 in the fact table.
In the sample case, I don’t see any need for a separate location dimension, unless the student dimension is a monster dimension and there is another hierarchy tying up schools, regions and states together in a multi-school scenario, and a student could be enrolled in different schools at the same time. Otherwise, keep it simple by denormalising the city name in your student dimension and set it to SCD1 or 2 attribute based on your business requirement.jmamedov wrote:
You need to normalize your student dimension by separating the location information into its own dimension and create location key in the fact table that points to this (location) dimension.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Loading Fact Table with Type 2 Slowly Changing Dimension
I fully understand that this technique can be used to display the dimension instance we want while displaying the schedule.
However, I am still confused about answering questions like:
What is the scheduled class for John Smith on 06/10/2010?
Assuming that I am running this query on 05/17/2010 by which time the latest instance for John Smith is having studentkey of 3.
How do I know which instance of John smith has to be joined with fact table to get the schedule? It could be 1,2 or 3 in real scenario.
Should I perform a right outer join between the student dimension (filtered for instances of John Smith) and schedule fact table?
However, I am still confused about answering questions like:
What is the scheduled class for John Smith on 06/10/2010?
Assuming that I am running this query on 05/17/2010 by which time the latest instance for John Smith is having studentkey of 3.
How do I know which instance of John smith has to be joined with fact table to get the schedule? It could be 1,2 or 3 in real scenario.
Should I perform a right outer join between the student dimension (filtered for instances of John Smith) and schedule fact table?
bidw_kk- Posts : 5
Join date : 2011-07-16
Similar topics
» Type 3 Slowly Changing Dimension
» What if Natural Key changes in a Slowly Changing Dimension Type 2?
» How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle
» Slowly Changing Dimension table
» slowly changing fact table (millions a night)
» What if Natural Key changes in a Slowly Changing Dimension Type 2?
» How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle
» Slowly Changing Dimension table
» slowly changing fact table (millions a night)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum