Do I need multiple fact tables or dimensions
2 posters
Page 1 of 1
Do I need multiple fact tables or dimensions
I am designing an Admissions data mart for a college. Each application has a stage which needs to be tracked (App Received, Interview, Offer Made, Withdrawn etc.). I have designed a fact to be able to do point in time analysis
FactApp
Course (FK)
Student (FK)
Status/Stage (FK)
AppNumber (Degenerate)
Notes (FK)
EffectiveDate (DateTime)
App (always 1)
ActiveApp (1 or 0)
An applicant will also have an interview, possibly more than one and this also has a status (Invited, Attended, Did not Attend, rearranged etc.) that needs to be tracked. Do I need a fact for this too? The same goes for Application Offer, it will have a status (Offer made, Offer Accepted)
I have thought having these 3 transactional fact and then a 4th fact which has one row per application and pivots the various stages. The difficulty is that there can be more than one interview and even more than one offer and I understand it is not wise to join fact tables together?
This is my first post, so apologies if its not well written. Any help would be much appreciated. Thanks
FactApp
Course (FK)
Student (FK)
Status/Stage (FK)
AppNumber (Degenerate)
Notes (FK)
EffectiveDate (DateTime)
App (always 1)
ActiveApp (1 or 0)
An applicant will also have an interview, possibly more than one and this also has a status (Invited, Attended, Did not Attend, rearranged etc.) that needs to be tracked. Do I need a fact for this too? The same goes for Application Offer, it will have a status (Offer made, Offer Accepted)
I have thought having these 3 transactional fact and then a 4th fact which has one row per application and pivots the various stages. The difficulty is that there can be more than one interview and even more than one offer and I understand it is not wise to join fact tables together?
This is my first post, so apologies if its not well written. Any help would be much appreciated. Thanks
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
You need 1 transaction fact table. Stage and Status, and student… etc. are your dims. Could you explain what 3 fact tables you are talking about?
You have a EffectiveDate in the Fact table, so having more than 1 interview with same student should not be an issue. Based on that date, you can select most current interview for a student and show that data in your report.
You have a EffectiveDate in the Fact table, so having more than 1 interview with same student should not be an issue. Based on that date, you can select most current interview for a student and show that data in your report.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
Thank you for your reply.
Interviews are created in a separate section and will need to record different information such as interview date, interview time, interview staff, interview location, student, course, interview status.
Offers are created in a different table and needs to records offer date, student, course, offer status.
The EffectiveDate is the source data updated date, so that I can do point in time analysis of the state at that time.
So my idea was
FactApp
Course (FK)
Student (FK)
Status/Stage (FK)
AppNumber (Degenerate)
Notes (FK)
EffectiveDate (DateTime)
App (always 1)
ActiveApp (1 or 0)
FactOffer
EffectiveDate (DateTime)
OfferDate (FK)
Student (FK)
Course (FK)
Status (FK)
OfferNote (FK)
AppNumber (Degenerate)
FactInterview
EffectiveDate (DateTime)
Date (FK)
Time (FK)
Student (FK)
Course (FK)
Staff (FK)
Location (FK)
AppNumber (Degenerate)
Status (FK)
How could this fit in 1 transactional fact table or would 3 tables be the right approach? Thank you
Interviews are created in a separate section and will need to record different information such as interview date, interview time, interview staff, interview location, student, course, interview status.
Offers are created in a different table and needs to records offer date, student, course, offer status.
The EffectiveDate is the source data updated date, so that I can do point in time analysis of the state at that time.
So my idea was
FactApp
Course (FK)
Student (FK)
Status/Stage (FK)
AppNumber (Degenerate)
Notes (FK)
EffectiveDate (DateTime)
App (always 1)
ActiveApp (1 or 0)
FactOffer
EffectiveDate (DateTime)
OfferDate (FK)
Student (FK)
Course (FK)
Status (FK)
OfferNote (FK)
AppNumber (Degenerate)
FactInterview
EffectiveDate (DateTime)
Date (FK)
Time (FK)
Student (FK)
Course (FK)
Staff (FK)
Location (FK)
AppNumber (Degenerate)
Status (FK)
How could this fit in 1 transactional fact table or would 3 tables be the right approach? Thank you
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
What is the reason to keep them separate? This should be designed as a pipeline of a process.... starting point is an interview and end point is making an offer. I could not find a pipeline approach design example in Kimball's design tips, but he describe it in detail in his book. Based on the information you provided, you need a accumulating snapshot fact table to capture this start and end process. Please read about accumulating snapshot fact table in Kimball's design tips at:
http://decisionworks.com/2010/12/design-tip-130-accumulating-snapshots-for-complex-workflows/
http://decisionworks.com/2010/12/design-tip-130-accumulating-snapshots-for-complex-workflows/
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
Thanks for the link.
I was thinking of deriving a pipeline accumulating snapshot from these facts with one row per application process for a current view of the application. I wasn't sure how to deal with storing multiple Interview slots if its one line per Application?
The reason I was going to have the transactional facts is to be able to do point-in-time comparison. For example - number of apps this year vs. this time last year (including what status they were in at the time).
I was keeping the Overall Application, Interviews & Offer separate because I thought they were recording different facts with different information. The overall application has details like the course, learner and overall status, status date. The interview slots have specific information such as interview date,time, location, staff and status. I forgot to mention the Interview section holds Interviews, Assessments & Trials. Unfortunately these are differentiated by the status with a prefix of the type.
The application has an overall status (stage) - 'App Received', 'Interview', 'Offer', 'Accepted', 'Suspended' or 'Withdrawn'.
The interviews will have a status of 'Interview Invite', 'DNA', 'Re-arranged' (duplicated with prefix of Int, Assess or Trial eg. 'Assess Invite', 'Assess DNA' etc.)
The offer will have a status of 'Offer Made', 'Accepted', 'Rejected'
Apologies if this is not very clear, for all the reading I have done, I just can't quite get it in my head exactly how this needs to be modelled.
I was thinking of deriving a pipeline accumulating snapshot from these facts with one row per application process for a current view of the application. I wasn't sure how to deal with storing multiple Interview slots if its one line per Application?
The reason I was going to have the transactional facts is to be able to do point-in-time comparison. For example - number of apps this year vs. this time last year (including what status they were in at the time).
I was keeping the Overall Application, Interviews & Offer separate because I thought they were recording different facts with different information. The overall application has details like the course, learner and overall status, status date. The interview slots have specific information such as interview date,time, location, staff and status. I forgot to mention the Interview section holds Interviews, Assessments & Trials. Unfortunately these are differentiated by the status with a prefix of the type.
The application has an overall status (stage) - 'App Received', 'Interview', 'Offer', 'Accepted', 'Suspended' or 'Withdrawn'.
The interviews will have a status of 'Interview Invite', 'DNA', 'Re-arranged' (duplicated with prefix of Int, Assess or Trial eg. 'Assess Invite', 'Assess DNA' etc.)
The offer will have a status of 'Offer Made', 'Accepted', 'Rejected'
Apologies if this is not very clear, for all the reading I have done, I just can't quite get it in my head exactly how this needs to be modelled.
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
You can have multiple interviews for an application in the fact table. If multiple interviews for an application are apart by a day then a specific date keep them apart and unique. If multiple interviews for an application is happening in a single day then add date/time to the fact table and it will keep them apart and unique. Kimball has a whole chapter on education in his book so it should provide more detail on this subject.
If multiple interviews for an application happen by mistake or by an error, then how do they get corrected on the source system ( i-e by deleting them or by flagging them as invalid). You have to involve your business user to correct such errors.
If multiple interviews for an application happen by mistake or by an error, then how do they get corrected on the source system ( i-e by deleting them or by flagging them as invalid). You have to involve your business user to correct such errors.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
My understanding is that the accumulating pipeline fact table would have one row per application. That's why I wasn't sure how to represent multiple interview slots in that single row.?
Am I right in thinking I need a transactional fact to be able to say this time last year there were x amount of applications at Interview stage compared with x amount this year?
If the Overall Application has its own Stage/Status and specific application information, and each interview has its own status and interview specific details, and the offer has its own status and offer specific details, do I need 3 facts to record these and a history of their status change?
I have read the Education section for admissions tracking in the data warehouse toolkit which is very useful. Kimball uses a Pipeline Accumulating Fact. This seems really useful for calculating lag between stages and having one row per application, however I can't see how this enable you to do point-in-time analysis. Also the pipeline accumulating fact only has a column for one interview, whereas I need to see a history or slots and status changes.
Am I right in thinking I need a transactional fact to be able to say this time last year there were x amount of applications at Interview stage compared with x amount this year?
If the Overall Application has its own Stage/Status and specific application information, and each interview has its own status and interview specific details, and the offer has its own status and offer specific details, do I need 3 facts to record these and a history of their status change?
I have read the Education section for admissions tracking in the data warehouse toolkit which is very useful. Kimball uses a Pipeline Accumulating Fact. This seems really useful for calculating lag between stages and having one row per application, however I can't see how this enable you to do point-in-time analysis. Also the pipeline accumulating fact only has a column for one interview, whereas I need to see a history or slots and status changes.
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
To do a point-in-time analysis, you add start date and end date into the fact table which works as SCD type 2. Here is more information to read about it:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
You are confusing yourself with fact grain or level of detail it can have. In my previous post I explained to you that your level of detail for a fact is not just the application but a date too. You do need 1 transaction fact table and then 1 accumulating Snapshot Fact table. If source system has normalized data (kept interview, offer etc. separate) then you de-normalize that data in your Trans fact table. Once that is done , then create your accumulating Snapshot Fact table.
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
You are confusing yourself with fact grain or level of detail it can have. In my previous post I explained to you that your level of detail for a fact is not just the application but a date too. You do need 1 transaction fact table and then 1 accumulating Snapshot Fact table. If source system has normalized data (kept interview, offer etc. separate) then you de-normalize that data in your Trans fact table. Once that is done , then create your accumulating Snapshot Fact table.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
That's really useful information thanks. I wasn't aware of using a SCD type 2 on a fact table.
How would I store it in one transaction fact table? I think I'm struggling with the de-normalizing into a single Trans Fact here. Do I enter a new row for when an application is created, and then a new row when an interview or offer is created? Then subsequently when the status changes on either of these statuses? I'm struggling to understand how I store them all in one fact when they use different fields.
Would it be like..
Timestamp
Course (FK)
Student (FK)
AppNumber (Degenerate)
ApplicationStage (FK)
InterviewDate (FK)
InterviewTime (FK)
InterviewStaff (FK)
InterviewLocation (FK)
InterviewStatus (FK)
OfferDate (FK)
OfferStatus (FK)
If so, when the application is in the initial 'received' stage, before an interview or offer, would the interview and offer fields just be empty?
How would I store it in one transaction fact table? I think I'm struggling with the de-normalizing into a single Trans Fact here. Do I enter a new row for when an application is created, and then a new row when an interview or offer is created? Then subsequently when the status changes on either of these statuses? I'm struggling to understand how I store them all in one fact when they use different fields.
Would it be like..
Timestamp
Course (FK)
Student (FK)
AppNumber (Degenerate)
ApplicationStage (FK)
InterviewDate (FK)
InterviewTime (FK)
InterviewStaff (FK)
InterviewLocation (FK)
InterviewStatus (FK)
OfferDate (FK)
OfferStatus (FK)
If so, when the application is in the initial 'received' stage, before an interview or offer, would the interview and offer fields just be empty?
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
Creating a de-normalized fact table is going to be a challenge for you so you can keep them separate. Here I explain how would your accumulating Snapshot Fact table work. Please pay attention to date columns in the example.
--- Day 1 Joe apply
Timestamp change it to create_date = 3/9/2016
Update_date = 3/9/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'App Received'
InterviewDate (FK) = null
InterviewTime (FK) = null
InterviewStaff (FK) = null
InterviewLocation (FK) = null
InterviewStatus (FK) = null
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 3/9/2016
End_date = 1/1/3500 ( good practice is to use date in the future to represent a current row so use 1/1/3500)
Current_flag = Y
--- Next day Joe got interview so you close that above row in the table.
create_date = 3/9/2016
Update_date = 4/1/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'App Received'
InterviewDate (FK) = null
InterviewTime (FK) = null
InterviewStaff (FK) = null
InterviewLocation (FK) = null
InterviewStatus (FK) = null
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 3/9/2016
End_date = 4/1/2016
Current_flag = N
--- Now Joe got interview on 4/1/2016. You insert a new row with interview info:
create_date = 4/1/2016
Update_date = 4/1/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'Interview
InterviewDate (FK) = 4/1/2016
InterviewTime (FK) = 9:30
InterviewStaff (FK) = Sam
InterviewLocation (FK) = Room 1
InterviewStatus (FK) = 'Interview Invite'
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 4/1/2016
End_date = 1/1/3500
Current_flag = Y
--- Day 1 Joe apply
Timestamp change it to create_date = 3/9/2016
Update_date = 3/9/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'App Received'
InterviewDate (FK) = null
InterviewTime (FK) = null
InterviewStaff (FK) = null
InterviewLocation (FK) = null
InterviewStatus (FK) = null
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 3/9/2016
End_date = 1/1/3500 ( good practice is to use date in the future to represent a current row so use 1/1/3500)
Current_flag = Y
--- Next day Joe got interview so you close that above row in the table.
create_date = 3/9/2016
Update_date = 4/1/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'App Received'
InterviewDate (FK) = null
InterviewTime (FK) = null
InterviewStaff (FK) = null
InterviewLocation (FK) = null
InterviewStatus (FK) = null
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 3/9/2016
End_date = 4/1/2016
Current_flag = N
--- Now Joe got interview on 4/1/2016. You insert a new row with interview info:
create_date = 4/1/2016
Update_date = 4/1/2016
Course (FK) = null
Student (FK) = Joe
AppNumber (Degenerate) = 1
ApplicationStage (FK) = 'Interview
InterviewDate (FK) = 4/1/2016
InterviewTime (FK) = 9:30
InterviewStaff (FK) = Sam
InterviewLocation (FK) = Room 1
InterviewStatus (FK) = 'Interview Invite'
OfferDate (FK) = null
OfferStatus (FK) = null
Start_date = 4/1/2016
End_date = 1/1/3500
Current_flag = Y
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
Thank you for this explanation. Sorry if you are having to repeat yourself, but just so that im clear in my head -
In terms of my transaction fact tables, I would have one for the overall application with a new row when the stage changes. A fact for the interviews, with a new row when the interview status or details changes or when a new interview slot is created, and a fact for the offers with a new row for each status or details change?
Then derive from these 3 facts an accumulating snapshot fact as per your explanation by acting like a SCD type 2 using start and end dates.
In terms of my transaction fact tables, I would have one for the overall application with a new row when the stage changes. A fact for the interviews, with a new row when the interview status or details changes or when a new interview slot is created, and a fact for the offers with a new row for each status or details change?
Then derive from these 3 facts an accumulating snapshot fact as per your explanation by acting like a SCD type 2 using start and end dates.
Scott- Posts : 17
Join date : 2016-03-07
Re: Do I need multiple fact tables or dimensions
Yes, you got it right.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Do I need multiple fact tables or dimensions
Thank you for your help
Scott- Posts : 17
Join date : 2016-03-07
Similar topics
» Multiple different grain fact tables with lot of common dimensions.
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» modeling multiple fact tables
» Single fact table vs multiple fact tables - what is the right way in a dimensional model?
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» modeling multiple fact tables
» Single fact table vs multiple fact tables - what is the right way in a dimensional model?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum