Duplicating a field in more than one fact table.
5 posters
Page 1 of 1
Duplicating a field in more than one fact table.
Question about dimensional modeling.
Scenario:
I have a dimension called:
1. Dim_Application
And two fact tables:
1. Fact_ApplicationStatus
2. Fact_ApplicationConsolidation
Now the Dim_Application consists all the applications in our system.
Fact_ApplicationStatus contains all the statusses that each application found itself in at some point in time.
Fact_ApplicationStatus
ApplicationId StatusId StartDate EndDate Latest
1 1 2010/01/01 2010/01/15 0
1 2 2010/01/16 2010/02/05 0
1 3 2010/02/06 9999/12/31 1
2 1 2010/02/01 2010/02/18 0
2 2 2010/02/19 9999/12/31 1
Fact_ApplicationConsolidation contains the consolidation amounts for each application.
Fact_ApplicationConsolidation
ApplicationId StatusId Amount
1 3 5,000
2 2 8,000
The Question:
Is it incorrect to have StatusId duplicated in these tables?
In the first fact table all statusses are stored including the latest.
In the second table only the latest statusId is stored.
I have found it easier to do analysis in this way, since the business user does not have to also retrieve the latest status from the fact status table when analyzing consolidation information but can now group consolidation using the latest status.
any ideas? comments?
Scenario:
I have a dimension called:
1. Dim_Application
And two fact tables:
1. Fact_ApplicationStatus
2. Fact_ApplicationConsolidation
Now the Dim_Application consists all the applications in our system.
Fact_ApplicationStatus contains all the statusses that each application found itself in at some point in time.
Fact_ApplicationStatus
ApplicationId StatusId StartDate EndDate Latest
1 1 2010/01/01 2010/01/15 0
1 2 2010/01/16 2010/02/05 0
1 3 2010/02/06 9999/12/31 1
2 1 2010/02/01 2010/02/18 0
2 2 2010/02/19 9999/12/31 1
Fact_ApplicationConsolidation contains the consolidation amounts for each application.
Fact_ApplicationConsolidation
ApplicationId StatusId Amount
1 3 5,000
2 2 8,000
The Question:
Is it incorrect to have StatusId duplicated in these tables?
In the first fact table all statusses are stored including the latest.
In the second table only the latest statusId is stored.
I have found it easier to do analysis in this way, since the business user does not have to also retrieve the latest status from the fact status table when analyzing consolidation information but can now group consolidation using the latest status.
any ideas? comments?
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
My Summarized Question on the original topic
How / When does one make a call on whether to include a field in another fact table - when it is possible to derive it from another fact table - even if the user has to pull more dimensions and perform additional filtering?
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: Duplicating a field in more than one fact table.
Is there a status dimension? Otherwise, what does status ID refer to?
There is nothing wrong with using a dimension more than once in different fact tables.
There is nothing wrong with using a dimension more than once in different fact tables.
Re: Duplicating a field in more than one fact table.
First questions that come to mind are fact table grain and type. What is the grain of Fact_ApplicationStatus? One row per application, status, and date?
Also, in your first fact table, is "Latest" exactly the same as thing "StatusID", only just the latest status? If so, is it the latest status of all time? Of a particular date range?
The reason I ask is that you said you were storing the latest StatusID in the your second table (Fact_ApplicationConsolidation). If "Latest" is what I think it is, then you've got mixed grain in the Fact_ApplicationStatus table (which I think is the point of your question). Typically I'd store this as you have in the second fact table, with the latest status per application, and then I'd run a multipass SQL to get both counts. The first pass would get all statuses by application, the second pass would get the latest by application (which will be duplicated across any rows in the first query where you have more than one application and status).
So effectively then your first table is a periodic snapshot (with date) and your second table is an accumulating snapshot (without date). Mixing the grain is typically not advisable, but I've seen it done before, particularly for tables built specifically for canned reporting. It can, as you mentioned, make reporting easier. You just can't sum data of a higher grain (at least not without a known divisor) when it duplicates across data at a more detailed grain. Some reporting tools are smart enough to know this (Excel not being one of them).
We actually do this where I work now in a report-specific table. We store daily activity that happened that day alongside a count of all our active customers, up through and including that day. The activity is a daily snapshot, whereas the customers are an accumulating snapshot. It breaks some grain rules but the users know how the two work together. It also poses some problems with reporting tool semantic layers (like BO) that have to be addressed and overcome.
Speaking of periodic snapshot, it seems that your first fact table has variable date ranges. Typically with a periodic snapshot we'd see some sort of regularity, as in all the records belong to a particular "activity date" or something similar in nature. Maybe I don't understand your data well enough though. What does "StartDate" and "EndDate" represent in this table? I'm just curious how you go about updating existing fact records with the latest StatusID; I would assume that has to be done in a secondary process.
Also, in your first fact table, is "Latest" exactly the same as thing "StatusID", only just the latest status? If so, is it the latest status of all time? Of a particular date range?
The reason I ask is that you said you were storing the latest StatusID in the your second table (Fact_ApplicationConsolidation). If "Latest" is what I think it is, then you've got mixed grain in the Fact_ApplicationStatus table (which I think is the point of your question). Typically I'd store this as you have in the second fact table, with the latest status per application, and then I'd run a multipass SQL to get both counts. The first pass would get all statuses by application, the second pass would get the latest by application (which will be duplicated across any rows in the first query where you have more than one application and status).
So effectively then your first table is a periodic snapshot (with date) and your second table is an accumulating snapshot (without date). Mixing the grain is typically not advisable, but I've seen it done before, particularly for tables built specifically for canned reporting. It can, as you mentioned, make reporting easier. You just can't sum data of a higher grain (at least not without a known divisor) when it duplicates across data at a more detailed grain. Some reporting tools are smart enough to know this (Excel not being one of them).
We actually do this where I work now in a report-specific table. We store daily activity that happened that day alongside a count of all our active customers, up through and including that day. The activity is a daily snapshot, whereas the customers are an accumulating snapshot. It breaks some grain rules but the users know how the two work together. It also poses some problems with reporting tool semantic layers (like BO) that have to be addressed and overcome.
Speaking of periodic snapshot, it seems that your first fact table has variable date ranges. Typically with a periodic snapshot we'd see some sort of regularity, as in all the records belong to a particular "activity date" or something similar in nature. Maybe I don't understand your data well enough though. What does "StartDate" and "EndDate" represent in this table? I'm just curious how you go about updating existing fact records with the latest StatusID; I would assume that has to be done in a secondary process.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Duplicating a field in more than one fact table.
Hi,
The Status Fact contains also the status duration of the previous status.
So it could look like this:
Fact_ApplicationStatus
Where the Duration indicates for how long the application has been in the status.
and current shows you which status is current.
Bye,
Ian
The Status Fact contains also the status duration of the previous status.
So it could look like this:
Fact_ApplicationStatus
[Id] | [ApplicationId] | [StatusId] | [Duration] | [Current] |
1 | 111 | 1 | 5 | 0 |
2 | 111 | 2 | 4 | 0 |
3 | 111 | 3 | 5 | 1 |
4 | 222 | 1 | 2 | 1 |
Where the Duration indicates for how long the application has been in the status.
and current shows you which status is current.
Bye,
Ian
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: Duplicating a field in more than one fact table.
@ian
i see lot of benefit if you can keep latest status in the dim acct table instead of fact App Consolidation table.
it will help you to perform analysis both on application status and also on App Consolidation facts.
Regards
Shiv
i see lot of benefit if you can keep latest status in the dim acct table instead of fact App Consolidation table.
it will help you to perform analysis both on application status and also on App Consolidation facts.
Regards
Shiv
sgudavalli- Posts : 29
Join date : 2010-06-10
Age : 40
Location : Pune, India
Re: Duplicating a field in more than one fact table.
I think I more clearly see your question. I probably complicated things a bit.
Your original question was if there was anything wrong with using the same field in two different fact tables (StatusID). As ngalemmo stated, there's nothing wrong with using a dimension in multiple fact tables. That's the whole point of the bus architecture and conformed dimensions.
StatusID appears to me to be a degenerate dimension (dimension with no attributes). There's no problem storing this degenerate dimension across multiple fact tables. Functionally it's no different than storing the surrogate for a dimension table across multiple fact tables; you just eliminate the join to a dimension table. You'll then change your SQL to pull the right StatusID field depending on which fact table you're querying (or you'll have the semantic layer of your reporting tool do it for you).
Your original question was if there was anything wrong with using the same field in two different fact tables (StatusID). As ngalemmo stated, there's nothing wrong with using a dimension in multiple fact tables. That's the whole point of the bus architecture and conformed dimensions.
StatusID appears to me to be a degenerate dimension (dimension with no attributes). There's no problem storing this degenerate dimension across multiple fact tables. Functionally it's no different than storing the surrogate for a dimension table across multiple fact tables; you just eliminate the join to a dimension table. You'll then change your SQL to pull the right StatusID field depending on which fact table you're querying (or you'll have the semantic layer of your reporting tool do it for you).
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Re: Duplicating a field in more than one fact table.
It looks to me your Fact_ApplicationStatus is an accumulating snapshot. Kimball has a relevant discussion in his book "The Complete Guide to Dimensional Modeling", Chapter 12, Education. The fact table may be called Application Pipeline Accumulating Fact with role playing date keys for all the statuses. The grain of the fact is one row per application. Therefore you would have ApplicationKey, Status1DateKey, Status2DateKey ... etc. in your fact table without introducing a dimension for status.
It's important to use surrogate date keys pointing to a role playing date dimension as you would have many undefined date keys when an application enters the pipeline. A date dimension would also give a rich set of calendar information for your analysis. You would periodically update the fact table as the application progresses fulfilling those undefined dates.The duration can be a derived measure and dollar amount should be just a normal measure in the same fact table. Hope this will help
It's important to use surrogate date keys pointing to a role playing date dimension as you would have many undefined date keys when an application enters the pipeline. A date dimension would also give a rich set of calendar information for your analysis. You would periodically update the fact table as the application progresses fulfilling those undefined dates.The duration can be a derived measure and dollar amount should be just a normal measure in the same fact table. Hope this will help
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» How to handle date field with a null value in the fact table ?
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» How to track the change of a specific field in a dimension table?
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» How to track the change of a specific field in a dimension table?
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum