To store dates or a reference to the date dimension?
4 posters
Page 1 of 1
To store dates or a reference to the date dimension?
I'm currently designing a data warehouse for an insurance company. As you can imagine, this involves storing a lot of dates. I don't know whether to store them in a fact table (using a reference to the date dimension) or storing the date in the dimension itself.
For example:
Fact table Benefit (information about how much is paid to a victim after a claim)
- amount paid per month
- reserve (amount reserved for future payments)
Dimension: Claim
-...
-Declaration date
-Recognition date
-Begin date
-Statut date
Dimension: Benefit
-...
I can't really put those dates in the benefit fact table as they're not related (nor do i have any facts related to Claims) but i will have to make analyses using those dates. (For example how many new claims in the last month etc). How do I solve this? Should i attempt to move those dates to a fact table (create a new fact table then solely for dates?) or just leave the dates just as datatype "date" in the claim dimension?
For example:
Fact table Benefit (information about how much is paid to a victim after a claim)
- amount paid per month
- reserve (amount reserved for future payments)
Dimension: Claim
-...
-Declaration date
-Recognition date
-Begin date
-Statut date
Dimension: Benefit
-...
I can't really put those dates in the benefit fact table as they're not related (nor do i have any facts related to Claims) but i will have to make analyses using those dates. (For example how many new claims in the last month etc). How do I solve this? Should i attempt to move those dates to a fact table (create a new fact table then solely for dates?) or just leave the dates just as datatype "date" in the claim dimension?
Andy Nieubourg- Posts : 6
Join date : 2011-09-06
Re: To store dates or a reference to the date dimension?
nor do i have any facts related to Claims
Are you sure about that? Fact tables represent business events or states. Placing a claim is certainly a business event, and there should be a fact table to represent it and the process around handling the claim.
There will be dates that need a dimension, such as the date of the event, while possibly others that sit on a dimension that are just dates... ones that don't need dimensional attributes, such as birth date, effective date and so on.
I would rethink your model.
Conformity
I agree with the previous comment but would like to add my 2 cents (warning – that may be all it’s worth).
It sounds like there are several granularities you’re dealing with (granularity = event, transaction, measurement, etc.). For a standard, non-analytically tuned model you’ll want to explore the idea of creating a fact table around each of those granularities.
From the perspective of an analyst or ad-hoc report developer it’s sure handy to have dates stored in the date dimension with the attributes specific to that organization. For instance, I’ve build date dimensions including everything from lunar cycles to differing definitions of season and fiscal year. Having all this date related material in one dimension is almost a must to support conformity among multiple and possibly unrelated subject areas.
Which brings my bit full circle; you’ve described items are not related. One of the primary purposes of a dimension is to create conformity among the fact tables. Seemingly unrelated items can be linked via conformed dimensions such as date. By selecting a single date from the date dimension a user should be able to tabulate (from different fact tables) how many accidents occurred, claims submitted, persons injured, payments made, premiums billed, premiums collected, etc… anything having that date. A single swipe at a well build EDW should produce this type of analysis almost instantly.
I would imagine the model should look like this:
Fact table would include
-amount paid per month
-amount paid year to date (rolled up in ETL)
-amount paid per year (rolled up in ETL)
-any other imaginable aggregation of amount...
-reserve (amount reserved for future payments)
-Declaration date (surrogate key)
-Recognition date (surrogate key)
-Begin date (surrogate key)
-Status date (surrogate key)
-Benefit (surrogate key)
-Customer(surrogate key)
-location(surrogate key)
-Other claim attributes(surrogate key)
Dimensions would include:
-Date Attributes
-Benefit Attributes
-Customer Attributes
-location Attributes
-Other claim Attributes (junk dimension)
-others dimensions as required
Hope this helps… if not, well… it was free advice and, as modelers often say…, “at least we have one more wrong answer out of the way.”
It sounds like there are several granularities you’re dealing with (granularity = event, transaction, measurement, etc.). For a standard, non-analytically tuned model you’ll want to explore the idea of creating a fact table around each of those granularities.
From the perspective of an analyst or ad-hoc report developer it’s sure handy to have dates stored in the date dimension with the attributes specific to that organization. For instance, I’ve build date dimensions including everything from lunar cycles to differing definitions of season and fiscal year. Having all this date related material in one dimension is almost a must to support conformity among multiple and possibly unrelated subject areas.
Which brings my bit full circle; you’ve described items are not related. One of the primary purposes of a dimension is to create conformity among the fact tables. Seemingly unrelated items can be linked via conformed dimensions such as date. By selecting a single date from the date dimension a user should be able to tabulate (from different fact tables) how many accidents occurred, claims submitted, persons injured, payments made, premiums billed, premiums collected, etc… anything having that date. A single swipe at a well build EDW should produce this type of analysis almost instantly.
I would imagine the model should look like this:
Fact table would include
-amount paid per month
-amount paid year to date (rolled up in ETL)
-amount paid per year (rolled up in ETL)
-any other imaginable aggregation of amount...
-reserve (amount reserved for future payments)
-Declaration date (surrogate key)
-Recognition date (surrogate key)
-Begin date (surrogate key)
-Status date (surrogate key)
-Benefit (surrogate key)
-Customer(surrogate key)
-location(surrogate key)
-Other claim attributes(surrogate key)
Dimensions would include:
-Date Attributes
-Benefit Attributes
-Customer Attributes
-location Attributes
-Other claim Attributes (junk dimension)
-others dimensions as required
Hope this helps… if not, well… it was free advice and, as modelers often say…, “at least we have one more wrong answer out of the way.”
KS_EDW- Posts : 20
Join date : 2011-09-07
Age : 49
Location : Kansas
Re: To store dates or a reference to the date dimension?
Thank you both. I added several Factless fact tables which signify a business event, i don't have any real facts but I will add the dates that way.
Andy Nieubourg- Posts : 6
Join date : 2011-09-06
Re: To store dates or a reference to the date dimension?
Clearly you do have facts in this business process, but they are presented at highly aggregated levels. I think you should engage in the business and find out all facts at lowest possible granular levels. For instance, on which claim and what date the amount has been paid and reserved. You then go to the OLTP (data entry) system to find out all the relevant detailed data related under the business context.Andy Nieubourg wrote:Fact table Benefit (information about how much is paid to a victim after a claim)
- amount paid per month
- reserve (amount reserved for future payments)
Then you can determine how you should have the fact tables to meet the business requirements. You may have a transaction grain fact table with relevant dimension keys on periodic (eg. daily) basis, meaning you only load the fact record that has the amount paid or reserved on the day. Now you have the base table to aggregate up to your high level facts, say the monthly snapshot fact table in your initial post. Bear in mind that many fact tables may only have the count as the fact, which is equally, if not more, valuable than explicit numeric measures. Hopefully this will give you some starting point.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Date Dimension: Representing partial dates/Imputing date values
» Date dimension multiple dates
» Fact dates before begin date of Dimension
» Too many dates on fact - Is there such a thing as Junk Date dimension
» Fact with eff / exp dates referencing dimension with eff / exp dates
» Date dimension multiple dates
» Fact dates before begin date of Dimension
» Too many dates on fact - Is there such a thing as Junk Date dimension
» Fact with eff / exp dates referencing dimension with eff / exp dates
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum