Dates in many dimensions
4 posters
Page 1 of 1
Dates in many dimensions
Hello,
I am new to DW. Currently, I am working on a Human Resource database. There are many tables in the source systems which stores dates e.g. dates about contract start, finish, Vacations of an employee, temporary service suspension and many more.
In this situation is it better to use a Date dimension and reference it in the dimensions which stores the dates or I should simply include the dates in the dimensions?
thanks,
rf
I am new to DW. Currently, I am working on a Human Resource database. There are many tables in the source systems which stores dates e.g. dates about contract start, finish, Vacations of an employee, temporary service suspension and many more.
In this situation is it better to use a Date dimension and reference it in the dimensions which stores the dates or I should simply include the dates in the dimensions?
thanks,
rf
rf001- Posts : 23
Join date : 2010-12-16
Re: Dates in many dimensions
I prefer to use a snowflake design when it comes to dates on other dimensions.
I figure there are 3 possible options. 1) Snowflake to a date dimension, 2) Put the date on the dimension table, 3) Mixed, depending upon how it will be used.
In my opinion, consistency very important so option 3 ranks at the bottom. There are times when I need the extended information in the date dimension to be avaliable for the date in the non-date dimension, which would exclude option 2. That leaves option 1.
In my database, a calendar date is only found in the Date Dimension.
I figure there are 3 possible options. 1) Snowflake to a date dimension, 2) Put the date on the dimension table, 3) Mixed, depending upon how it will be used.
In my opinion, consistency very important so option 3 ranks at the bottom. There are times when I need the extended information in the date dimension to be avaliable for the date in the non-date dimension, which would exclude option 2. That leaves option 1.
In my database, a calendar date is only found in the Date Dimension.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dates in many dimensions
If there is no time (hour, minute et.) involved, I agree with Jeff, have FK for most date attributes in dimensions. Not quite sure about things like birthdate, I would leave it as calendar date.
If the time of the day is significant, use Datetime field directly instead of trying to snowflake the dimension for the time field in any form. Of course you would still snowflake other date fields in the same dimension. This arrangement even applies to fact tables unless the time unit is well defined. The worst thing is to extend a date dimension into a date-time dimension only because some attributes involve time parts. Create a separate time unit dimension if the granularity does not go below a second and you need some descriptions for different time scales, otherwise just leave it as Datetime type.
If the time of the day is significant, use Datetime field directly instead of trying to snowflake the dimension for the time field in any form. Of course you would still snowflake other date fields in the same dimension. This arrangement even applies to fact tables unless the time unit is well defined. The worst thing is to extend a date dimension into a date-time dimension only because some attributes involve time parts. Create a separate time unit dimension if the granularity does not go below a second and you need some descriptions for different time scales, otherwise just leave it as Datetime type.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Dates in many dimensions
Thanks a lot for replies. It was very helpful.
rf001- Posts : 23
Join date : 2010-12-16
Re: Dates in many dimensions
There are dates that are just simply dates and dates that you need date dimension attributes for analysis. Birthdate, usually falls into the former, so would something like hire date.
In the latter, if these dates are significant, they more than likely belong in the fact table relating to the event. For example, in an employee action fact table, hiring would be one such action, so the hire date would be the effective date of that action (hire). But this does not mean it is the sole place for that value. Obviously, hire date (and rehire date) should also be attributes in the dimension for reference purposes.
If you were to do a report of how many people were hired during a period, you would look for hire actions in the action fact table and report on effective date (which references the date dimension), not by running a report off the employee dimension.
In the latter, if these dates are significant, they more than likely belong in the fact table relating to the event. For example, in an employee action fact table, hiring would be one such action, so the hire date would be the effective date of that action (hire). But this does not mean it is the sole place for that value. Obviously, hire date (and rehire date) should also be attributes in the dimension for reference purposes.
If you were to do a report of how many people were hired during a period, you would look for hire actions in the action fact table and report on effective date (which references the date dimension), not by running a report off the employee dimension.
Re: Dates in many dimensions
I just came across a design that has a fact table with 13 date keys and 30 other dimension keys. The date dimension contains more than 200,000 rows dating from 18500101 to 24991231. I think someone was so confident that the system could be still useful in next a few centuries.
Now there is a reporting requirement that many calendar dates are needed in the fact, so that I have to join the fact with date dimension a few times. As the date dimension in this case is not a relatively small dimension, each join will have performance penalty. I could also avoid the annoying joins by self deriving the date from the smart surrogate key, but not sure if the conversion would outperform the joins.
When I saw this fact table, my initial impression is Centipede fact, as the number of dimensions far exceeds Kimball’s rule of thumb figure 25. Redesigning the fact structure and have many dates wrapped up in some combined dimensions will reduce the number of joins significantly.
However it raises a question about the guideline on what data type we should use for date fields in dimension and fact tables. Generally we should use date keys in the fact to cater for non-existence and other unusual date meanings. Whereas in dimensions, we have options of date key or calendar date because there would not be any referential issue as in fact table. Sometimes snowflaking all the date attributes in a dimension may make queries unnecessarily complicated by too many joins with date dimension and hence defeat one of the important purposes of dimensional modelling, ease of use.
I am kind of turning against my previous post, and possibly Jeff’s as well, advocating using date FKs in dimensions. It would simplify a lot of queries if we just use straight calendar date as date attributes in dimensions which would allow NULL value as well, and if necessary, connect it to date dimension by the calendar date instead of date key.
Now there is a reporting requirement that many calendar dates are needed in the fact, so that I have to join the fact with date dimension a few times. As the date dimension in this case is not a relatively small dimension, each join will have performance penalty. I could also avoid the annoying joins by self deriving the date from the smart surrogate key, but not sure if the conversion would outperform the joins.
When I saw this fact table, my initial impression is Centipede fact, as the number of dimensions far exceeds Kimball’s rule of thumb figure 25. Redesigning the fact structure and have many dates wrapped up in some combined dimensions will reduce the number of joins significantly.
However it raises a question about the guideline on what data type we should use for date fields in dimension and fact tables. Generally we should use date keys in the fact to cater for non-existence and other unusual date meanings. Whereas in dimensions, we have options of date key or calendar date because there would not be any referential issue as in fact table. Sometimes snowflaking all the date attributes in a dimension may make queries unnecessarily complicated by too many joins with date dimension and hence defeat one of the important purposes of dimensional modelling, ease of use.
I am kind of turning against my previous post, and possibly Jeff’s as well, advocating using date FKs in dimensions. It would simplify a lot of queries if we just use straight calendar date as date attributes in dimensions which would allow NULL value as well, and if necessary, connect it to date dimension by the calendar date instead of date key.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Are dates degenerated dimensions?
» Dealing with empty/missing dates in dimensions
» Start and Finish dates and role-playing dimensions
» Dimensions with effective and end dates - best design and best way to join to fact tables?
» Fact with eff / exp dates referencing dimension with eff / exp dates
» Dealing with empty/missing dates in dimensions
» Start and Finish dates and role-playing dimensions
» Dimensions with effective and end dates - best design and best way to join to fact tables?
» Fact with eff / exp dates referencing dimension with eff / exp dates
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum