SQL Server 2008 Date data type as dimension key
+4
Jeff Smith
Alan Musnikow
DataWhisperer
VHF
8 posters
Page 1 of 1
SQL Server 2008 Date data type as dimension key
We are getting ready to migrate from Microsoft SQL Server 2005 to 2008. Currently we use an integer (in YYYYMMDD format) as the SK on our date dimension table. I am very tempted to start using the new 3-byte Date data type (which stores only the date with no time information) as the SK for the date dimension.
Anyone have first-hand experience with the new SQL 2008 Date data type? Have you bumped into any tools that don't support it properly?
Anyone have first-hand experience with the new SQL 2008 Date data type? Have you bumped into any tools that don't support it properly?
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
SQL Server 2008 Date data type as dimension key
I think that this is a good idea because in 90% of our queries this would eliminate the need to join to the date dimension. Sometimes a date is just a date. We are modeling processes which are date intensive and so many of our fact tables require many (6 to 10) joins to the date dimension. Too many joins can slow down a query no matter how efficient they are. Besides I think that using the date datatype as the SK would offer the best of both worlds. You could join to the date dimension if you need to, but only when you need to. We could even use our existing date dimension as is because we already have a unique constraint on the DateValue field and it is already the date data type.
To me the real issue would be how to deal with pseudo nulls. With ints we can use -1 as unknown and 0 as not applicable (or whatever), but with the date data type I'm not sure what we would use. We could use real nulls but I hate the idea. We could use pseudos such as 1/1/1900 but those too often end up creating classic date duration miscalculations such as this: "Hey why is that order 110 years overdue?". I think I'm liking the real nulls better.
I'd really like to know what folks think of this.
To me the real issue would be how to deal with pseudo nulls. With ints we can use -1 as unknown and 0 as not applicable (or whatever), but with the date data type I'm not sure what we would use. We could use real nulls but I hate the idea. We could use pseudos such as 1/1/1900 but those too often end up creating classic date duration miscalculations such as this: "Hey why is that order 110 years overdue?". I think I'm liking the real nulls better.
I'd really like to know what folks think of this.
DataWhisperer- Posts : 1
Join date : 2010-03-25
"surrogate key is an artificial or synthetic key"
The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence states on page 285:
If "SK" means "surrogate key", a Date data type cannot be an SK of a date dimension since a date is a date and not a surrogate for a date. Nevertheless, a Date data type could be a primary key for a date dimension table and a foreign key to that table from fact tables.A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
Performance on Joins using the 3 Byte Date Data Type
I would take into consideration the performance of joining on the 3 byte date field. Integers tend to join quickly and a Date Dimension tends to be short so joins to a date dimension should be really fast. And if you had to select for all rows for a particular year, would it be faster to put a where statement on the Year field in the Date Dimension and then do the join or is it quicker for SQL Server 2008 to convert the 3 Byte Date field into a Year value and then apply the join? If you had an index on a Date Field in the fact table and you wanted all of the rows for 2008, would it know to grab the rows from 1/1/2008 to 12/31/2008 or would it have to convert all of the dates to the Year and then look at every row, effectively ignoring the indexing?
Try it. Do a select with the filter on YEAR(Datefield) = 2008 vs where Datefield between 1/1/2008 and 12/31/2009. Also try it with the date field joining to the date dimension, ect.
Just for giggles, if you are experimenting, try making the date dimension an indexed view and join on the fact table with an integer. I would be interested to see if there is any performance improvement, which I doubt because the date dimension is so small but it would still be iunterested.
By the way, if you're using SQL 2008, shouldn't all of your queries be going against the Cubes?
Try it. Do a select with the filter on YEAR(Datefield) = 2008 vs where Datefield between 1/1/2008 and 12/31/2009. Also try it with the date field joining to the date dimension, ect.
Just for giggles, if you are experimenting, try making the date dimension an indexed view and join on the fact table with an integer. I would be interested to see if there is any performance improvement, which I doubt because the date dimension is so small but it would still be iunterested.
By the way, if you're using SQL 2008, shouldn't all of your queries be going against the Cubes?
Jeff Smith- Posts : 471
Join date : 2009-02-03
DateKey in DimDate
Hi
Today my partner asked me why we need DateKey in DimDate and I thought because integer is smaller than datetime but now we have date which is smaller than integer, and I was given link to this topic which seems to be very relevant.
One of the comments is "A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key."
In my opinion Date shouldn't follow this rule and my reasoning is that although natural keys are substituted with surrogate keys because they may be occasionally 'unstable' and several other reason Date is an expeption because it is and always will be fixed. 1st of Jan 2010 will always be 1st of Jan 2010.
I think Date as PK might also simplify certain processes but also it is important to have backward compatibility (old approach) so I doubt this approach will suddenly flood all systems.
Regards
Emil
Today my partner asked me why we need DateKey in DimDate and I thought because integer is smaller than datetime but now we have date which is smaller than integer, and I was given link to this topic which seems to be very relevant.
One of the comments is "A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key."
In my opinion Date shouldn't follow this rule and my reasoning is that although natural keys are substituted with surrogate keys because they may be occasionally 'unstable' and several other reason Date is an expeption because it is and always will be fixed. 1st of Jan 2010 will always be 1st of Jan 2010.
I think Date as PK might also simplify certain processes but also it is important to have backward compatibility (old approach) so I doubt this approach will suddenly flood all systems.
Regards
Emil
itcouple- Posts : 45
Join date : 2010-10-13
Re: SQL Server 2008 Date data type as dimension key
What value would you use in your fact/date dimension table to represent a null date?
I think date dimensions are unlike any other type of dimension. I think dates are frequently used to create partitions so using a pure surrogate key in the date dimension can make it harder to define the partitions.
My date dimension uses a pseudo surrogate. It's an integer with the earliest date in the dimension set to 1, the next oldest set to 2, etc. Some prefer using an integer representation of the date, for example 20101013. I think my approach allows me to substract date dimension values from each other to calculate the number of days. But that's just me.
I think date dimensions are unlike any other type of dimension. I think dates are frequently used to create partitions so using a pure surrogate key in the date dimension can make it harder to define the partitions.
My date dimension uses a pseudo surrogate. It's an integer with the earliest date in the dimension set to 1, the next oldest set to 2, etc. Some prefer using an integer representation of the date, for example 20101013. I think my approach allows me to substract date dimension values from each other to calculate the number of days. But that's just me.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: SQL Server 2008 Date data type as dimension key
The main reason for using a surrogate key is to allow for invalid or unexpected values. Using a date data type as a key does not provide a means to reference invalid dates. Physical size of the data field itself (date versus number) is insignificant and is not a reasonable argument to endorse one approach over the other. Neither is 'eliminating joins', in that, if the database properly implements a star join strategy (many of them do, including SS 2008) the greater number of dimensions help, rather than hinder, performance.
Re: SQL Server 2008 Date data type as dimension key
I understand what the idea is behind using SK's for dimension tables. However, normally a date dimension has a SK, a column of type date/datetime and a character column for the name. One thing I have a hard time understanding is in the case of a date dimension is that even when using a SK you still need to populate the column of type date with a fake date. For example if you have SK of -1 for Unknown, you are still forced to have say 1/1/1900 for the column of type date in the date dimension table. In other words, even when using a SK, you are still forced to store fake dates for invalid dates. The only differece is that instead of putting that fake date in your fact table you are putting it in your dimension table unless you don't even bother with a date/datetime column in the date dimension and only have a character field for the name.
Am I losing my mind of I have a point here
Am I losing my mind of I have a point here
arowshan- Posts : 23
Join date : 2011-10-18
Location : Vancouver, Canada
Re: SQL Server 2008 Date data type as dimension key
my DateDim table has date_key (int)(not null), date_type (varchar(10))(not null), full_date (datetime)(null), YYYYMMDD (char(8))(null), and a bunch of other date breakdowns. I've pre-populated my date dim with everything within our companies valid date range as well as 100 years into the future. Since full_date is nullable, when I come across date not found, I populate a new record with the next negative SK, and fill in what I can. My blank date (-1, 'INVALID', null, ' ',.....). Hope this makes sense.
TheNJDevil- Posts : 68
Join date : 2011-03-01
Similar topics
» ETL from Oracle to SQL Server 2008 Data Warehouse
» scd2 effective date, end date data type
» Configuring SQL Server 2008 R2 Change Data Capture with SSAS
» Type-2 Dates as Date Data Type ?
» where to keep sql server data type text
» scd2 effective date, end date data type
» Configuring SQL Server 2008 R2 Change Data Capture with SSAS
» Type-2 Dates as Date Data Type ?
» where to keep sql server data type text
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum