Modelling and design work for a Warehouse
2 posters
Page 1 of 1
Modelling and design work for a Warehouse
Hi All,
I am relatively new to the forum and would like to ask some questions here. So please excuse some of my basic questions.
I have a SQL server system which is more like an ODS in terms of how the reporting database is currently used today, now the idea is to have the ODS stage data temporarily for a week, then within that time transform the data to a DDS schema. Starting with the design, I have taken the following steps.
Created a split of the dimension tables and isolate them away from the fact tables. Populated the dimension table columns by having a group by from the ODS and referenced them with the necessary FK’s. Is this the correct way ? Some dimensions only have 1 column, in this case not worth having SCD 2/SCD 3 as any new updates would mean a new row in the dimension table.
Replaced all the date columns with DateID’s instead and used a role playing day date/day dimension table instead. The problem that I am facing here is that whilst I thought that all the dates in some columns don’t have any time element to it, I was actually wrong when I realised that about 1% of some of the date columns have a time element to it, and when the Date lookup runs it doesn’t match and then Nulls it. What is the best way to deal with situations like this, have a default value of -1 for unknown dates ? Ultimately, I guess one can argue that the data can be cleaned.
In the fact table, I have a lot of junk dimensions what I have done is that in some cases I created dimensions that can be grouped together. An example is this. On a fact table I have IndexSiteLocation, IndexCityLocation, IndexLocation, IndexJurisdiction all these columns are related and what has been done is the creation of a dimension table with 3 columns called IndexLocation
IndexLocationID int identity(1,1),
SourceLocation nvarchar(100)
LocationType (This will be ndexSiteLocation, IndexCityLocation,, IndexLocation and IndexJurisdiction) This is to identity each location type.
Lastly, for date columns with time, is it worth just leaving them as they are or perhaps create a separate time dimension. Will that also mean that to get the full date, one will have to join to the date table to get the day, then the time dimension to get the time.
I am relatively new to the forum and would like to ask some questions here. So please excuse some of my basic questions.
I have a SQL server system which is more like an ODS in terms of how the reporting database is currently used today, now the idea is to have the ODS stage data temporarily for a week, then within that time transform the data to a DDS schema. Starting with the design, I have taken the following steps.
Created a split of the dimension tables and isolate them away from the fact tables. Populated the dimension table columns by having a group by from the ODS and referenced them with the necessary FK’s. Is this the correct way ? Some dimensions only have 1 column, in this case not worth having SCD 2/SCD 3 as any new updates would mean a new row in the dimension table.
Replaced all the date columns with DateID’s instead and used a role playing day date/day dimension table instead. The problem that I am facing here is that whilst I thought that all the dates in some columns don’t have any time element to it, I was actually wrong when I realised that about 1% of some of the date columns have a time element to it, and when the Date lookup runs it doesn’t match and then Nulls it. What is the best way to deal with situations like this, have a default value of -1 for unknown dates ? Ultimately, I guess one can argue that the data can be cleaned.
In the fact table, I have a lot of junk dimensions what I have done is that in some cases I created dimensions that can be grouped together. An example is this. On a fact table I have IndexSiteLocation, IndexCityLocation, IndexLocation, IndexJurisdiction all these columns are related and what has been done is the creation of a dimension table with 3 columns called IndexLocation
IndexLocationID int identity(1,1),
SourceLocation nvarchar(100)
LocationType (This will be ndexSiteLocation, IndexCityLocation,, IndexLocation and IndexJurisdiction) This is to identity each location type.
Lastly, for date columns with time, is it worth just leaving them as they are or perhaps create a separate time dimension. Will that also mean that to get the full date, one will have to join to the date table to get the day, then the time dimension to get the time.
platforminc- Posts : 7
Join date : 2012-05-25
Re: Modelling and design work for a Warehouse
Your example is a little vague/confusing - but here are some thoughts.
- It sounds like you are tackling the modelling as a technical exercise ..... try and model from a business context, define the facts and dimensions, then determine the source-target mappings.
- regarding datetime, typical practice is to create separate dimensions DIM_DATE, DIM_TIME. If your fact requires TIME for analysis, then include the time dimension, otherwise the date dimension should suffice. If your data source is datetime, then your ETL will need to separate the date and time before the surrogate lookup. Every valid date should successfully lookup the correct surrogate key. Invalid and/or missing dates need to be accounted for in your design. (either reject the record, or map it to an "Unknown/Invalid" date dimension record).
LAndrews- Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada
Re: Modelling and design work for a Warehouse
Thanks for your reply.
When I look at the actual fact tables that we have put together, we see that the fact tables contain a lot of junk dimensions i.e flags, or dimensions where they only have one column describing them. Thats the reason why I ask.
When I look at the actual fact tables that we have put together, we see that the fact tables contain a lot of junk dimensions i.e flags, or dimensions where they only have one column describing them. Thats the reason why I ask.
platforminc- Posts : 7
Join date : 2012-05-25
Similar topics
» Time dimension design for cross time zone and custom calendar warehouse, 1min granularity
» Modelling a new warehouse
» Dimensional Modelling Design
» Work Order / Customer Order Design - Dimension or Fact
» Multi-tenant design for Data Warehouse
» Modelling a new warehouse
» Dimensional Modelling Design
» Work Order / Customer Order Design - Dimension or Fact
» Multi-tenant design for Data Warehouse
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum