Dummy dimension values in the fact table
+2
elmorejr
noodleeater
6 posters
Page 1 of 1
Dummy dimension values in the fact table
Hi
I had a question around dummy values in the fact table.
Lets say:
Fact Sales
Dim Date
Dim Product
Dim Salesman
Also, lets say, my transaction system for sales does not have a unique transaction ID but instead has date, product and salesman that uniquely define a row.
If one of the dimension values (say salesman) in the source is unknown, it set the surrogate key in the fact table to -1
But what happens when this record is updated in the source? i.e. the salesman ID is now populated in the transaction system for the record?
I would have no option but to truncate and load my fact table to be able to reflect that update, right?
I wouldnt think there is any other way to remove the -1 surrogate key and update it with the correct one as in the source.
This is quite cumbersome if your transaction table has 20 million rows!
Can someone think of any other way?
I had a question around dummy values in the fact table.
Lets say:
Fact Sales
Dim Date
Dim Product
Dim Salesman
Also, lets say, my transaction system for sales does not have a unique transaction ID but instead has date, product and salesman that uniquely define a row.
If one of the dimension values (say salesman) in the source is unknown, it set the surrogate key in the fact table to -1
But what happens when this record is updated in the source? i.e. the salesman ID is now populated in the transaction system for the record?
I would have no option but to truncate and load my fact table to be able to reflect that update, right?
I wouldnt think there is any other way to remove the -1 surrogate key and update it with the correct one as in the source.
This is quite cumbersome if your transaction table has 20 million rows!
Can someone think of any other way?
noodleeater- Posts : 1
Join date : 2011-12-12
Regular reload
One option is a regular reload. For example, if you know that the Salesperson ID is usually "filled in" within 30 days of the transaction, you can implement a rolling 30-day reload. On each load, remove the most recent 30 days and then reload data for those 30 days.
This will give the salesperson assignment time to catch up.
If you have appropriate partitioning, removing a range of data should not be an issue.
This will give the salesperson assignment time to catch up.
If you have appropriate partitioning, removing a range of data should not be an issue.
elmorejr- Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol
Re: Dummy dimension values in the fact table
May be rejecting that record untill salesman id is populated would be the better way.
If there is an update date column in source system, you may use transaction date + update date to get delta data from source. You get the record when created, if it is rejected you give another chance when updated.
If there is an update date column in source system, you may use transaction date + update date to get delta data from source. You get the record when created, if it is rejected you give another chance when updated.
umutiscan- Posts : 51
Join date : 2010-11-26
Age : 44
Location : Istanbul, Turkey
Re: Dummy dimension values in the fact table
This is a business rules issue. If there should always be a salesman, why would salesman be unspecified? Is it really 'officially' an order or just an indication from the customer? Umutiscan's comment that maybe you shouldn't load it yet may be the correct approach. When you are dealing with a sales force, typically an order isn't an order until it has been accepted. The other stuff is just "pipeline".
I also find it difficult to understand why there isn't an identifier for an order. The source system must have something that it uses to keep things straight. Is this a hypothetical question?
The idea of generating net change rows is probably the best option as it avoids having to update the fact table. It gives you the ability to state and order at any point in time, the downside is there are more rows in the table. How much of an impact the extra rows may have on query performance depends on how often the order changes.
I also find it difficult to understand why there isn't an identifier for an order. The source system must have something that it uses to keep things straight. Is this a hypothetical question?
The idea of generating net change rows is probably the best option as it avoids having to update the fact table. It gives you the ability to state and order at any point in time, the downside is there are more rows in the table. How much of an impact the extra rows may have on query performance depends on how often the order changes.
Dummy dimension values in the fact table
We run into a similar situation often in my current role. We often get upc codes from our point of sale system, but don't know what the other attributes are for these items (no description, size, name, etc.). In this case, we create an item record with a new surrogate key, where the business key (upc) is set, but most of the other item attributes are "unknown".
Usually, we will eventually get the descriptive attributes for the item. When we do, we track these attribute changes as normal scd attribute changes.
Usually, we will eventually get the descriptive attributes for the item. When we do, we track these attribute changes as normal scd attribute changes.
pcs- Posts : 20
Join date : 2009-02-03
Re: Dummy dimension values in the fact table
pcs wrote:We run into a similar situation often in my current role. We often get upc codes from our point of sale system, but don't know what the other attributes are for these items (no description, size, name, etc.). In this case, we create an item record with a new surrogate key, where the business key (upc) is set, but most of the other item attributes are "unknown".
Usually, we will eventually get the descriptive attributes for the item. When we do, we track these attribute changes as normal scd attribute changes.
It is a good way to do it if you are getting a business key (in this case UPC). My impression from the original post was that no business key was assigned when the information was extracted for loading into the DW. In such cases, you can't create a dimension row because you have nothing to identify it other than assigning it to a default 'unknown' row.
Re: Dummy dimension values in the fact table
The source system has to be able to track the transaction. How else can it append the missing info later in the process?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» How to Handle a value in a fact table that can have multiple dimension values
» Source Values and Conformed Values in the Dimension table
» Lab Result values in Fact Table has int and non int values
» Adding dummy Fact records to a Fact_SurveyAnswer table
» NULL Values in Fact Table
» Source Values and Conformed Values in the Dimension table
» Lab Result values in Fact Table has int and non int values
» Adding dummy Fact records to a Fact_SurveyAnswer table
» NULL Values in Fact Table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum