Modelling Question

Steven Thu Feb 05, 2009 1:17 am

Hi. I would welcome any input anyone has on the following modelling question. Apologies in advance for the length of this post, I have tried to keep it short, but it is difficult while being accurate.

Scenario

We have a snapshot fact table which is loaded once per quarter from the ‘real’ transaction fact table and supplemented with a quarter key. This is done for compliance reasons. The dimensions which connect to this snapshot fact are all type 2, are loaded daily, and are also used elsewhere in the warehouse.

We have been presented with a new requirement which is as follows. Users want to be able to report on dimension values as they are at the end of the quarter as though that’s how they were for the whole quarter.

For example, suppose ‘product’ is a dimension. If a product name changes in Q1 and then once more in Q2, users want all Q1 facts to be reported using the 2nd version of the product description and all Q2 facts reported using the 3rd version.

We have come up with the following proposed solutions:

Proposed Solution 1:

Add new surrogate keys to the snapshot fact table for each dimension. Rather than use the transaction date when looking up the dimension to obtain the surrogate key, use the last date of the quarter to obtain the version applicable as of that date.

Proposed Solution 2:

Create a new version of each dimension. This is loaded with the surrogate key values for all versions of each entity (e.g. a product), but the attributes are only the most recent version from each quarter. This dimension now has a composite key of the quarter key as well as the surrogate key (e.g. product key). This is loaded after the end of each quarter.

Does anyone have any thoughts on these proposals or other methods?

Type2 Thu Feb 05, 2009 12:34 pm

All you need to do is look up the Dim values using your snapshot date rather than transaction date when building your quarter snapshot...you don't need new surrogate keys added to your table (unless I'm misunderstanding your design or question).

Joy Thu Feb 05, 2009 1:01 pm

If all users of the snapshot table want to see only the attributes as of quarter-end, then Type2 (amusing name by the way) is correct: simply use that as the surrogate key. Likely, some people also want a more "normal" view of the dimension attributes. In this case your Proposed Solution 1 (two sets of dimension keys) is the way to go.

I don't like Proposed Solution 2 (quarterly images of each dimension table). Business users will inevitably forget to constrain on the quarter. In addition, it's inelegant.

Type2 Thu Feb 05, 2009 3:01 pm

I guess my take on having two different "views" of the same quarter-end data might be confusing to end users and may both end up on a manager's desk as two versions of the truth. I've always made sure date rollups reflect the end state of data as of that snapshot...if people want to see trending or changes between two snapshots, that's what the more transactional stores are for.

But as always, if your end users want that flexibility and can be properly trained to apply the proper attributes/metrics, then you do what you have to...ETL team be damned.

Steven Thu Feb 05, 2009 5:59 pm

Thanks for your feedback. It was most helpful. I was leaning towards retaining both sets of surrogate keys without really considering the confusion this might cause. I will liaise further with the users. Great forum by the way. A nice addition to the site.

BoxesAndLines Tue Feb 10, 2009 10:40 am

Solution 1. This is a common method for providing a current and historical view. Why do you load dimensions every day if you only load the fact table once a quarter? None of the fact rows will ever reference any of the new dimension keys until the next fact load.

Steven Tue Feb 10, 2009 6:44 pm

Hi BoxesandLines.
Thanks for the input. The primary reason the dimensions are loaded daily is that they are used elsewhere in the warehouse (i.e. by other facts which are also loaded daily).

Purushothaman.VS Wed Feb 11, 2009 2:33 am

Hi

If the problem is of presenting the data to the user as it was at that point of time, that is the name of a particular product to be what it was in Q1 with the data of the Q1 and what it was in Q2 with the data of the Q2, maintaining the Product dimension in Type 2 should solve the problem easily.
I am not sure whether the solution im suggesting is similar to the Proposed Solution 1, but i have just tried to illustrate the solution using an example.
The table below is the Product Dimension maintained in Type 2

ProductID	ProductCode	ProductName	ActivationDate	ExpiryDate
1	P01	ProductA	Jan-07	Dec-2007
2	P01	ProductA_Q1	Jan-08	Mar-2008
3	P01	ProductA_Q2	Apr-08	Dec-2050

The table below is a dummy fact table, i have added the columns like Product Name, Quarter and Year for the ease of understanding.

ProductID	Product Name	Snapshot Date	Measure 1	Quarter	Year
1	ProductA	31-May-07	59	Q2	2007
1	ProductA	6-Oct-07	24	Q4	2007
2	ProductA_Q1	1-Jan-08	75	Q1	2008
3	ProductA_Q2	1-Apr-08	35	Q2	2008

If you notice, for the snapshot of First Quarter of 2008, the product name would be as it was during the quarter 1 and similarly for the Second Quarter.
Hence, the idea is to add new rows to the Product Dimension table with new Surrogate keys, which would automatically be taken forward to the fact when the snapshots get loaded.

Hope it is of some help

slaborda Wed Feb 18, 2009 11:45 am

Hi,

I want to propouse you some alternatives because i don't like add new FK in order to have a concret temporal view, maybe is the solution but we cas discuss some alternatives if you want.
My question is, if you want view 10 differents time versions for your fact-table, the solution is 10 versions of FK's ??????

Alternative 1)The dimesion product "joins or links" with the alternative time version
Product description (actual)
temporal_view (Contais the rowid of the temporal point of view of product dimension)

This alternative is elegant in design but not very efficient because can imply 2 jumps of join.

Alternative 2) Increase the information contained in Dimensions, add the temporal point of view in the dimension.
In your exemplification, the dimension product will be:

Product description
Produc description (next quarter)
Product description (End year)
Etc...

This alternative is not very elegant but increase the performance of the model.

What do you think?

Apollogies for my English,

Sergi

dwbi_rb Thu Feb 19, 2009 10:34 pm

Joy wrote:If all users of the snapshot table want to see only the attributes as of quarter-end, then Type2 (amusing name by the way) is correct: simply use that as the surrogate key. Likely, some people also want a more "normal" view of the dimension attributes. In this case your Proposed Solution 1 (two sets of dimension keys) is the way to go.

Hello Joy,
With regards to this, wouldn't you however agree that it would be better to have a separate surrogate key rather than using only the snapshot date on the dimension table?
Just trying to get a clearer picture of your solution here.

Joy Tue Feb 24, 2009 2:30 pm

Yes, I agree a separate surrogate key is best. I also agree that this is an edge case -- most often the business users just want a single view, tracking history on the attributes they've identified as Type2, and restating history on the attributes we're managing as Type1.

The subject area where I've seen the greatest demand for both Type1 and Type2 attributes is in Human Resources schemas. There always seems to be a compelling business need both to track history and to see the current state of affairs. In HR schemas, you're more likely to implement the double-dimension approach (CurrentEmployee, HistoricalEmployee) -- and yes, this is two surrogate keys in the fact table.

Sponsored content

Modelling Question

Modelling Question

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question

Alternatives

Re: Modelling Question

Re: Modelling Question

Re: Modelling Question