Conformance of Dimension for facts aggregated at different levels

raikarleena Mon May 25, 2009 6:10 am

I am a newbie to Data Modeling. I am modeling a warehouse for Global reporting solution for a business operating in 3 countries. 2 business processes involved are "Sales" and "shipment Expenses"

The input is sumarized excel files generated by each country. My input is sumarized and not at lowest possible grain as per best practices.

Sales data is sumarized by calender month, Region
Shipment Expense data is sumarized by calender month, store

Where one Regions has multiple stores and one store is in one and only 1 region.

Where store dimension attributes are:
store key
store name
Region
country (one country has many regions)
territory (one territory has many countries)

I see that i should be able to compare both the numbers (Sales amount and Shipment expense) at Region, Country, Territory. Store is a conformed dimension but Sales Fact cannot have a store key as sales Fact is a summarized table.

What would be the best way to model it?

1. Have additional record in store dim at region level for each distinct region and have a store key for that region (This would have no specific store data)? So I could compare the 2 nos at Region or higher level

2. A new Dimension Region with 1 row for each distinct Region in store DIm and a new region key.

3. Associate a store key to Sales Fact (May be the minimum one for the region data complies to in store Dim). But This would mean I am manupulating as data is at region and cannot have store key. But i could still compare the 2 nos at Region or higher level (country, territory)

Please suggest me the best approach to model this. I am looking for conformance of my data at any cost.

Thanks in Advance!!

raikarleena Tue May 26, 2009 4:55 am

Read about aggregate dimensions and thought I would have 2 dimensions:

Store_dim and Region_dim (store will have the entire hirarchy and Region_Dim will be a perfect subset of Store Dim with one record for distinct Regions[and above] appearing in store).

But as I do not get Master Data and I am populating dimensions from my Transactional Summary, I might have Regions for which there is no entry in Store Dim (say a store with no sales). This contradicts the fundamentals of aggregate Dimensions

Any help is appreciated.

ngalemmo Tue May 26, 2009 11:53 am

Frankly, I would prefer the database was built with atomic, transactional data so you can support better analysis. But, you have what you have.

The model would have three dimensions: month, region (with territory and country) and store. When loading shipment expense, derive the region dimension key based on the store, this would make it easier to combine the two facts.

raikarleena Wed May 27, 2009 12:05 am

Thanks for the reply,

So the Sales Fact has 2 keys - Month_Key and Region_key and
Shipment has 3 Keys - Month_Key, Store_key and Shipment_key.

Where Store Dim has only Store level attributes (no region and above).

Though separating Store and region solves my problem of combining the 2 facts, It makes the drill down in cube a peoblem as I am not able to drill down from region to store (to view shipemnt amounts) as they are now different dimensions and connected only through facts. Any suggestions?

Another approach I could think was "Aggregate dimension" where store dim is entire store attributes + hirarchy attributes (store, region, territory) and Region Dim with region and above (As suggested by Kimball in "Datamodeling Toolkit" under confirmed Dimensions when data is captured at different level for different business processes (sales captured for product and budged is for Category) then having 2 dims Product (product+category+brand) and brand dim (brand and above).
This looks like a better option in my case only problem is: As i do not build my dimensions from Master Data (I extract Dimensions from Transactions), I would have a region (extracted from Shipmet transactions) for which there is no corresponding store (Say that store has not done any sale, so not captured in store dim) which contradics the fundamental requirement "Aggregate dim to be identical or subset of Granular Dim"

Please suggest me the best approach in this scenario.

Thanks,
Lee

ngalemmo Wed May 27, 2009 11:26 am

I would include region key as a dimension to the shipment fact. Derive the key based on the store. This gives you the conforming dimension key you need to integrate sales and shipments.

Jeff Smith Wed May 27, 2009 1:43 pm

There are different ways to skin this cat. One thing to consider is how to deal with changes. If there is the potential for the Store/Region to be modified and if you want to restate history to reflect the change, then Region should be in the Store dimension. For example, in 2009, Store A is currently in the Northeast region. in 2010, they decide to modify the regions and Store A is now assigned to the the New England Region. If you only have the Region dimension on the Fact Table, you will not be able to restate 2009 activity to reflect the 2010 Region assignment. To do this, you need the Region on the Store dimension.

There are 2 possible solutions: 1) Add the region to the store dimension as originally designed. Add in a new surrogate key for the Region so that the store dimension has 2 surrogate keys, 1 for store and 1 for region. Create a view of the store dimension that includes just those columns about the region of describe the region, selecting distinct rows. This will enable you to easily roll up store data to the region, allows you to use Region as a Dimension, and doesn't break any rules about re-using columns in dimension tables. 2) the second solution is to create 2 dimensions - one for store and one for region, with the store dimension contain the region surrogate key. Now you can create a snowflake join between the store dimension and the region dimension.

I prefer the first solution. Stores naturally roll up to regions, just as a store manager would rull up to a regional, manager or a stores cost center rolled up to a regional cost center. The only time I would put regions in a different dimension is if regions were defined geographically and I had a geographic dimension, say at the zip code level or at the census tract level. But even then, I would still create a roll up point in the geographic dimension for the region.

Sponsored content

Conformance of Dimension for facts aggregated at different levels

Conformance of Dimension for facts aggregated at different levels

Re: Conformance of Dimension for facts aggregated at different levels

Re: Conformance of Dimension for facts aggregated at different levels

Re: Conformance of Dimension for facts aggregated at different levels

Re: Conformance of Dimension for facts aggregated at different levels

Alternatives

Re: Conformance of Dimension for facts aggregated at different levels