Sparse Product Dim vs Heterogeneous Product Dim

DataDude Wed Jan 15, 2014 5:08 pm

What is a best practice or rule of thumb for deciding when to use a single product dim vs using the Heterogeneous Product design where some attributes will be mutually exclusive to certain product types.

For our dimensional model, we maintain type 7 dimensions (type 2 dim table with a surrogate dim key as well as a separate type 1 view with a durable dim key). If we used the Heterogeneous method, we would maintain a single durable dim key across all of the subtypes, but would probably maintain separate type2 hist keys for each subtype dim (instead of trying to keep them in sync) and thus have separate keys on the fact tables. So, the fact table may look like this....

Fact Table
prod_durable_key (scd1 FK)
prod_subtype1_key (scd2 FK)
prod_subtype2_key (scd2 FK)
prod_subtype3_key (scd2 FK)
prod_subtype4_key (scd2 FK)

What is the justifications for choosing one method over the other?
sparsity? what if 90% overlap, what if 50% overlap, what if 25% overlap
ease of use? - knowing which attributes apply to which type or
other?

ngalemmo Wed Jan 15, 2014 5:51 pm

Isn't an item of only one subtype? I don't understand the purpose of four different subtype keys.

When you update an item, would you not update its particular subtype and use that type 2 key in the fact? One would assume any subtype table would only contain members of that subtype and when used you would not include other subtypes in such a query.

If, on the other hand, your intent is to allow reporting any subtype's attributes at any time (i.e. against the entire population) what would be the point of having subtypes in the first place? Just have a single wide sparse dimension.

BoxesAndLines Thu Jan 16, 2014 9:13 am

How varied (heterogeneous) are your products? Do you have specific requirements for global views as well as line of business views of products? Do you have special line of business metrics not applicable to all products? That is the thought process I work through to determine whether I need to implement a heterogeneous product solution.

DataDude Thu Jan 16, 2014 4:38 pm

BoxesAndLines,
I am working in a more generic data service environment where we are developing common models and some tools that will then be deployed to multiple clients. Thus, I don't have specific answers to questions like "How varied are the products", etc. I was looking more for general rules of thumb that says if there are less than x% overlap, I would lean towards heterogeneous. There would be some line of business processes (facts) as well as some more generic processes that are common to all lines of business.

ngalemmo,
Yes, an item would only belong to one subtype. We are modeling our dims as TYPE 7, that is to say we have both a type II as well as a type I. The product_durable_key represented the reused key for all tables for the type I. For the type II dims, in order to reuse the same key in the fact table to join to both the common dim as well as any of the subtype dims, then the ETL would have to manage creating a new subtype record (even if nothing changed) if something in the common dim changed, or vice versa. Instead of that maintenance, I was thinking I would need a separate key for each subtype type II table.

ngalemmo Thu Jan 16, 2014 5:08 pm

If you think of the subtype row as an extension of the supertype row, a type 2 change in the supertype should also be reflected in the subtype even if none of the subtype attributes changed. This allows both supertype and subtype to carry the same type 2 key, making queries far simpler.

The problem with the multiple key solution, aside from overcomplicating queries, is where does it end? If you are trying to establish a generic environment, you have no control over how many of these subtypes may exist. You wind up with a sparse fact table rather than a sparse dimension table, which is a much worse situation.

Sponsored content

Sparse Product Dim vs Heterogeneous Product Dim

Sparse Product Dim vs Heterogeneous Product Dim

Re: Sparse Product Dim vs Heterogeneous Product Dim

Re: Sparse Product Dim vs Heterogeneous Product Dim

Re: Sparse Product Dim vs Heterogeneous Product Dim

Re: Sparse Product Dim vs Heterogeneous Product Dim

Re: Sparse Product Dim vs Heterogeneous Product Dim