Joining facts through conforming dimensions

Zen Thu Aug 26, 2010 10:33 am

I've seen some overlap in some posts but I don't want to hijack someone elses thread to attempt to get clarification for my scenario. I also didn't want to simply make the title "Joining Fact Tables" because I know the immediate response would be "You don't join fact tables". Now that we have that out of the way I have some questions.

We have 1 fact table that holds transactions (fact_trans_dtls). We have a 2nd fact that holds pricing components (fact_ar_dtls). The pricing components fact was created because there can be a variable number of pricing components for each transaction (a flattened version would always be adding columns and be very sparse). The tables were originally joined in a normalized form where the components had the transaction key (trans_dtls_key) in it's table. I have since been taught that this is incorrent in a dimensional model and have added the keys of the conforming dimensions to the components fact to join through.

This is fine when pulling aggregates through a cube built on top of this model but what about transaction level reporting from the DW? I think my main problem is that the key combination of the conforming dimensions does not represent a unique value for a transaction.

so the query with the old structure:

Code:: select measure1, measure2, measure3, measure4 from fact_trans_dtls A JOIN (select trans_dtls_key, sum(measure3) AS measure3, sum(measure4) AS measure4 from fact_ar_dtls where blah group by trans_dtls_key) B ON A.trans_dtls_key = B.trans_dtls_key where blah

does not produce the same result when modified to work with the new structure:

Code:: select measure1, measure2, measure3, measure4 from (select key1, key2, key3, key4, measure1, measure2 from fact_trans_dtls where blah) A JOIN (select key1, key2, key3, key4, sum(measure3), sum(measure4) from fact_ar_dtls where blah group by key1, key2, key3, key4) B ON A.key1 = B.key1 AND A.key2 = B.key2 AND A.key3 = B.key3 AND A.key4 = B.key4

What am I missing here?

On a positive note, I just recieved my Data Warehouse Toolkit, LifeCycle Toolkit and Group Reader so I'll hopefully be bugging you guys less in the near future.

ngalemmo Thu Aug 26, 2010 4:29 pm

Your second example is the correct way to go about it, except that the first sub-query

(select key1, key2, key3, key4, measure1, measure2
from fact_trans_dtls
where blah) A

should always be an aggregate query regardless of how many keys are involved. Always assume, regardless of the keys, that any fact-to-fact relationship is many-to-many. Both sub-queries must aggregate on the common keys before the join, and both sub-queries may only contain common keys.

Zen Fri Aug 27, 2010 6:48 am

I appreciate the tip but that does not address my main concern. I'm simply trying to recreate the legacy output within the new design. Because the combination of keys that make up the conforming dimensions do not constitute a unique key of a transaction, the data comes out incorrectly.

I was thinking about pulling in the transaction_id from the source system as a degenerate dimension. If I did that, would I still join on ALL conforming dimensions or would just joining on the transaction_id (tran_key?) be ok? If only the one key, how is that different than the normalized version that was originally done?

ngalemmo Fri Aug 27, 2010 12:29 pm

Zen wrote:
This is fine when pulling aggregates through a cube built on top of this model but what about transaction level reporting from the DW? I think my main problem is that the key combination of the conforming dimensions does not represent a unique value for a transaction.

I overlooked this part... Your analysis is correct. If you are trying to do a transaction level report, you cannot rely on the dimensions to identify a transaction unless one of those dimensions is the transaction ID. Add transaction ID as a degenerate dimension.

If the transaction ID is unique in the trans detail table, you could simply join on that. However, the safe thing to do is to treat it as any other dimension and do a standard aggregate and join between the two facts. This covers you should the design change later to include transactional history, at which point the transaction ID is no longer unique. You want to avoid getting into situations where a change in the design of a fact table requires extensive impact analysis and potential changes to existing queries. If you always combine facts consistantly in the prescribed manner, all pre-existing queries would continue to function correctly (unless, of course, you drop a column).

As a side note, I don't understand why you need to combine transactions and pricing components. When I dealt with this in the past, the pricing components fact had all the dimensions of the transaction (order line) fact, plus additional dimensions dealing with the components. The pricing component values always summed to the line value in the transaction fact. The sales users always looked at the transaction facts and the accounting folk pretty much always used the component facts. There was no need to combine them because, from a transaction level perspective, they contained the same information.

Zen Fri Aug 27, 2010 1:44 pm

Thank you for the confirmation. I think your side note gets to the real heart of the problem.

I think what's going on is that they need certain facts at the transaction level based on combinations of components that are variable between Merchants. I use the term Merchants loosely because that is not the only party within the scope of the reports to be generated. Flattening out these variable components at the transaction fact level would result in an ever growing sparse table because the components are technically bottomless.

This situation may be a result of an undefined scope, grain and audience like the requirements were "house all of the transactions with history and generate reports from that for everyone". It feels more like a top-down design but I can't say for certain. I'm new here and not directly in the role to make the final decisions but I do have a voice in the discussion so I'm trying to ask the right questions (obviously, with your expert opinions).

ngalemmo Fri Aug 27, 2010 2:13 pm

I faced the same problem at a large CPG manufacturer. The components varied widely from one sale to the next, even with the same product to the same customer.

If the component fact includes the same dimensions as the transaction fact (including the transaction id) you should be able to do any kind of component analysis without the need to use the transaction table. The only issue would be quantity measures, which are fully additive in the transaction fact but semi-additive in the component fact. But, if you are reporting at the transaction line level, displaying the average quantity should work.

Sponsored content

Joining facts through conforming dimensions

Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions

Re: Joining facts through conforming dimensions