Joining facts through conforming dimensions
2 posters
Page 1 of 1
Joining facts through conforming dimensions
I've seen some overlap in some posts but I don't want to hijack someone elses thread to attempt to get clarification for my scenario. I also didn't want to simply make the title "Joining Fact Tables" because I know the immediate response would be "You don't join fact tables". Now that we have that out of the way I have some questions.
We have 1 fact table that holds transactions (fact_trans_dtls). We have a 2nd fact that holds pricing components (fact_ar_dtls). The pricing components fact was created because there can be a variable number of pricing components for each transaction (a flattened version would always be adding columns and be very sparse). The tables were originally joined in a normalized form where the components had the transaction key (trans_dtls_key) in it's table. I have since been taught that this is incorrent in a dimensional model and have added the keys of the conforming dimensions to the components fact to join through.
This is fine when pulling aggregates through a cube built on top of this model but what about transaction level reporting from the DW? I think my main problem is that the key combination of the conforming dimensions does not represent a unique value for a transaction.
so the query with the old structure:
does not produce the same result when modified to work with the new structure:
What am I missing here?
On a positive note, I just recieved my Data Warehouse Toolkit, LifeCycle Toolkit and Group Reader so I'll hopefully be bugging you guys less in the near future.
We have 1 fact table that holds transactions (fact_trans_dtls). We have a 2nd fact that holds pricing components (fact_ar_dtls). The pricing components fact was created because there can be a variable number of pricing components for each transaction (a flattened version would always be adding columns and be very sparse). The tables were originally joined in a normalized form where the components had the transaction key (trans_dtls_key) in it's table. I have since been taught that this is incorrent in a dimensional model and have added the keys of the conforming dimensions to the components fact to join through.
This is fine when pulling aggregates through a cube built on top of this model but what about transaction level reporting from the DW? I think my main problem is that the key combination of the conforming dimensions does not represent a unique value for a transaction.
so the query with the old structure:
- Code:
select measure1, measure2, measure3, measure4
from fact_trans_dtls A
JOIN
(select trans_dtls_key, sum(measure3) AS measure3, sum(measure4) AS measure4
from fact_ar_dtls
where blah
group by trans_dtls_key) B
ON A.trans_dtls_key = B.trans_dtls_key
where blah
does not produce the same result when modified to work with the new structure:
- Code:
select measure1, measure2, measure3, measure4
from
(select key1, key2, key3, key4, measure1, measure2
from fact_trans_dtls
where blah) A
JOIN
(select key1, key2, key3, key4, sum(measure3), sum(measure4)
from fact_ar_dtls
where blah
group by key1, key2, key3, key4) B
ON A.key1 = B.key1
AND A.key2 = B.key2
AND A.key3 = B.key3
AND A.key4 = B.key4
What am I missing here?
On a positive note, I just recieved my Data Warehouse Toolkit, LifeCycle Toolkit and Group Reader so I'll hopefully be bugging you guys less in the near future.
Zen- Posts : 10
Join date : 2010-08-18
Re: Joining facts through conforming dimensions
Your second example is the correct way to go about it, except that the first sub-query
(select key1, key2, key3, key4, measure1, measure2
from fact_trans_dtls
where blah) A
should always be an aggregate query regardless of how many keys are involved. Always assume, regardless of the keys, that any fact-to-fact relationship is many-to-many. Both sub-queries must aggregate on the common keys before the join, and both sub-queries may only contain common keys.
(select key1, key2, key3, key4, measure1, measure2
from fact_trans_dtls
where blah) A
should always be an aggregate query regardless of how many keys are involved. Always assume, regardless of the keys, that any fact-to-fact relationship is many-to-many. Both sub-queries must aggregate on the common keys before the join, and both sub-queries may only contain common keys.
Re: Joining facts through conforming dimensions
I appreciate the tip but that does not address my main concern. I'm simply trying to recreate the legacy output within the new design. Because the combination of keys that make up the conforming dimensions do not constitute a unique key of a transaction, the data comes out incorrectly.
I was thinking about pulling in the transaction_id from the source system as a degenerate dimension. If I did that, would I still join on ALL conforming dimensions or would just joining on the transaction_id (tran_key?) be ok? If only the one key, how is that different than the normalized version that was originally done?
I was thinking about pulling in the transaction_id from the source system as a degenerate dimension. If I did that, would I still join on ALL conforming dimensions or would just joining on the transaction_id (tran_key?) be ok? If only the one key, how is that different than the normalized version that was originally done?
Zen- Posts : 10
Join date : 2010-08-18
Re: Joining facts through conforming dimensions
Zen wrote:
This is fine when pulling aggregates through a cube built on top of this model but what about transaction level reporting from the DW? I think my main problem is that the key combination of the conforming dimensions does not represent a unique value for a transaction.
I overlooked this part... Your analysis is correct. If you are trying to do a transaction level report, you cannot rely on the dimensions to identify a transaction unless one of those dimensions is the transaction ID. Add transaction ID as a degenerate dimension.
If the transaction ID is unique in the trans detail table, you could simply join on that. However, the safe thing to do is to treat it as any other dimension and do a standard aggregate and join between the two facts. This covers you should the design change later to include transactional history, at which point the transaction ID is no longer unique. You want to avoid getting into situations where a change in the design of a fact table requires extensive impact analysis and potential changes to existing queries. If you always combine facts consistantly in the prescribed manner, all pre-existing queries would continue to function correctly (unless, of course, you drop a column).
As a side note, I don't understand why you need to combine transactions and pricing components. When I dealt with this in the past, the pricing components fact had all the dimensions of the transaction (order line) fact, plus additional dimensions dealing with the components. The pricing component values always summed to the line value in the transaction fact. The sales users always looked at the transaction facts and the accounting folk pretty much always used the component facts. There was no need to combine them because, from a transaction level perspective, they contained the same information.
Re: Joining facts through conforming dimensions
Thank you for the confirmation. I think your side note gets to the real heart of the problem.
I think what's going on is that they need certain facts at the transaction level based on combinations of components that are variable between Merchants. I use the term Merchants loosely because that is not the only party within the scope of the reports to be generated. Flattening out these variable components at the transaction fact level would result in an ever growing sparse table because the components are technically bottomless.
This situation may be a result of an undefined scope, grain and audience like the requirements were "house all of the transactions with history and generate reports from that for everyone". It feels more like a top-down design but I can't say for certain. I'm new here and not directly in the role to make the final decisions but I do have a voice in the discussion so I'm trying to ask the right questions (obviously, with your expert opinions).
I think what's going on is that they need certain facts at the transaction level based on combinations of components that are variable between Merchants. I use the term Merchants loosely because that is not the only party within the scope of the reports to be generated. Flattening out these variable components at the transaction fact level would result in an ever growing sparse table because the components are technically bottomless.
This situation may be a result of an undefined scope, grain and audience like the requirements were "house all of the transactions with history and generate reports from that for everyone". It feels more like a top-down design but I can't say for certain. I'm new here and not directly in the role to make the final decisions but I do have a voice in the discussion so I'm trying to ask the right questions (obviously, with your expert opinions).
Zen- Posts : 10
Join date : 2010-08-18
Re: Joining facts through conforming dimensions
I faced the same problem at a large CPG manufacturer. The components varied widely from one sale to the next, even with the same product to the same customer.
If the component fact includes the same dimensions as the transaction fact (including the transaction id) you should be able to do any kind of component analysis without the need to use the transaction table. The only issue would be quantity measures, which are fully additive in the transaction fact but semi-additive in the component fact. But, if you are reporting at the transaction line level, displaying the average quantity should work.
If the component fact includes the same dimensions as the transaction fact (including the transaction id) you should be able to do any kind of component analysis without the need to use the transaction table. The only issue would be quantity measures, which are fully additive in the transaction fact but semi-additive in the component fact. But, if you are reporting at the transaction line level, displaying the average quantity should work.
Similar topics
» Arriving at Facts and Dimensions
» TWO FACTS JOINing?
» Fact Tables Joining to Factless Facts
» Question on joining dimensions to avoid possible snowflake
» No of Dimensions and Facts
» TWO FACTS JOINing?
» Fact Tables Joining to Factless Facts
» Question on joining dimensions to avoid possible snowflake
» No of Dimensions and Facts
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum