How to model relations between dimensions?

zakbor0 Tue Jul 03, 2012 1:30 pm

Dear all,

I am currently in a DWH project for banking industry. We have just started the modelling in multidimensional model. However, just at the begining we are started to argue on basic concepts:

There is a fact table FCT_Balances and two dimenstions:
- DIM_Banking_Product (ProductLine, CustomerTargetSegment, ProductCode, ProductName, Description, SellingStartDate, SellingEndDate...) and
- DIM_Contracts (ContractNumber, ProductCode, MainOwner...)
Of corse, each contract is an occurence of a certain DIM_Banking_Product (there is a 1:n relation be between the Product and Contract).

My question: how to model the relation? We have the following possibilities:

Include a foreign key from DIM_Banking_Product in DIM_Contract and not put any product attributes (like ProductCode) in DIM_Contract
Include a foreign key from DIM_Banking_Product in DIM_Contract and put some attributes (like ProductCode or ProductName) in DIM_Contract (redundancy for ease of usage of DIM_Contract)
Do not link DIM_Banking_Product and DIM_Contract anyhow but by fact table

Additionally, if we decide to connect the dimenstions by putting Product foreign key to Dim_Contract (option 1 or 2 from above) should the DIM_Product be the dimensiton of FCT_Balances or not?

What are your opinions?
Regards

jchernev Tue Jul 03, 2012 5:22 pm

What's the actual process you are trying to describe? Can you please give us a little information about the process itself, any granularity of your fact table and dimensions you'd like to slice and dice by.

zakbor0 Wed Jul 04, 2012 4:28 am

jchernev wrote:What's the actual process you are trying to describe? Can you please give us a little information about the process itself, any granularity of your fact table and dimensions you'd like to slice and dice by.

This is a very basic process of handling banking lending products (credit cards, invesmtment loans, mortgage loans, consumer loans, credit lines...). Each day there are payments made by the clients on one hand and on the other hand installments are becoming matured and penalty interestes are being calculated and added to total debt.
I would like to model the daily snapshots of each account (contract) showing its balance (amount of capital due, unpaid interestes, penalties etc) at the end of each reporting day. The grain will be one row of table per contract per day.

I would like to be able to build reports showind aggregated debt per Product lines, per Product (meaing product type: eg. "visa card for students"), per Owner, on a specific date.
However, I would also like to be able to drill down to see a specific contract data (DIM_Contract) with the extendad information about the product of this contract (DIM_Product).

What I am trying to understand is the generic and most common way of handling situations when two demensions of a fact table are linked together (ie. in ERD they would be related 1:1 (like above: each contract from DIM_Contract has one product whose specific data is in DIM_Product) or 1:n (eg. each loan contract has many installments generated and waiting for repayment).

Should the dimenstions by linked togeteher by key? If so, surogate key (key for dimension) or natural (eg. product code from source system)?
Or maybe it is sufficient to get the estended data of product by using fact table as a bridge? But what if there is an entry in the dimension but no entry in the fact table?
In addition to the above 2 bullets: Should we still put some most important and basic product data in the contract dimension (eg. product name, product code, product line...) as redundant data (because it can be obtained from DIM_Product) just for the ease of report building, allowing to avoid joins in most situations?

jchernev Wed Jul 04, 2012 7:49 am

I don't have answers for all of your questions but let me see what I can help with.

I would like to model the daily snapshots of each account (contract) showing its balance (amount of capital due, unpaid interestes, penalties etc) at the end of each reporting day. The grain will be one row of table per contract per day.

This is the beginning of a good model. However, since we're doing daily snapshots of static data like bank balances, your facts may end up being semi-additive. This limitation manifests itself in you being able to pull only stats like AVERAGE, COUNT, but not SUM. I hope this doesn't clash with your requirement to do aggregates:

I would like to be able to build reports showind aggregated debt per Product lines, per Product (meaing product type: eg. "visa card for students"), per Owner, on a specific date.

I'm not entirely sure here

What I am trying to understand is the generic and most common way of handling situations when two demensions of a fact table are linked together (ie. in ERD they would be related 1:1 (like above: each contract from DIM_Contract has one product whose specific data is in DIM_Product) or 1:n (eg. each loan contract has many installments generated and waiting for repayment).

It sounds like you have two separate business processes which would require fact tables on their own.

In dimensional modeling you can expect for your dimensions to be related many-to-many. This is why the fact table is the corner-stone for each dimensional design. In normal E-R design, the fact table is referred to as a bridge or a join table that allows you to describe many-to-many relationships between the co-joining tables.

You'd never directly link dimensions to each other as that would violate basic dimensional design principles. You'd have to go through the fact table to describe that relationship.

Surrogate keys are a must - they ensure stability and easy growth for your dimensional model over time.

Having an entry in the dimension table and none in the fact simply means that there are no measures generated by your process - e.g. nothing that we're looking for actually happened.

You'd never put reporting labels/dimensional attributes in multiple places "ease" that will avoid joins. Joins are good and the dimensional model relies heavily on them.

Hope this helps

zakbor0 Wed Jul 04, 2012 10:47 am

Jchernev - thank you for your answer. I found however some opinions in other threads that it would be ok to keep the FK when there is a relation 1:n between the dimensions (which is called snowflaking)....

Any other comments on these:
[list][*] Should the dimenstions by linked togeteher by key? If so, surogate key (key for dimension) or natural (eg. product code from source system)?
[*] Or maybe it is sufficient to get the estended data of product by using fact table as a bridge? But what if there is an entry in the dimension but no entry in the fact table?
[*] In addition to the above 2 bullets: Should we still put some most important and basic product data in the contract dimension (eg. product name, product code, product line...) as redundant data (because it can be obtained from DIM_Product) just for the ease of report building, allowing to avoid joins in most situations?

cmosquera Tue Nov 19, 2013 8:16 pm

Hello Zakbor, did you ever get an answer on this?

zakbor0 wrote:Jchernev - thank you for your answer. I found however some opinions in other threads that it would be ok to keep the FK when there is a relation 1:n between the dimensions (which is called snowflaking)....

Any other comments on these:
[list]

[*] Should the dimenstions by linked togeteher by key? If so, surogate key (key for dimension) or natural (eg. product code from source system)?

[*]Or maybe it is sufficient to get the estended data of product by using fact table as a bridge? But what if there is an entry in the dimension but no entry in the fact table?
[*]In addition to the above 2 bullets: Should we still put some most important and basic product data in the contract dimension (eg. product name, product code, product line...) as redundant data (because it can be obtained from DIM_Product) just for the ease of report building, allowing to avoid joins in most situations?

ngalemmo Wed Nov 20, 2013 3:17 pm

Relationships between dimensions is handled by fact tables and bridge tables.

A fact table that contains balances of contracts by product will define that relationship for contracts with balances. A factless fact table containing just the dimensional relationships could also be created to track the state of those relationships.

Snowflaking is not an appropriate solution in this case.

Sponsored content

How to model relations between dimensions?

How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?

Re: How to model relations between dimensions?