Modeling Dimensional Parent-Child Relations
3 posters
Page 1 of 1
Modeling Dimensional Parent-Child Relations
We have a Data Modeling Case where there’s a CUSTOMER Dimension and Account Dimension and both of these two dimensions are SCD Type2. Typically a customer may have multiple accounts but each account belongs ALWAYS to a specific customer
CUSTOMER and ACCOUNT Dimension are referenced by TRANSACTIONS Fact Table. The issue is that we can track the relation between a customer and his account only at the time that a transaction occurs, we need to track that relation regardless of the transactions
We’re thinking of 2 options:
1- Having a Bridge Table which is joined to CUSTOMER and ACCOUNT Dimensions using Natural Keys of both Dimensions (Customer Number, Account Number)
2- Embedding the Customer Natural Key (Customer Number) into Account Dimension as a Snowflake
To get the relation at a specific point of time, we need to filter both of CUSTOMER and ACCOUNT dimension using ACTIVE_FROM_TIMESTAMP and ACTIVE_TO_TIMESTAMP columns
Our question is what is the Pros and Cons of both options, and whether you have any other ideas
CUSTOMER and ACCOUNT Dimension are referenced by TRANSACTIONS Fact Table. The issue is that we can track the relation between a customer and his account only at the time that a transaction occurs, we need to track that relation regardless of the transactions
We’re thinking of 2 options:
1- Having a Bridge Table which is joined to CUSTOMER and ACCOUNT Dimensions using Natural Keys of both Dimensions (Customer Number, Account Number)
2- Embedding the Customer Natural Key (Customer Number) into Account Dimension as a Snowflake
To get the relation at a specific point of time, we need to filter both of CUSTOMER and ACCOUNT dimension using ACTIVE_FROM_TIMESTAMP and ACTIVE_TO_TIMESTAMP columns
Our question is what is the Pros and Cons of both options, and whether you have any other ideas
mostafa_mahrous75- Posts : 8
Join date : 2011-11-20
Re: Modeling Dimensional Parent-Child Relations
"The issue is that we can track the relation between a customer and his account only at the time that a transaction occurs, we need to track that relation regardless of the transactions"
That is exactly what type 2 dimensions are supposed to do.
Your question really has to do with the versions of each entity, not the relationship between entities.
When you design a type 2 dimension you include a 'durable' key, one that does not change across versions of the entity. This may be the natural key or a surrogate (the latter being more efficient as a key).
The basic way to use a type 2 in your situation would be to perform a self join on the dimension using the durable key to locate the particular version of the entity you wish to use. Typically this is the 'current' version, but it is also possible to choose a version at a particular point in time.
If you don't like self-joins, an alternate method is to store the durable key on the fact as well (always use a surrogate).
In a bridge you would use the durable keys, so the bridge does not change as entities change (again, always use a surrogate).
That is exactly what type 2 dimensions are supposed to do.
Your question really has to do with the versions of each entity, not the relationship between entities.
When you design a type 2 dimension you include a 'durable' key, one that does not change across versions of the entity. This may be the natural key or a surrogate (the latter being more efficient as a key).
The basic way to use a type 2 in your situation would be to perform a self join on the dimension using the durable key to locate the particular version of the entity you wish to use. Typically this is the 'current' version, but it is also possible to choose a version at a particular point in time.
If you don't like self-joins, an alternate method is to store the durable key on the fact as well (always use a surrogate).
In a bridge you would use the durable keys, so the bridge does not change as entities change (again, always use a surrogate).
Re: Modeling Dimensional Parent-Child Relations
In case of using the surrogate keys in the Bridge Table, then we’ll have to include ACTIVE_FROM_TIMESTAMP and ACTIVE_TO_TIMESTAMP column in the Bridge Table which will lead to that whenever CUSTOMER OR ACCOUNT dimensions is affected by Type 2 Change, the same effect should be propagated to the Fact record as well, so that ACTIVE_TO_TIMESTAMP column in the old fact record is updated to the change time and a new Fact record is created with ACTIVE_FROM_TIMESTAMP column is set to the change time and ACTIVE_TO_TIMESTAMP is set to infinity. This will surely add a burden to ETL
However, if use Natural Key instead of Surrogate keys, we’ll avoid reflecting Type 2 changes to the Fact Table (of course we must filter both of Account and Customer Dimensions to a specific point of time to get the corresponding versions together)
However, if use Natural Key instead of Surrogate keys, we’ll avoid reflecting Type 2 changes to the Fact Table (of course we must filter both of Account and Customer Dimensions to a specific point of time to get the corresponding versions together)
mostafa_mahrous75- Posts : 8
Join date : 2011-11-20
Re: Modeling Dimensional Parent-Child Relations
No. A durable key is the same as a type 1 key. It has nothing to do with changes to the entity. If you have an active date range it would indicate the time the customer-account relationship existed, not when something about the customer or account changed.
The dates that relate to when a particular version of a customer or account is in effect is stored in the respective dimension table. Not on bridges or facts.
Don't think that a natural key is any different from a surrogate key. You can have type 2 surrogate keys or type 1 surrogate keys. A type 1 key is 1:1 with the natural key. That is the durable key I am referring to. It is an alternate non-unique key to the dimension table. You use the effective timestamps, also carried in the dimension table, to locate a specific row. It is the exact same thing as a natural key, but with additional benefits (smaller, more efficient, stable). Storing the type 1 surrogate on the fact is optional. It is not necessary to handle your situation (i.e. you do a self join on the dimension).
The dates that relate to when a particular version of a customer or account is in effect is stored in the respective dimension table. Not on bridges or facts.
Don't think that a natural key is any different from a surrogate key. You can have type 2 surrogate keys or type 1 surrogate keys. A type 1 key is 1:1 with the natural key. That is the durable key I am referring to. It is an alternate non-unique key to the dimension table. You use the effective timestamps, also carried in the dimension table, to locate a specific row. It is the exact same thing as a natural key, but with additional benefits (smaller, more efficient, stable). Storing the type 1 surrogate on the fact is optional. It is not necessary to handle your situation (i.e. you do a self join on the dimension).
Re: Modeling Dimensional Parent-Child Relations
Durable key (DK) is more thoughtful as suggested by ngalemmo, however if your natural key (NK) never gets reused, you may just refer to natural key as durable key for simplicity.
Your option 1 is good for tracking the change of Customer-Account relationship. It would be very efficient to get the current version of both dimensions with current flag bitmap/columnar indexed, but needs to date-range both dimensions to get the historic version. However, surrogate key (SK) in bridge is mainly focused on historic version by straight joins. The approach could grow the bridge alarmingly if any relevant dimension is type 2 frequently changing dimension.
Option 2 only works if an account will never be shared by more than one customers. However you still have the dilemma of NK vs. SK. Since account dimension is likely big, you may just store customer NK in the account dimension, if current customer version is your main focus, for the same reason in bridge scenario. Alternatively, you may include both customer SK and NK in account dimension to cater for both purposes, but need to have type 2 response on account in case of type 2 customer change.
Your option 1 is good for tracking the change of Customer-Account relationship. It would be very efficient to get the current version of both dimensions with current flag bitmap/columnar indexed, but needs to date-range both dimensions to get the historic version. However, surrogate key (SK) in bridge is mainly focused on historic version by straight joins. The approach could grow the bridge alarmingly if any relevant dimension is type 2 frequently changing dimension.
Option 2 only works if an account will never be shared by more than one customers. However you still have the dilemma of NK vs. SK. Since account dimension is likely big, you may just store customer NK in the account dimension, if current customer version is your main focus, for the same reason in bridge scenario. Alternatively, you may include both customer SK and NK in account dimension to cater for both purposes, but need to have type 2 response on account in case of type 2 customer change.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Modeling Dimensional Parent-Child Relations
If you are doing a self-join to locate a version of an entity, the natural key is fine. However, if you plan to propagate it to bridges or facts it raises a number of issues. That is why I strongly recommend a surrogate durable key.
It's pretty easy to set up. You set it to the dimension row's primary key when you add the entity for the first time, then simply copy it for any new version of the entity you create.
It's pretty easy to set up. You set it to the dimension row's primary key when you add the entity for the first time, then simply copy it for any new version of the entity you create.
Similar topics
» Fact table Modeling (1:n relations)
» Dimensional modeling on HospitalStay
» Dimensional Modeling-Checklist
» Need for a dimensional modeling tool?
» Dimensional Modeling Certification
» Dimensional modeling on HospitalStay
» Dimensional Modeling-Checklist
» Need for a dimensional modeling tool?
» Dimensional Modeling Certification
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|