Flattening Dimension
2 posters
Page 1 of 1
Flattening Dimension
Hi
I have a dimension called PERSON which will have apart from the name (family name, other name, given name) attributes will also have the following:
Gender (Male, Female etc)
Relationship Status (Married etc)
Title (Mr, Mrs etc)
Is it beeter to have a snowflake where you have Gender, Relationship Status, Title dimensions and put surrogate keys in the person dimensiion
OR
Flatten the dimensions and put the codes and descriptions (Gender_Code, Gender_Desc etc) in the person dimension
Regards
Tim
I have a dimension called PERSON which will have apart from the name (family name, other name, given name) attributes will also have the following:
Gender (Male, Female etc)
Relationship Status (Married etc)
Title (Mr, Mrs etc)
Is it beeter to have a snowflake where you have Gender, Relationship Status, Title dimensions and put surrogate keys in the person dimensiion
OR
Flatten the dimensions and put the codes and descriptions (Gender_Code, Gender_Desc etc) in the person dimension
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
But... sometimes you may do both. If demographic information is commonly used in analysis, such as in healthcare and marketing, you may want the smaller demographic dimension to improve performance, particularly if other information about the person is not needed.
In an atomic level fact you would have keys to both dimensions. If PERSON is type 1, then it would provide current demograhic information while the demographic dimension would provide attributes at the time the fact event occured.
In an atomic level fact you would have keys to both dimensions. If PERSON is type 1, then it would provide current demograhic information while the demographic dimension would provide attributes at the time the fact event occured.
Person Dimension Update
Thanks for that
I have expanded the model so that
1) Person dimension now contains the person contact details (voice email etc)
2) A person can have one or more memberships (but only one membership at one time), this is represented by having a Member Dimension containing person key, start date, end date and status (i.e current, exited)
3) A membership can be associated with one or more Accounts. This is represented by having Account Dimension and a Bridge table that contains the member key and the account key
My worry is that if the persons contact details changes then this cause:
1) The old person record to be end-dated and a new person dimension record will be created with a new start date and the changes
2) The old member record will be end-dated and a new member record will be created containing the new person key and a new start date
3) New rows will have to be inserted into the BRIDGE table representing the new member/account relationships
Is there a better way to do this ?
Regards
Tim
I have expanded the model so that
1) Person dimension now contains the person contact details (voice email etc)
2) A person can have one or more memberships (but only one membership at one time), this is represented by having a Member Dimension containing person key, start date, end date and status (i.e current, exited)
3) A membership can be associated with one or more Accounts. This is represented by having Account Dimension and a Bridge table that contains the member key and the account key
My worry is that if the persons contact details changes then this cause:
1) The old person record to be end-dated and a new person dimension record will be created with a new start date and the changes
2) The old member record will be end-dated and a new member record will be created containing the new person key and a new start date
3) New rows will have to be inserted into the BRIDGE table representing the new member/account relationships
Is there a better way to do this ?
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
Person and Member are type 2 dimensions, correct? A change in person doesn't mean its cascades to member. The person hasn't changed, just the key. The key simply represents a snapshot of that person at a particular point in time.
The issue is not the keys, but the attributes. So, if an attribute such a 'person id' exists (or any other attribute for consolidation), and is consistent, then it is easy enough to consolidate the membership role. In the case of a type 2, you have a choice to look at either attributes at the time of the transaction (fact) or based on the current attributes.
Bridge tables are different. A bridge table exists to resolve M-M relationships between a fact table and a dimension table. As it works purely on keys, it is necessary to recognize and support all possible keys in a relationship, so a type 2 update to either the member dimension or the account dimension will require adding new keys. The key generation is not that bad since any update to one dimension only requires relating to the current versions of the other dimension's rows. The member/account relationship need only be expired when the actual relationship expires. This means, when new rows are inserted to either dimension, it does not need update the bridge, it simply inserts new rows, a much faster process.
The issue is not the keys, but the attributes. So, if an attribute such a 'person id' exists (or any other attribute for consolidation), and is consistent, then it is easy enough to consolidate the membership role. In the case of a type 2, you have a choice to look at either attributes at the time of the transaction (fact) or based on the current attributes.
Bridge tables are different. A bridge table exists to resolve M-M relationships between a fact table and a dimension table. As it works purely on keys, it is necessary to recognize and support all possible keys in a relationship, so a type 2 update to either the member dimension or the account dimension will require adding new keys. The key generation is not that bad since any update to one dimension only requires relating to the current versions of the other dimension's rows. The member/account relationship need only be expired when the actual relationship expires. This means, when new rows are inserted to either dimension, it does not need update the bridge, it simply inserts new rows, a much faster process.
Cascading
Thanks for that although you lost me slightly around the "Cascading" bit
You are right to say that both member and person are SCD Type 2
If a Person attributes changes (say contact details) a new Person record has to be created with a new start date (the old one is end dated), I assumed that also a new member record has also to be created (old one end dated) that contains the new Person Key
Regards
Tim
You are right to say that both member and person are SCD Type 2
If a Person attributes changes (say contact details) a new Person record has to be created with a new start date (the old one is end dated), I assumed that also a new member record has also to be created (old one end dated) that contains the new Person Key
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
Im sorry. I messed up on member.
A 'membership' is the relationship between a person and their accounts, correct? So what you have is a person dimension, an account dimension, and a membership fact table? I imagine there could be more than one person involved. So the fact represents the relationship between the person and the account, which has not changed. There is no need to add to or change any existing relationship regardless of what changes on the dimensions. The issue becomes when the business relationship expires. You need to expire all versions of the relationship.
A 'membership' is the relationship between a person and their accounts, correct? So what you have is a person dimension, an account dimension, and a membership fact table? I imagine there could be more than one person involved. So the fact represents the relationship between the person and the account, which has not changed. There is no need to add to or change any existing relationship regardless of what changes on the dimensions. The issue becomes when the business relationship expires. You need to expire all versions of the relationship.
Re: Flattening Dimension
Hi
I appreciate your help although I don't think I have explained it properly , sorry about that
A person can be both a Member and a Beneficiary (and other types) at the same time etc, but we will stick to Member for now
Firstly I have a Person Dimension which contains Person Key, Person Natural ID, Gender, Name , Title Contact Details etc
Next I have a Member Dimension which contains Member Key, Person Key, Join Date , Exit Date. I need a Member Dimension because I will eventually have to report on the Member (via a snapshot fact table) on a daily basis (e.g number of accounts, total balance for all accounts for a particular date etc). A person during their life can be a member many times differentiated by join date and exit date. Member is I suppose the same as Membership
Next I have an Account Dimension which contains Account Key, Account Type Key, Account Natural ID , Join Date, Exit Date etc, a Member is associated with one or more accounts, the Members Join Date is the earliest accounts join date and the Members Exit Date is the latest exit date when all the accounts have exited.
I hope this helps
Regards
Tim
I appreciate your help although I don't think I have explained it properly , sorry about that
A person can be both a Member and a Beneficiary (and other types) at the same time etc, but we will stick to Member for now
Firstly I have a Person Dimension which contains Person Key, Person Natural ID, Gender, Name , Title Contact Details etc
Next I have a Member Dimension which contains Member Key, Person Key, Join Date , Exit Date. I need a Member Dimension because I will eventually have to report on the Member (via a snapshot fact table) on a daily basis (e.g number of accounts, total balance for all accounts for a particular date etc). A person during their life can be a member many times differentiated by join date and exit date. Member is I suppose the same as Membership
Next I have an Account Dimension which contains Account Key, Account Type Key, Account Natural ID , Join Date, Exit Date etc, a Member is associated with one or more accounts, the Members Join Date is the earliest accounts join date and the Members Exit Date is the latest exit date when all the accounts have exited.
I hope this helps
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
But a member is a person, 'member', 'beneficiary' or whatever, are simply roles a person is playing in a business transaction (i.e. enrolling, disenrolling, etc..). The fact table represents the transaction. So the accounts a member has is really a relationship between person and account where, for the sake of the transaction, the person mentioned happens to be a member. If you happen to be processing a claim, the 'beneficiary' happens to be a person who plays that role for the claim (business transaction). You don't need a member dimension.
Re: Flattening Dimension
Ok I think I am starting to get it, therefore I think I need
1) A Person Dimension to store the person details (SCD 2)
2) An Account Dimension to store the account details (also SCD 2)
3) A Person_Membership Fact table to store the person membership details, this table contains
Person ID (PK, DD)
Membership Start Date Key (PK)
Membership End Date Key
Its the next bit I'm confused with, do I need a BRIDGE table to link the fact table with the accounts dimension as one or more accounts can be associated with a membership
Regards
Tim
1) A Person Dimension to store the person details (SCD 2)
2) An Account Dimension to store the account details (also SCD 2)
3) A Person_Membership Fact table to store the person membership details, this table contains
Person ID (PK, DD)
Membership Start Date Key (PK)
Membership End Date Key
Its the next bit I'm confused with, do I need a BRIDGE table to link the fact table with the accounts dimension as one or more accounts can be associated with a membership
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
No bridge. The fact (#3 on your list) would contain:
person key (FK)
account key (FK)
start date key (FK)
end date key (FK)
There would be one row for each member/account combination, with effective dates. Updates to the dimensions have no effect on the fact table, there is no need to change keys in the fact. You do not store the person ID (DD) in the fact table because you have a dimension table for person.
person key (FK)
account key (FK)
start date key (FK)
end date key (FK)
There would be one row for each member/account combination, with effective dates. Updates to the dimensions have no effect on the fact table, there is no need to change keys in the fact. You do not store the person ID (DD) in the fact table because you have a dimension table for person.
Re: Flattening Dimension
Thanks for that, so in the fact table (#3),
1) a record just displays just the the relationship between a person and a single account
2) The start /end dates in the fact relate to the person/account relationship start date/end dates which is account create and exit dates).
3) if a person has more than one account there will be two records in the fact table each having maybe different start/end dates depending on when the account was created and exited.
4) Updates to the Person dimension will cause a new reord to be put in the dimension table but will not trigger updates/additions to the fact table
5) Exiting the account will only cause the End Date in the fact table to be updated
Which really gets me back to how does one represent "Total Continous Membership" then. Total Continous Membership is the time span when the first account is created and the last account is exited. A person can have more than one Total Memberships, each one has a different time span, a person can start a membership, finish it and then start again at a later date. I presume Total Continous Membership would be represented as a Daily Snapshot Fact Table as we need measures like "Total Number of Active Accounts", "Total Balance for All Accounts") on a daily basis.
Regards
Tim
1) a record just displays just the the relationship between a person and a single account
2) The start /end dates in the fact relate to the person/account relationship start date/end dates which is account create and exit dates).
3) if a person has more than one account there will be two records in the fact table each having maybe different start/end dates depending on when the account was created and exited.
4) Updates to the Person dimension will cause a new reord to be put in the dimension table but will not trigger updates/additions to the fact table
5) Exiting the account will only cause the End Date in the fact table to be updated
Which really gets me back to how does one represent "Total Continous Membership" then. Total Continous Membership is the time span when the first account is created and the last account is exited. A person can have more than one Total Memberships, each one has a different time span, a person can start a membership, finish it and then start again at a later date. I presume Total Continous Membership would be represented as a Daily Snapshot Fact Table as we need measures like "Total Number of Active Accounts", "Total Balance for All Accounts") on a daily basis.
Regards
Tim
tim_goodsell- Posts : 49
Join date : 2010-09-21
Re: Flattening Dimension
Total continuous membership is an aggregate of membership. (You need to create a new fact table) In the of type 2 dimensions you need to decide what point in time do you want the dimension values. Do you want the value at the time of the transaction or do you want the as of another point in time (such as current). If you want the point in time of the transaction, just use the type 2 key as is. If you want attributes at some other point in time, you must do a self-join of the dimension onto itself using the natural key to locate the appropriate row based on the effective period of that row. Usually you have a current row flag to make the SQL a little simpler and the query a little quicker when all you want is the current row. You use the key of which ever dimension row suits the requirement and aggregate on those keys.
As far as generating the aggregate, a lot of new SQL versions support SQL analytic functions which include LEAD and LAG to look at the next or previous row in groups of rows in a set sorted in some sequence. It may allow you to apply your business rules for gaps when trying to determine the membership period. It may be possible to do this in one query, which means you may be able to define a view rather than build a table, provided performance isn't terrible. I did this 15 years ago (using FOCUS no less) at an HMO and it was a bit messy. You probably need to use a stored procedure.
As far as generating the aggregate, a lot of new SQL versions support SQL analytic functions which include LEAD and LAG to look at the next or previous row in groups of rows in a set sorted in some sequence. It may allow you to apply your business rules for gaps when trying to determine the membership period. It may be possible to do this in one query, which means you may be able to define a view rather than build a table, provided performance isn't terrible. I did this 15 years ago (using FOCUS no less) at an HMO and it was a bit messy. You probably need to use a stored procedure.
Similar topics
» FACT Design
» Wide fact tables
» bridge table and junk dimension on customer dimension (bank/credit union)
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» Wide fact tables
» bridge table and junk dimension on customer dimension (bank/credit union)
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum