Dimensional model - denormalized source
2 posters
Page 1 of 1
Dimensional model - denormalized source
I have a dimensional model challenge.
The project is a single-source data warehouse.
The source system has a very wide table called 'case' and is a logical collection of various classes arranged horizontally (one row per case)
Account (customer)
Project
Property
Loan
Borrower (1-3)
County
The case itself doesn't have too many attributes....there are some status codes and similar workflow-related columns, but 90% of the columns in this case table are related to the property and loan associated with the case. In the source system, the case table is associated with many tables related to workflow, documentation, billing etc. Moreover, most of those classes (property, borrower, Loan, etc) don't have reliable unique natural ids.
Issue #1
I have not had a chance to work with a source system this denormalized in the past. I own a copy of Data warehouse toolkit (1st and 2nd versions) and the Data warehouse Life Cycle toolkit, but I can't find any discussion on dealing with how to model dimensions from source data when the natural unique id is not well defined.
Issue#2
All the reports I've analyzed use Account and project in the filter (the relation ship is 1:M Account ----< Project) in the filter. In UML world I'd say that relationship between Project and Account is a "composition", in other words, Project does not exist out side of Account, you would never see a ProjectID in a table without a AccountID. However, Account does exist from time to time without a project.
I am tempted to combine Account and Project in a single account dimension (much like the customer dimension in the Telco Database in DWLTK). If I do this, I'm worried about how join the Account dimension to a fact table when project is not applicable. Thoughts?
The project is a single-source data warehouse.
The source system has a very wide table called 'case' and is a logical collection of various classes arranged horizontally (one row per case)
Account (customer)
Project
Property
Loan
Borrower (1-3)
County
The case itself doesn't have too many attributes....there are some status codes and similar workflow-related columns, but 90% of the columns in this case table are related to the property and loan associated with the case. In the source system, the case table is associated with many tables related to workflow, documentation, billing etc. Moreover, most of those classes (property, borrower, Loan, etc) don't have reliable unique natural ids.
Issue #1
I have not had a chance to work with a source system this denormalized in the past. I own a copy of Data warehouse toolkit (1st and 2nd versions) and the Data warehouse Life Cycle toolkit, but I can't find any discussion on dealing with how to model dimensions from source data when the natural unique id is not well defined.
Issue#2
All the reports I've analyzed use Account and project in the filter (the relation ship is 1:M Account ----< Project) in the filter. In UML world I'd say that relationship between Project and Account is a "composition", in other words, Project does not exist out side of Account, you would never see a ProjectID in a table without a AccountID. However, Account does exist from time to time without a project.
I am tempted to combine Account and Project in a single account dimension (much like the customer dimension in the Telco Database in DWLTK). If I do this, I'm worried about how join the Account dimension to a fact table when project is not applicable. Thoughts?
joegenshlea- Posts : 4
Join date : 2012-11-13
Re: Dimensional model - denormalized source
#1. Worry about natural keys after you identified your dimensions. The choice of dimensions is driven by the nature of the business.
#2 Why would you want to combine project and account in the same dimension? Are projects specific to an account or is the project:account relationship M:M?
#2 Why would you want to combine project and account in the same dimension? Are projects specific to an account or is the project:account relationship M:M?
Re: Dimensional model - denormalized source
ngalemmo wrote:
#2 Why would you want to combine project and account in the same dimension? Are projects specific to an account or is the project:account relationship M:M?
Well, yes. Accounts have many projects. and the relationship is O:M (more on this below). A project always has an account, but accounts do not always have projects. But There is a wrinkle: the PK on the source project table is [Account, ProjectCd]....so this must mean M:M? Well, no not really. The application will allow users to enter any ProjectCd they want, even if it has been used previously for another Account. When there are two rows in the Project table with the same ProjectCD, this is purely coincidence, as there is no shared commonality between the duplicate values (making my life difficult in the dimensional model). Because of this, SQL WHERE clauses at the source require an AccountID when a projectID is specified. A ProjectCD is meaningless with out an Account qualifier.
Business point of view:
Account -+------O< project
Database implementation:
Account >+------O< project
I considered two dimensions, dimProject and dimAccount. The problem is that every time the dimProject is used in a query, dimAccount must be used unless I put accountID in the dimProject dimension. But isn't it a bad practice to put attributes in multiple dimensions?
Maybe an outrigger might be the solution?
joegenshlea- Posts : 4
Join date : 2012-11-13
Re: Dimensional model - denormalized source
Ok. So the natural key to a project must include the account number. It still doesn't mean they need to be the same dimension. Project grain fact tables would reference the account and project dimensions, while an account grain fact table would only reference the account dimension. This gives you conformance and points of integration between the two facts.
Re: Dimensional model - denormalized source
I decided to propose two dimensions to the other members of the design team: Account and Project. The project dimension will have at least one account attribute (the account number), otherwise the project is not uniquely identifiable.
Thank you - I appreciate your time.
joegenshlea- Posts : 4
Join date : 2012-11-13
Similar topics
» Dimensional Model from a Hierarchical Data Source
» Rule based algorithm to convert an ER model to a dimensional model
» Complexities of Relational Model and Simplicities of Dimensional Model
» Meta-model of Kimball dimensional model
» Why we use Dimensional Model over De-normalized relational Model ?
» Rule based algorithm to convert an ER model to a dimensional model
» Complexities of Relational Model and Simplicities of Dimensional Model
» Meta-model of Kimball dimensional model
» Why we use Dimensional Model over De-normalized relational Model ?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum