One Dimension or Multiple Dimensions
4 posters
Page 1 of 1
One Dimension or Multiple Dimensions
I'm working on a DW model that handles a loan system. There is a loan application source system and a loan servicing source system. The system was built in stages and they started with the application system:
A subset of tables are:
FACT_LOAN (Accumulating snapshot from Loan Application System)
DIM_LOAN (Non Additive loan data, 1:1 with FACT_LOAN, SCD Type 1)
DIM_BORROWER
DIM_PRODUCT
DIM_PROGRAM
DIM_DATE
...
Then the loan servicing system was added, and they created new dimensions from the servicing system.
FACT_LOAN_SERVICING (Periodic Snapshot from Servicing system)
FACT_LOAN_TRANSACTION (Transaction table detailing advances, payments, etc)
DIM_LOAN_DATES (SCD Type 2 - Event dates for servicing system, ie. Past Due Date, Last Payment Date, Last Statement Date, etc)
DIM_LOAN_SERVICING (SCD Type 2 - Non additive loan data, generally 1:1 with FACT_LOAN_SERVICING)
About 99% of the loans in the servicing system came from the application system. There are a few loans that were purchased and don't exist in the application system. Not all loans get approved, so not all loans in the application system make it to the servicing system. The FACT_LOAN_SERVICING tables has keys for DIM_LOAN, DIM_BORROWER, DIM_PROGRAM, etc. FACT_LOAN however doesn't have keys into DIM_LOAN_DATES or DIM_LOAN_SERVICING.
The thing I've been wondering is if DIM_LOAN_DATE and DIM_LOAN_SERVICING are really needed, or should those have been conformed into DIM_LOANS? Even if they shouldn't have been conformed, do they really need to be separate tables, or should they have been merged together as one dimension? Or should the dates all have been added to the fact table and linked to DIM_DATE?
I guess I'm just having trouble determining when data from 2 source should be merged into a single dimension or split into separate dimensions. Also, when should dates exist in a dimension and when should they exist in the fact?
Is this design as messed up as I'm feeling it is, or am I wrong and this is all a perfectly valid way of handling it? Any insight would be much appreciated.
A subset of tables are:
FACT_LOAN (Accumulating snapshot from Loan Application System)
DIM_LOAN (Non Additive loan data, 1:1 with FACT_LOAN, SCD Type 1)
DIM_BORROWER
DIM_PRODUCT
DIM_PROGRAM
DIM_DATE
...
Then the loan servicing system was added, and they created new dimensions from the servicing system.
FACT_LOAN_SERVICING (Periodic Snapshot from Servicing system)
FACT_LOAN_TRANSACTION (Transaction table detailing advances, payments, etc)
DIM_LOAN_DATES (SCD Type 2 - Event dates for servicing system, ie. Past Due Date, Last Payment Date, Last Statement Date, etc)
DIM_LOAN_SERVICING (SCD Type 2 - Non additive loan data, generally 1:1 with FACT_LOAN_SERVICING)
About 99% of the loans in the servicing system came from the application system. There are a few loans that were purchased and don't exist in the application system. Not all loans get approved, so not all loans in the application system make it to the servicing system. The FACT_LOAN_SERVICING tables has keys for DIM_LOAN, DIM_BORROWER, DIM_PROGRAM, etc. FACT_LOAN however doesn't have keys into DIM_LOAN_DATES or DIM_LOAN_SERVICING.
The thing I've been wondering is if DIM_LOAN_DATE and DIM_LOAN_SERVICING are really needed, or should those have been conformed into DIM_LOANS? Even if they shouldn't have been conformed, do they really need to be separate tables, or should they have been merged together as one dimension? Or should the dates all have been added to the fact table and linked to DIM_DATE?
I guess I'm just having trouble determining when data from 2 source should be merged into a single dimension or split into separate dimensions. Also, when should dates exist in a dimension and when should they exist in the fact?
Is this design as messed up as I'm feeling it is, or am I wrong and this is all a perfectly valid way of handling it? Any insight would be much appreciated.
jjhartma- Posts : 4
Join date : 2010-05-20
Re: One Dimension or Multiple Dimensions
Given that the loan dimension is type 1 and load servicing is type 2, they cannot be combined. If all you want to do is add some servicing information to the existing loan dimension, there is nothing to stop you from doing so, provided the business key (i.e. loan ID) from the servicing system can tie to the business key in the loan dimension.
The new dimensions on the servicing fact are not a problem. It is the normal course of things. Also, the population of loans in servicing is usually different (often significantly different) than the population in origination, so I would not be surprised if the servicing fact does not reference the loan dimension.
If you need to combine measures from loan and servicing facts, keep in mind that conformance is through common attributes with common values, not necessarily through common dimension tables.
The new dimensions on the servicing fact are not a problem. It is the normal course of things. Also, the population of loans in servicing is usually different (often significantly different) than the population in origination, so I would not be surprised if the servicing fact does not reference the loan dimension.
If you need to combine measures from loan and servicing facts, keep in mind that conformance is through common attributes with common values, not necessarily through common dimension tables.
Re: One Dimension or Multiple Dimensions
Dim_Loan_Date is not needed. You should build relationships off of the fact table to the date dimension. If the date is seldomly used for analytic purposes, combine with other related attributes into a junk dimension. Dim_Loan_Servicing is not needed either. Any time you have a one-to-one relationship to the fact table a red flag should go up. In this case, you can break this dimension up into smaller dimensions that reduce the amount of rows in the parent dimension. This also applies to Dim_Loan as well. You are spot on worrying on how to integrate these two facts. Any common dimension between the origination and servicing systems allow for drill across capabilities. If they loan number carries through, you have a big bonus in ease of integration. You're not far off from a solid, extensible model.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: One Dimension or Multiple Dimensions
Why is Dim_Loan_Servicing a 1:1 to Fact_Loan_Servicing if the fact table is a periodic snapshot? If the periods of the snapshot are monthly and a Loan has been serviced for 12 months, shouldn't it be 1:12?
It's not uncommon for a dimension table to be larger than a single snapshot. My Member Dimension Table is much larger than the number of rows in my Member Fact Monthly table for any given month. But for the periodic snapshot table as a whole, the fact table dwarfs the dimension table.
Just curious but if the Application Number and the Loan Number were linked somewhere, would it be best to put Loan number on the Application data or the Application on the Loan? I'm thinking the Application identifier on the Loan. Would it be best as a degenerate dimension on the Fact table or a snow flake? If I assume the DIM_Loan has the Application Number and the Dim_Loan_Servicing had the Loan Number, you could put the DIM_Loan Key on the DIM_Loan_Servicing. This would work if the link between the application number and loan number was delayed. You figure a record gets generated in the Fact_Loan Table, followed by a record getting created in the Fact_Loan_Servicing. Sometimes there can be a delay in establishing the link Loan Number and the Application Number. If ther is no delay, then you could put the DIM_Loan Key on the Fact_Loan_Servicing. But if there is a delay and you don't want to put updates on the FACT table, particularly if the fact apperas multiple times in the fact table, then it might make more sense to put the DIM_Loan key on the DIM_Loan_Ser table. The frequency of use should also be a consideration.
It's not uncommon for a dimension table to be larger than a single snapshot. My Member Dimension Table is much larger than the number of rows in my Member Fact Monthly table for any given month. But for the periodic snapshot table as a whole, the fact table dwarfs the dimension table.
Just curious but if the Application Number and the Loan Number were linked somewhere, would it be best to put Loan number on the Application data or the Application on the Loan? I'm thinking the Application identifier on the Loan. Would it be best as a degenerate dimension on the Fact table or a snow flake? If I assume the DIM_Loan has the Application Number and the Dim_Loan_Servicing had the Loan Number, you could put the DIM_Loan Key on the DIM_Loan_Servicing. This would work if the link between the application number and loan number was delayed. You figure a record gets generated in the Fact_Loan Table, followed by a record getting created in the Fact_Loan_Servicing. Sometimes there can be a delay in establishing the link Loan Number and the Application Number. If ther is no delay, then you could put the DIM_Loan Key on the Fact_Loan_Servicing. But if there is a delay and you don't want to put updates on the FACT table, particularly if the fact apperas multiple times in the fact table, then it might make more sense to put the DIM_Loan key on the DIM_Loan_Ser table. The frequency of use should also be a consideration.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: One Dimension or Multiple Dimensions
Thanks for all the information! I'm now getting back to this after the holidays and another project.
I did leave some things out in my first message. The FACT_LOAN_SERVICING table is actually FACT_LOAN_SERVICING_CURRENT, _DAILY, and _MONTHLY. Also DIM_LOAN_SERVICING is SCD2 and so I was wrong in saying that it was 1:1.
The Application Number and the Account numbers between origination and servicing are the same. We would have some records in Origination that never got booked and therefore wouldn't be in servicing. Likewise, we have some loans in the servicing system that were purchased and never went through origination. Those records have unique Account/Application numbers though, so we currently tie the two systems together by Application and Account number safely.
Most of the time the application exists in DIM_LOAN/FACT_LOAN first. Then the servicing records is added later when the loan books. FACT_LOAN_SERVICING_* does now have a key to DIM_LOAN, so we can drill across to the origination data.
I've also found out that the original requirements for the DW only required reporting on the current state of the origination data, but they needed to be able to recreate a previously run report from servicing with point-in-time data. So Accumulating Snapshot and SCD1 tables for Origination, and Periodic Snapshot and SCD2 tables for servicing.
I guess the general problem I have trouble wrapping my head around is determining what data should go in what dimensions. How do you determine what should be in a junk dimension? When should one dimension be split into multiple dimensions? When do you conform and when do you not?
As an example DIM_LOAN is 159 columns wide, which seems a little to wide to me. There are columns on DIM_LOAN like Program Name. There is also a DIM_PROGRAM table that contains that same data. The columns in DIM_PROGRAM contain the current name, and the column in DIM_LOAN contains the name as it was at origination. I feel like the correct way would be if the old name was important, then DIM_PROGRAM should have been SCD2 and program name never should have been in the DIM_LOAN table. If the old name isn't important, then it shouldn't even be in the data.
These types of things are really beginning to be an issue for me as we're currently moving from one servicing system to another and so far there have been a good 1000-2000 new columns we will have access to in the new system, most of which are at the grain of a single loan, and over half are probably dimension columns and not fact columns. I'm unsure of how to split them into dimensions, or if I just stick them into DIM_LOAN_SERVICING. I don't really want one gigantic dimension, however I thought too many dimensions was frowned upon as well. Mapping this new servicing system into the existing model is already a nightmare, and adding all this new data is making it even worse.
I did leave some things out in my first message. The FACT_LOAN_SERVICING table is actually FACT_LOAN_SERVICING_CURRENT, _DAILY, and _MONTHLY. Also DIM_LOAN_SERVICING is SCD2 and so I was wrong in saying that it was 1:1.
The Application Number and the Account numbers between origination and servicing are the same. We would have some records in Origination that never got booked and therefore wouldn't be in servicing. Likewise, we have some loans in the servicing system that were purchased and never went through origination. Those records have unique Account/Application numbers though, so we currently tie the two systems together by Application and Account number safely.
Most of the time the application exists in DIM_LOAN/FACT_LOAN first. Then the servicing records is added later when the loan books. FACT_LOAN_SERVICING_* does now have a key to DIM_LOAN, so we can drill across to the origination data.
I've also found out that the original requirements for the DW only required reporting on the current state of the origination data, but they needed to be able to recreate a previously run report from servicing with point-in-time data. So Accumulating Snapshot and SCD1 tables for Origination, and Periodic Snapshot and SCD2 tables for servicing.
I guess the general problem I have trouble wrapping my head around is determining what data should go in what dimensions. How do you determine what should be in a junk dimension? When should one dimension be split into multiple dimensions? When do you conform and when do you not?
As an example DIM_LOAN is 159 columns wide, which seems a little to wide to me. There are columns on DIM_LOAN like Program Name. There is also a DIM_PROGRAM table that contains that same data. The columns in DIM_PROGRAM contain the current name, and the column in DIM_LOAN contains the name as it was at origination. I feel like the correct way would be if the old name was important, then DIM_PROGRAM should have been SCD2 and program name never should have been in the DIM_LOAN table. If the old name isn't important, then it shouldn't even be in the data.
These types of things are really beginning to be an issue for me as we're currently moving from one servicing system to another and so far there have been a good 1000-2000 new columns we will have access to in the new system, most of which are at the grain of a single loan, and over half are probably dimension columns and not fact columns. I'm unsure of how to split them into dimensions, or if I just stick them into DIM_LOAN_SERVICING. I don't really want one gigantic dimension, however I thought too many dimensions was frowned upon as well. Mapping this new servicing system into the existing model is already a nightmare, and adding all this new data is making it even worse.
jjhartma- Posts : 4
Join date : 2010-05-20
Re: One Dimension or Multiple Dimensions
What is your servicing system? I may have already done it.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: One Dimension or Multiple Dimensions
Sometimes it helps to talk some of this out with the business. Have them explain to you what they hope to do with the data. It may give you some ideas as to how the data could be best organized and how they see their business.
As far as designing junk dimensions, the choices are sometimes more mechanical. The content is often driven by cardinality and correlation. You want to avoid combinations of attributes that are so diverse the table grows to an enormous size. But a problem with junk is you lose type 2 functionality if that is what you need. Junk will always record the state of the attributes at the time of the fact. It is very difficult to restate the facts to the current state of those attributes (as it relates to the loan). At best you can restate in terms of the last known state, but it is a very time consuming query on most platforms.
As far as designing junk dimensions, the choices are sometimes more mechanical. The content is often driven by cardinality and correlation. You want to avoid combinations of attributes that are so diverse the table grows to an enormous size. But a problem with junk is you lose type 2 functionality if that is what you need. Junk will always record the state of the attributes at the time of the fact. It is very difficult to restate the facts to the current state of those attributes (as it relates to the loan). At best you can restate in terms of the last known state, but it is a very time consuming query on most platforms.
Re: One Dimension or Multiple Dimensions
The old servicing system is Fiserv SourceOne. The new system is Fiserv Lending Solutions/MortgageServ/LoanServ. The systems are a way more different than we originally thought.
The original warehouse was built by a vendor, and has since been handed off to us internal IT people. Business engagement was very low during the initial phases, but since being handed to us, we've seen that was a mistake and have been pushing hard for more discussion with the business. It's been rocky, but we're starting to make progress.
I hadn't really thought that hard about the Type 2 issue with the junk dimensions. They will still may be useful for the origination system, and in the transactions records for the servicing system.
The original warehouse was built by a vendor, and has since been handed off to us internal IT people. Business engagement was very low during the initial phases, but since being handed to us, we've seen that was a mistake and have been pushing hard for more discussion with the business. It's been rocky, but we're starting to make progress.
I hadn't really thought that hard about the Type 2 issue with the junk dimensions. They will still may be useful for the origination system, and in the transactions records for the servicing system.
jjhartma- Posts : 4
Join date : 2010-05-20
Re: One Dimension or Multiple Dimensions
Yep. I've actually mapped both Fiserv and MSP to a fully dimensional model supporting standard reporting, loss mit, fc/bk, etc. There are a few junk dims required to eliminate "Loan Dim". It can be done though.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Multiple dimensions Vs. Single dimension and hierarchy
» Same attribute in multiple dimensions or Create new dimension?
» Multivalued Dimension or Multiple facts w/conformed dimensions
» Attributes from a dimension appearing in other dimensions; multiple-inheritance in hierarchies.
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
» Same attribute in multiple dimensions or Create new dimension?
» Multivalued Dimension or Multiple facts w/conformed dimensions
» Attributes from a dimension appearing in other dimensions; multiple-inheritance in hierarchies.
» Composite Dimension Natural Keys - Attribute Used In Multiple Dimensions as well as Fact Tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum