Dimensions - To Create Based on Type or Role ?
4 posters
Page 1 of 1
Dimensions - To Create Based on Type or Role ?
I'm trying to lay down some guidelines for creating new dimensions in an existing shop.
I have a mix of dimensions that are type-based and role-based. Type-based are those created because an entity (think Party) has many types (individual, organization, virtual). These types share some common attributes of Party, but each carries it's own unique set of attributes. A Party-Individual may have first name, last name, birthdate, whereas a Party-Organization may have legal name, date of incorporate, D&B Nbr.
Role-based dimensions are those created based on the many roles an Entity may play. Since a Party can play many roles, a very busy Party may be found in multiple dimensions (an Employee-Dim, a Broker-Dim, and an Insured-Dim, etc). Obviously this is an insurance application, but could be applied to toher industries as well. If roles are numerous, then dimensions will in turn be numerous.
I prefer Type-based dimensions (Party-Individual, Party-Organization) over Role-based dimensions, the Role-based seem unsustainable to me. (You've got a new role? Then create a new Role-based dimension, even though most of that data is found elsewhere already.)
We have situations, however, where Role-based dimensions reduce complexity- as in cases where a role can be played by different Party types. (i.e., a Policyholder can either be an Individual or an Organization. A Broker may be an Individual or an Organization). In these 2 situations, for a Policy fact table, Type-based dimensions result in needing 2 surrogate keys (one for the Individual Policyholder, the other for an Organization Policyholder), whereas Role-based dimensions would need only 1 surrogate key (to a role dimension of PolicyHolder-Dim) - but there's a price to pay - most of that data is already found in other Role-based dimensions (Claimant-Dim, Insured-Dim, etc).
I see merits in both but am leaning on Type-based dimensions. Could someone weigh in on what they've seen that works - I want to quantify guidelines for which to use, and why.
Thanks.
I have a mix of dimensions that are type-based and role-based. Type-based are those created because an entity (think Party) has many types (individual, organization, virtual). These types share some common attributes of Party, but each carries it's own unique set of attributes. A Party-Individual may have first name, last name, birthdate, whereas a Party-Organization may have legal name, date of incorporate, D&B Nbr.
Role-based dimensions are those created based on the many roles an Entity may play. Since a Party can play many roles, a very busy Party may be found in multiple dimensions (an Employee-Dim, a Broker-Dim, and an Insured-Dim, etc). Obviously this is an insurance application, but could be applied to toher industries as well. If roles are numerous, then dimensions will in turn be numerous.
I prefer Type-based dimensions (Party-Individual, Party-Organization) over Role-based dimensions, the Role-based seem unsustainable to me. (You've got a new role? Then create a new Role-based dimension, even though most of that data is found elsewhere already.)
We have situations, however, where Role-based dimensions reduce complexity- as in cases where a role can be played by different Party types. (i.e., a Policyholder can either be an Individual or an Organization. A Broker may be an Individual or an Organization). In these 2 situations, for a Policy fact table, Type-based dimensions result in needing 2 surrogate keys (one for the Individual Policyholder, the other for an Organization Policyholder), whereas Role-based dimensions would need only 1 surrogate key (to a role dimension of PolicyHolder-Dim) - but there's a price to pay - most of that data is already found in other Role-based dimensions (Claimant-Dim, Insured-Dim, etc).
I see merits in both but am leaning on Type-based dimensions. Could someone weigh in on what they've seen that works - I want to quantify guidelines for which to use, and why.
Thanks.
seadog2010- Posts : 23
Join date : 2010-03-04
Nick Galemmos -
Nick;
I'm unable to get the URL to work for your article on Unity Dimensions - it may be what I need for my prev question on creating Type-based or Role-based dimensions. Can you point me in the right direction to your article(s) on Variant Dimensions and Unity Dimensions?
Much appreciated.
> Seadog
I'm unable to get the URL to work for your article on Unity Dimensions - it may be what I need for my prev question on creating Type-based or Role-based dimensions. Can you point me in the right direction to your article(s) on Variant Dimensions and Unity Dimensions?
Much appreciated.
> Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Dimensions - To Create Based on Type or Role ?
As far as roles go, usually you have one dimension (say, date) and many roles (order date, ship date, promise date, etc...). Each role is reflected by the foreign key column name in the fact table. All reference the same dimension.
When you have a common dimension with many different types, such as an Account dimension in a bank, you create one common dimension with the PK, NK and common attributes, as well as subtype dimension tables with the same PK and a collection of attributes for that type.
It is not clear if Party requires subtypes as it does not appear the differences in attributes are that great. For example, it would be usualy that there is a 'Name' attribute that would always contain the name of the party regardless of type. In the case of a person, it would be either Last, First or First Last depending on what the business wants.
The Unity dimension concept is sort of a reverse of sub types. It is the creation of a common dimension when disparate dimensions exist. The genesis was to cover a situation I had when implementing a logistics subject area. The warehouse already had the typical customer, location, and vendor dimensions but, for logistics, you simply have senders and receivers, which could be a company location, a customer or a vendor. So, the Unity dimension was a new dimension that subordinated the location, customer and vendor. It contained common attributes from each. sufficient to support logistics reporting, and, because the same sequence was being used for generating PKs for those dimensions, PKs were mutually exclusive, so I did not need a FK from the unity dimension to the subordinated dimensions... I just used the same PK value from the subordinated row. If that was not the case, I would need to keep FK references between the two so that attributes can be maintained properly.
The common PK had the advantage of autofiltering queries, for example, if I wanted to see shipments to customers, I would join the shipment information to the customer dimension, rather than the common dimension. This would implicity limit output to that relating to customers.
When you have a common dimension with many different types, such as an Account dimension in a bank, you create one common dimension with the PK, NK and common attributes, as well as subtype dimension tables with the same PK and a collection of attributes for that type.
It is not clear if Party requires subtypes as it does not appear the differences in attributes are that great. For example, it would be usualy that there is a 'Name' attribute that would always contain the name of the party regardless of type. In the case of a person, it would be either Last, First or First Last depending on what the business wants.
The Unity dimension concept is sort of a reverse of sub types. It is the creation of a common dimension when disparate dimensions exist. The genesis was to cover a situation I had when implementing a logistics subject area. The warehouse already had the typical customer, location, and vendor dimensions but, for logistics, you simply have senders and receivers, which could be a company location, a customer or a vendor. So, the Unity dimension was a new dimension that subordinated the location, customer and vendor. It contained common attributes from each. sufficient to support logistics reporting, and, because the same sequence was being used for generating PKs for those dimensions, PKs were mutually exclusive, so I did not need a FK from the unity dimension to the subordinated dimensions... I just used the same PK value from the subordinated row. If that was not the case, I would need to keep FK references between the two so that attributes can be maintained properly.
The common PK had the advantage of autofiltering queries, for example, if I wanted to see shipments to customers, I would join the shipment information to the customer dimension, rather than the common dimension. This would implicity limit output to that relating to customers.
Unity Dimension
Thanks for the response.
If I'm understanding correctly - please verify - the new Unity dimension would essentially contain the exact same natural key (not the SK) of the candidate dimensions, along with whatever might be common among them (i.e., name, state code, etc). This new dimension would not be a collection of possible combinations (of location, vendor & customer) but rather the collection of all locations, all vendors, all customers, one row for each. Were there to be 50 location records, 25 vendor records and 1,000 customer records in their respective dimensions (LOC_DIM, VNDR_DIM, CUST_DIM) then I would expect to see 50+25+1,000 = 1,075 records in the Unity Dim, albeit with a minimal nbr of (slightly redundant) attributes.
Loading a new Vendor record (#26) would therefore mean the NK for the new Vendor (#26) would also be added to the Unity Dim, which would now total 1075 +1 records, to keep all tbls in synch, regardless if this new Vendor (#26) would ever be used. The ETL load process would query the Unity Dim, by NK, to find the corresponding SK for that record.
A Party in my example could be either an individual, an organization, or a virtual Party (i.e., web site). Org Parties is further broken down into Business Units (company departments), Field Offices, Companies, Associations, Jurisdictions (for licensing) and Insurance Carriers (a specialized subset of a Company with few roles). The implementation of your Unity concept would allow, if I'm reading this right, multiple Party types (an Association, a Company, a State or City, an Individual, etc) the flexibility to play any role (such as Claimaint) while preserving the underlying data found within a minimal nbr of Party dimensions.
I need to research whether Party type breakout, in my shop, is necessary. As you say, based on the list of attributes. We've seem to have thrown in other cols into our dimensions that are non-attributes, so further analysis is needed.
I hope I've interpreted your response correctly.
Nick, thanks for all your help. The world of policyholders is forever in your debt.
-Seadog
If I'm understanding correctly - please verify - the new Unity dimension would essentially contain the exact same natural key (not the SK) of the candidate dimensions, along with whatever might be common among them (i.e., name, state code, etc). This new dimension would not be a collection of possible combinations (of location, vendor & customer) but rather the collection of all locations, all vendors, all customers, one row for each. Were there to be 50 location records, 25 vendor records and 1,000 customer records in their respective dimensions (LOC_DIM, VNDR_DIM, CUST_DIM) then I would expect to see 50+25+1,000 = 1,075 records in the Unity Dim, albeit with a minimal nbr of (slightly redundant) attributes.
Loading a new Vendor record (#26) would therefore mean the NK for the new Vendor (#26) would also be added to the Unity Dim, which would now total 1075 +1 records, to keep all tbls in synch, regardless if this new Vendor (#26) would ever be used. The ETL load process would query the Unity Dim, by NK, to find the corresponding SK for that record.
A Party in my example could be either an individual, an organization, or a virtual Party (i.e., web site). Org Parties is further broken down into Business Units (company departments), Field Offices, Companies, Associations, Jurisdictions (for licensing) and Insurance Carriers (a specialized subset of a Company with few roles). The implementation of your Unity concept would allow, if I'm reading this right, multiple Party types (an Association, a Company, a State or City, an Individual, etc) the flexibility to play any role (such as Claimaint) while preserving the underlying data found within a minimal nbr of Party dimensions.
I need to research whether Party type breakout, in my shop, is necessary. As you say, based on the list of attributes. We've seem to have thrown in other cols into our dimensions that are non-attributes, so further analysis is needed.
I hope I've interpreted your response correctly.
Nick, thanks for all your help. The world of policyholders is forever in your debt.
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Dimensions - To Create Based on Type or Role ?
ngalemmo wrote:The Unity dimension concept is sort of a reverse of sub types. ...
Why do call it a reverse of sub types? It sounds just like sub types.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Dimensions - To Create Based on Type or Role ?
I'm thinking that Variant dimensions divide (sub-type), whereas Unity dimensions group (combine types).
seadog2010- Posts : 23
Join date : 2010-03-04
Unity Dimension Natural Keys
A question about Natural Keys -
Suppose the natural key of Vendor Dim = 'vendor id', and the natural key of Loc Dim = 'location id', and the Unity dimension of these 2 looked like:
Unity Dim
*********
SK
------------
vendor id
location id
Would the natural key then of the Unity Dim be the combination of (vendor id + location id), where in all cases one or the other would be null, and the other populated? This is not the same as saying the natural key of the Unity Dim is the same as the natural key of each of the related dims (vendor and location).
If, on the other hand, you generalize the natural key of the Unity Dim, it might look like:
Unity Dim
*********
SK
------------
dimension id
in which case you wouldn't use the same (name and value) but the same value for the natural key of 'dimension id' and apply that towards either 'vendor id' or 'location id'.
It gets more complicated when there are multi-part natural keys, but for now, can you point to the best approach to take?
-Seadog
Suppose the natural key of Vendor Dim = 'vendor id', and the natural key of Loc Dim = 'location id', and the Unity dimension of these 2 looked like:
Unity Dim
*********
SK
------------
vendor id
location id
Would the natural key then of the Unity Dim be the combination of (vendor id + location id), where in all cases one or the other would be null, and the other populated? This is not the same as saying the natural key of the Unity Dim is the same as the natural key of each of the related dims (vendor and location).
If, on the other hand, you generalize the natural key of the Unity Dim, it might look like:
Unity Dim
*********
SK
------------
dimension id
in which case you wouldn't use the same (name and value) but the same value for the natural key of 'dimension id' and apply that towards either 'vendor id' or 'location id'.
It gets more complicated when there are multi-part natural keys, but for now, can you point to the best approach to take?
-Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Dimensions - To Create Based on Type or Role ?
seadog2010 wrote:I'm thinking that Variant dimensions divide (sub-type), whereas Unity dimensions group (combine types).
Yes, basically. If you start with the intent of creating subtypes you have the luxury of ensuring the same primary keys in the subtype tables. The unity idea is to create a common type after you have already created various dimensions, so those dimensions become subtypes of the unity dimension.
Ideally it would be best to start out with building subtypes. But sometimes that need isn't recognized until it is too late.
Re: Dimensions - To Create Based on Type or Role ?
Would the natural key then of the Unity Dim be the combination of (vendor id + location id), where in all cases one or the other would be null, and the other populated? This is not the same as saying the natural key of the Unity Dim is the same as the natural key of each of the related dims (vendor and location).
No. If you are concerned with duplicate natural keys (and you should be), such as a customer and a vendor having the same id, you should prefix natural keys in the unity dimension with a type code of some sort to ensure uniqueness.
In general, when I define dimension tables I always define an NK string column that is independent of any attributes and load it with an abstracted value, usually a source code concatenated with whatever columns from the source that make up the source's natural key. This ensures a stable, unique identifier regardless of the number of sources and changes to the attribute values. Values in the string should be delimited so that variable length values don't inadvertently combine to create a duplicate of some other natural key.
It also makes ETL easier as the SK lookup code always uses a single NK string regardless of how may columns actually make up the key in the source. It is very useful if you have multiple sources with varying key columns for the same dimension. You resolve such issues when staging the data, creating the NK string as part of the staging process.
Re: Dimensions - To Create Based on Type or Role ?
I am not too concerned with duplicate natural keys in our dimes since in this shop we always use a mulit-part NK (source system code col + identifier). We assume that within each source system their ID's are unique.
I was questioning the makeup of the Unity Dimenison natural key,and wondering if were a stacking of the other natural keys (same names), or if it was a generic 'dimension natural key' whose values, not the col name, are what form a link to the other dimensions.
I like your notion of a natural key string, but as such we have not implemented that here so it's too late to redo everything.
- Seadog
I was questioning the makeup of the Unity Dimenison natural key,and wondering if were a stacking of the other natural keys (same names), or if it was a generic 'dimension natural key' whose values, not the col name, are what form a link to the other dimensions.
I like your notion of a natural key string, but as such we have not implemented that here so it's too late to redo everything.
- Seadog
seadog2010- Posts : 23
Join date : 2010-03-04
Re: Dimensions - To Create Based on Type or Role ?
What you need to do with the NK in the unity dimension will depend on what you get from the source system. If you are getting the same values as you would normally get, then there is nothing special about the NK in the unity dimension.
Re: Dimensions - To Create Based on Type or Role ?
To paraphrase from a previous note:
Would the natural key of the Unity Dim be the combination of (vendor id + location id), where in all cases only one is populated (and the other is null):
Unity Dim
*********
SK
------------
vendor NK
location NK
or would you use a generalized natural key:
Unity Dim
*********
SK
------------
unity dimension NK (where the value = the value found in vendor NK, or the value found in location NK, etc)
Would the natural key of the Unity Dim be the combination of (vendor id + location id), where in all cases only one is populated (and the other is null):
Unity Dim
*********
SK
------------
vendor NK
location NK
or would you use a generalized natural key:
Unity Dim
*********
SK
------------
unity dimension NK (where the value = the value found in vendor NK, or the value found in location NK, etc)
seadog2010- Posts : 23
Join date : 2010-03-04
Unity Dimension
I am just starting to look into how to hand this same situation. We currently have a very flat Policy dimension that includes PolicyHolder data, but it is not working for us.
I would love to see how you have modeled the Unity Dimension. I can't seem to get a handle on the design by just reading these posts.
Thanks in advance for any help you can provide.
I would love to see how you have modeled the Unity Dimension. I can't seem to get a handle on the design by just reading these posts.
Thanks in advance for any help you can provide.
C John Palmer- Posts : 1
Join date : 2014-03-12
Location : Harrisburg, PA
Similar topics
» load a table without unique indentifier
» Should I create one or two dimensions?
» Modeling of dimensions based on location
» Role-playing dimensions
» Role Playing dimensions
» Should I create one or two dimensions?
» Modeling of dimensions based on location
» Role-playing dimensions
» Role Playing dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum