Finding the grain with One-To-Many fact tables.
5 posters
Page 1 of 1
Finding the grain with One-To-Many fact tables.
I'm trying to get my head round the Kimball DW model described in The Data Warehouse Toolkit. I can't seem to apply the examples I've read to this scenario I have.
A shipping manifest contains many waybills, a waybill contains many charges.
I can cleanly remove the manifest itself by just leaving the surrogate key in with the waybill for grouping purposes. The waybill also has a lot of attributes describing hight, length, width, volume, weight, the units of said values, consignee details, link to Customer and Carrier and Carrier Service ect.
Waybill to me is the thing that is being captured however I also need to capture the prices. So a waybill has charges and it can be one, or many. There's obviously the base charge and then depending on the package and carrier there could be fuel surcharge, collect postage charge, insurance charge ect. I have 25 different types of charges in a charge type table that charges links to. I thought about making the charge types attributes on the waybill but 25 is way too much, they might changed at some point (rarely) and then most will be nulls for a given instance and i read that nulls in a fact table is bad news.
I'd like to be able to slide/dice and roll up waybills, including the costings once this DW is imported into SQL Analysis Services. I'm not sure I can do this if it's two fact tables joined.
Any advice would be hugely appreciated.
A shipping manifest contains many waybills, a waybill contains many charges.
I can cleanly remove the manifest itself by just leaving the surrogate key in with the waybill for grouping purposes. The waybill also has a lot of attributes describing hight, length, width, volume, weight, the units of said values, consignee details, link to Customer and Carrier and Carrier Service ect.
Waybill to me is the thing that is being captured however I also need to capture the prices. So a waybill has charges and it can be one, or many. There's obviously the base charge and then depending on the package and carrier there could be fuel surcharge, collect postage charge, insurance charge ect. I have 25 different types of charges in a charge type table that charges links to. I thought about making the charge types attributes on the waybill but 25 is way too much, they might changed at some point (rarely) and then most will be nulls for a given instance and i read that nulls in a fact table is bad news.
I'd like to be able to slide/dice and roll up waybills, including the costings once this DW is imported into SQL Analysis Services. I'm not sure I can do this if it's two fact tables joined.
Any advice would be hugely appreciated.
vivaladan- Posts : 4
Join date : 2011-11-08
Re: Finding the grain with One-To-Many fact tables.
You need at least a fact table at the charge grain.
The easiest way to get a handle on what you need to do is to design the charge grain fact first. Figure out all your measures and dimensions. If you need to, then design a waybill grain fact next. It should basically be the same table with one or more dimensions (such as charge type) removed. There should not be a need to join the two tables.
The easiest way to get a handle on what you need to do is to design the charge grain fact first. Figure out all your measures and dimensions. If you need to, then design a waybill grain fact next. It should basically be the same table with one or more dimensions (such as charge type) removed. There should not be a need to join the two tables.
Re: Finding the grain with One-To-Many fact tables.
Thanks for your reply, but I'm still struggling to get my head around this. Perhaps I might be confusing DW with OLAP too much.
Ultimately I might want to ask: show me the cost of insurance (charge type) as a percentage of total spend for all way bills that month, perhaps also against country.
I can see why this might want to be two different grains, one at waybill (with a carrier, destination and date dimension) and the other at charge (with charge type dimension). The total spend measure is an aggregation of all the charges for a given waybill and then i guess all the waybills for a given dimension.
The problem I see with two grains, is that the charge is actually really simple, it's just a key to the charge type dimension and a monetary value. The waybill on the other hand has keys to quite a few dimensions along with a good 15 or so attributes for the various dimensions and other bits and pieces. If I were to push this lot down to the charge grain then something as simple as wanting to include a value for insurance or Saturday delivery will require all of that waybill data to be reproduced for the various charges.
Am I wrong in thinking that the dimensions need to be all pointing to the same fact table in order to ask a question like about?
Thanks again.
Ultimately I might want to ask: show me the cost of insurance (charge type) as a percentage of total spend for all way bills that month, perhaps also against country.
I can see why this might want to be two different grains, one at waybill (with a carrier, destination and date dimension) and the other at charge (with charge type dimension). The total spend measure is an aggregation of all the charges for a given waybill and then i guess all the waybills for a given dimension.
The problem I see with two grains, is that the charge is actually really simple, it's just a key to the charge type dimension and a monetary value. The waybill on the other hand has keys to quite a few dimensions along with a good 15 or so attributes for the various dimensions and other bits and pieces. If I were to push this lot down to the charge grain then something as simple as wanting to include a value for insurance or Saturday delivery will require all of that waybill data to be reproduced for the various charges.
Am I wrong in thinking that the dimensions need to be all pointing to the same fact table in order to ask a question like about?
Thanks again.
vivaladan- Posts : 4
Join date : 2011-11-08
Re: Finding the grain with One-To-Many fact tables.
The charge level fact table should carry all the dimensions of the waybill level fact table (if you have one) PLUS dimensions relating to the charge information. This would allow you to do charge analysis in context of waybill attributes. The idea of 'joining' the facts to get attributes relating to the waybill is relational thinking, not dimensional thinking. As far as the charge level fact goes, you are not 'reproducing' the waybill data, you are simply including the same foreign keys to the same dimension tables in the fact.
RE:Finding the grain with One-To-Many fact tables.
A bridge table will help you sort out this mess. Here is something you need:
FACT Table
WayBill_Dim_ID (FK)
Manifest_ID
WAYBill_DIM
WayBill_Dim_ID
Charge_DIM
Charge_Dim_ID
Charge Type
Charge amount
BILL_Charge_Bridge
WayBill_Dim_ID(FK)
Charge_Dim_ID (FK)
If a waybill_dim_id say 100 has 20 different charges, then in the bridge table waybill_dim_id 100 will have 20 rows with different charge_dim_ids. Read more about a bridge table in Kimball book or in this forum.
FACT Table
WayBill_Dim_ID (FK)
Manifest_ID
WAYBill_DIM
WayBill_Dim_ID
Charge_DIM
Charge_Dim_ID
Charge Type
Charge amount
BILL_Charge_Bridge
WayBill_Dim_ID(FK)
Charge_Dim_ID (FK)
If a waybill_dim_id say 100 has 20 different charges, then in the bridge table waybill_dim_id 100 will have 20 rows with different charge_dim_ids. Read more about a bridge table in Kimball book or in this forum.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Finding the grain with One-To-Many fact tables.
Do what Neil says. It's in the Kimball book for a reason.
Re: Finding the grain with One-To-Many fact tables.
Thanks for the great responses.
I've gone for the single table at charge grain as it makes the most sense. I was a little resistant at first because I was a little caught up on duplicating data across rows. So it's a fact hierarchy that's flattened down to a single table. These are shipment -> package -> charge where the links are one to many.
The issue I'm onto now which I also can't really seem to find the solution too more lies in the OLAP side unless I'm still getting confused with the dimensional aspect of things. Price can correctly be rolled up along all the dimensions using a sum as the aggregation function. Weight on the other hand cannot. Weight is duplicated across all charge rows for a given package, but OLAP, or rather SSAS in this case has no concept that charges can be grouped by a package and that weight really represents that of the package. So a sum aggregation will show the actual weight * number of charges for each package. I'd like to show, total weight and total cost side by side for some set of rolled up dimensions.
I've gone for the single table at charge grain as it makes the most sense. I was a little resistant at first because I was a little caught up on duplicating data across rows. So it's a fact hierarchy that's flattened down to a single table. These are shipment -> package -> charge where the links are one to many.
The issue I'm onto now which I also can't really seem to find the solution too more lies in the OLAP side unless I'm still getting confused with the dimensional aspect of things. Price can correctly be rolled up along all the dimensions using a sum as the aggregation function. Weight on the other hand cannot. Weight is duplicated across all charge rows for a given package, but OLAP, or rather SSAS in this case has no concept that charges can be grouped by a package and that weight really represents that of the package. So a sum aggregation will show the actual weight * number of charges for each package. I'd like to show, total weight and total cost side by side for some set of rolled up dimensions.
vivaladan- Posts : 4
Join date : 2011-11-08
Re: Finding the grain with One-To-Many fact tables.
Why not divide the weight by the count in your fact table (or probably in your DSV) and then sum in the cube.
Re: Finding the grain with One-To-Many fact tables.
John Simon wrote:Why not divide the weight by the count in your fact table (or probably in your DSV) and then sum in the cube.
So I would do that in during the ETL wouldn't I. I didn't consider it but was unsure whether it was a dirty hack and might cause other problems. I'll give it a go though and see how well I get on. Cheers.
vivaladan- Posts : 4
Join date : 2011-11-08
One Fact table with two level of grain
I have seen a production data warehouse design like the case you described.
The design is to have the lowest grain in one fact table, which is CHARGES in your case. In this fact table, some columns were added like waybill_sk, waybill_count and waybill_amount. waybill_count is given to the value = 1 ONLY for the first row of charges, otherwise null value. And waybill_amount records the measure at waybill level where waybill_count = 1. An index was created on waybill_count. If waybill_count = 1, the fact table will function at waybill grain level by using waybill_amount measure.
From my knowledge, I didn't notice a supporting source for this desgin. However, this design was in production. Maybe this is a good time to have a discussion if it is a good practice or not.
The design is to have the lowest grain in one fact table, which is CHARGES in your case. In this fact table, some columns were added like waybill_sk, waybill_count and waybill_amount. waybill_count is given to the value = 1 ONLY for the first row of charges, otherwise null value. And waybill_amount records the measure at waybill level where waybill_count = 1. An index was created on waybill_count. If waybill_count = 1, the fact table will function at waybill grain level by using waybill_amount measure.
From my knowledge, I didn't notice a supporting source for this desgin. However, this design was in production. Maybe this is a good time to have a discussion if it is a good practice or not.
tlx814- Posts : 3
Join date : 2011-11-02
Re: Finding the grain with One-To-Many fact tables.
The problem with having a count column as you descibe is that it doesn't work. For example, if you wanted to count the number of waybills carrying a specific charge, you cannot get a correct answer by summing the count column. I don't like using such columns as they are basically a cheat to avoid work in the BI tool, or to work around shortcomings in the BI tool.
Allocation doesn't really work well either, but can serve to resolve a mixed grain issue provided the users understand that the allocated value will be misleading at the detail level. The other issue is, once you mention that, it opens a new issue regarding how the allocation should be done. In rare instances, the business will quickly reach consensus and you will have clear guidance as to how to allocate the value. More likely, you insite a holy war amongst user communities with different priorities and needs for use of this data.
Allocation doesn't really work well either, but can serve to resolve a mixed grain issue provided the users understand that the allocated value will be misleading at the detail level. The other issue is, once you mention that, it opens a new issue regarding how the allocation should be done. In rare instances, the business will quickly reach consensus and you will have clear guidance as to how to allocate the value. More likely, you insite a holy war amongst user communities with different priorities and needs for use of this data.
Re: One fact table with count columns
ngalemmo wrote:For example, if you wanted to count the number of waybills carrying a specific charge, you cannot get a correct answer by summing the count column.
Thanks for the post.
I think this example can be answered by counting distinct waybill_sk where charge is a specific type. I don't have clear picture of the capabilities of BI tools. Maybe this query format is not suitable for BI tools or I ignore some other considerations to avoid this design.
tlx814- Posts : 3
Join date : 2011-11-02
Re: Finding the grain with One-To-Many fact tables.
Right. The correct way to do it is use COUNT(). Most all BI tools these days allow you to use column expressions (as metadata), so using COUNT() instead of an actual column is easy enough to do. But such is not the case with earlier versions of such tools. Old habits and design rules are hard to break.
Similar topics
» calendar grain on both dimension and fact tables
» conformed dimension for two fact tables which are at different grain
» Linking two Fact tables with different grain through a hierarchy dimension
» Design Question - Multiple Fact Tables at the same Grain
» Using a dimension in multiple fact tables with different grain and support SCD
» conformed dimension for two fact tables which are at different grain
» Linking two Fact tables with different grain through a hierarchy dimension
» Design Question - Multiple Fact Tables at the same Grain
» Using a dimension in multiple fact tables with different grain and support SCD
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum