Regarding Factless Fact Tables
+6
sarah_id1
hang
John Simon
BoxesAndLines
ngalemmo
PaulM
10 posters
Page 1 of 1
Regarding Factless Fact Tables
Some time ago I was asked to build the BI platform for a road safety organization. The warehouse was built of three primary fact tables, each with a collection of associated dimensions. I am now trying to branch out into ETL and Warehouse design, and from what I've been reading I believe there could have been a few improvements to the design. Specifically I've been reading that fact tables should never be joined directly - instead bridge tables should be used.
The three tables are:
Accidents
---------
AccidentKey (PK)
Involved
---------
AccidentKey (FK)
InvolvedKey (PK)
Vehicles
--------
AccidentKey (FK)
InvolvedKey (FK)
VehicleKey (PK)
Each accident can have 1 or more Involved. Each Involved has 1 Accident. Each Involved has 0 or 1 Vehicles and each Vehicle can have 1 or more Involved.
The actual measures are either distinct counts (sum of 1 for involved) of each table, or found in the dimensions (population of accident city, road capacity, and so on). The reports based on either the cube or the relational are well within acceptable run-times.
Now for my questions.
The three tables are:
Accidents
---------
AccidentKey (PK)
Involved
---------
AccidentKey (FK)
InvolvedKey (PK)
Vehicles
--------
AccidentKey (FK)
InvolvedKey (FK)
VehicleKey (PK)
Each accident can have 1 or more Involved. Each Involved has 1 Accident. Each Involved has 0 or 1 Vehicles and each Vehicle can have 1 or more Involved.
The actual measures are either distinct counts (sum of 1 for involved) of each table, or found in the dimensions (population of accident city, road capacity, and so on). The reports based on either the cube or the relational are well within acceptable run-times.
Now for my questions.
- Do bridge tables have any use here?
- My gut instinct is to place a dummy row in the Vehicles table for each time an involved does not have a vehicle: VehicleType_cd = -1, No Vehicle. This would change it to an inner join, but how would I count vehicles if that was implemented? A separate field for VehicleKey that would be null for No Vehicle?
- Is there anything else of which I should be aware?
PaulM- Posts : 1
Join date : 2011-01-02
Re: Regarding Factless Fact Tables
You do not use bridge tables to join between fact tables. Bridge tables are used between a fact and a dimension... either to handle hierarchies or a multivalued dimension.
Facts are combined by summarizing individual fact tables on common dimensions and then joining/merging on those dimensions.
Facts are combined by summarizing individual fact tables on common dimensions and then joining/merging on those dimensions.
Re: Regarding Factless Fact Tables
In addition to what Nick said, it seems like you could have done this with just the last fact table.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Regarding Factless Fact Tables
I've done something similar with marine incidents. An incident can have zero or more persons, and/or can have zero or more vessels involved. For certain queries, I just joined straight from the incident table to the incident vessel table on the incident key, exactly as you have described in your post with the accidents.
While it's not "best practice", I think it makes the most sense for certain queries since you're joining on the "degenerate" dimension value.
Regarding your questions:
1) No
2) Just keep it to a left join between accident and vehicles - I don't see any benefit to having a dummy value for no vehicle if none was involved.
3) I think you're on the right track - don't make it more complicated than it needs to be.
While it's not "best practice", I think it makes the most sense for certain queries since you're joining on the "degenerate" dimension value.
Regarding your questions:
1) No
2) Just keep it to a left join between accident and vehicles - I don't see any benefit to having a dummy value for no vehicle if none was involved.
3) I think you're on the right track - don't make it more complicated than it needs to be.
Re: Regarding Factless Fact Tables
It seems to me that your three tables are structured in a typical relational modelling style. So the primary task would be identifying facts and dimensions based on existing relationship.
Obviously accident events should belong to faceless fact table and the grain is one accident per row. The involved party and vehicle are dimensions but neither can be directly in the accident fact table as they may take multiple values in a single accident. I guess you need a third dimension that can be an FK in the accident fact, grouping involved parties and vehicles together by a single surrogate key. Let's call this third dimension accident group.
Now I think you do need a bridge table that contains FKs from accident group, involved party and vehicle dimension. In the bridge table, the accident group FK stays the same for each accident that may involve multiple parties and vehicles.
The schema involves quite a bit of snowflaking, however that's the dimensional approach I can see to connect multivalued dimensions to the fact.
Obviously accident events should belong to faceless fact table and the grain is one accident per row. The involved party and vehicle are dimensions but neither can be directly in the accident fact table as they may take multiple values in a single accident. I guess you need a third dimension that can be an FK in the accident fact, grouping involved parties and vehicles together by a single surrogate key. Let's call this third dimension accident group.
Now I think you do need a bridge table that contains FKs from accident group, involved party and vehicle dimension. In the bridge table, the accident group FK stays the same for each accident that may involve multiple parties and vehicles.
The schema involves quite a bit of snowflaking, however that's the dimensional approach I can see to connect multivalued dimensions to the fact.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Regarding Factless Fact Tables
I think this could have better designed with 1 fact table having foreign key for each dimension i.e. Vehicle, Involved and Accident.
sarah_id1- Posts : 9
Join date : 2010-11-18
Re: Regarding Factless Fact Tables
Is accident a fact? If not and suppose it's a dimension, than what is your fact.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Regarding Factless Fact Tables
An accident is a fact, because it is an event that occurs at a point in time and has dimension attributes such as location, accident type, response etc
An accident can involve zero or many persons and potentially zero or many vehicles.
There's no way to model this without having three separate fact tables.
An accident can involve zero or many persons and potentially zero or many vehicles.
There's no way to model this without having three separate fact tables.
Re: Regarding Factless Fact Tables
For me the core of the problem is to define the granularity right and then implement a single fact table.
Use a single fact table is a big improvement for usability.
Given the specification seems that the granularity is the right of the table "involved".
In fact, every entity involved is a single incident and also 0 or 1 vehicle.
If the vehicles could be more than one for each entity involved granulat the right should be "vehicles involved"
I hope to be helpful
Hello
Alessio
Use a single fact table is a big improvement for usability.
Given the specification seems that the granularity is the right of the table "involved".
In fact, every entity involved is a single incident and also 0 or 1 vehicle.
If the vehicles could be more than one for each entity involved granulat the right should be "vehicles involved"
I hope to be helpful
Hello
Alessio
Alessio- Posts : 1
Join date : 2011-01-08
Age : 60
Location : Italy
Re: Regarding Factless Fact Tables
I don't see how you could possibly have one fact table.
If a vehicle or a person has a role in an accident, then how do you record this in one fact table without having many records in the one fact table without having many records for the one accident? How is that necessarily better?
If a vehicle or a person has a role in an accident, then how do you record this in one fact table without having many records in the one fact table without having many records for the one accident? How is that necessarily better?
Re: Regarding Factless Fact Tables
The solution presented in my previous post is based on Kimball's approach resolving the problem connecting multivalued dimensions to the fact table. There is only one accident fact and all others are dimension/bridge tables. Please refer to 'The Complete Guide to Dimensional Modeling - 2nd Edition', Chapter 13, p264.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Regarding Factless Fact Tables
Hang,
I don’t think that solution you pointed out is correct for this instance - although it may well be.
Using a bridge table doesn’t account for the following:
A person may be involved in an accident. That person may have different roles: Driver or Passenger. That person may also have a valid driver’s licence at that time, or not. These are not attributes of the person dimension. There may well be other attributes that are captured about the person at the time of the accident that do not make sense to be a part of the person dimension.
A vehicle may be involved in an accident. That vehicle may have a number of persons on board. That vehicle may sustain variable amounts of damage. That vehicle may be the offending vehicle or not. These attributes are not attributes of the vehicle dimension, but of the accident and hence are dimensions.
Using a bridge table will not solve the above issues. You could add those attributes to the bridge table as dimensions, but essentially it’s just a fact table and calling it a bridge table is a matter of semantics.
I don’t think that solution you pointed out is correct for this instance - although it may well be.
Using a bridge table doesn’t account for the following:
A person may be involved in an accident. That person may have different roles: Driver or Passenger. That person may also have a valid driver’s licence at that time, or not. These are not attributes of the person dimension. There may well be other attributes that are captured about the person at the time of the accident that do not make sense to be a part of the person dimension.
A vehicle may be involved in an accident. That vehicle may have a number of persons on board. That vehicle may sustain variable amounts of damage. That vehicle may be the offending vehicle or not. These attributes are not attributes of the vehicle dimension, but of the accident and hence are dimensions.
Using a bridge table will not solve the above issues. You could add those attributes to the bridge table as dimensions, but essentially it’s just a fact table and calling it a bridge table is a matter of semantics.
Re: Regarding Factless Fact Tables
John,
I see your point, your approach could work, but it does not seem to be a dimensional approach. A dimensional modeling is about finding the facts and providing the dimensional context for the indentified facts. Obviously in this case, the fact is an accident event and its context is involved parties, persons and vehicles. The multivalued nature of dimensions does not turn them into facts. In dimensional modeling, the common mistake is to treat dimensions as facts and relate them to a real fact. The connection between facts is through dimension conformance not the referential connection.
Now if a person has different roles in different accident at different point of time, then the role should be a slowly changing attribute of the person, hence you can track down the changing roles for each person. Similar to vehicle, if it is offending in one accident and not in another, then the flag is a slowly changing attribute of vehicle. I call them slowly changing because you don’t normally have many accidents for the same person and vehicle in a short period of time.
Of course the dimension attribute change is triggered by the fact, a accident event, like many other SCD attributes that are also triggered by some kind of transaction. But it does not mean the changing attributes are facts per se. In order to associate the changing attributes to the fact, if it’s a single valued dimension, you have an FK in the fact table, and most likely a date key would tell you which version of the dimension the fact is connected to.
However in case of multivalued dimensions, you need to resolve m-m relationship among dimensions, and the relationship should be carried into the fact table by a single FK, in this case it’s an accident group key.
I see your point, your approach could work, but it does not seem to be a dimensional approach. A dimensional modeling is about finding the facts and providing the dimensional context for the indentified facts. Obviously in this case, the fact is an accident event and its context is involved parties, persons and vehicles. The multivalued nature of dimensions does not turn them into facts. In dimensional modeling, the common mistake is to treat dimensions as facts and relate them to a real fact. The connection between facts is through dimension conformance not the referential connection.
Now if a person has different roles in different accident at different point of time, then the role should be a slowly changing attribute of the person, hence you can track down the changing roles for each person. Similar to vehicle, if it is offending in one accident and not in another, then the flag is a slowly changing attribute of vehicle. I call them slowly changing because you don’t normally have many accidents for the same person and vehicle in a short period of time.
Of course the dimension attribute change is triggered by the fact, a accident event, like many other SCD attributes that are also triggered by some kind of transaction. But it does not mean the changing attributes are facts per se. In order to associate the changing attributes to the fact, if it’s a single valued dimension, you have an FK in the fact table, and most likely a date key would tell you which version of the dimension the fact is connected to.
However in case of multivalued dimensions, you need to resolve m-m relationship among dimensions, and the relationship should be carried into the fact table by a single FK, in this case it’s an accident group key.
Last edited by hang on Wed Jan 12, 2011 9:40 pm; edited 2 times in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Regarding Factless Fact Tables
Interesting. I see what you're saying. I'm going to have to cogitate on this for a while.
Thanks.
Thanks.
Re: Regarding Factless Fact Tables
Hang
I am quite interested in this discussion as I had encountered a similar problem recently. I believe your suggestion (accident group + bridge table) looks the best to me too. I have couple of questions though.
1. When you have a accident group dimension and link this to the fact, what would be the attributes of this accident group table? Or Is it only to serve the fact table with one row to join with. What I want to understand is can we simply have the bridge table link to the fact? I know this will not be a pure dimension to fact link but I cannot think of any analysis that might just include the accident group table without using the bridge table attributes.
2. Also when it comes to the role of a person or vehicle, I am not able to understand how that would be classified as attributes of the dimension itself ( SCD as you say). How do we model/capture the scenario where the same person is a victim and an offender in two different accidents at the same time? The role of vehicles and persons do look like the attributes of the accident to me but I am not sure what would be a proper model for that one too.
I am quite interested in this discussion as I had encountered a similar problem recently. I believe your suggestion (accident group + bridge table) looks the best to me too. I have couple of questions though.
1. When you have a accident group dimension and link this to the fact, what would be the attributes of this accident group table? Or Is it only to serve the fact table with one row to join with. What I want to understand is can we simply have the bridge table link to the fact? I know this will not be a pure dimension to fact link but I cannot think of any analysis that might just include the accident group table without using the bridge table attributes.
2. Also when it comes to the role of a person or vehicle, I am not able to understand how that would be classified as attributes of the dimension itself ( SCD as you say). How do we model/capture the scenario where the same person is a victim and an offender in two different accidents at the same time? The role of vehicles and persons do look like the attributes of the accident to me but I am not sure what would be a proper model for that one too.
bagavath- Posts : 3
Join date : 2011-01-13
Re: Regarding Factless Fact Tables
The use of bridges is something I have struggled with too - probably too much relational heritage!
As was asserted earlier in the thread - being clear about the grain of the fact seems critical - as determined by what needs to be reported for the business. However it seems that there will inevitably be a bridge needed to resolve the 1:n situation.
My specific thought is whether the 'role' can be an attribute of the bridge? Similar to having a weighting factor for attributing the fact measures, could a 'role' attribute in the bridge be used to attribute the role played in that fact?
As was asserted earlier in the thread - being clear about the grain of the fact seems critical - as determined by what needs to be reported for the business. However it seems that there will inevitably be a bridge needed to resolve the 1:n situation.
My specific thought is whether the 'role' can be an attribute of the bridge? Similar to having a weighting factor for attributing the fact measures, could a 'role' attribute in the bridge be used to attribute the role played in that fact?
D_Pons- Posts : 16
Join date : 2009-02-10
Location : UK
Re: Regarding Factless Fact Tables
then the role should be a slowly changing attribute of the person
Disagree. Role is tied to the relationship between the person and the incident, not of the person themselves. Ponzie's suggestion of maintaing the role (as a code or flag) in the bridge table between the incident and persons is appropriate.
Re: Regarding Factless Fact Tables
"I know this will not be a pure dimension to fact link but I cannot think of any analysis that might just include the accident group table without using the bridge table attributes."
Then why use a bridge table? Why not put whatever you are bridging to directly into the fact table?
Then why use a bridge table? Why not put whatever you are bridging to directly into the fact table?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Regarding Factless Fact Tables
Jeff,
An accident may contain many vehicles and many person. How can you put all of that in the one fact table?
The other consideration that was brought up, is that a person may have a licence, but it may not be valid for the vehicle they are driving i.e. a car licence but no motor cycle licence and they were driving a motor bike at the time of the accident. That and the role argument (i.e. driver, passenger) that was brought up are a relationship between the accident and the person, and not attributes of the person themselves.
So again it comes back to putting these attributes in a bridge table, or just have another fact table since they are essentially the same thing.
An accident may contain many vehicles and many person. How can you put all of that in the one fact table?
The other consideration that was brought up, is that a person may have a licence, but it may not be valid for the vehicle they are driving i.e. a car licence but no motor cycle licence and they were driving a motor bike at the time of the accident. That and the role argument (i.e. driver, passenger) that was brought up are a relationship between the accident and the person, and not attributes of the person themselves.
So again it comes back to putting these attributes in a bridge table, or just have another fact table since they are essentially the same thing.
Re: Regarding Factless Fact Tables
As I said earlier, none of the dimension key can go directly into fact table. All the dimensions are multivalued dimensions, meaning one accident fact record associates with many dimension keys.Jeff Smith wrote:Then why use a bridge table? Why not put whatever you are bridging to directly into the fact table?
Think of accident group as an aggregate dimension of person and vehicle. So you could have attributes like the number of involved persons and vehicles which I think could be useful for drilling down purpose.bagavath wrote:1. what would be the attributes of this accident group table?
What I want to understand is can we simply have the bridge table link to the fact?
You can’t link the bridge table to the accident fact table directly as one factual row is associated with multiple rows in the bridge. To the fact, both person and accident are multivalued dimensions. However the bridge is a junction table between two dimensions and it is a typical structure to resolve m-m issues in both relational and dimensional world. The introduction of accident group is to dimensionalize the m-m relationship.
Well, having thought about comments by ponzie and ngalemmo, it makes more sense that the role attributes go into bridge. I take back the idea of treating roles as SCD attributes for dimensions. I would imagin the table structure should be something as follows:bagavath wrote:2. Also when it comes to the role of a person or vehicle, I am not able to understand how that would be classified as attributes of the dimension itself ( SCD as you say)
AccidentFact
AccidentGroupKey (FK)
DateKey (FK)
AccidentCount (default to 1)
AccidentGroup
AccidentGroupKey (PK)
NoOfPersons
NoOfVehicles
PersonVehicleBridge
AccidentGroupKey (FK)
PersonKey (FK
VehicleKey (FK)
PersonRole
VehicleRole
... (other attributes)
PersonDim
PersonKey (PK)
PersonNK
....
VehicleDim
VehicleKey (PK)
VehicleNK
...
Last edited by hang on Fri Jan 14, 2011 5:04 pm; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Regarding Factless Fact Tables
Thanks everyone, I think this model outlined by Hang should be able to capture all information ( at least all I could think of).
John,
Can you explain a little further on your idea of having multiple facts here? I have not come across such a design before or may be I have but didn't realize it as one !
John Simon wrote:Jeff,
An accident may contain many vehicles and many person. How can you put all of that in the one fact table?
The other consideration that was brought up, is that a person may have a licence, but it may not be valid for the vehicle they are driving i.e. a car licence but no motor cycle licence and they were driving a motor bike at the time of the accident. That and the role argument (i.e. driver, passenger) that was brought up are a relationship between the accident and the person, and not attributes of the person themselves.
So again it comes back to putting these attributes in a bridge table, or just have another fact table since they are essentially the same thing.
John,
Can you explain a little further on your idea of having multiple facts here? I have not come across such a design before or may be I have but didn't realize it as one !
bagavath- Posts : 3
Join date : 2011-01-13
Re: Regarding Factless Fact Tables
I don't know...
It would seem to me the accident fact table would have an accident ID as a degenerate dimension and contain one row per party involved. Party may be a multivalued dimension representing a collection of person wither in a vehicle or not. Then a bunch of other dimensions for responsibility, location, status, etc...
It is possible you may have another fact at the individual person level for other types of analysis.
It simpler to deal with and isn't any different than the way most other things are modeled. I mean, when you do a fact table for medical claims, its usually at the procedure level, or a fact table for orders, its at the line level. There usually isn't a reason to model something where a single fact row must represent the entire event.
It would seem to me the accident fact table would have an accident ID as a degenerate dimension and contain one row per party involved. Party may be a multivalued dimension representing a collection of person wither in a vehicle or not. Then a bunch of other dimensions for responsibility, location, status, etc...
It is possible you may have another fact at the individual person level for other types of analysis.
It simpler to deal with and isn't any different than the way most other things are modeled. I mean, when you do a fact table for medical claims, its usually at the procedure level, or a fact table for orders, its at the line level. There usually isn't a reason to model something where a single fact row must represent the entire event.
Re: Regarding Factless Fact Tables
I've written a blog about this to make it easier to post diagrams at http://jsimonbi.wordpress.com
Re: Regarding Factless Fact Tables
John Simon wrote:I've written a blog about this to make it easier to post diagrams at http://jsimonbi.wordpress.com
Thanks for the link John. Excellent post.
bagavath- Posts : 3
Join date : 2011-01-13
Re: Regarding Factless Fact Tables
For anyone interested, I've been working on this and created another post on my blog here: https://jsimonbi.wordpress.com/2011/03/17/data-modeling-conundrum-follow-up/
Essentially you cannot use bridge tables to solve this particular many-to-many situation. If you use a bridge table, you need to include the attributes such as personrole, vehicle role etc as part of the Bridge key. Since it is highly unlikely that the same people will be involved in an accident with the exact same roles, and the same goes for vehicles, you gain nothing by using a bridge table.
So it comes down to including all rows in the one fact table, or have multiple fact tables for Accident, AccidentPerson, AccidentVehicle and AccidentVehiclePerson.
Essentially you cannot use bridge tables to solve this particular many-to-many situation. If you use a bridge table, you need to include the attributes such as personrole, vehicle role etc as part of the Bridge key. Since it is highly unlikely that the same people will be involved in an accident with the exact same roles, and the same goes for vehicles, you gain nothing by using a bridge table.
So it comes down to including all rows in the one fact table, or have multiple fact tables for Accident, AccidentPerson, AccidentVehicle and AccidentVehiclePerson.
Similar topics
» Joining Factless Fact tables
» Factless fact tables and SCD2
» Fact Tables Joining to Factless Facts
» Looking for advise on loading keys into factless fact tables
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» Factless fact tables and SCD2
» Fact Tables Joining to Factless Facts
» Looking for advise on loading keys into factless fact tables
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum