Regarding Factless Fact Tables

PaulM Sun Jan 02, 2011 5:30 am

Some time ago I was asked to build the BI platform for a road safety organization. The warehouse was built of three primary fact tables, each with a collection of associated dimensions. I am now trying to branch out into ETL and Warehouse design, and from what I've been reading I believe there could have been a few improvements to the design. Specifically I've been reading that fact tables should never be joined directly - instead bridge tables should be used.

The three tables are:

Accidents
---------
AccidentKey (PK)

Involved
---------
AccidentKey (FK)
InvolvedKey (PK)

Vehicles
--------
AccidentKey (FK)
InvolvedKey (FK)
VehicleKey (PK)

Each accident can have 1 or more Involved. Each Involved has 1 Accident. Each Involved has 0 or 1 Vehicles and each Vehicle can have 1 or more Involved.

The actual measures are either distinct counts (sum of 1 for involved) of each table, or found in the dimensions (population of accident city, road capacity, and so on). The reports based on either the cube or the relational are well within acceptable run-times.

Now for my questions.

Do bridge tables have any use here?
My gut instinct is to place a dummy row in the Vehicles table for each time an involved does not have a vehicle: VehicleType_cd = -1, No Vehicle. This would change it to an inner join, but how would I count vehicles if that was implemented? A separate field for VehicleKey that would be null for No Vehicle?
Is there anything else of which I should be aware?

ngalemmo Mon Jan 03, 2011 12:15 am

You do not use bridge tables to join between fact tables. Bridge tables are used between a fact and a dimension... either to handle hierarchies or a multivalued dimension.

Facts are combined by summarizing individual fact tables on common dimensions and then joining/merging on those dimensions.

BoxesAndLines Mon Jan 03, 2011 9:46 am

In addition to what Nick said, it seems like you could have done this with just the last fact table.

John Simon Mon Jan 03, 2011 7:40 pm

I've done something similar with marine incidents. An incident can have zero or more persons, and/or can have zero or more vessels involved. For certain queries, I just joined straight from the incident table to the incident vessel table on the incident key, exactly as you have described in your post with the accidents.
While it's not "best practice", I think it makes the most sense for certain queries since you're joining on the "degenerate" dimension value.

Regarding your questions:

1) No
2) Just keep it to a left join between accident and vehicles - I don't see any benefit to having a dummy value for no vehicle if none was involved.
3) I think you're on the right track - don't make it more complicated than it needs to be.

hang Tue Jan 04, 2011 8:35 am

It seems to me that your three tables are structured in a typical relational modelling style. So the primary task would be identifying facts and dimensions based on existing relationship.

Obviously accident events should belong to faceless fact table and the grain is one accident per row. The involved party and vehicle are dimensions but neither can be directly in the accident fact table as they may take multiple values in a single accident. I guess you need a third dimension that can be an FK in the accident fact, grouping involved parties and vehicles together by a single surrogate key. Let's call this third dimension accident group.

Now I think you do need a bridge table that contains FKs from accident group, involved party and vehicle dimension. In the bridge table, the accident group FK stays the same for each accident that may involve multiple parties and vehicles.

The schema involves quite a bit of snowflaking, however that's the dimensional approach I can see to connect multivalued dimensions to the fact.

sarah_id1 Thu Jan 06, 2011 7:30 am

I think this could have better designed with 1 fact table having foreign key for each dimension i.e. Vehicle, Involved and Accident.

hang Thu Jan 06, 2011 7:56 am

Is accident a fact? If not and suppose it's a dimension, than what is your fact.

John Simon Sat Jan 08, 2011 2:26 am

An accident is a fact, because it is an event that occurs at a point in time and has dimension attributes such as location, accident type, response etc

An accident can involve zero or many persons and potentially zero or many vehicles.
There's no way to model this without having three separate fact tables.

Alessio Sat Jan 08, 2011 12:59 pm

For me the core of the problem is to define the granularity right and then implement a single fact table.
Use a single fact table is a big improvement for usability.
Given the specification seems that the granularity is the right of the table "involved".
In fact, every entity involved is a single incident and also 0 or 1 vehicle.
If the vehicles could be more than one for each entity involved granulat the right should be "vehicles involved"

I hope to be helpful

Hello

Alessio

John Simon Sun Jan 09, 2011 2:11 am

I don't see how you could possibly have one fact table.
If a vehicle or a person has a role in an accident, then how do you record this in one fact table without having many records in the one fact table without having many records for the one accident? How is that necessarily better?

hang Sun Jan 09, 2011 2:30 am

The solution presented in my previous post is based on Kimball's approach resolving the problem connecting multivalued dimensions to the fact table. There is only one accident fact and all others are dimension/bridge tables. Please refer to 'The Complete Guide to Dimensional Modeling - 2nd Edition', Chapter 13, p264.

John Simon Mon Jan 10, 2011 6:42 pm

Hang,
I don’t think that solution you pointed out is correct for this instance - although it may well be.
Using a bridge table doesn’t account for the following:

A person may be involved in an accident. That person may have different roles: Driver or Passenger. That person may also have a valid driver’s licence at that time, or not. These are not attributes of the person dimension. There may well be other attributes that are captured about the person at the time of the accident that do not make sense to be a part of the person dimension.

A vehicle may be involved in an accident. That vehicle may have a number of persons on board. That vehicle may sustain variable amounts of damage. That vehicle may be the offending vehicle or not. These attributes are not attributes of the vehicle dimension, but of the accident and hence are dimensions.

Using a bridge table will not solve the above issues. You could add those attributes to the bridge table as dimensions, but essentially it’s just a fact table and calling it a bridge table is a matter of semantics.

hang Wed Jan 12, 2011 8:16 pm

John,

I see your point, your approach could work, but it does not seem to be a dimensional approach. A dimensional modeling is about finding the facts and providing the dimensional context for the indentified facts. Obviously in this case, the fact is an accident event and its context is involved parties, persons and vehicles. The multivalued nature of dimensions does not turn them into facts. In dimensional modeling, the common mistake is to treat dimensions as facts and relate them to a real fact. The connection between facts is through dimension conformance not the referential connection.

Now if a person has different roles in different accident at different point of time, then the role should be a slowly changing attribute of the person, hence you can track down the changing roles for each person. Similar to vehicle, if it is offending in one accident and not in another, then the flag is a slowly changing attribute of vehicle. I call them slowly changing because you don’t normally have many accidents for the same person and vehicle in a short period of time.

Of course the dimension attribute change is triggered by the fact, a accident event, like many other SCD attributes that are also triggered by some kind of transaction. But it does not mean the changing attributes are facts per se. In order to associate the changing attributes to the fact, if it’s a single valued dimension, you have an FK in the fact table, and most likely a date key would tell you which version of the dimension the fact is connected to.

However in case of multivalued dimensions, you need to resolve m-m relationship among dimensions, and the relationship should be carried into the fact table by a single FK, in this case it’s an accident group key.

John Simon Wed Jan 12, 2011 9:21 pm

Interesting. I see what you're saying. I'm going to have to cogitate on this for a while.
Thanks.

bagavath Thu Jan 13, 2011 5:57 am

Hang

I am quite interested in this discussion as I had encountered a similar problem recently. I believe your suggestion (accident group + bridge table) looks the best to me too. I have couple of questions though.

1. When you have a accident group dimension and link this to the fact, what would be the attributes of this accident group table? Or Is it only to serve the fact table with one row to join with. What I want to understand is can we simply have the bridge table link to the fact? I know this will not be a pure dimension to fact link but I cannot think of any analysis that might just include the accident group table without using the bridge table attributes.

2. Also when it comes to the role of a person or vehicle, I am not able to understand how that would be classified as attributes of the dimension itself ( SCD as you say). How do we model/capture the scenario where the same person is a victim and an offender in two different accidents at the same time? The role of vehicles and persons do look like the attributes of the accident to me but I am not sure what would be a proper model for that one too.

D_Pons Thu Jan 13, 2011 6:47 am

The use of bridges is something I have struggled with too - probably too much relational heritage!

As was asserted earlier in the thread - being clear about the grain of the fact seems critical - as determined by what needs to be reported for the business. However it seems that there will inevitably be a bridge needed to resolve the 1:n situation.

My specific thought is whether the 'role' can be an attribute of the bridge? Similar to having a weighting factor for attributing the fact measures, could a 'role' attribute in the bridge be used to attribute the role played in that fact?

ngalemmo Thu Jan 13, 2011 11:59 am

then the role should be a slowly changing attribute of the person

Disagree. Role is tied to the relationship between the person and the incident, not of the person themselves. Ponzie's suggestion of maintaing the role (as a code or flag) in the bridge table between the incident and persons is appropriate.

Jeff Smith Thu Jan 13, 2011 1:58 pm

"I know this will not be a pure dimension to fact link but I cannot think of any analysis that might just include the accident group table without using the bridge table attributes."

Then why use a bridge table? Why not put whatever you are bridging to directly into the fact table?

John Simon Thu Jan 13, 2011 7:21 pm

Jeff,
An accident may contain many vehicles and many person. How can you put all of that in the one fact table?

The other consideration that was brought up, is that a person may have a licence, but it may not be valid for the vehicle they are driving i.e. a car licence but no motor cycle licence and they were driving a motor bike at the time of the accident. That and the role argument (i.e. driver, passenger) that was brought up are a relationship between the accident and the person, and not attributes of the person themselves.

So again it comes back to putting these attributes in a bridge table, or just have another fact table since they are essentially the same thing.

hang Thu Jan 13, 2011 7:47 pm

Jeff Smith wrote:Then why use a bridge table? Why not put whatever you are bridging to directly into the fact table?

As I said earlier, none of the dimension key can go directly into fact table. All the dimensions are multivalued dimensions, meaning one accident fact record associates with many dimension keys.

bagavath wrote:1. what would be the attributes of this accident group table?
What I want to understand is can we simply have the bridge table link to the fact?

Think of accident group as an aggregate dimension of person and vehicle. So you could have attributes like the number of involved persons and vehicles which I think could be useful for drilling down purpose.

You can’t link the bridge table to the accident fact table directly as one factual row is associated with multiple rows in the bridge. To the fact, both person and accident are multivalued dimensions. However the bridge is a junction table between two dimensions and it is a typical structure to resolve m-m issues in both relational and dimensional world. The introduction of accident group is to dimensionalize the m-m relationship.

bagavath wrote:2. Also when it comes to the role of a person or vehicle, I am not able to understand how that would be classified as attributes of the dimension itself ( SCD as you say)

Well, having thought about comments by ponzie and ngalemmo, it makes more sense that the role attributes go into bridge. I take back the idea of treating roles as SCD attributes for dimensions. I would imagin the table structure should be something as follows:

AccidentFact
AccidentGroupKey (FK)
DateKey (FK)
AccidentCount (default to 1)

AccidentGroup
AccidentGroupKey (PK)
NoOfPersons
NoOfVehicles

PersonVehicleBridge
AccidentGroupKey (FK)
PersonKey (FK
VehicleKey (FK)
PersonRole
VehicleRole
... (other attributes)

PersonDim
PersonKey (PK)
PersonNK
....

VehicleDim
VehicleKey (PK)
VehicleNK
...

bagavath Fri Jan 14, 2011 12:36 am

Thanks everyone, I think this model outlined by Hang should be able to capture all information ( at least all I could think of).

John Simon wrote:Jeff,
An accident may contain many vehicles and many person. How can you put all of that in the one fact table?

The other consideration that was brought up, is that a person may have a licence, but it may not be valid for the vehicle they are driving i.e. a car licence but no motor cycle licence and they were driving a motor bike at the time of the accident. That and the role argument (i.e. driver, passenger) that was brought up are a relationship between the accident and the person, and not attributes of the person themselves.

So again it comes back to putting these attributes in a bridge table, or just have another fact table since they are essentially the same thing.

John,
Can you explain a little further on your idea of having multiple facts here? I have not come across such a design before or may be I have but didn't realize it as one !

ngalemmo Sun Jan 16, 2011 2:45 am

I don't know...

It would seem to me the accident fact table would have an accident ID as a degenerate dimension and contain one row per party involved. Party may be a multivalued dimension representing a collection of person wither in a vehicle or not. Then a bunch of other dimensions for responsibility, location, status, etc...

It is possible you may have another fact at the individual person level for other types of analysis.

It simpler to deal with and isn't any different than the way most other things are modeled. I mean, when you do a fact table for medical claims, its usually at the procedure level, or a fact table for orders, its at the line level. There usually isn't a reason to model something where a single fact row must represent the entire event.

John Simon Sun Jan 16, 2011 7:14 am

I've written a blog about this to make it easier to post diagrams at http://jsimonbi.wordpress.com

bagavath Mon Jan 17, 2011 12:49 am

John Simon wrote:I've written a blog about this to make it easier to post diagrams at http://jsimonbi.wordpress.com

Thanks for the link John. Excellent post.

John Simon Wed Mar 16, 2011 10:26 pm

For anyone interested, I've been working on this and created another post on my blog here: https://jsimonbi.wordpress.com/2011/03/17/data-modeling-conundrum-follow-up/

Essentially you cannot use bridge tables to solve this particular many-to-many situation. If you use a bridge table, you need to include the attributes such as personrole, vehicle role etc as part of the Bridge key. Since it is highly unlikely that the same people will be involved in an accident with the exact same roles, and the same goes for vehicles, you gain nothing by using a bridge table.

So it comes down to including all rows in the one fact table, or have multiple fact tables for Accident, AccidentPerson, AccidentVehicle and AccidentVehiclePerson.

Sponsored content

Regarding Factless Fact Tables

Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables

Re: Regarding Factless Fact Tables