Status attributes on main dimension or as separate dimension
4 posters
Page 1 of 1
Status attributes on main dimension or as separate dimension
I have a dimension "dim_vehicle" which can go through various statuses ("In Transit", "Parked", "In Use", "In Repair", etc.).
Based on these statuses, I can determine many other attributes about the vehicle. Just as one example, if the vehicle is "In Transit", "In Repair", or "In Use" I can infer a whole new attribute "Is Available" with a value of "No" where as if the vehicle was "Parked", I can determine that "Is Available" is "Yes".
I would typically store a status on the "dim_vehicle" dimension itself as an SCD Type 2 attribute, but I've been questioning if this is the best idea or if the status (and its additional inferred attributes like "Is Available") should be stored in their own dim_vehicle_status dimension.
This would work, but I loose the ability to query historical vehicle-related facts based on the current status of a vehicle.
Ex: fact_vehicle_mileage_checkin
* id
* checkin_date_id -> (dim_date)
* vehicle_id -> (dim_vehicle)
* vehicle_status_id -> (dim_vehicle_status)
* mileage
In this case, I can't look at vehicles with a current status of "In Repair" and see what their mileage was 1 year ago. Is it best to hold statuses on the dimension itself like I have been doing?
Based on these statuses, I can determine many other attributes about the vehicle. Just as one example, if the vehicle is "In Transit", "In Repair", or "In Use" I can infer a whole new attribute "Is Available" with a value of "No" where as if the vehicle was "Parked", I can determine that "Is Available" is "Yes".
I would typically store a status on the "dim_vehicle" dimension itself as an SCD Type 2 attribute, but I've been questioning if this is the best idea or if the status (and its additional inferred attributes like "Is Available") should be stored in their own dim_vehicle_status dimension.
This would work, but I loose the ability to query historical vehicle-related facts based on the current status of a vehicle.
Ex: fact_vehicle_mileage_checkin
* id
* checkin_date_id -> (dim_date)
* vehicle_id -> (dim_vehicle)
* vehicle_status_id -> (dim_vehicle_status)
* mileage
In this case, I can't look at vehicles with a current status of "In Repair" and see what their mileage was 1 year ago. Is it best to hold statuses on the dimension itself like I have been doing?
ryno1234- Posts : 33
Join date : 2015-01-07
Re:Status attributes on main dimension or as separate dimension
Hi,
it depends upon what you want to do with the data, if the user want to some analysis on the vehicle status then it is better to have it separately as a vehicle status.
thanks
it depends upon what you want to do with the data, if the user want to some analysis on the vehicle status then it is better to have it separately as a vehicle status.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Status attributes on main dimension or as separate dimension
Thank you for your reply, unfortunately it doesn't really address the concerns that I had called out, particularly the concern regarding historical information.
In ALL of my vehicle-related queries, status will need to be constrained, i.e. Show me all active vehicles as of Dec 31st 2014.
In a sense, I'm trying to play Devil's advocate and understand how putting the status in its own dimension as referencing it from within my facts would work, vs. putting it in the dim_vehicle dimension as an SCD Type 2 attribute.
In ALL of my vehicle-related queries, status will need to be constrained, i.e. Show me all active vehicles as of Dec 31st 2014.
In a sense, I'm trying to play Devil's advocate and understand how putting the status in its own dimension as referencing it from within my facts would work, vs. putting it in the dim_vehicle dimension as an SCD Type 2 attribute.
ryno1234- Posts : 33
Join date : 2015-01-07
Re: Status attributes on main dimension or as separate dimension
Hi,
if the users are interested in getting information about vehicle if they are in In Transit or in repair and from which date to which date and other type of information then it is better to put it in a fact.
If they are not interested in doing much analysis on the status of the vehicle then you can have it as SCD2 dimension
thanks
if the users are interested in getting information about vehicle if they are in In Transit or in repair and from which date to which date and other type of information then it is better to put it in a fact.
If they are not interested in doing much analysis on the status of the vehicle then you can have it as SCD2 dimension
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Status attributes on main dimension or as separate dimension
I think there are a lot of things going on here. There is the life of the vehicle and the various states it goes through, and there is the current status of the vehicle (static or over time). One is a fact, the other a dimensional attribute.
As a fact you are recording what happens to the vehicle over time. As a dimension it is either the current state or the state at the time some other action took place.
And, if you use the dimension in the fact that is recording the action (implemented as a Type 2), you could set the type 2 key value to reference the dimension prior to the action. This would let you know the prior state of the vehicle.
Another thing to consider, if you need to record the state at the time of the transaction, but most queries only care about the current status, you could avoid the self-join by storing both versions of the status in a type 2 table. The dimension load process needs to do a little extra work, but it speeds up queries. There are a lot of other techniques, such as storing a durable alternate key in both the fact and dimension. This allows you to locate the current row using a single join and a filter.
As a fact you are recording what happens to the vehicle over time. As a dimension it is either the current state or the state at the time some other action took place.
And, if you use the dimension in the fact that is recording the action (implemented as a Type 2), you could set the type 2 key value to reference the dimension prior to the action. This would let you know the prior state of the vehicle.
Another thing to consider, if you need to record the state at the time of the transaction, but most queries only care about the current status, you could avoid the self-join by storing both versions of the status in a type 2 table. The dimension load process needs to do a little extra work, but it speeds up queries. There are a lot of other techniques, such as storing a durable alternate key in both the fact and dimension. This allows you to locate the current row using a single join and a filter.
Re: Status attributes on main dimension or as separate dimension
The main benefits I see are:
1. The status dimension is now much smaller
2. I can use a conformed status dimension in other fact tables without bringing in the large vehicle dimension
3. I can filter on status types without having to do a select distinct on a big dimension
4. I can create nice hierarchies on the status types
1. The status dimension is now much smaller
2. I can use a conformed status dimension in other fact tables without bringing in the large vehicle dimension
3. I can filter on status types without having to do a select distinct on a big dimension
4. I can create nice hierarchies on the status types
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Status attributes on main dimension or as separate dimension
Thank you all for your feedback, very valuable.
I was reading "Star Schema - The Complete Reference", thumbing through the section on Dimension Browseability, and came upon a clue to help me understand better how to make this design choice. If status is its own dimension, I do not have a way to see what status a vehicle is in without the occurrence of an event (fact table). But in reality, there isn't an event required for a vehicle to have an initial status, just the vehicle's shear existence allows it to have a status. Given this, my understanding is that the status would lie on the vehicle dimension.
All that being said, there is no reason I cannot ALSO have a fact_vehicle_status_change table which effective / expiration dates and a foreign key to the dim_vehicle table to analyze the change in status over time.
If anyone feels like I'm completely missing the bus here, please let me know.
I was reading "Star Schema - The Complete Reference", thumbing through the section on Dimension Browseability, and came upon a clue to help me understand better how to make this design choice. If status is its own dimension, I do not have a way to see what status a vehicle is in without the occurrence of an event (fact table). But in reality, there isn't an event required for a vehicle to have an initial status, just the vehicle's shear existence allows it to have a status. Given this, my understanding is that the status would lie on the vehicle dimension.
All that being said, there is no reason I cannot ALSO have a fact_vehicle_status_change table which effective / expiration dates and a foreign key to the dim_vehicle table to analyze the change in status over time.
If anyone feels like I'm completely missing the bus here, please let me know.
ryno1234- Posts : 33
Join date : 2015-01-07
Similar topics
» Account attributes in separate dimensions
» Model with Attributes Based on Two Separate Dimensions
» Main dimension and profile dimension
» dimension table design question for around 100 attributes and higher level calculated attributes
» Question on breaking out Degenerate Dimension to separate dimension
» Model with Attributes Based on Two Separate Dimensions
» Main dimension and profile dimension
» dimension table design question for around 100 attributes and higher level calculated attributes
» Question on breaking out Degenerate Dimension to separate dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum