Model Design best practice - add columns or pivot data for multiple rows ?
2 posters
Page 1 of 1
Model Design best practice - add columns or pivot data for multiple rows ?
Hi,
We have an instance where we currently have a Fact table that contains the majority of the data in the DW and is the main fact table.
There is a requirement to add further data where for each current fact record there may be none or multiple values e.g. the fact table contains the rental records related to car hire. Associated with each rental record are certain fees such as delivery fee, fuel fee etc.
The requirement is to add the fee data which currently doesn't exist. We could add each different fee as an existing column which does not affect the grain of the data and so could exist within the current fact rental table. However, by this method if further fees were required or present fees removed this would require changes to the ETL and metadata of the design which makes further future changes more complex from a development and deployment point of view IMO.
The other alternative is to pivot the data so that we only have two columns related to fees, one for the fee type and the other the fee amount. However, each rental record would now have several rows associated to it as it is possible to have multiple fees for each record. This has the advantage in that any additional/removal of fee types requires no changes in metadata (still only 2 columns in the fact table) but the disadvantage is that this changes the grain of the rental to lower than that which currently exists in the current fact rental table and so would require a new fact table.
Apart from the above does anyone recognise any further advantages/disadvantages of each design e.g. reporting, analytics etc. ?
My thought is to have the latter option but I am concerned about whether have lots of fact tables would not be a good design for the DW (there are other data requirements which follow a similar pattern and so would require separate fact tables if the latter option was chosen). Is this a valid concern ?
We have an instance where we currently have a Fact table that contains the majority of the data in the DW and is the main fact table.
There is a requirement to add further data where for each current fact record there may be none or multiple values e.g. the fact table contains the rental records related to car hire. Associated with each rental record are certain fees such as delivery fee, fuel fee etc.
The requirement is to add the fee data which currently doesn't exist. We could add each different fee as an existing column which does not affect the grain of the data and so could exist within the current fact rental table. However, by this method if further fees were required or present fees removed this would require changes to the ETL and metadata of the design which makes further future changes more complex from a development and deployment point of view IMO.
The other alternative is to pivot the data so that we only have two columns related to fees, one for the fee type and the other the fee amount. However, each rental record would now have several rows associated to it as it is possible to have multiple fees for each record. This has the advantage in that any additional/removal of fee types requires no changes in metadata (still only 2 columns in the fact table) but the disadvantage is that this changes the grain of the rental to lower than that which currently exists in the current fact rental table and so would require a new fact table.
Apart from the above does anyone recognise any further advantages/disadvantages of each design e.g. reporting, analytics etc. ?
My thought is to have the latter option but I am concerned about whether have lots of fact tables would not be a good design for the DW (there are other data requirements which follow a similar pattern and so would require separate fact tables if the latter option was chosen). Is this a valid concern ?
Guest- Guest
Re: Model Design best practice - add columns or pivot data for multiple rows ?
a Fact table that contains the majority of the data in the DW
Is your DW a single subject area? Or do I see a red flag?
Anyway, often the correct solution is to do both (sort of). A flat representation, not too wide , at a general classification level (i.e. rental, maybe 2-3 fee categories(insurance, site fees, add ons), tax) so it is easy to get totals and trends, as well as a thin, tall, very detailed component level for accounting and revenue analysis.
Re: Model Design best practice - add columns or pivot data for multiple rows ?
You could also stick the fees in a bridge table which is kind of a pivot or lower grain than your fact to be more precise.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Model Design best practice - add columns or pivot data for multiple rows ?
The DW currently consists of three main subject areas.
If my understanding of the bridging solution is correct, would this mean a GroupFeeKey in the FactRental table linked to a bridging table which contains the GroupFeeKey, the FeeKey and FeeAmount, which is then linked to the FeeDimension containing a FeeKey and FeeTypeDescription.
If this is correct what would be the advantages/disadvantages over this type of solution compared to have a fact table of lower grain containing the fees ?
If my understanding of the bridging solution is correct, would this mean a GroupFeeKey in the FactRental table linked to a bridging table which contains the GroupFeeKey, the FeeKey and FeeAmount, which is then linked to the FeeDimension containing a FeeKey and FeeTypeDescription.
If this is correct what would be the advantages/disadvantages over this type of solution compared to have a fact table of lower grain containing the fees ?
Guest- Guest
Re: Model Design best practice - add columns or pivot data for multiple rows ?
You don't have to create another fact table with all the dimensions just to track the fees. Otherwise, the solutions are pretty similar. You have metrics at two different grains. You can either create two fact tables, or build a bridge table to capture the lower grained metrics. I usually pick the bridge table if all metrics are at the same level except for one.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Model Design best practice - add columns or pivot data for multiple rows ?
The metrics are not at different grains. They are renting cars. You have a contract with a bunch of charges for different things, some are fees, some are taxes, and some are for the car and any extra stuff.
The original question was, do you go wide, with a column for each type of fee or charge on a single row, or do you go deep, one row per charge.
In the classic sales structure, you go deep. Each line is an item sold with appropriate dimensions to identify what it is, who gets the revenue, etc...
However, if there is a large segment of the user community that just want to see 'fees' and doesn't care much about what they are, it may also be useful to build an aggregate that has one line per contract with various items broken out to a small number of predefined categories.
I would not, however, go wide and try to capture every individual charge. It would drive you mad...
The original question was, do you go wide, with a column for each type of fee or charge on a single row, or do you go deep, one row per charge.
In the classic sales structure, you go deep. Each line is an item sold with appropriate dimensions to identify what it is, who gets the revenue, etc...
However, if there is a large segment of the user community that just want to see 'fees' and doesn't care much about what they are, it may also be useful to build an aggregate that has one line per contract with various items broken out to a small number of predefined categories.
I would not, however, go wide and try to capture every individual charge. It would drive you mad...
Similar topics
» Fact table design: Sales Transaction with multiple Discount rows
» Dimensional data modeling CASE tool and DW design practice
» Attribute data held as rows in source - how to dimensionally model???
» Overall design and data model and Informatica CDC
» Dimension Design with Multiple Data Sources (ORACLE and SQL Server)
» Dimensional data modeling CASE tool and DW design practice
» Attribute data held as rows in source - how to dimensionally model???
» Overall design and data model and Informatica CDC
» Dimension Design with Multiple Data Sources (ORACLE and SQL Server)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum