Periodic snapshot fact tables with sparse data
3 posters
Page 1 of 1
Periodic snapshot fact tables with sparse data
Description of the business and data model: my client runs a typical service oriented business; employees provide services to several clients, who pay according on the number of hours they use.
Description of the problem: One of the tables that I have designed is a monthly periodic snapshot fact table which records the number of hours provided and the associated cost (amount), with several dimensions attached like client, contract, office, type of service and a few others. Usually periodic snapshot fact tables are dense (there are actual transaction facts to load for any combination of the dimensions), but in this case this is not true: for example, a typical client only buys 2-3 services, while the services dimension contains more than 200 different kinds of services.
Question: should I load the periodic snapshot with a record for every combination of dimension values, even when there are not actual data? Of course this means loading a large number of records, 90% of which would contain all zeroes on the measures.
Considerations: since the snapshot is mainly used to build cross-tab reports for a given year with the 12 months on the columns, some combinations of dimensions must always be filled, otherwise the report might come out with some missing months on the column headers (this might depend on the tool used, we use Pentaho and we have observed this behavior). As a minimum, it is necessary to create at least one record per month per client, even if the client didn’t buy any services on some months.
I wanted to know if any of you have used a similar technique when dealing with periodic snapshot tables with sparse data and if there are some risks that I haven’t considered in doing so.
Description of the problem: One of the tables that I have designed is a monthly periodic snapshot fact table which records the number of hours provided and the associated cost (amount), with several dimensions attached like client, contract, office, type of service and a few others. Usually periodic snapshot fact tables are dense (there are actual transaction facts to load for any combination of the dimensions), but in this case this is not true: for example, a typical client only buys 2-3 services, while the services dimension contains more than 200 different kinds of services.
Question: should I load the periodic snapshot with a record for every combination of dimension values, even when there are not actual data? Of course this means loading a large number of records, 90% of which would contain all zeroes on the measures.
Considerations: since the snapshot is mainly used to build cross-tab reports for a given year with the 12 months on the columns, some combinations of dimensions must always be filled, otherwise the report might come out with some missing months on the column headers (this might depend on the tool used, we use Pentaho and we have observed this behavior). As a minimum, it is necessary to create at least one record per month per client, even if the client didn’t buy any services on some months.
I wanted to know if any of you have used a similar technique when dealing with periodic snapshot tables with sparse data and if there are some risks that I haven’t considered in doing so.
Re: Periodic snapshot fact tables with sparse data
I have a "sparse" daily snapshot for inventory data. Probably not as sparse as your scenario, but records with a 0 quantity on hand are not stored. Because not every item is stored at every storage location, not storing a record for each item for each storage location eliminates a large number of rows from the daily snapshot.
However, I have a business user who wants to see "zero" records. He does a cross-tab report of items by storage locations, and he wants every possible combination to have a value. Rather than store the zero records in the snapshot, I gave him a view that synthesizes them. In essence the view does a cross-tab of items and storage locations and then left joins the daily snapshot to get the non-zero values when they exist. From the user's perspective it is as if there is a record for every combination stored in the daily snapshot fact table.
You might be able to do something similar with clients, services, and months.
However, I have a business user who wants to see "zero" records. He does a cross-tab report of items by storage locations, and he wants every possible combination to have a value. Rather than store the zero records in the snapshot, I gave him a view that synthesizes them. In essence the view does a cross-tab of items and storage locations and then left joins the daily snapshot to get the non-zero values when they exist. From the user's perspective it is as if there is a record for every combination stored in the daily snapshot fact table.
You might be able to do something similar with clients, services, and months.
Last edited by VHF on Wed Oct 26, 2011 9:01 am; edited 2 times in total (Reason for editing : typo; spelling)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Periodic snapshot fact tables with sparse data
Good idea. Depending on your database, you may also want to consider doing a UNION ALL instead of a join.
Similar topics
» Sparse data in fact table
» Storing data that changes periodically - should I use a periodic snapshot fact?
» Variable period data in a single periodic snapshot fact table
» Aggregates in Periodic Snapshot Fact Table
» Monthly Periodic Snapshot Table using SCD Type1 Dimension tables - Design Question.
» Storing data that changes periodically - should I use a periodic snapshot fact?
» Variable period data in a single periodic snapshot fact table
» Aggregates in Periodic Snapshot Fact Table
» Monthly Periodic Snapshot Table using SCD Type1 Dimension tables - Design Question.
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum