Periodic snapshot fact tables with sparse data

Andrea Vincenzi Mon Oct 24, 2011 2:17 pm

Description of the business and data model: my client runs a typical service oriented business; employees provide services to several clients, who pay according on the number of hours they use.

Description of the problem: One of the tables that I have designed is a monthly periodic snapshot fact table which records the number of hours provided and the associated cost (amount), with several dimensions attached like client, contract, office, type of service and a few others. Usually periodic snapshot fact tables are dense (there are actual transaction facts to load for any combination of the dimensions), but in this case this is not true: for example, a typical client only buys 2-3 services, while the services dimension contains more than 200 different kinds of services.

Question: should I load the periodic snapshot with a record for every combination of dimension values, even when there are not actual data? Of course this means loading a large number of records, 90% of which would contain all zeroes on the measures.

Considerations: since the snapshot is mainly used to build cross-tab reports for a given year with the 12 months on the columns, some combinations of dimensions must always be filled, otherwise the report might come out with some missing months on the column headers (this might depend on the tool used, we use Pentaho and we have observed this behavior). As a minimum, it is necessary to create at least one record per month per client, even if the client didn’t buy any services on some months.

I wanted to know if any of you have used a similar technique when dealing with periodic snapshot tables with sparse data and if there are some risks that I haven’t considered in doing so.

VHF Tue Oct 25, 2011 3:32 pm

I have a "sparse" daily snapshot for inventory data. Probably not as sparse as your scenario, but records with a 0 quantity on hand are not stored. Because not every item is stored at every storage location, not storing a record for each item for each storage location eliminates a large number of rows from the daily snapshot.

However, I have a business user who wants to see "zero" records. He does a cross-tab report of items by storage locations, and he wants every possible combination to have a value. Rather than store the zero records in the snapshot, I gave him a view that synthesizes them. In essence the view does a cross-tab of items and storage locations and then left joins the daily snapshot to get the non-zero values when they exist. From the user's perspective it is as if there is a record for every combination stored in the daily snapshot fact table.

You might be able to do something similar with clients, services, and months.

ngalemmo Tue Oct 25, 2011 9:35 pm

Good idea. Depending on your database, you may also want to consider doing a UNION ALL instead of a join.

Sponsored content

Periodic snapshot fact tables with sparse data

Periodic snapshot fact tables with sparse data

Re: Periodic snapshot fact tables with sparse data

Re: Periodic snapshot fact tables with sparse data

Re: Periodic snapshot fact tables with sparse data