Designing huge incremental factless fact
3 posters
Page 1 of 1
Designing huge incremental factless fact
Hi
I am working on a Retail BI solution and stuck at a design problem.
Situation : Consider a product dimension with 200k records and an outlet dimension with around 300 records. Most certainly we have a fact containing cost and sales with other dimensions, but let us just consider these two (along with time) for now.
Requirement : To have a daily report which will find out product sales performance. In other words, it will have two columns - one showing the number of products in a store, and second showing number of products sold in a store for a given time period. For example, a store can have 100 products but only 80 were sold. Data for second column comes from daily transactions and is straight forward.
Problem : I am thinking what is the best way to store data for the first column. I thought of creating a factless fact containing product and store ids. Then I can do normal star joins and count the rows. What I am doubtful of this approach is the data volume. Ideally a store won't have all the products, so lets be generous and say on average we have 100k products in each store which means 100,000 x 300 = 30 million records in factless fact. As it is a daily report, I would need to have 30 million records added to the fact every day.
Is there a better way to design this problem?
Thanks
Vij
I am working on a Retail BI solution and stuck at a design problem.
Situation : Consider a product dimension with 200k records and an outlet dimension with around 300 records. Most certainly we have a fact containing cost and sales with other dimensions, but let us just consider these two (along with time) for now.
Requirement : To have a daily report which will find out product sales performance. In other words, it will have two columns - one showing the number of products in a store, and second showing number of products sold in a store for a given time period. For example, a store can have 100 products but only 80 were sold. Data for second column comes from daily transactions and is straight forward.
Problem : I am thinking what is the best way to store data for the first column. I thought of creating a factless fact containing product and store ids. Then I can do normal star joins and count the rows. What I am doubtful of this approach is the data volume. Ideally a store won't have all the products, so lets be generous and say on average we have 100k products in each store which means 100,000 x 300 = 30 million records in factless fact. As it is a daily report, I would need to have 30 million records added to the fact every day.
Is there a better way to design this problem?
Thanks
Vij
vvij- Posts : 7
Join date : 2015-07-13
Re: Designing huge incremental factless fact
If all you need is a count of products, why not just store the count by day rather than individual products?
Re: Designing huge incremental factless fact
No, I don't just need count of products. The count measure is required at product and store grain, and it may roll up to other attributes in product and store dimensions.
For example, a product rolls up to department. Therefore, can show store wise x department wise product count. This count needs to be compared with actual sales count in a daily report.
For example, a product rolls up to department. Therefore, can show store wise x department wise product count. This count needs to be compared with actual sales count in a daily report.
vvij- Posts : 7
Join date : 2015-07-13
Designing huge incremental factless fact
You need different data snapshot Fact tables. 1 fact table for daily snapshot to hold only 1 day of data. Depending on business report need, you can create weekly, monthly and yearly snapshot fact tables. You use this approach because of the high data volume you have and to avoid SQL slow performance.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Designing huge incremental factless fact
Zoom, thanks for the reply.
I understand what you said. This where business needs to re-work on the requirements. If they say we want a dynamic report where we can chose any date range, then it becomes a design problem to deliver such reports with good performance! If reports are static then we can control which snapshot fact tables to use.
I understand what you said. This where business needs to re-work on the requirements. If they say we want a dynamic report where we can chose any date range, then it becomes a design problem to deliver such reports with good performance! If reports are static then we can control which snapshot fact tables to use.
vvij- Posts : 7
Join date : 2015-07-13
Re: Designing huge incremental factless fact
You could store first seen date and last seen date on a product. This will allow you to count by any date range and not have to store one row per day. Such a table would be orders of magnitude smaller.
If you need to account for gaps in inventory, you could add an 'out of stock' flag. If you see the product again and existing rows are 'out of stock' for that product, you create a new row with a new first seen date. If you find a row with the out of stock flag set to false, just update the last seen date.
If you need to account for gaps in inventory, you could add an 'out of stock' flag. If you see the product again and existing rows are 'out of stock' for that product, you create a new row with a new first seen date. If you find a row with the out of stock flag set to false, just update the last seen date.
Re: Designing huge incremental factless fact
What you said looks similar to a SCD 2 implementation. I thought of it too but this table is supposed to act like a fact and will join to product, outlet and time dimension. The only measure is the count which is required to be calculated daily. So if we model it like the way you suggested, can we still use it like a fact? If yes, how will we join it to time dimension?
vvij- Posts : 7
Join date : 2015-07-13
Re: Designing huge incremental factless fact
Given all you care about is the date, degenerate dimensions should be fine. It makes the query much simpler. But if you do need dimensional attributes, there is no reason could could not also include FKs to the date dimension.
Re: Designing huge incremental factless fact
Expanding on that a bit, it is important to understand that integration occurs through conforming dimensional attributes, not matching keys. If it was solely on matching keys, Type 2 dimensions would not work.
While other facts may have date keys, there is no reason data cannot be combined on the date values.
If you need to provide counts by date for a range of dates, you can join the count fact to the date dimension for dates within the effective period (using the date values, not keys) and filter for dates in your range. The query would probably perform best if you put the range filter on the date dimension.
For example:
SELECT d.date_value, count(*)
FROM count_fact f join date_dim d on d.date_value between f.first_seen_date and f.last_seen_date
WHERE d.date_value between low_date and high_date;
While other facts may have date keys, there is no reason data cannot be combined on the date values.
If you need to provide counts by date for a range of dates, you can join the count fact to the date dimension for dates within the effective period (using the date values, not keys) and filter for dates in your range. The query would probably perform best if you put the range filter on the date dimension.
For example:
SELECT d.date_value, count(*)
FROM count_fact f join date_dim d on d.date_value between f.first_seen_date and f.last_seen_date
WHERE d.date_value between low_date and high_date;
Similar topics
» Fact Table with huge number of Blank (or Empty) foreign keys
» Fact, factless fact, and current view dimensions
» Designing a Fact for Sales DW
» Need help in Fact table designing
» Problem while Designing Fact table
» Fact, factless fact, and current view dimensions
» Designing a Fact for Sales DW
» Need help in Fact table designing
» Problem while Designing Fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|