Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Designing huge incremental factless fact

3 posters

Go down

Designing huge incremental factless fact Empty Designing huge incremental factless fact

Post  vvij Mon Jul 13, 2015 9:52 am

Hi

I am working on a Retail BI solution and stuck at a design problem.
Situation : Consider a product dimension with 200k records and an outlet dimension with around 300 records. Most certainly we have a fact containing cost and sales with other dimensions, but let us just consider these two (along with time) for now.

Requirement : To have a daily report which will find out product sales performance. In other words, it will have two columns - one showing the number of products in a store, and second showing number of products sold in a store for a given time period. For example, a store can have 100 products but only 80 were sold. Data for second column comes from daily transactions and is straight forward.

Problem : I am thinking what is the best way to store data for the first column. I thought of creating a factless fact containing product and store ids. Then I can do normal star joins and count the rows. What I am doubtful of this approach is the data volume. Ideally a store won't have all the products, so lets be generous and say on average we have 100k products in each store which means 100,000 x 300 = 30 million records in factless fact. As it is a daily report, I would need to have 30 million records added to the fact every day.

Is there a better way to design this problem?

Thanks
Vij

vvij

Posts : 7
Join date : 2015-07-13

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  ngalemmo Mon Jul 13, 2015 10:04 am

If all you need is a count of products, why not just store the count by day rather than individual products?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  vvij Mon Jul 13, 2015 10:11 am

No, I don't just need count of products. The count measure is required at product and store grain, and it may roll up to other attributes in product and store dimensions.
For example, a product rolls up to department. Therefore, can show store wise x department wise product count. This count needs to be compared with actual sales count in a daily report.

vvij

Posts : 7
Join date : 2015-07-13

Back to top Go down

Designing huge incremental factless fact Empty Designing huge incremental factless fact

Post  zoom Tue Jul 14, 2015 8:20 am

You need different data snapshot Fact tables. 1 fact table for daily snapshot to hold only 1 day of data. Depending on business report need, you can create weekly, monthly and yearly snapshot fact tables. You use this approach because of the high data volume you have and to avoid SQL slow performance.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  vvij Tue Jul 14, 2015 10:38 pm

Zoom, thanks for the reply.

I understand what you said. This where business needs to re-work on the requirements. If they say we want a dynamic report where we can chose any date range, then it becomes a design problem to deliver such reports with good performance! If reports are static then we can control which snapshot fact tables to use.

vvij

Posts : 7
Join date : 2015-07-13

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  ngalemmo Wed Jul 15, 2015 12:07 am

You could store first seen date and last seen date on a product.  This will allow you to count by any date range and not have to store one row per day. Such a table would be orders of magnitude smaller.

If you need to account for gaps in inventory, you could add an 'out of stock' flag.  If you see the product again and existing rows are 'out of stock' for that product, you create a new row with a new first seen date.  If you find a row with the out of stock flag set to false, just update the last seen date.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  vvij Fri Jul 17, 2015 6:52 am

What you said looks similar to a SCD 2 implementation. I thought of it too but this table is supposed to act like a fact and will join to product, outlet and time dimension. The only measure is the count which is required to be calculated daily. So if we model it like the way you suggested, can we still use it like a fact? If yes, how will we join it to time dimension?

vvij

Posts : 7
Join date : 2015-07-13

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  ngalemmo Fri Jul 17, 2015 5:05 pm

Given all you care about is the date, degenerate dimensions should be fine. It makes the query much simpler. But if you do need dimensional attributes, there is no reason could could not also include FKs to the date dimension.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  ngalemmo Sat Jul 18, 2015 3:12 pm

Expanding on that a bit, it is important to understand that integration occurs through conforming dimensional attributes, not matching keys.  If it was solely on matching keys, Type 2 dimensions would not work.

While other facts may have date keys, there is no reason data cannot be combined on the date values.

If you need to provide counts by date for a range of dates, you can join the count fact to the date dimension for dates within the effective period (using the date values, not keys) and filter for dates in your range.  The query would probably perform best if you put the range filter on the date dimension.

For example:
SELECT d.date_value, count(*)
FROM count_fact f join date_dim d on d.date_value between f.first_seen_date and f.last_seen_date
WHERE d.date_value between low_date and high_date;
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Designing huge incremental factless fact Empty Re: Designing huge incremental factless fact

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum