Daily Summaries in Data Warehouse
4 posters
Page 1 of 1
Daily Summaries in Data Warehouse
Hi,
I am maintaining daily aggregate and monthly aggregates for my Data warehouse.
The only problem is that I have late arriving Facts. i.e. a fact record can come as late as 2 months but 90% of facts will arrive on first 7 days.
So if I am doing daily aggregates how can I do aggregation of late arriving Facts. What is the right approach.
Also what is the right partitioning strategy for Daily aggregates : Should it be ETL process date or business date.
Note: Daily records can be around 150 million
Regards,
Pankaj
I am maintaining daily aggregate and monthly aggregates for my Data warehouse.
The only problem is that I have late arriving Facts. i.e. a fact record can come as late as 2 months but 90% of facts will arrive on first 7 days.
So if I am doing daily aggregates how can I do aggregation of late arriving Facts. What is the right approach.
Also what is the right partitioning strategy for Daily aggregates : Should it be ETL process date or business date.
Note: Daily records can be around 150 million
Regards,
Pankaj
pankaj.tiwari- Posts : 2
Join date : 2013-12-10
Re:Daily Summaries in Data Warehouse
Hi ,
generally it depends upon the data on which you are doing summary and generating report on that, if they are for goverment/authority reporting then we dont update the data for that day, but create a new entry when the updated fact information was received.
When you are generating report based on business date then the you should use the business date as the partitioning column.
thanks
generally it depends upon the data on which you are doing summary and generating report on that, if they are for goverment/authority reporting then we dont update the data for that day, but create a new entry when the updated fact information was received.
When you are generating report based on business date then the you should use the business date as the partitioning column.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Daily Summaries in Data Warehouse
Hi,
I suggest to have 2 various day keys not only in your elementary fact table but in your aggregate table as well:
Business day key and ETL day key.
In your aggregate table you will group fact data according these 2 day keys ( and other dimension keys of course). That means for a given business day there might be more rows with various ETL day and you will not update aggregate data but insert new rows.
So you can follow in your database the arrival of your data for a specific business day, even you can have report about loading.
I suggest to have 2 various day keys not only in your elementary fact table but in your aggregate table as well:
Business day key and ETL day key.
In your aggregate table you will group fact data according these 2 day keys ( and other dimension keys of course). That means for a given business day there might be more rows with various ETL day and you will not update aggregate data but insert new rows.
So you can follow in your database the arrival of your data for a specific business day, even you can have report about loading.
gvarga- Posts : 43
Join date : 2010-12-15
Daily Summaries in Data Warehouse
My concern....
Say I have 100 CDR's for 1st of Dec out of which 90 CDRs are received by warehouse and 10 CDRs been late arriving will come on 2nd of Dec.
I have partition my Transaction Fact(TF) on Business date and Summary Fact (SF) also on Business date.
On 1st TF will have 90 records on 1 Dec partition and the aggregated records would also be on 1st Dec partition in SF.
On 2nd Dec I got another 100 CDR (90 CDR for 2nd and 10 CDR of 1st).
Since my TF is partitioned on Business Date the 10 CDR will go to 1st partition and 90 CDR's will go into 2nd Partition, which is perfect
But if I do summary aggregates for 2nd I will only select records from 2nd partition and aggregate those 90 CDR's.
How can I aggregate those remaining 10 CDR's which should as per rule should be in 1st Dec partition.
Any workaround solution would be much appreciated.
Say I have 100 CDR's for 1st of Dec out of which 90 CDRs are received by warehouse and 10 CDRs been late arriving will come on 2nd of Dec.
I have partition my Transaction Fact(TF) on Business date and Summary Fact (SF) also on Business date.
On 1st TF will have 90 records on 1 Dec partition and the aggregated records would also be on 1st Dec partition in SF.
On 2nd Dec I got another 100 CDR (90 CDR for 2nd and 10 CDR of 1st).
Since my TF is partitioned on Business Date the 10 CDR will go to 1st partition and 90 CDR's will go into 2nd Partition, which is perfect
But if I do summary aggregates for 2nd I will only select records from 2nd partition and aggregate those 90 CDR's.
How can I aggregate those remaining 10 CDR's which should as per rule should be in 1st Dec partition.
Any workaround solution would be much appreciated.
pankaj.tiwari- Posts : 2
Join date : 2013-12-10
Re: Daily Summaries in Data Warehouse
If your aggregates are views, you don't have to do anything. If the aggregates are tables, simply rebuild any aggregate fact that had an underlying partition change.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» difference between data mart and data warehouse at logical/physical level
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» Modelling current data and daily observations.
» Daily/weekly/monthly data mart views
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» Modelling current data and daily observations.
» Daily/weekly/monthly data mart views
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum