Rolling Month Measures Issue
3 posters
Page 1 of 1
Rolling Month Measures Issue
Hi:
I met a difficult requirement about rolling month measures. It's different from regular rolling time reports, hope you can read through this post.
Dimensions and Facts have many fields, to simplify the problem, I use the following example. Suppose there are two tables, EMP and FACT. where EMP has three fields, (id, month, status); and FACT also has three fields, (id, month, amount).
Generally speaking, for dimensional table, we should use surragate key. However, the status for EMPs changes so fast that we think better to use monthly snapshot. So the EMP table may looks like
id, month, status
1, 201201, A
2, 201201, B
1, 201202, B
2, 201202, C
1, 201203, C
2, 201203, D
And in FACT table, we recorded EMP's sales amount every month, such as:
id, month, amount
1, 201201, 10
2, 201201, 20
1, 201202, 30
2, 201202, 40
1, 201203, 50
2, 201203, 60
The requirement is, user specify a month and a rolling number, n. then the report should output n months' data, but the STATUS field for EMP table, should use the LATEST month. For example, base on the sample data, if user specify '201203' for month, and '3' for rolling month. then data for the report should looks like
id, status, month, amount
1, C, 201203, 10
2, D, 201203, 20
1, C, 201203, 30
2, D, 201203, 40
1, C, 201203, 50
2, D, 201203, 60
And the SQL we use is:
select a.id, a.status, a.month, b.amount
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month
No, so fa so good. However, the most difficult part is performance issue, EMP table has 1 million rows per month, while Fact table has 10 million rows per month, it took me half hour to run such sql if I join these two table.
Now, I wonder if there is a better way to model this requirement? Such as put all information in Fact table and query without join? then I can speed up the query time.
Thanks
Larry
I met a difficult requirement about rolling month measures. It's different from regular rolling time reports, hope you can read through this post.
Dimensions and Facts have many fields, to simplify the problem, I use the following example. Suppose there are two tables, EMP and FACT. where EMP has three fields, (id, month, status); and FACT also has three fields, (id, month, amount).
Generally speaking, for dimensional table, we should use surragate key. However, the status for EMPs changes so fast that we think better to use monthly snapshot. So the EMP table may looks like
id, month, status
1, 201201, A
2, 201201, B
1, 201202, B
2, 201202, C
1, 201203, C
2, 201203, D
And in FACT table, we recorded EMP's sales amount every month, such as:
id, month, amount
1, 201201, 10
2, 201201, 20
1, 201202, 30
2, 201202, 40
1, 201203, 50
2, 201203, 60
The requirement is, user specify a month and a rolling number, n. then the report should output n months' data, but the STATUS field for EMP table, should use the LATEST month. For example, base on the sample data, if user specify '201203' for month, and '3' for rolling month. then data for the report should looks like
id, status, month, amount
1, C, 201203, 10
2, D, 201203, 20
1, C, 201203, 30
2, D, 201203, 40
1, C, 201203, 50
2, D, 201203, 60
And the SQL we use is:
select a.id, a.status, a.month, b.amount
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month
No, so fa so good. However, the most difficult part is performance issue, EMP table has 1 million rows per month, while Fact table has 10 million rows per month, it took me half hour to run such sql if I join these two table.
Now, I wonder if there is a better way to model this requirement? Such as put all information in Fact table and query without join? then I can speed up the query time.
Thanks
Larry
larry_lan- Posts : 5
Join date : 2011-11-07
Re: Rolling Month Measures Issue
Firstly you query would return far more records than you expected. I guess you may want to query as follows:
select a.id, a.status, a.month, sum(b.amount)
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month
group by a.id, a.status, a.month
If month is the period grain for your report, you may pre-aggregate your monthly snapshot by id and month without status, using similar query.
select a.id, a.status, a.month, sum(b.amount)
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month
group by a.id, a.status, a.month
If month is the period grain for your report, you may pre-aggregate your monthly snapshot by id and month without status, using similar query.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Rolling Month Measures Issue
Hi Hang:
Thanks for your suggestion. We have many properties in dimensions, and some of then need sum, while some of them is count distinct. There are millions of combination. We tried pre-aggregate and it took days to calculate.
Thanks
Larry
Thanks for your suggestion. We have many properties in dimensions, and some of then need sum, while some of them is count distinct. There are millions of combination. We tried pre-aggregate and it took days to calculate.
Thanks
Larry
larry_lan- Posts : 5
Join date : 2011-11-07
Re: Rolling Month Measures Issue
Sounds like a good case of leveraging OLAP cube, SSAS for instance.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Rolling Month Measures Issue
Try:
select a.id, a.status, a.month, b.amount
from (SELECT id, max(status) as status from EMP where month = 201203 group by id) a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203
select a.id, a.status, a.month, b.amount
from (SELECT id, max(status) as status from EMP where month = 201203 group by id) a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203
Re: Rolling Month Measures Issue
Thanks ngalemmo:
The subquery do reduce the joining records and query becomes faster. Thanks a lot. I also wonder if there is any better way to model such requirement in star schema? Since we will put this model in report tools like BO or BIEE, and with these tools, queries sometimes are not so flexible.
Thanks
Larry
The subquery do reduce the joining records and query becomes faster. Thanks a lot. I also wonder if there is any better way to model such requirement in star schema? Since we will put this model in report tools like BO or BIEE, and with these tools, queries sometimes are not so flexible.
Thanks
Larry
larry_lan- Posts : 5
Join date : 2011-11-07
Similar topics
» when to use a seperate month dimension?
» Employee Hierarchical - Rolling down reporting
» Problems with design to allow Rolling up of Hierarchical Data
» Which date to be used for rolling up the data into monthly aggregate fact ?
» Statement Cycle versus MTD, Financial/Banking Industry
» Employee Hierarchical - Rolling down reporting
» Problems with design to allow Rolling up of Hierarchical Data
» Which date to be used for rolling up the data into monthly aggregate fact ?
» Statement Cycle versus MTD, Financial/Banking Industry
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum