how to "reduce the duplicated rows"
2 posters
Page 1 of 1
how to "reduce the duplicated rows"
i have adopted a database with a table, my_daily_snap_tbl, that keep a complete archive of daily transactional detail. For example, today, there are 100 records in the OLTP_table then these 100 rows are copied over and insert to my_daily_snap_tbl. suppose on next day, there are total of 105 rows, then, all 105 are copied over. if I do the following select:
select snap_date, count(snap_date)
from my_daily_snap_tbl
group by count(snap_date)
it will show:
2010-12-07 100
2010-12-08 105
if I do
select count(*)
from my_daily_snap_tbl
I will get 205 rows. (In reality, I've got 7 months worth of data in this my_daily_snap_tbl, which snapped every day, and there are 400 million rows.)
When I asked around to find out why my_daily_snap_tbl is capturing data from transaction liked that, one answer I've got is: the analyst wanted to have the ability to goto a specific day and see all the rows on any given day.
I did a quick comparison of rows, there are over 80% records haven't changed at all. Most of them are duplicated rows.
is there a way to model this table so that i only keep track of the changed, deleted, and new rows? if so, how to resemble the records on a given date? Feel free to elaborate how to handle it, with samples and code. I am a newbie.
Thanks.
select snap_date, count(snap_date)
from my_daily_snap_tbl
group by count(snap_date)
it will show:
2010-12-07 100
2010-12-08 105
if I do
select count(*)
from my_daily_snap_tbl
I will get 205 rows. (In reality, I've got 7 months worth of data in this my_daily_snap_tbl, which snapped every day, and there are 400 million rows.)
When I asked around to find out why my_daily_snap_tbl is capturing data from transaction liked that, one answer I've got is: the analyst wanted to have the ability to goto a specific day and see all the rows on any given day.
I did a quick comparison of rows, there are over 80% records haven't changed at all. Most of them are duplicated rows.
is there a way to model this table so that i only keep track of the changed, deleted, and new rows? if so, how to resemble the records on a given date? Feel free to elaborate how to handle it, with samples and code. I am a newbie.
Thanks.
jon- Posts : 11
Join date : 2010-05-10
Re: how to "reduce the duplicated rows"
There are two basic forms of fact tables that can be used to maintain a historical record of measures over time: a delta (transactional) fact table and an accumulating snapshot fact table.
A delta fact table stores activity as a series of transactions where each row is the net change. In other words, if a fact is updated, you store the difference between the old version and new version. The table would include the date of the change. By summing facts bound by that date, you can recreate the state of a fact at any point in time.
An accumulating snapshot stores a new image of the row with the addition of effective and expiration dates. When a fact is updated, the old version is expired and a new version is added with an appropriate effective date. Selecting rows bound by these dates would allow you to state the measures as of a particular date. While similar to your current table, this appoach stores far fewer rows as only new or updated rows are added to the table. Unchanged rows simply remain effective.
A delta fact table stores activity as a series of transactions where each row is the net change. In other words, if a fact is updated, you store the difference between the old version and new version. The table would include the date of the change. By summing facts bound by that date, you can recreate the state of a fact at any point in time.
An accumulating snapshot stores a new image of the row with the addition of effective and expiration dates. When a fact is updated, the old version is expired and a new version is added with an appropriate effective date. Selecting rows bound by these dates would allow you to state the measures as of a particular date. While similar to your current table, this appoach stores far fewer rows as only new or updated rows are added to the table. Unchanged rows simply remain effective.
thank you.
thanks for your explanation.
I've got only 4 words for them... it is amazingly good.
Thanks for your patient and detailed reply.
You guys are the best.
I've got only 4 words for them... it is amazingly good.
Thanks for your patient and detailed reply.
You guys are the best.
jon- Posts : 11
Join date : 2010-05-10
Similar topics
» Reduce Rows in Inventory Snapshot Fact Table
» duplicated facts against multpile dimensions in MS OLAP control (excel/SSMS)
» How to reduce size of a customer like dimension
» Daily snapshot fact table-any chance to reduce data volume?
» Fact table granulartity to small or just built wrong
» duplicated facts against multpile dimensions in MS OLAP control (excel/SSMS)
» How to reduce size of a customer like dimension
» Daily snapshot fact table-any chance to reduce data volume?
» Fact table granulartity to small or just built wrong
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum