Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

how to "reduce the duplicated rows"

2 posters

Go down

how to "reduce the duplicated rows" Empty how to "reduce the duplicated rows"

Post  jon Wed Dec 08, 2010 1:36 am

i have adopted a database with a table, my_daily_snap_tbl, that keep a complete archive of daily transactional detail. For example, today, there are 100 records in the OLTP_table then these 100 rows are copied over and insert to my_daily_snap_tbl. suppose on next day, there are total of 105 rows, then, all 105 are copied over. if I do the following select:
select snap_date, count(snap_date)
from my_daily_snap_tbl
group by count(snap_date)

it will show:
2010-12-07 100
2010-12-08 105

if I do
select count(*)
from my_daily_snap_tbl

I will get 205 rows. (In reality, I've got 7 months worth of data in this my_daily_snap_tbl, which snapped every day, and there are 400 million rows.)

When I asked around to find out why my_daily_snap_tbl is capturing data from transaction liked that, one answer I've got is: the analyst wanted to have the ability to goto a specific day and see all the rows on any given day.

I did a quick comparison of rows, there are over 80% records haven't changed at all. Most of them are duplicated rows.

is there a way to model this table so that i only keep track of the changed, deleted, and new rows? if so, how to resemble the records on a given date? Feel free to elaborate how to handle it, with samples and code. I am a newbie.



Posts : 11
Join date : 2010-05-10

Back to top Go down

how to "reduce the duplicated rows" Empty Re: how to "reduce the duplicated rows"

Post  ngalemmo Wed Dec 08, 2010 2:31 am

There are two basic forms of fact tables that can be used to maintain a historical record of measures over time: a delta (transactional) fact table and an accumulating snapshot fact table.

A delta fact table stores activity as a series of transactions where each row is the net change. In other words, if a fact is updated, you store the difference between the old version and new version. The table would include the date of the change. By summing facts bound by that date, you can recreate the state of a fact at any point in time.

An accumulating snapshot stores a new image of the row with the addition of effective and expiration dates. When a fact is updated, the old version is expired and a new version is added with an appropriate effective date. Selecting rows bound by these dates would allow you to state the measures as of a particular date. While similar to your current table, this appoach stores far fewer rows as only new or updated rows are added to the table. Unchanged rows simply remain effective.

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

Back to top Go down

how to "reduce the duplicated rows" Empty thank you.

Post  jon Wed Dec 08, 2010 11:38 pm

thanks for your explanation.

I've got only 4 words for them... it is amazingly good.

Thanks for your patient and detailed reply.

You guys are the best.


Posts : 11
Join date : 2010-05-10

Back to top Go down

how to "reduce the duplicated rows" Empty Re: how to "reduce the duplicated rows"

Post  Sponsored content

Sponsored content

Back to top Go down

Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum