Slow changing fact data with an effective date
Page 1 of 1
Slow changing fact data with an effective date
I have run into a run into a modeling issue and I am unsure of what the best approach is to address it.
I am modeling a portfolio management system. In my transactional system the most grainular piece of data is a taxlot. Each tax lot can have multiple closing dates when less then all of the shares are sold. So it is a traditional 1 to many relationship of taxlots to closed taxlots.
To model it dimensional I flattened out the relationship and use a -1 for the date_closed_id for taxlots that are still open. The example below shows a single tax lot of 60 shares opened on date 5480 and 4 shares were closed on 6207:
The following day an additional 4 shares were sold and the rows look like the following:
Now in the future there is a 2:1 split so the records look as follow:
My problem is the open taxlot is a slow changing fact if there is a split and if there is no split it will never change, but for every day I am creating a new record in my fact_taxlot table. I am storing a lot of redundent data by flattening the taxlots and closed taxlots. If the average portfolio has 15 securities and there are ~250 trading days a year and there are 5000 portfolios (and forecast to increase at 10% per year), that is 18,750,000 records per year in this one table. I have only included a subset of the columns in the table to demonstrate the problem, The fact table includes about a dozen aggregate calculations as well as additional foreign keys.
The system displays portfolio values as of any historical date, if there were no splits and the open taxlot information could not change I would make 2 tables the static open taxlots as a dimension and closed taxlots as the fact, but since splits can occur I am not sure how to model this parent child relationship so that accurate values can be displayed on an as of date basis, but also not store so much redundent data.
My test warehouse database is at roughly 100gb already running data for 8 1/2 years for 210 accounts with 13 securities in each account. I need a smaller foot print.
Any suggestions?
Thanks in advance,
-Jay
[code]
I am modeling a portfolio management system. In my transactional system the most grainular piece of data is a taxlot. Each tax lot can have multiple closing dates when less then all of the shares are sold. So it is a traditional 1 to many relationship of taxlots to closed taxlots.
To model it dimensional I flattened out the relationship and use a -1 for the date_closed_id for taxlots that are still open. The example below shows a single tax lot of 60 shares opened on date 5480 and 4 shares were closed on 6207:
as_of_date_id | security_id | date_opened_id | date_closed_id | current_price_id | closed_taxlot_number | open | shares |
6250 | 4 | 5480 | -1 | 33888 | NULL | 1 | 56 |
6250 | 4 | 5480 | 6207 | 33888 | 1 | 0 | 4 |
The following day an additional 4 shares were sold and the rows look like the following:
as_of_date_id | security_id | date_opened_id | date_closed_id | current_price_id | closed_taxlot_number | open | shares |
6252 | 4 | 5480 | -1 | 33889 | NULL | 1 | 56 |
6252 | 4 | 5480 | 6207 | 33889 | 1 | 0 | 4 |
6252 | 4 | 5480 | 6251 | 33889 | 1 | 0 | 4 |
Now in the future there is a 2:1 split so the records look as follow:
as_of_date_id | security_id | date_opened_id | date_closed_id | current_price_id | closed_taxlot_number | open | shares |
6300 | 4 | 5480 | -1 | 43001 | NULL | 1 | 112 |
6300 | 4 | 5480 | 6207 | 43001 | 1 | 0 | 4 |
6300 | 8 | 5480 | 6251 | 43001 | 1 | 0 | 8 |
My problem is the open taxlot is a slow changing fact if there is a split and if there is no split it will never change, but for every day I am creating a new record in my fact_taxlot table. I am storing a lot of redundent data by flattening the taxlots and closed taxlots. If the average portfolio has 15 securities and there are ~250 trading days a year and there are 5000 portfolios (and forecast to increase at 10% per year), that is 18,750,000 records per year in this one table. I have only included a subset of the columns in the table to demonstrate the problem, The fact table includes about a dozen aggregate calculations as well as additional foreign keys.
The system displays portfolio values as of any historical date, if there were no splits and the open taxlot information could not change I would make 2 tables the static open taxlots as a dimension and closed taxlots as the fact, but since splits can occur I am not sure how to model this parent child relationship so that accurate values can be displayed on an as of date basis, but also not store so much redundent data.
My test warehouse database is at roughly 100gb already running data for 8 1/2 years for 210 accounts with 13 securities in each account. I need a smaller foot print.
Any suggestions?
Thanks in advance,
-Jay
[code]
jfallik- Posts : 1
Join date : 2012-05-23
Similar topics
» Fact in Slow changing Dim
» scd2 effective date, end date data type
» Initial date of effective date column for SCD 2 implementation
» Loading dimension when source already has effective to and from dates
» Which date to be used for rolling up the data into monthly aggregate fact ?
» scd2 effective date, end date data type
» Initial date of effective date column for SCD 2 implementation
» Loading dimension when source already has effective to and from dates
» Which date to be used for rolling up the data into monthly aggregate fact ?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum