Data Warehouse Purge Strategy - HELP
2 posters
Page 1 of 1
Data Warehouse Purge Strategy - HELP
I'm looking at the best way to implement a purge strategy in a dimensional model. The reason to purge is that no personal data can be kept after 4 years, but the users would still want to see counts/facts by no identifying data. For example i couldn't keep a name, but i could keep the gender or employer.
Is it best to do this by depersonalising data e.g. setting certain column to NULL after a number of years, or adopting an aggregate strategy. The data volumes are not huge, but if i go the aggregate route i'm going to be left with quite a lot of duplication, as opposed to just keeping all the detail data and NULL'ing certain columns, after a set time. Doing this aslo avoids any issues with lack of aggregate aware tools.
TIA
Is it best to do this by depersonalising data e.g. setting certain column to NULL after a number of years, or adopting an aggregate strategy. The data volumes are not huge, but if i go the aggregate route i'm going to be left with quite a lot of duplication, as opposed to just keeping all the detail data and NULL'ing certain columns, after a set time. Doing this aslo avoids any issues with lack of aggregate aware tools.
TIA
Re: Data Warehouse Purge Strategy - HELP
Keep it simple, null the columns (or replace with some string, such as 'N/A' or 'Obsolete') you need to drop.
Re: Data Warehouse Purge Strategy - HELP
Thanks, it does seem to make a lot of sense to keep it simple, so i will be looking at purging specific data from my selected columns after a certain period of time.
I'll only look at an Aggregate strategy when storage/space issue becomes important.
I think what i will also do is to add a flag to each record where a purge can occur to add clarity that a column value has been purged rather than missing data.
I'll only look at an Aggregate strategy when storage/space issue becomes important.
I think what i will also do is to add a flag to each record where a purge can occur to add clarity that a column value has been purged rather than missing data.
Re: Data Warehouse Purge Strategy - HELP
Yes, adding such a column is a good idea. You may also want to include a timestamp as well.
Similar topics
» data warehouse and data warehouse system
» Some very basic question regarding overall ETL and Modeling strategy/best practices for Data warehouse project
» difference between data mart and data warehouse at logical/physical level
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» Some very basic question regarding overall ETL and Modeling strategy/best practices for Data warehouse project
» difference between data mart and data warehouse at logical/physical level
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum