Large number of late arriving facts

Jeff Smith Tue Feb 08, 2011 10:47 am

I am loading a Star Schema from a data base that acts as the staging area. We are converting our source systems to a new transaction system. We are converting data to the new transaction systems in large chunks with conversion dates spread over 3 years. I am only loading data from the staging area only after it has been converted to the new transaction system. When converted, transaction system will contain 3 years of history. As a result, after each conversion, I get 3 years of data.

On the fact table, i want to create a clustered index on Paid Date (of the insurance claim). I am afraid that after a 3 year chunk of data gets converted, the load process will be brought to it's knees. I'd drop the index, load the 3 years worth of history and then rebuild the index. I'm afraid that rebuilding the index will bring the load process to it's knees.

I'm trying to figure out ways to facilitate the historical data. I was thinking about partitioning the table based on the Paid Date, which would limit the amount of data that had to be resorted at one time when the clustered index was rebuilt. I was also considering partitioning the table based on 2 factors - paid date on clients - the conversion schedule is based on clients and client data is a typical filter when querying the fact table.

Thoughts? How would I partition a fact table on 2 unrelated attributes?

BoxesAndLines Tue Feb 08, 2011 2:30 pm

I wouldn't focus on historical conversions. I would focus how the table is loaded and queried normally. Historical conversions are always painful which is why they are often done over the weekend.

Large number of late arriving facts

Large number of late arriving facts

Re: Large number of late arriving facts