data model architecture for economic forecast data

anakella Sun Dec 26, 2010 6:58 am

Hi All,

I have to design data model for forecast data and I am a bit confused with how to go about it. The following are the two approaches I have taken for my task and would appreciate any help regarding this topic. There is one source file for each country and we receive files every quarterly, half, annually depending on the country. The frequency for new files may change in future.

1) Star Schema with the following FACTS and dimensions

1) Transactional Fact(Metric DESC, Metric Value, Date Key, Country Key)
Dimensions: COUNTRY ( Country KEy, Country Code, Country DESC)
DATE(Date Key, Year, Quarter)

Since there would be files to be updated at irregular intervals to this schema(quarterly, half, annually depending on the country- at best i have to update with new data every quarter), I have come to a decision to use a Snapshot Fact for every quarter and update my transaction table with new values. This would preserve the history of my fact table and also would help to maintain the current hot rolling quarter data for reporting purposes. IS this approach correct.

2) DATE/TIME as fact table and metrics as one of my dimension and country as other dimension

This would enable to help track history of the metrics using SCD2 but how far is this approach correct.

Any advise would be a great help.

Thanks,

Nag

hang Sun Dec 26, 2010 5:07 pm

anakella wrote:2) DATE/TIME as fact table and metrics as one of my dimension and country as other dimension

Why is DATE/TIME a fact table. I would think you have a date dimension with daily grain. Year, Quarter and Month etc. are just attributes at aggregate levels in you date dimension. Time is a confusing term and should only be used when hour, minute or second are required. I don't see any such requirement in your case.

anakella wrote:This would enable to help track history of the metrics using SCD2 but how far is this approach correct

SCD2 is a technique used to track change history for dimension not for fact, as your snapshot will capture the picture of the fact as the transactions become available at whatever point of time you would like to load them. Don't be concerned about repeating your fact records for those unchanged facts at each snapshot load. With the Periodic Snapshot Fact, you should only insert your factual records into the fact table and not update the table. The metrics in such a fact table is not additive along the date dimension.

anakella Mon Dec 27, 2010 12:30 pm

thanks hang, that was helpful.
One more question is the forecast is a time series i mean a set of predicted values for future at a given date (currently 20 values-one value for each quarter for next 5 years) and for every new file for each country there are new set of values for the same metrics and depending on whether its 3rd or 4th quarter file there would be values added for a new forecast year(ex 3rd quarter 2010 file- there are forecasts added for 2015 all four quarters metrics), similarly if its 2010 first or 2nd quarter then the snapshots shows 2009 YOY values(YOY is independent metric calculated by business and cannot be aggregated from individual time series quarter values) and the forecast for rest of 2010, till 2014.
Any help regarding how to implement this in a star schema would be really helpful.

hang Mon Dec 27, 2010 6:07 pm

I guess you don't want any SCD2 dimensions, meaning only current version of dimension is useful.

It looks like you need to deal with two sets of time series, time associated with your forecast and time when the forecast becomes available (load time). Is your forecast file getting bigger and bigger, meaning the latest file contains all the historical forecast plus new forecast. Is there any change to the past forecast in the new source file, do you want to keep the changes or you only want to apply the latest values.

Assuming you want to keep all the historical forecasts in your fact table, you need to have two time series entries for date dimension (role playing dimension), one for your forecast quarters, one for the snapshot date (i.e. load date). The first one gives you the time series (quarters) context for the forecast, whereas the snapshot date will partition your forecast fact into a number of versions of such fact-context combinations. You may not need an aggregate dimension at quarter level if you nominate a date to represent the quarter, say first or last day of the quarter.

BTW, you should have a metric dimension and only allow MetricKey instead of MetricDesc in your forecast fact table.

Jeff Smith Tue Dec 28, 2010 10:44 am

Why can't you drop and reload the Fact Table?

Forecasts regularly change. Unless there is a need to keep the changed forecasts, drop and reload the table. If you want to keep forecasts that have past, then create 2 fact tables - current and history - and a union view. A forecast is loaded from the Current Table into the Histroy once the forecast period has passed. The Current table is truncated and reloaded with the new forecast.

Sponsored content

data model architecture for economic forecast data

data model architecture for economic forecast data

Re: data model architecture for economic forecast data

Re: data model architecture for economic forecast data

Re: data model architecture for economic forecast data

Re: data model architecture for economic forecast data

Re: data model architecture for economic forecast data