Anti-aliasing time series data in a data warehouse?
3 posters
Page 1 of 1
Anti-aliasing time series data in a data warehouse?
I'm not sure that "anti-aliasing of time series data" is the correct terminology to use, so let me explain:
I have some sources of data that are aligned quarterly, mostly to do with quarterly running costs. I have some other sources of data, some of which are aligned weekly, and some monthly, and these are mostly related to transactions that took place. By the term 'anti-aliasing', I am referring to the problem of how to represent this data on a common time granularity, so that it can all be integrated together into a data warehouse.
How do I get this data which does not align through time, into a data warehouse? The alignment problem mostly stems from the fact that weeks do not exactly fit into months or quarters. The months against quarters issue is less of a problem, since 3 months make a quarter.
For example, if I know the quarterly running cost, should I divide that up, and attribute an equal share of it to each day in the quarter. Then if I know the number of transactions completed on a weekly basis, should I divide them up and attribute them equally to each day of the week. That would artificially break down the grain of the data into days, and allow me to roll it up by week, month or quarter.
(Is there a more correct terminology in data-warehousing for this problem?)
I have some sources of data that are aligned quarterly, mostly to do with quarterly running costs. I have some other sources of data, some of which are aligned weekly, and some monthly, and these are mostly related to transactions that took place. By the term 'anti-aliasing', I am referring to the problem of how to represent this data on a common time granularity, so that it can all be integrated together into a data warehouse.
How do I get this data which does not align through time, into a data warehouse? The alignment problem mostly stems from the fact that weeks do not exactly fit into months or quarters. The months against quarters issue is less of a problem, since 3 months make a quarter.
For example, if I know the quarterly running cost, should I divide that up, and attribute an equal share of it to each day in the quarter. Then if I know the number of transactions completed on a weekly basis, should I divide them up and attribute them equally to each day of the week. That would artificially break down the grain of the data into days, and allow me to roll it up by week, month or quarter.
(Is there a more correct terminology in data-warehousing for this problem?)
rupertlssmith- Posts : 4
Join date : 2015-08-14
Re: Anti-aliasing time series data in a data warehouse?
The by-the-book approach is to load data at the transactional level, which mitigates most of the issues.
Since you have aggregate data, you need dimensions to represent the different time periods(week, month, etc.) and hierarchy bridges to handle rollups.
As far as weeks into months goes, it requires business rules. You may need to allocate or pick a day of the week that identifies the month. Talk to the business. They are the only ones who can answer your questions.
Since you have aggregate data, you need dimensions to represent the different time periods(week, month, etc.) and hierarchy bridges to handle rollups.
As far as weeks into months goes, it requires business rules. You may need to allocate or pick a day of the week that identifies the month. Talk to the business. They are the only ones who can answer your questions.
Re: Anti-aliasing time series data in a data warehouse?
Thanks. I had also read this article, which explains that the problem fundamentally comes down to not capturing the data at the individual transaction grain:
http://www.kimballgroup.com/2007/07/keep-to-the-grain-in-dimensional-modeling/
I hope that we will eventually be able to capture data at the transactional grain, but this data is being gathered over multiple organisations with varying levels of technical know-how, so it seems very likely that we will have to cope with awkward data. I think when we do get to that point, we'll have a table for the individual transactions, and roll some of that up into the summary table that provides stats on a monthly or quarterly basis.
I'm pushing this back to the business - it is also possible that different organisations have different accounting periods that do not align (but I hope not...).
Just one question: Is my proposed scheme of attributing the cost-per-quarter, to individual days within the quarter, by dividing by the number of days a common thing to do when re-aligning data? Or would I be doing something very wrong by taking that approach?
To my thinking, this would be the most accurate way of re-aligning weekly/monthly/quarterly data, as a quarter end that breaks a week would be partially attributed to the week, weighted by the number of days of the week that are overlapped by. These values are going to be used to calculate some KPIs as ratios, such as cost-per-transaction; the KPIs are artificial performance measures, so an artificial re-arrangement of data to produce them doesn't seem wrong to me.
http://www.kimballgroup.com/2007/07/keep-to-the-grain-in-dimensional-modeling/
I hope that we will eventually be able to capture data at the transactional grain, but this data is being gathered over multiple organisations with varying levels of technical know-how, so it seems very likely that we will have to cope with awkward data. I think when we do get to that point, we'll have a table for the individual transactions, and roll some of that up into the summary table that provides stats on a monthly or quarterly basis.
I'm pushing this back to the business - it is also possible that different organisations have different accounting periods that do not align (but I hope not...).
Just one question: Is my proposed scheme of attributing the cost-per-quarter, to individual days within the quarter, by dividing by the number of days a common thing to do when re-aligning data? Or would I be doing something very wrong by taking that approach?
To my thinking, this would be the most accurate way of re-aligning weekly/monthly/quarterly data, as a quarter end that breaks a week would be partially attributed to the week, weighted by the number of days of the week that are overlapped by. These values are going to be used to calculate some KPIs as ratios, such as cost-per-transaction; the KPIs are artificial performance measures, so an artificial re-arrangement of data to produce them doesn't seem wrong to me.
Last edited by rupertlssmith on Thu Sep 10, 2015 4:15 am; edited 1 time in total (Reason for editing : typo)
rupertlssmith- Posts : 4
Join date : 2015-08-14
Anti-aliasing time series data in a data warehouse?
You should work with the business user to get them on the same page to do this artificial re-arrangement of data. You approach sounds good but business users should be on board to sign this off.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Similar topics
» Real time datawarehousing
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» Seeking Director of Data Warehouse Development for NYC Institution (Full-Time)
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» Time dimension design for cross time zone and custom calendar warehouse, 1min granularity
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» Seeking Director of Data Warehouse Development for NYC Institution (Full-Time)
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» Time dimension design for cross time zone and custom calendar warehouse, 1min granularity
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum