Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
4 posters
Page 1 of 1
Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi folks,
I have maybe a basic question for some of you.
Background: We have a highly normalized Enterprise data warehouse with fact tables that use row versioning (compliance-enabled), that means, the fact tables have validfrom and validto timestamps like a type 2 SCD.
I want to design a fact table in the DM to aggregate a denormalized some data, in order to simplify the cube design and report creation but I'm little bit lost about how to aggregate these facts with the valid periods.
Let's say I have orders with different status (open, close, delayed, canceled, etc.) and the users want to determine how many articles are in orders with a given status at a given date or the sum of the net sales of canceled order in a given date.
I know I can leave the orders table like in the Enterprise data warehouse and then try to fulfill the analysis requirement with MDX but I also know that is better to have the proper design at the data base level.
Any comment would be appreciated,
Kind Regards,
Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
I have maybe a basic question for some of you.
Background: We have a highly normalized Enterprise data warehouse with fact tables that use row versioning (compliance-enabled), that means, the fact tables have validfrom and validto timestamps like a type 2 SCD.
I want to design a fact table in the DM to aggregate a denormalized some data, in order to simplify the cube design and report creation but I'm little bit lost about how to aggregate these facts with the valid periods.
Let's say I have orders with different status (open, close, delayed, canceled, etc.) and the users want to determine how many articles are in orders with a given status at a given date or the sum of the net sales of canceled order in a given date.
I know I can leave the orders table like in the Enterprise data warehouse and then try to fulfill the analysis requirement with MDX but I also know that is better to have the proper design at the data base level.
Any comment would be appreciated,
Kind Regards,
Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi,
do you have any field which idnetifis what is the currect stage of the order , like the order may be open yesterday but today it is closed which means when you print the status of the order today you will pick up the Closed record as it is the current status of the order.
thanks
Himanshu
do you have any field which idnetifis what is the currect stage of the order , like the order may be open yesterday but today it is closed which means when you print the status of the order today you will pick up the Closed record as it is the current status of the order.
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi, thanks for your answer.
Yes, there is a status column to identify the status between the validfrom and validto dates. What I would like to do is preaggregate the data in order to improve the performance in the cubes or to have a table structure that better represents the event "order status change". Like there are more than one status I am thinking in something like timestamp the changes and backout the measures for the previous status, but I have not found the satisfactory solution yet.
Kind Regards,
Paul
Yes, there is a status column to identify the status between the validfrom and validto dates. What I would like to do is preaggregate the data in order to improve the performance in the cubes or to have a table structure that better represents the event "order status change". Like there are more than one status I am thinking in something like timestamp the changes and backout the measures for the previous status, but I have not found the satisfactory solution yet.
Kind Regards,
Paul
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
If you want to track the event "order status change" then I would build a fact table that does just this i.e. when you load your order dim you work out if the status has changed and if it has then create a record in the fact table. The fact table would have 2 keys to a Status Dim for the old an new values (plus all your other relevant keys, obviously).
If you want an aggregate table that would tell you what the statuses were on any particular date you would have to create aggregate records for every date and status combination - regardless of whether there had been any status changes on a particular date.
This assumes that a status can only change once in a day - if it can change more frequently then you either can't build this aggregate (because your 'date' grain would need to go down to the minute, or something similar, which would not be sensible to implement) or you would need a business rule to define what an order's day's status actually means
If you want an aggregate table that would tell you what the statuses were on any particular date you would have to create aggregate records for every date and status combination - regardless of whether there had been any status changes on a particular date.
This assumes that a status can only change once in a day - if it can change more frequently then you either can't build this aggregate (because your 'date' grain would need to go down to the minute, or something similar, which would not be sensible to implement) or you would need a business rule to define what an order's day's status actually means
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
I would build an accumulating snapshot fact table to pivot the statuses. Include a date dimension FK for each status as well.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi,
from what you said it looks like you want the sum of net sales or quantity of orders by stages (this is on the assumption that the order can only be in one stage at a particular time).
If you have an active flag and a date range in the table then it would help you to get a status of the order at a praticular time.
How will you handle split orders or back orders ? will yousplit the main order ?
thanks
Himanshu
from what you said it looks like you want the sum of net sales or quantity of orders by stages (this is on the assumption that the order can only be in one stage at a particular time).
If you have an active flag and a date range in the table then it would help you to get a status of the order at a praticular time.
How will you handle split orders or back orders ? will yousplit the main order ?
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi everyone,
thanks do much for your answers.
One of the major difficulties is several corrections are perform in the central data warehouse and is really a haedache to transfer these corrections to the data mart, for this reasons we are trying to find a solution in which we can keep the structure of the fact table in the data warehouse. That is a row for each order number with a validto and validfrom timestamp plus a current field. For aggregations, like the sum of the net sales this is not a problem, you just have to filter the valid records with the current flag. For semi additiv measures, like the sum of the net sales in during Januar 2014, I have to determine the valid records for this specific period.
I created 3 time dimensions: Validfrom, Validto and Time, then I created the following calculated measure using the idea that I found here:
MDX Between Start Date and End Date
this is my mdx:
Maybe a good idea is to truncate the table in the datamart and fill it again instead of trying to correct the values and have a better structure.
Kind Regards,
Paul
thanks do much for your answers.
One of the major difficulties is several corrections are perform in the central data warehouse and is really a haedache to transfer these corrections to the data mart, for this reasons we are trying to find a solution in which we can keep the structure of the fact table in the data warehouse. That is a row for each order number with a validto and validfrom timestamp plus a current field. For aggregations, like the sum of the net sales this is not a problem, you just have to filter the valid records with the current flag. For semi additiv measures, like the sum of the net sales in during Januar 2014, I have to determine the valid records for this specific period.
I created 3 time dimensions: Validfrom, Validto and Time, then I created the following calculated measure using the idea that I found here:
MDX Between Start Date and End Date
this is my mdx:
- Code:
CREATE MEMBER CURRENTCUBE.[Measures].[Snapshot]
AS AGGREGATE( {NULL:LINKMEMBER([Time].[Date Id].CURRENTMEMBER,[Valid From].[Date Id])} * {NULL:LINKMEMBER([Time].[Date Id].CURRENTMEMBER,[Valid To].[Date Id])}, [Measures].[OOH Units]),
FORMAT_STRING = "Currency",
NON_EMPTY_BEHAVIOR = { [OOH Units] }
Maybe a good idea is to truncate the table in the datamart and fill it again instead of trying to correct the values and have a better structure.
Kind Regards,
Paul
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
I'm unclear what your ValidFrom and ValidTo columns are being used for as fact table rows record an event at a point in time (ignoring accumulating snapshots). So is your ValidFrom date = transaction date and ValidTo date = the transaction date for the next fact record for the same order?
Also, what is the ValidTo date for a Closed record - as presumably this status is valid until the end of time?
When you say you want to report on "the sum of the net sales in during Januar 2014" is a sale only 'registered' when the order has a status of 'Closed' (or whatever status value you are using)? Presumably you have an equivalent of transaction date in your model so the aggregate would just be:
SELECT * FROM SALES WHERE STATUS = 'CLOSED' AND TRANSACTION_DATE BETWEEN '01-JAN-2014' AND '31-JAN-2014'?
If, instead, you are trying to pick up any record that existed in Jan 2014, and use the most recent record for each order in that month, then you'll probably have to use some form of subquery, a bit like this (though this is psuedo-code so don't expect to run it):
SELECT * FROM SALES WHERE SALES.VALIDFROM =
(SELECT MAX(S1.VALIDFROM) FROM SALES S1
WHERE SALES.ORDER_ID = S1.ORDER_ID
AND
(
S1.VALIDFROM <= '01-JAN-2014' AND S1.VALIDTO >= '01-JAN-2014' //Row started before Jan and ended after
OR
S1.VALIDFROM BETWEEN '01-JAN-2014' AND '31-JAN-2014' //Row started in Jan
OR
S1.VALIDTO BETWEEN '01-JAN-2014' AND '31-JAN-2014' //Row ended in Jan
)
GROUP BY S1.ORDER_ID
)
Also, what is the ValidTo date for a Closed record - as presumably this status is valid until the end of time?
When you say you want to report on "the sum of the net sales in during Januar 2014" is a sale only 'registered' when the order has a status of 'Closed' (or whatever status value you are using)? Presumably you have an equivalent of transaction date in your model so the aggregate would just be:
SELECT * FROM SALES WHERE STATUS = 'CLOSED' AND TRANSACTION_DATE BETWEEN '01-JAN-2014' AND '31-JAN-2014'?
If, instead, you are trying to pick up any record that existed in Jan 2014, and use the most recent record for each order in that month, then you'll probably have to use some form of subquery, a bit like this (though this is psuedo-code so don't expect to run it):
SELECT * FROM SALES WHERE SALES.VALIDFROM =
(SELECT MAX(S1.VALIDFROM) FROM SALES S1
WHERE SALES.ORDER_ID = S1.ORDER_ID
AND
(
S1.VALIDFROM <= '01-JAN-2014' AND S1.VALIDTO >= '01-JAN-2014' //Row started before Jan and ended after
OR
S1.VALIDFROM BETWEEN '01-JAN-2014' AND '31-JAN-2014' //Row started in Jan
OR
S1.VALIDTO BETWEEN '01-JAN-2014' AND '31-JAN-2014' //Row ended in Jan
)
GROUP BY S1.ORDER_ID
)
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi Nick,
thanks for your dedicated answer.
What we have is a fact table like a type II SCD, when a record arrives for the first time, let´s say a line of an order, the validFrom date is the datetime of the current load and the validTo is assigned to a dummy datetime (9999-31-12). When new information for this line of this order arrives, then the validTo is marked with the current timestamp and another record is inserted:
What I would like to do, is find a way to calculate the sum of the Net and Gross Sales value at any given point in time using MDX or to store the data in a different structure that better represent what I want to measure
thanks for your dedicated answer.
What we have is a fact table like a type II SCD, when a record arrives for the first time, let´s say a line of an order, the validFrom date is the datetime of the current load and the validTo is assigned to a dummy datetime (9999-31-12). When new information for this line of this order arrives, then the validTo is marked with the current timestamp and another record is inserted:
What I would like to do, is find a way to calculate the sum of the Net and Gross Sales value at any given point in time using MDX or to store the data in a different structure that better represent what I want to measure
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
OK - if you need to be able to answer the question "... at any given point in time..." then I don't think it is possible to improve on the design of the fact table as you currently have it, assuming by "point in time" you mean date/day? You would have to create a summary/aggregate record for every single day and while not knowing your data volumes and environment I doubt this would be a sensible way forward.
Assuming this fact table is indexed appropriately then queries like this should perform adequately:
SELECT SUM(GROSSSALESVALUE), SUM(NETSALESVALUE)
FROM FACT_TABLE
WHERE 'given_date' BETWEEN VALIDFROM AND VALIDTO
As you can only have one record per order line valid on any one date, you can't double count lines, as long as you are selecting information on a single 'given date'.
If, instead, you need to be able to query for a given time period that is greater than a day (e.g. show me the figures for January) then you will need to define with your business people what is meant by an "order line" in a month (i.e. what are all the scenarios that can affect an order line and for each of them how does that affect how the line's attributes for that month are defined). You can then use these rules to collapse each order line into a single record and then aggregate these single records into a monthly summary table. Bear in mind that implementing any process like this is likely to result in an aggregate table that answers a very specific question and won't be a generalised solution to many user requirements - though of course you could create multiple aggregates tables to answer multiple questions if necessary.
Assuming this fact table is indexed appropriately then queries like this should perform adequately:
SELECT SUM(GROSSSALESVALUE), SUM(NETSALESVALUE)
FROM FACT_TABLE
WHERE 'given_date' BETWEEN VALIDFROM AND VALIDTO
As you can only have one record per order line valid on any one date, you can't double count lines, as long as you are selecting information on a single 'given date'.
If, instead, you need to be able to query for a given time period that is greater than a day (e.g. show me the figures for January) then you will need to define with your business people what is meant by an "order line" in a month (i.e. what are all the scenarios that can affect an order line and for each of them how does that affect how the line's attributes for that month are defined). You can then use these rules to collapse each order line into a single record and then aggregate these single records into a monthly summary table. Bear in mind that implementing any process like this is likely to result in an aggregate table that answers a very specific question and won't be a generalised solution to many user requirements - though of course you could create multiple aggregates tables to answer multiple questions if necessary.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Hi Nick,
thanks for your answer,
what I'm trying to do is actually translate this query to MDX,
At the moment my equivalent MDX query is as follows:
The Time dimension is a dummy dimension to let the user select a specific date.
The problem is due to the crossjoin the query takes about 2 minutes and that is not acceptable.
Kind Regards,
Paul
thanks for your answer,
what I'm trying to do is actually translate this query to MDX,
nick_white wrote:
Assuming this fact table is indexed appropriately then queries like this should perform adequately:
SELECT SUM(GROSSSALESVALUE), SUM(NETSALESVALUE)
FROM FACT_TABLE
WHERE 'given_date' BETWEEN VALIDFROM AND VALIDTO
At the moment my equivalent MDX query is as follows:
- Code:
with member [Measures].[OOH] as aggregate(
{null:linkmember([Time].[Date Id].CurrentMember,[Valid From].[Date Id])}
*
{linkmember([Time].[Date Id].CurrentMember,[Valid To].[Date Id]):NULL}
,[Measures].[OOH Units]
)
select [Measures].[OOH] on 0,
[Time].[Date Id].&[2579]: [Time].[Date Id].&[2581] on 1
The Time dimension is a dummy dimension to let the user select a specific date.
The problem is due to the crossjoin the query takes about 2 minutes and that is not acceptable.
Kind Regards,
Paul
Re: Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Sorry, I'm afraid I can't help you with this. I'm from an Oracle background and have never used MDX
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Similar topics
» Insurance - Policies and Riders Periodic Snapshot Fact Table Design Suggestion
» How to I recognize a fact table in a data warehouse
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Variable period data in a single periodic snapshot fact table
» Accumulating Snapshot Fact Table Data Model (Order Management)
» How to I recognize a fact table in a data warehouse
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Variable period data in a single periodic snapshot fact table
» Accumulating Snapshot Fact Table Data Model (Order Management)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum