Deriving Dimension attributes from Fact table
3 posters
Page 1 of 1
Deriving Dimension attributes from Fact table
Hi,
I have a Customer_Activity fact table which stores the the Activity Start & End Date values. I need to derive the last time the customer had an activity before the current date (based on the Activity End Date - MAX(Activity End Date) WHERE Activity End Date < SYSDATE) and the next planned activity for the customer (based on the Activity Start Date - MIN(Activity Start Date) WHERE Activity Start Date >= SYSDATE) - and the corresponding Activity Types for those dates. Since these dates are stored in the fact table itself, I would have to do an in-line query to look up the same fact table to derive the Last/next Activity Dates grouped on the Customer ID on the Fact table.
If we do it at the reporting end, then it will be an expensive query and will take more time to bring in the resultsets; so the idea was to derive it at the ETL layer.
The above derivation would fetch the last/next Activity Dates and their tyopes along with the Customer ID. We could either store in a new table and join that table at the data mart layer to the main Customer dimension or actually add a further derivation to join it at the DW-to-Data Mart transformation layer and store these attributes in the main customer dimension itself.
Is it correct to derive these Last/Next Customer Activity DAtes and Types from the fact table and store THEM in the Customer dimension during the warehouse to data mart transformation stage?
ANy alternate solution for this?
Thanks & Regards
I have a Customer_Activity fact table which stores the the Activity Start & End Date values. I need to derive the last time the customer had an activity before the current date (based on the Activity End Date - MAX(Activity End Date) WHERE Activity End Date < SYSDATE) and the next planned activity for the customer (based on the Activity Start Date - MIN(Activity Start Date) WHERE Activity Start Date >= SYSDATE) - and the corresponding Activity Types for those dates. Since these dates are stored in the fact table itself, I would have to do an in-line query to look up the same fact table to derive the Last/next Activity Dates grouped on the Customer ID on the Fact table.
If we do it at the reporting end, then it will be an expensive query and will take more time to bring in the resultsets; so the idea was to derive it at the ETL layer.
The above derivation would fetch the last/next Activity Dates and their tyopes along with the Customer ID. We could either store in a new table and join that table at the data mart layer to the main Customer dimension or actually add a further derivation to join it at the DW-to-Data Mart transformation layer and store these attributes in the main customer dimension itself.
Is it correct to derive these Last/Next Customer Activity DAtes and Types from the fact table and store THEM in the Customer dimension during the warehouse to data mart transformation stage?
ANy alternate solution for this?
Thanks & Regards
dwbi_rb- Posts : 17
Join date : 2009-02-19
Re: Deriving Dimension attributes from Fact table
I think it's OK to store that information on the dimension table.
I recall one place I worked where the definition of an "active" account was based on the number of days since it was last used along with the balance on the account. We were loading 150 million transactions a month. It was easiest to assign the "Last Activity Date" on the account dimension.
I recall one place I worked where the definition of an "active" account was based on the number of days since it was last used along with the balance on the account. We were loading 150 million transactions a month. It was easiest to assign the "Last Activity Date" on the account dimension.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Deriving Dimension attributes from Fact table
Its fine as long as the customer dimension is not type 2.... or if it is, you treat last activity date as a type 1 attribute in a type 2 dimension (i.e. update all versions of customer).
Deriving Dimension attributes from Fact table
Thanks a lot for confirming on this!
The in-line query will be an expensive one as it would join the fact table itself. More so, since we would have to find the "Activity Type" based on the MAX/MIN date (logic as explained in my previous post) from the same fact table.
A discussion within the team was that the time taken to derive this would increase the ETL refresh time and whether this can be derived at the reporting layer. I feel that though at the reporting layer, it may return the intial partial results in a lesser time, it would actually be a very expensive query and would involve more response time for the end user. Based on my experience, I strongly believe such derivations should be done at the ETL layer so that the reporting response time is the least for the end user. It would be great if I could get your comments on this as well!
Secondly, the query to derive this is an expensive one - The structure of the query that would be required to derive this logic is given below. Please suggest if there is a better alternative. By concatenating the date and the activity type, I thouth of reducing the further joins to the fact table that would have been required if we wanted to derive the 4 values individually.
SELECT fact.customer_id
, (SELECT t1.activity_end_dt || '*' || t1.activity_type
FROM fact_table t1
WHERE t1.customer_id = fact.customer_id
AND t1. activity_end_dt = ( SELECT MAX(v1.activity_end_dt)
FROM fact_table v1
WHERE v1.activity_end_dt < sysdate
AND v1.customer_id = t1.customer_id
)
) last_con_dt_typ
, (SELECT t1.activity_start_dt || '*' || t1.activity_type
FROM fact_table t1
WHERE t1.customer_id = fact.customer_id
AND t1. activity_start_dt = ( SELECT MIN(v1.activity_start_dt)
FROM fact_table v1
WHERE v1.activity_end_dt >= sysdate
AND v1.customer_id = t1.customer_id
)
) next_con_dt_typ
from fact_table fact ;
The in-line query will be an expensive one as it would join the fact table itself. More so, since we would have to find the "Activity Type" based on the MAX/MIN date (logic as explained in my previous post) from the same fact table.
A discussion within the team was that the time taken to derive this would increase the ETL refresh time and whether this can be derived at the reporting layer. I feel that though at the reporting layer, it may return the intial partial results in a lesser time, it would actually be a very expensive query and would involve more response time for the end user. Based on my experience, I strongly believe such derivations should be done at the ETL layer so that the reporting response time is the least for the end user. It would be great if I could get your comments on this as well!
Secondly, the query to derive this is an expensive one - The structure of the query that would be required to derive this logic is given below. Please suggest if there is a better alternative. By concatenating the date and the activity type, I thouth of reducing the further joins to the fact table that would have been required if we wanted to derive the 4 values individually.
SELECT fact.customer_id
, (SELECT t1.activity_end_dt || '*' || t1.activity_type
FROM fact_table t1
WHERE t1.customer_id = fact.customer_id
AND t1. activity_end_dt = ( SELECT MAX(v1.activity_end_dt)
FROM fact_table v1
WHERE v1.activity_end_dt < sysdate
AND v1.customer_id = t1.customer_id
)
) last_con_dt_typ
, (SELECT t1.activity_start_dt || '*' || t1.activity_type
FROM fact_table t1
WHERE t1.customer_id = fact.customer_id
AND t1. activity_start_dt = ( SELECT MIN(v1.activity_start_dt)
FROM fact_table v1
WHERE v1.activity_end_dt >= sysdate
AND v1.customer_id = t1.customer_id
)
) next_con_dt_typ
from fact_table fact ;
Last edited by dwbi_rb on Thu Oct 14, 2010 9:17 am; edited 1 time in total (Reason for editing : trying to make the SQL query more readable with further indentations - but I doubt if it would work as the tabs did not either.)
dwbi_rb- Posts : 17
Join date : 2009-02-19
Re: Deriving Dimension attributes from Fact table
If its a choice between a little more complex ETL process versus a more complex and slower query, always choose changing the ETL. Its a one-time hit when nothing of importance is going on versus a degraded user experience.
Similar topics
» How to Model Fact table having 1:1 relationship with key Dimension attributes
» dimension table design question for around 100 attributes and higher level calculated attributes
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
» YTD attributes in fact table
» dimension table design question for around 100 attributes and higher level calculated attributes
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
» YTD attributes in fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum