Adding a new Dimension to the Star-Schema
4 posters
Page 1 of 1
Adding a new Dimension to the Star-Schema
Dear experts,
I’m currently handling with a for me not-solvable problem.
I want to add/implement a new Dimension to my star-schema.
The fact table is already connected throw surrogate keys to several dimensions. A few of the dimension tables are implemented as scd-2 dimensions.
To add a new dimension means for me to change the DDL-structure of the fact table (a new FK-Column) and to connect the new dimension table to the fact table and load the fact table in full-mode (Initial-mode). By this, I would lost the history of the scd-2 dimensions.
Is there any ETL-solution for implementing changes of star-schema in a best way?
Any advice is welcome.
Thanks and best regards,
Abgoosht
I’m currently handling with a for me not-solvable problem.
I want to add/implement a new Dimension to my star-schema.
The fact table is already connected throw surrogate keys to several dimensions. A few of the dimension tables are implemented as scd-2 dimensions.
To add a new dimension means for me to change the DDL-structure of the fact table (a new FK-Column) and to connect the new dimension table to the fact table and load the fact table in full-mode (Initial-mode). By this, I would lost the history of the scd-2 dimensions.
Is there any ETL-solution for implementing changes of star-schema in a best way?
Any advice is welcome.
Thanks and best regards,
Abgoosht
abgoosht- Posts : 5
Join date : 2012-01-24
Re: Adding a new Dimension to the Star-Schema
Adding dimensions to an existing fact can be a difficult undertaking depending on the reasons the dimension is added.
The simplest is you are introducing a new source that has additional attributes not available from other sources. It's easy because there is no expectation or ablilty to historically assign that dimension to the old facts... they just wind up pointing to a default "not applicable" dimension row.
If it was a dimension that should have been part of the model in the first place, but omitted in the original design, then the usual expectation is to assign the dimension to past facts, which can be very difficult to do. Re-loading the fact table is a last resort, and filled with potential problems, particularly when you have type 2 dimensions. You should attempt to populate the fact FK's using a non-disruptive update process as far back as you can. Rows you cannot update should reference an "unknown" row in the dimension table. Not a simple task.
The simplest is you are introducing a new source that has additional attributes not available from other sources. It's easy because there is no expectation or ablilty to historically assign that dimension to the old facts... they just wind up pointing to a default "not applicable" dimension row.
If it was a dimension that should have been part of the model in the first place, but omitted in the original design, then the usual expectation is to assign the dimension to past facts, which can be very difficult to do. Re-loading the fact table is a last resort, and filled with potential problems, particularly when you have type 2 dimensions. You should attempt to populate the fact FK's using a non-disruptive update process as far back as you can. Rows you cannot update should reference an "unknown" row in the dimension table. Not a simple task.
Re: Adding a new Dimension to the Star-Schema
Think about how you'll be working out the FK to the dim during your normal incremental loading, then try to build a SQL update query to achieve the same thing as a one-off backfill exercise when you release the new dimension.
You'll probably have something in your fact table or one of the existing dimensions to which it is linked that will enable to join to the source table where the business key for the dimension lies, and once you know that you can look up the FK.
e.g. If you were adding salesperson to an orders fact, you'd hopefully have an existing degenerate dimension for some sort of order_reference that would exist in the source table. You could do something like the following to get to the correct FK (I'm making an assumption that you'd have a zero key to represent 'Unknown'):
UPDATE fact_orders SET dim_salesperson_key = coalesce(dim_salesperson.dim_salesperson_key, 0)
FROM fact_orders
LEFT JOIN orders_source_table ON fact_orders.order_reference = orders_source.order_reference
LEFT JOIN dim_salesperson ON orders_source_table.salesperson_business_key = dim_salesperson.salesperson_business_key
It might not be quite that simple and it might be you have to join through a few tables before you can get to the business key of the dimension, but hopefully you get the gist.
Depending on the size of your fact table you might need to update the FK's in batches to avoid excessive log growth etc. and you might also want to consider dropping / rebuilding indexes on the fact either side of the update.
You'll probably have something in your fact table or one of the existing dimensions to which it is linked that will enable to join to the source table where the business key for the dimension lies, and once you know that you can look up the FK.
e.g. If you were adding salesperson to an orders fact, you'd hopefully have an existing degenerate dimension for some sort of order_reference that would exist in the source table. You could do something like the following to get to the correct FK (I'm making an assumption that you'd have a zero key to represent 'Unknown'):
UPDATE fact_orders SET dim_salesperson_key = coalesce(dim_salesperson.dim_salesperson_key, 0)
FROM fact_orders
LEFT JOIN orders_source_table ON fact_orders.order_reference = orders_source.order_reference
LEFT JOIN dim_salesperson ON orders_source_table.salesperson_business_key = dim_salesperson.salesperson_business_key
It might not be quite that simple and it might be you have to join through a few tables before you can get to the business key of the dimension, but hopefully you get the gist.
Depending on the size of your fact table you might need to update the FK's in batches to avoid excessive log growth etc. and you might also want to consider dropping / rebuilding indexes on the fact either side of the update.
PeteGrace- Posts : 7
Join date : 2011-09-01
Re: Adding a new Dimension to the Star-Schema
Thank You very much for your advices. I see, structural changes are not trivial to handle and there is no "best practice" existing for this kind of expandabilities. So i have to find a solution for my case.
Thanks and best regards,
Abgoosht
Thanks and best regards,
Abgoosht
abgoosht- Posts : 5
Join date : 2012-01-24
Re: Adding a new Dimension to the Star-Schema
Best practice is try to avoid the situation by including all the dimensions when you build the fact table initially. Unfortunately most project managers don't understand that.
Re: Adding a new Dimension to the Star-Schema
It is trivial if you don't need to capture history for the new dimension. Add the FK, update ETL, you're done. I don't understand your point on losing history on my type 2's when adding a new dimension. You're doing something wrong if you do.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Fact 1:N Dimension with Star Schema
» How to Handle Updates to Dimension Attributes (SCD 1) that have a Cardinality > 1 (Context is Data Warehouse Star Schema)
» Snowflake or Star Schema?
» Star Schema vs All in one table
» star schema designing
» How to Handle Updates to Dimension Attributes (SCD 1) that have a Cardinality > 1 (Context is Data Warehouse Star Schema)
» Snowflake or Star Schema?
» Star Schema vs All in one table
» star schema designing
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|