Rebuilding Dimension, effect on fact table
4 posters
Page 1 of 1
Rebuilding Dimension, effect on fact table
Hi,
Firstly please excuse me being new to Data Warehousing, my terminologies are maybe not 100%!
I have built a data warehouse mostly from self taught forums, questions, books, etc. I started with my data modelling. I built a reasonable environment (in my opinion) but after it was put live, i started to receive many request for additional attributes within dimensions. I am struggling to know how to handle this in my live environment. What I currently do is add the atributes (or additional snowflaked dimensions) to my model, then i rebuild the dimension, after which my primary keys (which i auto generate) change. This forces me to perform a full drop and reload of all my facts that are linked to these affected dimensions. The reason i have to do this is becuase when i perform my lookups for keys on the dimensions, i drop the business key from my flow and don't store it in my fact table.
One idea i had was to keep the business keys in the fact tables next to the joined keys but suppress them in a view in my source DB. This way i could run a series of update scripts to refresh the keys. Is this approach common? Is there a better way? Am I looking at this totally the wrong way?
I am not sure if this is the right topic to post into, so please move it if necessary.
Thanks in advance,
Dom
Firstly please excuse me being new to Data Warehousing, my terminologies are maybe not 100%!
I have built a data warehouse mostly from self taught forums, questions, books, etc. I started with my data modelling. I built a reasonable environment (in my opinion) but after it was put live, i started to receive many request for additional attributes within dimensions. I am struggling to know how to handle this in my live environment. What I currently do is add the atributes (or additional snowflaked dimensions) to my model, then i rebuild the dimension, after which my primary keys (which i auto generate) change. This forces me to perform a full drop and reload of all my facts that are linked to these affected dimensions. The reason i have to do this is becuase when i perform my lookups for keys on the dimensions, i drop the business key from my flow and don't store it in my fact table.
One idea i had was to keep the business keys in the fact tables next to the joined keys but suppress them in a view in my source DB. This way i could run a series of update scripts to refresh the keys. Is this approach common? Is there a better way? Am I looking at this totally the wrong way?
I am not sure if this is the right topic to post into, so please move it if necessary.
Thanks in advance,
Dom
DomCotton- Posts : 5
Join date : 2013-10-01
Re: Rebuilding Dimension, effect on fact table
Dimension surrogate primary keys should never change. A process that requires fact table keys to be reassigned is not sustainable.
You need to rethink how you maintain dimension tables.
You need to rethink how you maintain dimension tables.
Re: Rebuilding Dimension, effect on fact table
Hi,
Thanks for responding. Is anybody able to direct me to a "best practice" or some other examples of how to achieve what i need to?
My understanding is that as my business key is alphanumeric i generate a surrogate using an SQL Server identity column - this is my problem as i cannot guarantee the sequence that is generated each time i have to drop and reload.
So... is it the dropping and reloading that is causing me the problem? or should i generate a surrogate key a different way ensuring it gets the same key after any necessary rebuild?
Hoping somebody can direct me
Thanks,
Dom
Thanks for responding. Is anybody able to direct me to a "best practice" or some other examples of how to achieve what i need to?
My understanding is that as my business key is alphanumeric i generate a surrogate using an SQL Server identity column - this is my problem as i cannot guarantee the sequence that is generated each time i have to drop and reload.
So... is it the dropping and reloading that is causing me the problem? or should i generate a surrogate key a different way ensuring it gets the same key after any necessary rebuild?
Hoping somebody can direct me
Thanks,
Dom
DomCotton- Posts : 5
Join date : 2013-10-01
Re: Rebuilding Dimension, effect on fact table
ETL Toolkit. Read the whole book. It's filled with answers to these types of questions.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Rebuilding Dimension, effect on fact table
And yes, the dropping and rebuilding is causing the problem.
Re: Rebuilding Dimension, effect on fact table
Thanks, i will check out that book.
I have been thinking about this and had a question, maybe its dumb but i will ask it anyway
Would it work do you think to have an intermediate dimension table that links to my fact that has a surrogate key, business key, and then a dimension key - so something like this:
Fact______Intermediate Dimension______Real Dimension
.
pkDim---->pkDim
. businessKey
. pkRealDim------------------>pkRealDim
. businessKey
. Attribute 1
. Attribute 2
. etc
In my mind this means that I can drop and reload the "real" dimension when i need to add new attributes, etc but it leaves the pkDim link from fact to Intermediate Dimension intact?
Thanks,
Dom
I have been thinking about this and had a question, maybe its dumb but i will ask it anyway
Would it work do you think to have an intermediate dimension table that links to my fact that has a surrogate key, business key, and then a dimension key - so something like this:
Fact______Intermediate Dimension______Real Dimension
.
pkDim---->pkDim
. businessKey
. pkRealDim------------------>pkRealDim
. businessKey
. Attribute 1
. Attribute 2
. etc
In my mind this means that I can drop and reload the "real" dimension when i need to add new attributes, etc but it leaves the pkDim link from fact to Intermediate Dimension intact?
Thanks,
Dom
DomCotton- Posts : 5
Join date : 2013-10-01
Re: Rebuilding Dimension, effect on fact table
No. Your primary objective should be to allow queries to be as simple as possible. The approach goes in the wrong direction. Why not simply do a proper update process for your dimensions?
Re: Rebuilding Dimension, effect on fact table
Hi,
I thought that in my example the intelligence would be hidden within the DSV and the Dimension would present the same attributes as it does now, with the only difference being that the fact table would always hold the same primary key? Anyway was just a thought...
Regarding the proper update process for my dimensions... sort of what i'm looking for as I don't know how to achieve that.
I guess I will have to buy that book and get reading!
Thanks for everybody's time.
I thought that in my example the intelligence would be hidden within the DSV and the Dimension would present the same attributes as it does now, with the only difference being that the fact table would always hold the same primary key? Anyway was just a thought...
Regarding the proper update process for my dimensions... sort of what i'm looking for as I don't know how to achieve that.
I guess I will have to buy that book and get reading!
Thanks for everybody's time.
DomCotton- Posts : 5
Join date : 2013-10-01
Re: Rebuilding Dimension, effect on fact table
A dimension table has two keys: the surrogate primary key and the business key. The latter serves as an alternate key and is used when applying updates. So... with the business key from your source, update columns in the dimension table using that key. If a row with that key does not exist, create a new one.
Use ALTER TABLE to add new columns, and modify your update process to accommodate them. As you are using SQL Server, I believe SSIS has wizards to implement dimension table update processes.
Use ALTER TABLE to add new columns, and modify your update process to accommodate them. As you are using SQL Server, I believe SSIS has wizards to implement dimension table update processes.
Re: Rebuilding Dimension, effect on fact table
Hi,
So one last question, i have a couple of larger dimensions where i am using CDC (my source has a date/time last updated). So in my dimension build i only take the newest rows to update each time its run. So if i want to add new fields, i have to fully read the source table again. This is sort of where i am getting confused.
Would it be best to write a specific update process script when adding new fields/attributes, i.e. alter table, read everything from source system (regardless of last updated date), then update rows with new information. This would preserve the existing key on the fact table?
Many thanks again,
Dom
So one last question, i have a couple of larger dimensions where i am using CDC (my source has a date/time last updated). So in my dimension build i only take the newest rows to update each time its run. So if i want to add new fields, i have to fully read the source table again. This is sort of where i am getting confused.
Would it be best to write a specific update process script when adding new fields/attributes, i.e. alter table, read everything from source system (regardless of last updated date), then update rows with new information. This would preserve the existing key on the fact table?
Many thanks again,
Dom
DomCotton- Posts : 5
Join date : 2013-10-01
Re: Rebuilding Dimension, effect on fact table
You shouldn't add columns in a piecemeal fashion. It's really inefficient. Get everything that makes sense the first time through.
If you have an existing, proper, update process, when you add a new column you modify that code to maintain those columns. There is no need for 'special' code. Just extract everything and run it through... its a one-time hit.
If you have an existing, proper, update process, when you add a new column you modify that code to maintain those columns. There is no need for 'special' code. Just extract everything and run it through... its a one-time hit.
Re: Rebuilding Dimension, effect on fact table
Your primary objective should be to allow queries to be as simple as possible. The approach goes in the wrong direction. Why not simply do a proper update process for your dimensions???
_____________
Get free demos for Testking istqb certification exam and mcts exam nclex questions with guaranteed success. Our best quality prepares you well before appearing in the final exams of rasmussen &mcts www.loyola.edu
_____________
Get free demos for Testking istqb certification exam and mcts exam nclex questions with guaranteed success. Our best quality prepares you well before appearing in the final exams of rasmussen &mcts www.loyola.edu
skinpenthar- Posts : 1
Join date : 2015-02-26
Similar topics
» joining dimension table to dimension and again fact table
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
» Large Dimension table compared to fact table?
» Conformed Dimension for Transaction Fact and Accumulating Snapshot Fact Table
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
» Large Dimension table compared to fact table?
» Conformed Dimension for Transaction Fact and Accumulating Snapshot Fact Table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum