Loading Fact Table
3 posters
Page 1 of 1
Loading Fact Table
In reading a previous thread, I understand that when a Dimension is changed by a SCD type 2, a new record is created for the new key and no update is needed on the previously existing facts.
However, what happens in the loading of the fact table that lets the process know that it has to load a fact again?
Does this happen since the combination of keys has changed and it is recognized as a new row?
Should a fact table load check for existing rows, if so, how?
-- RPC
However, what happens in the loading of the fact table that lets the process know that it has to load a fact again?
Does this happen since the combination of keys has changed and it is recognized as a new row?
Should a fact table load check for existing rows, if so, how?
-- RPC
Re: Loading Fact Table
rpcasey001 wrote:However, what happens in the loading of the fact table that lets the process know that it has to load a fact again?
-- RPC
I don't understand the question. Usually, new facts are assigned to the new version of the type 2 dimension row. Are you referring to a retroactive dimension change?
Re: Loading Fact Table
Yes, a retroactive change.
Suppose there is a change to the dimension. When the change occurs, the old record is marked not current, but there are still facts associated with the old key. The new key for the dimension is created with the new record that has the updated information. However, there are facts that exist with the previous key. In another thread, it was explained that we do not go back and update the records in the fact table with the new dimension key since a new record will be created in the fact with the new key.
My question is in regards to the new fact records. What in the load process indicates that the fact needs to be loaded again with the new key? Is this because the fact load process needs to check what records have already have been loaded?
--- RPC
Suppose there is a change to the dimension. When the change occurs, the old record is marked not current, but there are still facts associated with the old key. The new key for the dimension is created with the new record that has the updated information. However, there are facts that exist with the previous key. In another thread, it was explained that we do not go back and update the records in the fact table with the new dimension key since a new record will be created in the fact with the new key.
My question is in regards to the new fact records. What in the load process indicates that the fact needs to be loaded again with the new key? Is this because the fact load process needs to check what records have already have been loaded?
--- RPC
Re: Loading Fact Table
Nothing during the load process indicates that the fact table needs to do anything related to a dimension row change. When the new fact comes in, it is associated with the new dimension row. That is the beauty of the dimensional model. It captures history exactly as it happened at that point in time. If a dimension changes subsequently it has absolutely no impact on any events (i.e. facts) that previously occurred.rpcasey001 wrote:Yes, a retroactive change.
...
My question is in regards to the new fact records. What in the load process indicates that the fact needs to be loaded again with the new key? Is this because the fact load process needs to check what records have already have been loaded?
--- RPC
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Loading Fact Table
As B&L pointed out, the purpose of a type 2 dimension is to capture the dimension state at a point in time. That row is associated with fact that occured while the dimension entity was in that state. As the entity changes, the new row is assigned to subsequent facts...old fact retain their reference to old versions of the entity.
With that understood... reporting facts in relation to the current state of the entity is a real pain if all you have is a type 2 dimension. Even when the business demands retention of such dimension history, in practice, most reporting (I have seen) is based on current state. What I usually do is, if a type 2 is called for, implement both a type 1 and a type 2 version of the same dimension and carry both foreign keys on all facts that use it.
With that understood... reporting facts in relation to the current state of the entity is a real pain if all you have is a type 2 dimension. Even when the business demands retention of such dimension history, in practice, most reporting (I have seen) is based on current state. What I usually do is, if a type 2 is called for, implement both a type 1 and a type 2 version of the same dimension and carry both foreign keys on all facts that use it.
Loading Fact Table
Let me explain the business case more...
I am loading the fact table with both commercial and residential mortage loan data. In many cases, when the fact is loaded, only part of the information for the dimension is available since acquisition data from the acquisition department often arrives after the accounting data is available. The fact is generated from the accounting data, and dimension is created from the accounting data without the additional data that is provided from the acquisition data. When the acqusition data arrives, the dimension is updated as a slowly changing dimension type II. So, in this case, the fact that was loaded the first time, is still valid as it was originally loaded, but it now points to a key that is marked False for RowIsCurrent. As you can see in this business case, there is no new fact, just a dimension that was slowly arriving. What is the best method to keep the fact updated with the newest key?
Regards,
RPC
I am loading the fact table with both commercial and residential mortage loan data. In many cases, when the fact is loaded, only part of the information for the dimension is available since acquisition data from the acquisition department often arrives after the accounting data is available. The fact is generated from the accounting data, and dimension is created from the accounting data without the additional data that is provided from the acquisition data. When the acqusition data arrives, the dimension is updated as a slowly changing dimension type II. So, in this case, the fact that was loaded the first time, is still valid as it was originally loaded, but it now points to a key that is marked False for RowIsCurrent. As you can see in this business case, there is no new fact, just a dimension that was slowly arriving. What is the best method to keep the fact updated with the newest key?
Regards,
RPC
Re: Loading Fact Table
If it is always the case that the fact should reflect the most current data, use a type 1 dimension. If you must also capture dimensional history, use both type 1 and type 2.
Another approach is to control the conditions under which a new type 2 row is created. Type 2 doesn't mean you should always create a new row, how and when new rows are created can be subject to business rules. If a dimension is fed from different sources with different data, you could put flags (and/or CRC values for change detection) for each source. If data is received from a source that has not yet been encountered for the row, you can implement logic that updates in place rather than creating a new row.
It is also conceivable that history is only needed on certain fields. The dimension table could be a combination of type 1 and type 2 columns. A new row is created only when the type 2 columns change. Type 1 columns are updated in all rows with a matching natural key (current or not).
Finally, and worse case, you go back and update foreign keys on the fact table. This can be real messy and difficult to indentify when it should occur.
Another approach is to control the conditions under which a new type 2 row is created. Type 2 doesn't mean you should always create a new row, how and when new rows are created can be subject to business rules. If a dimension is fed from different sources with different data, you could put flags (and/or CRC values for change detection) for each source. If data is received from a source that has not yet been encountered for the row, you can implement logic that updates in place rather than creating a new row.
It is also conceivable that history is only needed on certain fields. The dimension table could be a combination of type 1 and type 2 columns. A new row is created only when the type 2 columns change. Type 1 columns are updated in all rows with a matching natural key (current or not).
Finally, and worse case, you go back and update foreign keys on the fact table. This can be real messy and difficult to indentify when it should occur.
Similar topics
» Loading Fact table
» Loading Data Aggregated to Date into Fact Table
» Update and deletes in incremental loading of the fact table
» Loading data into fact table
» ETL Question for Loading a Fact table
» Loading Data Aggregated to Date into Fact Table
» Update and deletes in incremental loading of the fact table
» Loading data into fact table
» ETL Question for Loading a Fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum