Surrogate Key in Stage table
4 posters
Page 1 of 1
Surrogate Key in Stage table
Is it good idea to have Surrogate Key in stage table ? if yes then obviously why?
dwh.arvind- Posts : 4
Join date : 2012-10-25
Re: Surrogate Key in Stage table
I don't think the question is "why would you add surrogates to the staging tables". I think the more appropriate question is why wouldn't you?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Surrogate Key in Stage table
Do you mean to assign surrogate key values to stating table rows, or do you mean should natural keys be converted to surrogates as part of the staging process?
Generally, no to the first and yes to the latter.
Generally, no to the first and yes to the latter.
Re: Surrogate Key in Stage table
The surrogate key is a meaningless key but still it is good to have with the many reason; as may times the data coming from a source has no unique identifier or sometimes the unique identifier is a composite key; in such cases when data issue is found with any of the row it is very difficult to identify the particular row or even mention it. When a unique row number is assigned to each row in the staging table it becomes really easy to reference it.
dwh.arvind- Posts : 4
Join date : 2012-10-25
Re: Surrogate Key in Stage table
Surrogate keys are necessary if you are wanting to use delta loading (where only updated records are loaded into the warehouse when the load is performed) and/or slowly changing dimensions.
Pretend that in our HR system we have an Employee table, and in our data warehouse staging database we have another table that matches it.
HR.dbo.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)
DWStaging.HR.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)
The first time we load data from HR.dbo.Employee into DWStaging.HR.Employee, all records should come through. The second time we load data, only records in HR.dbo.Employee that have been updated since the last load should come through (because we're doing delta loading). The problem is that you could potentially updated the same record in HR.dbo.Employee every day, which would result in the new version of the record being loaded into DWStaging.HR.Employee (because we don't delete data from DWStaging.HR.Employee). This is a problem if we want to make EmployeeId the primary key in DWStaging.HR.Employee because it could potentially not be unique (because we have multiple versions of the same record because the record is updated in HR.dbo.Employee).
What you can do is update the staging table to have a surrogate key:
DWStaging.HR.Employee (ImportKey INT PK, EmployeeId INT, Name NVARCHAR(128), UpdatedDate DATETIME)
ImportKey is now our auto-incrementing primary key in DWStaging,HR.Employee. We can use the UpdatedDate field to check which version of the record was the one that was 'active' at a particular point in time.
Hope this helps.
Pretend that in our HR system we have an Employee table, and in our data warehouse staging database we have another table that matches it.
HR.dbo.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)
DWStaging.HR.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)
The first time we load data from HR.dbo.Employee into DWStaging.HR.Employee, all records should come through. The second time we load data, only records in HR.dbo.Employee that have been updated since the last load should come through (because we're doing delta loading). The problem is that you could potentially updated the same record in HR.dbo.Employee every day, which would result in the new version of the record being loaded into DWStaging.HR.Employee (because we don't delete data from DWStaging.HR.Employee). This is a problem if we want to make EmployeeId the primary key in DWStaging.HR.Employee because it could potentially not be unique (because we have multiple versions of the same record because the record is updated in HR.dbo.Employee).
What you can do is update the staging table to have a surrogate key:
DWStaging.HR.Employee (ImportKey INT PK, EmployeeId INT, Name NVARCHAR(128), UpdatedDate DATETIME)
ImportKey is now our auto-incrementing primary key in DWStaging,HR.Employee. We can use the UpdatedDate field to check which version of the record was the one that was 'active' at a particular point in time.
Hope this helps.
min.emerg- Posts : 39
Join date : 2011-02-25
Similar topics
» Fact surrogate key as foreign key in another fact table
» Stage vs PSA Naming Conventions
» Surrogate keys in dimension and fact table
» Type 2 Fact table - Durable surrogate key?
» Group & Bridge tables ... and a Surrogate Key on the Fact table?
» Stage vs PSA Naming Conventions
» Surrogate keys in dimension and fact table
» Type 2 Fact table - Durable surrogate key?
» Group & Bridge tables ... and a Surrogate Key on the Fact table?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum