Extract to staging: use constraints or not?
2 posters
Page 1 of 1
Extract to staging: use constraints or not?
Hi -- I'm looking for best practices for this problem.
I'm building new ETL packages (MS SQL Server). The tables in the source system may have constraints (e.g. keys, indexes, "not null", etc.) Is the best practice to replicate those constraints in the staging area (the "E" of the ETL) or to keep things as generic as possible (e.g. no keys, indices or constraints like "not null")?
The dimensions I create will of course have keys and indexes. What about other constraints there? Keep 'em or dump 'em (that is, do the data quality checks in the ETL packages)?
I'm building new ETL packages (MS SQL Server). The tables in the source system may have constraints (e.g. keys, indexes, "not null", etc.) Is the best practice to replicate those constraints in the staging area (the "E" of the ETL) or to keep things as generic as possible (e.g. no keys, indices or constraints like "not null")?
The dimensions I create will of course have keys and indexes. What about other constraints there? Keep 'em or dump 'em (that is, do the data quality checks in the ETL packages)?
gbritton- Posts : 6
Join date : 2013-11-18
Re: Extract to staging: use constraints or not?
As far as staging is concerned its really up to you and the particular situation you are dealing with. Most of the time its just a location where tables are placed for (usually) sequential processing. Most of the time it doesn't make much sense to go overboard with constraints and indexes.
In the data warehouse itself it is common not to enforce primary and foreign key constraints because of the process of assigning surrogate keys, which ensures you have proper keys. Any database constraints are redundant and can significantly slow down load processes. Many databases will allow you to declare the constraints (which are often used by the optimizer and external BI tools) but deactivate them so they are not enforced.
In the data warehouse itself it is common not to enforce primary and foreign key constraints because of the process of assigning surrogate keys, which ensures you have proper keys. Any database constraints are redundant and can significantly slow down load processes. Many databases will allow you to declare the constraints (which are often used by the optimizer and external BI tools) but deactivate them so they are not enforced.
Similar topics
» Why extract?
» Extract Archive Process
» Question asked about ETL (little explanation needed)
» Foreign Key Constraints
» Constraints on Dimensions
» Extract Archive Process
» Question asked about ETL (little explanation needed)
» Foreign Key Constraints
» Constraints on Dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum