Foreign Key Constraints in Data Mart Design
+2
BoxesAndLines
ljsong
6 posters
Page 1 of 1
Foreign Key Constraints in Data Mart Design
What's the best practice on foreign key constraints in your company's Data Mart design. For example do you build any foreign key from dimension table to fact table? do you usually build them but not enforce (enable) them in the database? Do you have different practices in development environment, test environment, and production environment?
ljsong- Posts : 1
Join date : 2011-06-15
Re: Foreign Key Constraints in Data Mart Design
I build them and enforce them in development environments. For test and production environments, I do not enforce. If the database supports the definition only foreign (e.g. SQL Server) and the modeling supports DDL generation (e.g. not ER Studio) then I generate the appropriate DDL. I have spent countless hours for clients building data models that are reverse engineered from the database. This is made more tedious since the FK constraints are not generated (I'm not recommending you do enable). When you reverse engineer these databases you end up with 100 boxes and no lines. And if you're going to build a data model, you need boxes and lines.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Foreign Key Constraints in Data Mart Design
If you are using surrogate primary keys, the process used to assign those keys ensures relational integrity. Enforcing FK constraints in the database is redundant and not necessary.
Enforce and capture exceptions
A data mart is a subset of a data store, therefore, a data mart is a subset of a fact table (or whatever design your data warehouse has) for a particular purpose. However, in your question you
define a data mart as a fact table with dimensions, which means it's not a subset but an entire fact table with dimensions. Ok, enough about what we define as a data mart.
Ask yourself the following question: do you want to be 100% sure your facts refer to a record in a dimension? If the answer is "yes", then you should enforce foreign keys. But your work doesn't stop there. On the contrary, it starts when a fact is referring to a non existent record in a dimension. You should think about a procedure to capture these "exceptions" and deal with them.
The advantage of enforcing foreign keys is that you can be absolutely sure you fact and dimensions have proper references. For reporting purposes this is important. Think about what would happen to facts that are not linked, do you see them in your report? Do you want to include them using outer joins in your queries? If so, realize it will be a burden for the performance of these reports.
Take into consideration that sometimes scripts are updating the facts and dimension for whatever purposes. Are you still sure that every fact is referring to a dimension?
I would recommend you enforce the foreign keys. But, you should develop a mechanism that keeps track of the facts that violate these constraints.
define a data mart as a fact table with dimensions, which means it's not a subset but an entire fact table with dimensions. Ok, enough about what we define as a data mart.
Ask yourself the following question: do you want to be 100% sure your facts refer to a record in a dimension? If the answer is "yes", then you should enforce foreign keys. But your work doesn't stop there. On the contrary, it starts when a fact is referring to a non existent record in a dimension. You should think about a procedure to capture these "exceptions" and deal with them.
The advantage of enforcing foreign keys is that you can be absolutely sure you fact and dimensions have proper references. For reporting purposes this is important. Think about what would happen to facts that are not linked, do you see them in your report? Do you want to include them using outer joins in your queries? If so, realize it will be a burden for the performance of these reports.
Take into consideration that sometimes scripts are updating the facts and dimension for whatever purposes. Are you still sure that every fact is referring to a dimension?
I would recommend you enforce the foreign keys. But, you should develop a mechanism that keeps track of the facts that violate these constraints.
Re: Foreign Key Constraints in Data Mart Design
AndreLinssen wrote:A data mart is a subset of a data store, therefore, a data mart is a subset of a fact table (or whatever design your data warehouse has) for a particular purpose...
I'm with you on the first part, but you lost me on the therefore part. A data mart would never be a subset of a fact table. It could be more than one fact and related dimensions based on a business process.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Foreign Key Constraints in Data Mart Design
It's a novel definition... personally I've given up on using 'data mart', it seems everyone has their own definition. Kimball originally used it to refer to a single star schema, Inmon used it to refer the result of a publication from the DW. It has gone downhill from there...
As far as enforcing RI with FK contraints (when performing surrogate key assignment), its a lot like wearing a belt and suspenders. If you are really that worried, maybe you shouldn't go outside.
As far as enforcing RI with FK contraints (when performing surrogate key assignment), its a lot like wearing a belt and suspenders. If you are really that worried, maybe you shouldn't go outside.
Re: Foreign Key Constraints in Data Mart Design
I like the analogy! Don’t overkill your data warehouse, as it will defeat its two major purposes, performance and ease of use.ngalemmo wrote:its a lot like wearing a belt and suspenders. If you are really that worried, maybe you shouldn't go outside.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Foreign Key Constraints in Data Mart Design
Having foreign key constraints can help with Star Query optimizers (at least in SQL Server). It can also help with documentation and with reporting tools that use the constraints to understand relationships.
Do not enforce them though - your surrogate key generation ensures that enforcing the constraint is redundant. It will also slow down your ETL load for no benefit.
Andre has a very weird definition of a star schema. That said, I attended a Bill Inmon seminar last year and he said that the definitions for most of the DW world have been taken over by vendors to suit their own purposes, hence his decision to copyright the term DW 2.0
Do not enforce them though - your surrogate key generation ensures that enforcing the constraint is redundant. It will also slow down your ETL load for no benefit.
Andre has a very weird definition of a star schema. That said, I attended a Bill Inmon seminar last year and he said that the definitions for most of the DW world have been taken over by vendors to suit their own purposes, hence his decision to copyright the term DW 2.0
Re: Foreign Key Constraints in Data Mart Design
John Simon wrote:...That said, I attended a Bill Inmon seminar last year and he said that the definitions for most of the DW world have been taken over by vendors to suit their own purposes, hence his decision to copyright the term DW 2.0
Right. Isn't Bill a vendor too?
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Foreign Key Constraints in Data Mart Design
He is, and regardless, he does have a point.
I mean, what the hell does ODS stand for these days? I get asked as a consultant to build an ODS when what they really mean is a 3NF database (i.e. an Inmon-style DataWarehouse). I asked Inmon about an ODS and he said it was for intra-day reporting and stored maybe 30 days worth of data and sits on a separate database for performance reasons, which makes sense to me. It's crazy how when a client gives me a specification I need to question them on any industry-specific terms, because those terms could mean anything these days.
I mean, what the hell does ODS stand for these days? I get asked as a consultant to build an ODS when what they really mean is a 3NF database (i.e. an Inmon-style DataWarehouse). I asked Inmon about an ODS and he said it was for intra-day reporting and stored maybe 30 days worth of data and sits on a separate database for performance reasons, which makes sense to me. It's crazy how when a client gives me a specification I need to question them on any industry-specific terms, because those terms could mean anything these days.
Similar topics
» Data mart design
» Data mart Design Question
» [b]Need Help on Employee Data Mart Design[/b]
» Data Mart Philosophical Design Differences
» Human Resources Data Mart Design Guidelines
» Data mart Design Question
» [b]Need Help on Employee Data Mart Design[/b]
» Data Mart Philosophical Design Differences
» Human Resources Data Mart Design Guidelines
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|