Normalization in DWH environment
4 posters
Page 1 of 1
Normalization in DWH environment
Hi All,
Recently I came across a question,would like to post it here for the views of the members.
What normalization form are the Fact and dimension tables and why (with examples aligning to the Normalization definition)
Thanks,
Dheeraj
Recently I came across a question,would like to post it here for the views of the members.
What normalization form are the Fact and dimension tables and why (with examples aligning to the Normalization definition)
Thanks,
Dheeraj
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: Normalization in DWH environment
A fact table is 3nf while a dimension table is not (usually 2nf, sometimes 1nf, but rarely 3nf).
Other than an academic discussion, it really doesn't matter.
Other than an academic discussion, it really doesn't matter.
Re: Normalization in DWH environment
Can you provide any example with defnitions applied to them...or provide me the link which talks about the fact and dimensions and the mormalization
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: Normalization in DWH environment
Codd & Date covered the definitions of normal forms a long time ago. I'm sure the same is available on Wiki...
Re: Normalization in DWH environment
Most of them are on the typical transaction tables...none of them so far I have seen on the Fact/Dimensions (typical star schema),do let me know if you find else leave that to me ...;)
kapoor_dh- Posts : 24
Join date : 2009-12-08
-normalizaion in dw
hi
basically with respect to data organized in Datawarehousing
its not fully normalized boss...i think so,
if you look at the database design in OLTP application , its fully normalized
basically in the datawarehouse we bother about how much data we are storing,
we store the same data some times in morethan one place as it is against in OLTP design
becase we need it for analization,
the primary goal we need it for analisations,we dont bother how redundantly store the data.
it doesn't mean that its completely denoramized, it is normalised but not fully.ok
may be up to 3rd NF
better you go through with database designing techniques by CODD
but to me its not possible to fully normalize the data in DWH
thanx
jagadish
basically with respect to data organized in Datawarehousing
its not fully normalized boss...i think so,
if you look at the database design in OLTP application , its fully normalized
basically in the datawarehouse we bother about how much data we are storing,
we store the same data some times in morethan one place as it is against in OLTP design
becase we need it for analization,
the primary goal we need it for analisations,we dont bother how redundantly store the data.
it doesn't mean that its completely denoramized, it is normalised but not fully.ok
may be up to 3rd NF
better you go through with database designing techniques by CODD
but to me its not possible to fully normalize the data in DWH
thanx
jagadish
jagadish- Posts : 2
Join date : 2010-01-15
Re: Normalization in DWH environment
kapoor_dh wrote:Most of them are on the typical transaction tables...none of them so far I have seen on the Fact/Dimensions (typical star schema),do let me know if you find else leave that to me ...;)
Which is why it is pointless to discuss normalization in the context of dimensional design. A normalized schema is one designed using normalization rules. There is a very rigid, mechanical process by which you achieve various levels of normalization. Dimensional design, on the other hand, does not use normalization to achieve a data model. Creating a dimensional model is more art than science. While such a model, when implemented in a relational database, can be categorized to be in a particular normal form (based on the relationship between the row's contents and its key), such categorization is meaningless since normalization (i.e. ER modeling) was not used to develop the model.
Re: Normalization in DWH environment
I think you discount the value of understanding normalization rules in building a dimensional model. Many of the questions posted here have roots in normalization or traditional data modeling. People new to modeling dimensional or normalized models always struggle with where do I put this attribute, should this column be null, should I collapse this hierarchy, etc. Experienced modelers fly through the modeling process in their heads without even giving a second thought to what they are doing. This is why I believe people often struggle with proper dimension design. It's irrelevant if you are a Kimball or an Inmon kool aid drinker, at some point you need, at a minimum, a normalized model to understand the data relationships and dependencies. It is an understanding of these dependencies that drive a quality dimensional model. It would be interesting to see which of Ralph's books have sold the most copies. My guess is the the Complete Guide to Dimensional Modeling.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Normalization in DWH environment
Yes I do. Normalization is used as a crutch in Dimensional design. Normalization is a specific methodology to achive a data model for a specific purpose, that being efficient and effective OLTP. Dimensional design, as described in DW Toolkit is a completely different methodology to achive a different data model for a different purpose (i.e. data warehousing).BoxesAndLines wrote:I think you discount the value of understanding normalization rules in building a dimensional model.
To get a good dimensional design, you need to think differently than you would trying to design an OLTP database. I've seen, on numerous occasions, dimensional models getting balled up because of modelers falling back on normalization principles and second guessing themselves.
Read page 11 of Ralph's book.
And, as far as DW books go, DW Toolkit is by far the largest seller. When I asked him about it around 6 years ago, it was around 70,000 copies, which is a huge number for this type of book (10,000 is considered a best seller). It's always been at the top of the Amazon list.
Re: Normalization in DWH environment
"Experienced modelers fly through the modeling process in their heads without even giving a second thought to what they are doing"
Dimensional Modeling if not considered for Normalization,which is true in most of the cases is less of Modelling but more of the success comes from the fact how good are the requirements gathered and then it goes to building aggregate creation for performance tuning and more of a straight forward thing as you know what kind of stcructure you are going to come up with (Star Schema).While I am myself into Datawarehousing I see the more efforts are involved in ETL (maintaing SCDs,Error tracking etc) and Performance tuning either thru reporting tools or SQL tuning.While a lot is desired out of Modeling in OLTP and normalization scenarios,and there is another group of Modeling that lies in between 2 of them(which corporates call for real time reporting),Foundation Layer or ODS Layer.This is the kind of scenarios where you need to decide between Normalization or De-normalization or hybrid.
Dimensional Modeling if not considered for Normalization,which is true in most of the cases is less of Modelling but more of the success comes from the fact how good are the requirements gathered and then it goes to building aggregate creation for performance tuning and more of a straight forward thing as you know what kind of stcructure you are going to come up with (Star Schema).While I am myself into Datawarehousing I see the more efforts are involved in ETL (maintaing SCDs,Error tracking etc) and Performance tuning either thru reporting tools or SQL tuning.While a lot is desired out of Modeling in OLTP and normalization scenarios,and there is another group of Modeling that lies in between 2 of them(which corporates call for real time reporting),Foundation Layer or ODS Layer.This is the kind of scenarios where you need to decide between Normalization or De-normalization or hybrid.
kapoor_dh- Posts : 24
Join date : 2009-12-08
Re: Normalization in DWH environment
Excuse me while I break into a rant... I really do not like the term 'hybrid' as if somehow the designer has discovered some unique balance between Kimball and Inmon that addresses all issues and handles everything in some magical solution. What it usually means, particulalry if they have somehow managed to put together a mix of ER and Dimensional concepts into the same model, is that they have basically produced a mess that requires a savant to understand.kapoor_dh wrote:This is the kind of scenarios where you need to decide between Normalization or De-normalization or hybrid.
There is no such thing as a 'hybrid' approach between Kimball and Inmon. The Inmon approach IS a hybrid approach. Under Inmon you build a somewhat 3NF foundation layer from which you publish dimensional data marts for end user consumption. Under Kimball, the foundation layer is dimensional, which can be queried directly by end users. So, you do one or the other... there is no 'in-between'.
There is no argument that different approaches apply to different applications (althought I would argue that a normalized model is necessary for real time DW applications), but the choice is not between a normalized model or a de-normalized model... because those are the same thing. They simply defer in a matter of degree (a model in 3NF is a de-normalized model when compared to a model in 4NF). It's all about the methodology you use to achieve the model... you are either developing an ER model or a Dimensional model. Both require different thought processes to accomplish.
To give you an example... a General Ledger Chart of Accounts is an example of a dimensional model (Note: the chart itself, I'm not referring to anything dealing with computers or databases). The concept of categorizing a value based on time (accounting period), business unit, cost center, natural account, project and so on, is an example of dimensional thinking. You cannot design a chart of accounts using ER modeling techniques and come up with a solution as simple and elegant as a COA can be.
Re: Normalization in DWH environment
Here's the rub for Kimball EDW's. The dimensional model is well proven for high performance and understandability. What it doesn't do well is reflect the underlying data relationships and dependencies. If you do not want that information then you are golden. If you do want that information, you will need a normalized data model.
Examples:
Dimension Column optionality - I need to model all of my dimension columns as NULL since I normally include a default row for not applicable.
Dimension Optionality - A fact table will have mandatory relationships to all dimensions. I can not tell whether a given dimension is mandatory or optional for a given metric.
Dimension Hierarchies - These are constructed by collapsing or denormalizing the OLTP data models. I can no longer see the hierarchy structure via the dimension data model.
I personally am interested in these things so I encourage my clients to build a normalized data model to minimally help the DW team build the dimensional models. Whether you expose this metadata to business community is personal preference. So to clarify my original response, normalization skills beyond 1NF are not necessary for building a dimensional model, however, I believe the dimensional data warehouse needs to be supplemented with a normalized model to address the previous examples.
Examples:
Dimension Column optionality - I need to model all of my dimension columns as NULL since I normally include a default row for not applicable.
Dimension Optionality - A fact table will have mandatory relationships to all dimensions. I can not tell whether a given dimension is mandatory or optional for a given metric.
Dimension Hierarchies - These are constructed by collapsing or denormalizing the OLTP data models. I can no longer see the hierarchy structure via the dimension data model.
I personally am interested in these things so I encourage my clients to build a normalized data model to minimally help the DW team build the dimensional models. Whether you expose this metadata to business community is personal preference. So to clarify my original response, normalization skills beyond 1NF are not necessary for building a dimensional model, however, I believe the dimensional data warehouse needs to be supplemented with a normalized model to address the previous examples.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Normalization in DWH environment
I disagree. There is no such thing as 'dimensional optionality' for a measure in a correct dimensional model. A fact table should contain measures appropriate for the grain of the fact table. Otherwise, you are mixing grains, which is not proper modeling.
How do you model hierarchies in an ER model that are any clearer than what happens in a dimensional model? Most hierarchies in a normalized model are represented by recursive structures. In a dimensional model you could retain that (if your query environment supports it) or more often, by an explosion of the recursive structure or a flattening of the hiearachy. The latter form pretty much spells out a hierarhcy as clearly as anyone could make it.
As for dimensional column optionality... so what? That is a data entry issue, which is not what dimensional models are for. When somebody does a query, the responsibility of the data warehouse is to ensure that the data being provided is accurate.
How do you model hierarchies in an ER model that are any clearer than what happens in a dimensional model? Most hierarchies in a normalized model are represented by recursive structures. In a dimensional model you could retain that (if your query environment supports it) or more often, by an explosion of the recursive structure or a flattening of the hiearachy. The latter form pretty much spells out a hierarhcy as clearly as anyone could make it.
As for dimensional column optionality... so what? That is a data entry issue, which is not what dimensional models are for. When somebody does a query, the responsibility of the data warehouse is to ensure that the data being provided is accurate.
Re: Normalization in DWH environment
ngalemmo wrote:I disagree. There is no such thing as 'dimensional optionality' for a measure in a correct dimensional model. A fact table should contain measures appropriate for the grain of the fact table. Otherwise, you are mixing grains, which is not proper modeling.
Take the ubiquitous Promotion dimension. It certainly won't apply to all facts. What gets stored in the fact is either the N/A row or the 'No promotion' row. Sometimes you know the difference, sometimes you don't. The best case scenario is the 'No Promotion' row is stored. The only way you know that Promotions are optional is by examining the data within the dimension. The dimensional data model provides no insight here.
So what!? (LOL) Maybe the business doesn't care, but the developers, modelers, and testers, all care. A data model is a great place to capture this metadata. No can do for dimensional models.
Great discussion, BTW.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Normalization in DWH environment
But the need to know this in a systematic manner (i.e. through database constraints) is a transactional application requirement. From a dimensional design standpoint the dimensional reference is always required (the FK cannot be null)... it is just that in some instances it is referencing a row for the not applicable case. If it needs to be documented in the metadata, adding a comment will do the trick.

» Integrating Hadoop environment with a DW environment
» normalization in the dimensional model
» Surrogate/Business Key in ODS Environment
» Customer addresses in a high volume retail environment
» Estimating ROI / Predicting Benefits of a DW Implementation In a Utility Environment
» normalization in the dimensional model
» Surrogate/Business Key in ODS Environment
» Customer addresses in a high volume retail environment
» Estimating ROI / Predicting Benefits of a DW Implementation In a Utility Environment
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|