All dimensions embedded in fact tables
5 posters
Page 1 of 1
All dimensions embedded in fact tables
Normally, when we design data warehouse we will have fact tables and dimension tables.
However, it does make sense to embed dimension in fact table. Especially for simple dimensions who has no other attributes and rarely change its value.
Having dimensions in fact table will makes query run very fast and no need to maintain dimension table separately, no need to look up dimension table when doing ETL.
What are considerations to keep dimensions in separate table from facts?
Thanks!
However, it does make sense to embed dimension in fact table. Especially for simple dimensions who has no other attributes and rarely change its value.
Having dimensions in fact table will makes query run very fast and no need to maintain dimension table separately, no need to look up dimension table when doing ETL.
What are considerations to keep dimensions in separate table from facts?
Thanks!
rendybjunior- Posts : 7
Join date : 2014-09-30
Re: All dimensions embedded in fact tables
What do you do when a dimension column changes value? Update a billion row table?
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: All dimensions embedded in fact tables
Hi
What database are you using, if you are using an RDBMS then the are created to join, you should not have any problem when joining tables.
What is the dimesnion which you want to store, if it is like an order number, invoice number then it is fine to store in fact.
thanks
Himanshu
What database are you using, if you are using an RDBMS then the are created to join, you should not have any problem when joining tables.
What is the dimesnion which you want to store, if it is like an order number, invoice number then it is fine to store in fact.
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: All dimensions embedded in fact tables
If you only have a single attribute for your dimension you do embed it in your Fact - it's called a Degenerate Dimension
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: All dimensions embedded in fact tables
As B&L pointed out, its a real challenge to update something like that.
But even if you never update it, there are more practical reasons for a proper model. If you are using a typical row/column based DBMS, the table becomes huge. It is not clear that it would be any faster to query due to the row width. DBMS needs more physical reads because fewer rows fit in a block. Taken to extreme, a flattened table could be 15x or more larger than a proper fact table, consuming vast amounts of space and taking much longer to query.
In the case of vector (columnar) databases space isn't an issue and the performance impact is less, but you still have the update issue to contend with.
But even if you never update it, there are more practical reasons for a proper model. If you are using a typical row/column based DBMS, the table becomes huge. It is not clear that it would be any faster to query due to the row width. DBMS needs more physical reads because fewer rows fit in a block. Taken to extreme, a flattened table could be 15x or more larger than a proper fact table, consuming vast amounts of space and taking much longer to query.
In the case of vector (columnar) databases space isn't an issue and the performance impact is less, but you still have the update issue to contend with.
Re: All dimensions embedded in fact tables
Thanks for all your responses, happy to join this forum and finds you guys so helpful.
I'm adding more context. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS.
I'll try to summarize my response:
1. Update billion rows when dimension change
@ngalemmo & @BoxesAndLine
What if I still embed the dimension to fact and:
- decided not to update old rows
- have a separate table that store information of previous dimensions so query can refer to that information if someone want to use historical and current value viewed as something equal
Will still be there any problem I will faced in the future?
2. Row column DBMS very wide row which affect performance
@ngalemmo
Assume I decided not to use row column DBMS, then this point will be invalid. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS. Of course if I am using row column DBMS it will be a deal breaker. To continue the discussion forward, let's say I am not using row column DBMS.
3. Join or not join fact and dimension
@hkandapal
Yet if it is even designed to join, not joining any tables will improve performance. CMIIW.
4. Degenerate dimension
@nick_white
It is not something like invoice number, it is about the whole dimensions to be embedded.
In addition, I am trying to find another reason:
5. Let say it is important for user to know possible values of dimension. Separate dimension will give easiness to show all possible values for certain dimension.
However (with some effort) it can be created incrementally when doing ETL. CMIIW.
6. Let say will be helpful when other facts having well defined exactly same dimension to be drilled across.
However, if we maintain the definition somewhere else, it still can be accessed and drilled across.
I totally agree the concept of fact and dimension, no argue on that.
However I am still wondering whether I have to separate them or have them in the same table all at once...
I'm adding more context. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS.
I'll try to summarize my response:
1. Update billion rows when dimension change
@ngalemmo & @BoxesAndLine
What if I still embed the dimension to fact and:
- decided not to update old rows
- have a separate table that store information of previous dimensions so query can refer to that information if someone want to use historical and current value viewed as something equal
Will still be there any problem I will faced in the future?
2. Row column DBMS very wide row which affect performance
@ngalemmo
Assume I decided not to use row column DBMS, then this point will be invalid. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS. Of course if I am using row column DBMS it will be a deal breaker. To continue the discussion forward, let's say I am not using row column DBMS.
3. Join or not join fact and dimension
@hkandapal
Yet if it is even designed to join, not joining any tables will improve performance. CMIIW.
4. Degenerate dimension
@nick_white
It is not something like invoice number, it is about the whole dimensions to be embedded.
In addition, I am trying to find another reason:
5. Let say it is important for user to know possible values of dimension. Separate dimension will give easiness to show all possible values for certain dimension.
However (with some effort) it can be created incrementally when doing ETL. CMIIW.
6. Let say will be helpful when other facts having well defined exactly same dimension to be drilled across.
However, if we maintain the definition somewhere else, it still can be accessed and drilled across.
I totally agree the concept of fact and dimension, no argue on that.
However I am still wondering whether I have to separate them or have them in the same table all at once...
Last edited by rendybjunior on Thu Oct 23, 2014 8:58 pm; edited 1 time in total (Reason for editing : typos)
rendybjunior- Posts : 7
Join date : 2014-09-30
Re: All dimensions embedded in fact tables
Frankly, it isn't worth discussing. It's a very old idea that is impractical in the long run. If you want to try it, go ahead.
Re: All dimensions embedded in fact tables
Hi ,
you are correct not joining and storing in a single file will give you a better performance then any RDBMS, but file systesm has its own drawback like security, disaster recovery, scalability, multi user capability...
thanks
you are correct not joining and storing in a single file will give you a better performance then any RDBMS, but file systesm has its own drawback like security, disaster recovery, scalability, multi user capability...
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: All dimensions embedded in fact tables
@ngalemmo
Do you have reference to any discussions for the same subject in the past? Thanks
Do you have reference to any discussions for the same subject in the past? Thanks
rendybjunior- Posts : 7
Join date : 2014-09-30
Re: All dimensions embedded in fact tables
Hi,
Dimensional modelling is the industry standard for data warehouse design and has been developed over the last few decades by some of the top minds in the business - most of whom have experience implementing this methodologies 100s of times across all industries.
If you think you can come up with something better then good luck to you but you're unlikely to get much help designing a new methodology from a Kimball design forum - as, almost by definition, people on this forum are followers of Kimball's methodology.
Regards,
Dimensional modelling is the industry standard for data warehouse design and has been developed over the last few decades by some of the top minds in the business - most of whom have experience implementing this methodologies 100s of times across all industries.
If you think you can come up with something better then good luck to you but you're unlikely to get much help designing a new methodology from a Kimball design forum - as, almost by definition, people on this forum are followers of Kimball's methodology.
Regards,
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: All dimensions embedded in fact tables
Sure, it's called the Data warehouse Toolkit. I'd recommend buying a copy.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Two fact tables connected through dimensions
» Do I need multiple fact tables or dimensions
» Two fact tables sharing the same dimensions
» Type II dimensions and budget fact tables
» Multiple different grain fact tables with lot of common dimensions.
» Do I need multiple fact tables or dimensions
» Two fact tables sharing the same dimensions
» Type II dimensions and budget fact tables
» Multiple different grain fact tables with lot of common dimensions.
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum