How much is too much really ?
+3
ngalemmo
Jeff Smith
karan_das
7 posters
Page 1 of 1
How much is too much really ?
Hi ,
I was wondering that for a FACT, how many columns is regarded as too much ?
Is there a best practice to keep the number of columns in a FACT to be limited to a specific range / number ?
The reason I ask is because I have seen FACT tables having > 150 columns. I have also come across designs where the number of columns were much , much greater than 100. Does having large number of columns (beyond a point) have adverse performance impact ?
Coming from a background of data modeling focussing on normalization, wanted your opinions on this to clear my knowledge of denormalized analytical models.
Thanks.
Karan
I was wondering that for a FACT, how many columns is regarded as too much ?
Is there a best practice to keep the number of columns in a FACT to be limited to a specific range / number ?
The reason I ask is because I have seen FACT tables having > 150 columns. I have also come across designs where the number of columns were much , much greater than 100. Does having large number of columns (beyond a point) have adverse performance impact ?
Coming from a background of data modeling focussing on normalization, wanted your opinions on this to clear my knowledge of denormalized analytical models.
Thanks.
Karan
karan_das- Posts : 10
Join date : 2013-02-26
Age : 36
Re: How much is too much really ?
100 is way too many. Ideally, the number of dimensions is around 15 but they can be more. But I think you'll find general agreement that 100 is way too many.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: How much is too much really ?
There is no limit, but if you find it getting over 20 you may need to consider rethinking a design. But I wouldn't immediately dismiss a fact with a large number of columns. However, such wide tables will usually hinder performance.
Re: How much is too much really ?
Hi,
Thanks for the replies.
Actually we are planning to build dimensional tables over which Views will be designed.
Only these views will be queried by the reporting tool.
The views that we build will be taking only a combination of few columns from the underlying fact tables (having 100 columns).
Is this a better approach ? Or is performance still an issue since the underlying FACT tables have large number of columns ?
Thanks,
Karan
Thanks for the replies.
Actually we are planning to build dimensional tables over which Views will be designed.
Only these views will be queried by the reporting tool.
The views that we build will be taking only a combination of few columns from the underlying fact tables (having 100 columns).
Is this a better approach ? Or is performance still an issue since the underlying FACT tables have large number of columns ?
Thanks,
Karan
karan_das- Posts : 10
Join date : 2013-02-26
Age : 36
Re: How much is too much really ?
HI,
a view may or may not make your query run faster (as by defination a view is a compiled version of a SQL stored in the database).
You may need to look into the query and see the performance.
thanks
a view may or may not make your query run faster (as by defination a view is a compiled version of a SQL stored in the database).
You may need to look into the query and see the performance.
thanks
hkandpal- Posts : 113
Join date : 2010-08-16
Re: How much is too much really ?
The performance issue is primarily a fact table issue. Very wide fact tables take up more space, the system retrieves fewer rows when it accesses a block of disk, requiring more disk access. Disk is the slowest thing in the system, so it can have a significant effect. Views don't help as it is a physical problem.
Re: How much is too much really ?
Hi,
We have created fact tables with around 120 columns. We have created this design for an ecommerce firm and our single fact contains shipping /sales info clubbed together. The huge number of columns is because of varied details that we have pulled together in one table. As the shipment passes through different phases we have different columns to capture each phase of shipping. These column amount to 25 itself. The only problem that we are facing is to have a solid update strategy. We want to bring our DWH near real time. Updating theses 120 columns is painful
We have created fact tables with around 120 columns. We have created this design for an ecommerce firm and our single fact contains shipping /sales info clubbed together. The huge number of columns is because of varied details that we have pulled together in one table. As the shipment passes through different phases we have different columns to capture each phase of shipping. These column amount to 25 itself. The only problem that we are facing is to have a solid update strategy. We want to bring our DWH near real time. Updating theses 120 columns is painful
rishbeck1988- Posts : 2
Join date : 2013-04-06
Age : 35
Location : India
Re: How much is too much really ?
You may need to report shipping and sales info together, but you don't have to put all details in one table.
umutiscan- Posts : 51
Join date : 2010-11-26
Age : 42
Location : Istanbul, Turkey
Re: How much is too much really ?
Hi,
Definitely we can have seperate tables for this purpose but the problem is that most of the end users are data analysts who fetch data from fact tables directly. Around 100 users use fact dimension tables directly to run queries to drive analytics/reports etc. That's why splitting of the sales and shipping information would cause huge performance load during data fetch.
Definitely we can have seperate tables for this purpose but the problem is that most of the end users are data analysts who fetch data from fact tables directly. Around 100 users use fact dimension tables directly to run queries to drive analytics/reports etc. That's why splitting of the sales and shipping information would cause huge performance load during data fetch.
rishbeck1988- Posts : 2
Join date : 2013-04-06
Age : 35
Location : India
Re: How much is too much really ?
Hi rishbeck1998,
What you describe does not sound like a DWH to me - it sounds more like the classic "data dump" with a single, very wide "fact dimension" table , in an effort "make it easier" for analysts. Perhaps the only benefit from this approach is that you have offloaded the data preparation from your core systems.
I suggest you review and consider the Kimbal Dimensional method, which will drive you to properly separate your data according to it's use and granularity. Once that design is in place you will find a huge performance improvement in data fetch, as that is the specific purpose that Diemnsional DW are designed for.
Good luck!
Mike
What you describe does not sound like a DWH to me - it sounds more like the classic "data dump" with a single, very wide "fact dimension" table , in an effort "make it easier" for analysts. Perhaps the only benefit from this approach is that you have offloaded the data preparation from your core systems.
I suggest you review and consider the Kimbal Dimensional method, which will drive you to properly separate your data according to it's use and granularity. Once that design is in place you will find a huge performance improvement in data fetch, as that is the specific purpose that Diemnsional DW are designed for.
Good luck!
Mike
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|