Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Null Valued Columns performance Impact

4 posters

Go down

Null Valued Columns performance Impact Empty Null Valued Columns performance Impact

Post  rbiradav Tue Sep 27, 2011 11:59 pm

We have a de-normalized table in our HR schema which contains pretty much all of the basic info of all employee. In this table, we have several columns like employee personal info, work info etc.
I got a request to add even further details of the employee mentors(say mentor names, employee status, etc). Most of they employees will have 1 or 2 mentors and very few have 4 mentors. These mentors info we get from the same employee table so we use self joins.

My concern for adding these fields to the same emp table is for most of the employee records these columns will contain null values. We may get further requests to add more fields which may contain null values for most of the employees. I know definitely having these fields will simply the reporting users task and may improve performance.
My question is, even though there is no limit (actually 1000 columns) on the number of columns, having more and more null valued columns would impact the overall performance or not? I thought having more columns might cause row chaining which will impact the performance. Rather than adding these columns to the same table, I suggested to create a new table just for this info and we can always join to the main table using the primary keys. As long as we have proper indexes on these columns, it would not have much impact for the end users.

What would be your suggestion in this case for better performance: Adding to the same denormalized table or create a new table with one row per employee and converting rows into columns ? Basically I would like to know if we have more null valued columns (even if they are varchar2) will have any impact on the over performance impact?

I really appreciate if you could give me some suggestions on this.

rbiradav

Posts : 2
Join date : 2011-09-27

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  John Simon Wed Sep 28, 2011 1:01 am

It is certainly not ideal. Even with a NULL value you require space. So the answer depends on your database platform.
SQL Server has the ability to configure certain columns and datatypes as SPARSE to save space. From the MS website:
http://msdn.microsoft.com/en-us/library/cc280604.aspx

"Sparse columns are ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. Consider using sparse columns when the space saved is at least 20 percent to 40 percent. "


I would imagine that most mature database technologies would also have this capability.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  ngalemmo Wed Sep 28, 2011 3:00 am

Nulls, or having extra columns that are null, would not impact performance. In almost all database platforms, having such columns contain null rather than actual values helps performance. Either way, there is no real cause for concern until the table grows very large, at which point you may want to consider a vertical paritioning of the table to suit most, but not all queries.

It is more of a matter of displaying or filtering on null values. Most of the impact has more to do with helping the users understand what null is and what it represents. Or, you can just say 'null' means we don't know.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  rbiradav Wed Sep 28, 2011 9:07 am

Thank you all for the quick reply.

Just to make clear we are using ORACLE 10g and in the DW environment.

Just to make more clear on my concerns of adding the mentor columns to the employee table:

Actually we already have mentors 1 - 4 added as columns to the main employee table. Now the new request is to add about 8 – 12 fields(3 attributes for each mentors).
Possible to add to more in the future. My question is if we keep on adding these mentors info in the same EMPLOYEE table (most of them will have blanks) would there be any impact for the over all performance. My thinking the more columns we add to this table(which are used by only few repots) there will be less number of rows stored/retrieved in each block which impacts performance for most frequently used reports. So my suggestion to the users was to create a separate table as you mentioned below. Do you agree with my concern about less rows per block or it is not of a big difference compared to joining employee with mentor table?

Users are saying, we already have Mentors 1- 4 added to the employee table which is already denormalized table for reporting purpose what would be the impact if we add more columns.

Do you have any suggestion on the pros and cons of the two approaches?

rbiradav

Posts : 2
Join date : 2011-09-27

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  ngalemmo Wed Sep 28, 2011 1:07 pm

The wider a table (NOT by number of columns, but rather the amount of data on a row) the slower access will be. The question is, does representing a sizeable number of mentors as individual columns in a row the right way to go about it?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  hang Wed Sep 28, 2011 8:20 pm

It's m-m relationship between employee and mentor although they refer to one dimension table. So it is a modeling issue instead of performance issue. Generally I would use bridge table to store the m-m relationship. If there is any attribute specific to mentor, you could have a sub-dimension holding mentor attributes.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Null Valued Columns performance Impact Empty Re: Null Valued Columns performance Impact

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum