Null values in facts, yes or no?
5 posters
Page 1 of 1
Null values in facts, yes or no?
Dear all,
I'm wondering if anyone could help me out with the following (simplified) case. This is for my Business Intelligence study on a Dutch university where we do a project to apply BI to the student information systems, especially their results and grades.
Case:
For a whole study, students have to follow courses (about 10 each year) for which they (mostly) get grades. In Holland, grades range from 1 to 10, where 10 is best and 1 is worse. You have to get a 5,5 to pass a course otherwise you have to retake the classes for that course. Up untill this point our DWH design seems kind of easy, some dimensions (course, student, time, [...]) and results as facts. Then we can easily get everything we need to know about these grades and which students passed all their courses.
The big difference here is that there are some courses for which no grades are assigned. You could just get PASSED or NOT PASSED. Now we're wondering how to process this change in our fact tables.
We are thinking of:
But in this case, how should we handle these courses without grades? It would be the easiest to pass a NULL value in this case for the grade, but most BI professionals do not advise this, because of both performance problems and unexpected results.
If we would aggregrate grades, it would be averages. By example top students are all those with all courses passed, an average grade of 7,5 and no grade lower than 7. For this grade measurement courses without grades are not taken in the account for the average. We would never total data. For this reason, we are unable to pass a '0' value.
Another solution would be to add a small 'grade'-dimension, with only grade and ispassed. But then calculations would be made from the dimensions and that shouldn't be ok, I guess.
So, I am very interested if anyone would know of any appropiate solution for this case. I appreciate your help very much.
With regards,
Jochem van Grondelle
student in Amsterdam, Netherlands
jochem@vangrondelle.net
I'm wondering if anyone could help me out with the following (simplified) case. This is for my Business Intelligence study on a Dutch university where we do a project to apply BI to the student information systems, especially their results and grades.
Case:
For a whole study, students have to follow courses (about 10 each year) for which they (mostly) get grades. In Holland, grades range from 1 to 10, where 10 is best and 1 is worse. You have to get a 5,5 to pass a course otherwise you have to retake the classes for that course. Up untill this point our DWH design seems kind of easy, some dimensions (course, student, time, [...]) and results as facts. Then we can easily get everything we need to know about these grades and which students passed all their courses.
The big difference here is that there are some courses for which no grades are assigned. You could just get PASSED or NOT PASSED. Now we're wondering how to process this change in our fact tables.
We are thinking of:
- Code:
RESULT FACTS
---------------------------
TimeDimension_key (int)
StudentDimension_key (int)
courseDimension_key (int)
Grade (float)
IsPassed (bit)
But in this case, how should we handle these courses without grades? It would be the easiest to pass a NULL value in this case for the grade, but most BI professionals do not advise this, because of both performance problems and unexpected results.
If we would aggregrate grades, it would be averages. By example top students are all those with all courses passed, an average grade of 7,5 and no grade lower than 7. For this grade measurement courses without grades are not taken in the account for the average. We would never total data. For this reason, we are unable to pass a '0' value.
Another solution would be to add a small 'grade'-dimension, with only grade and ispassed. But then calculations would be made from the dimensions and that shouldn't be ok, I guess.
So, I am very interested if anyone would know of any appropiate solution for this case. I appreciate your help very much.
With regards,
Jochem van Grondelle
student in Amsterdam, Netherlands
jochem@vangrondelle.net
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Null values in facts, yes or no?
"but most BI professionals do not advise this, because of both performance problems and unexpected results"...
This is an issue with foreign keys, not attributes or values. The thing is, if there is no grade, there is no grade... simple as that. There isn't any good reason to invent one. Keeping it null when there is no grade has advantages. If you are calculating an average grade for a student, null values will be ignored in the calculation using standard SQL, giving correct results. (the average of 5, null, 6, 7 is 6 not 4.5)
What you may want to consider is adding a grading method attribute to the course information (or if is it sometimes an elective, that is the student can choose to get a pass/fail grade, have a separate dimension indicating grading method) so that users can easily segregate what should and should not have a grade.
This is an issue with foreign keys, not attributes or values. The thing is, if there is no grade, there is no grade... simple as that. There isn't any good reason to invent one. Keeping it null when there is no grade has advantages. If you are calculating an average grade for a student, null values will be ignored in the calculation using standard SQL, giving correct results. (the average of 5, null, 6, 7 is 6 not 4.5)
What you may want to consider is adding a grading method attribute to the course information (or if is it sometimes an elective, that is the student can choose to get a pass/fail grade, have a separate dimension indicating grading method) so that users can easily segregate what should and should not have a grade.
Re: Null values in facts, yes or no?
One other thing, I assume you would always set the IsPassed value regardless of how the course was graded.
Re: Null values in facts, yes or no?
ngalemmo wrote:"...The thing is, if there is no grade, there is no grade... simple as that. There isn't any good reason to invent one. Keeping it null when there is no grade has advantages...."
Thanks for your quick answer. It is a very clear explanation. This is what I thought, although could you confirm that this does not result in a performance degradation?
Last edited by jochem_van_grondelle on Tue Sep 22, 2009 4:38 pm; edited 1 time in total
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Null values in facts, yes or no?
Definitely!ngalemmo wrote:One other thing, I assume you would always set the IsPassed value regardless of how the course was graded.
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Null values in facts, yes or no?
jochem_van_grondelle wrote:Thanks for your quick answer. It is a very clear explanation. This is what I thought, although could you confirm that this does not result in a performance degradation?
I have no explicit data one way or the other, but I have never noticed any impact based on weither an attribute was null or not. A null foreign key however would certainly impact results (but not performance) as all such rows would not be selected if an inner join is performed using that key.
By the way, Oracle has a non-standard definition of null. Any varchar field containing an empty string is considered 'null' even if the value of that field was explicitly set to an empty string. Null (i.e. blank) character fields are very common in Oracle DBs and it doesn't seem to affect how it performs.
Re: Null values in facts, yes or no?
I would probably implement this solution in two fields and both can be on the fact table. I would have a Grade Point field and a Pass/No Pass field. The Grade Point field for a couse that does not offer grades can be set to 0. I understand your explanation of why you do not want to set it to zero but you can filter out these records when you are computing a students average Grade Point very easily; just do not include courses where GP = 0 (assuming that this is the only way a zero is being used). One advantage of haviing the two fields is that a user can easily find out the courses a student has not passed by querying the Pass/No Pass field.
beyeguru- Posts : 5
Join date : 2009-08-03
Re: Null values in facts, yes or no?
After some research this week , it appears that NULL valued facts are not a problem (anymore) with the current generation of datawarehouse systems. If we would need a column without it, we could always create an extra view.
Could you please explain why you would not use NULL values?
Could you please explain why you would not use NULL values?
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Null values in facts, yes or no?
You can't join on a null foreign key column. This can cause all sorts of user errors when running reports. My FK's are always NOT NULL. I do occasionally have other data on the fact table that is nullable.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Null values in facts, yes or no?
Perhaps an attribute "IsGraded" (Yes/No) on the Course Dimension and a Null value for the Grade.
rajsundar- Posts : 4
Join date : 2009-08-06
Re: Null values in facts, yes or no?
Thanks for your reply. Maybe you understand my question wrong, but I'm talking about nullable facts, not about nullable foreign keys.BoxesAndLines wrote:You can't join on a null foreign key column. This can cause all sorts of user errors when running reports. My FK's are always NOT NULL. I do occasionally have other data on the fact table that is nullable.
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Similar topics
» Handling Null Values in Facts
» Multiple Facts or Single Facts and Status Table?
» NULL Values in Fact Table
» NULL DATES IN FACTS
» Source Values and Conformed Values in the Dimension table
» Multiple Facts or Single Facts and Status Table?
» NULL Values in Fact Table
» NULL DATES IN FACTS
» Source Values and Conformed Values in the Dimension table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum