Modelling Age / Date of Birth
5 posters
Page 1 of 1
Modelling Age / Date of Birth
Sorry if this is a simple or frequently repeated question. I'm still relatively inexperienced at dimensional data modelling...
We have a fact table, which for the sake of my question let's call it FACT_SERVICES. One of the fields in the fact table is the Age of the recipient of the services. This field is used for calculating things like the average age of our clients, max/min age and so forth. It is also used for reports, where it is frequently rolled up to groupings, eg 18-25, 26-35 and so forth.
Age is calculated from the date of services, less the client's DOB. Due to the usual data quality issues, this often produces incorrect results like clients with negative ages.
My initial reaction was to create an AGE surrogate key, and then put together a DIM_AGE table. Any age less than 18 and more than 120 would be given a surrogate key of 0, which in the DIM table would relate to 'UNKNOWN'. The advantage of the DIM table is that I could then also create attributes to handle commonly used grouping methodologies eg, "Broad" Age categories such as 18-30, 31-50, 51-65, 65+ or "Narrow" categories such as 18-25, 26-30, 31-35 on so forth. At the moment, grouping ages is handled differently in each report depending on the individual report developer (urgh!)
But I'm not sure how I then handle the question of average age.
In this instance, should age be treated as both a measure and a dimension?
We have a fact table, which for the sake of my question let's call it FACT_SERVICES. One of the fields in the fact table is the Age of the recipient of the services. This field is used for calculating things like the average age of our clients, max/min age and so forth. It is also used for reports, where it is frequently rolled up to groupings, eg 18-25, 26-35 and so forth.
Age is calculated from the date of services, less the client's DOB. Due to the usual data quality issues, this often produces incorrect results like clients with negative ages.
My initial reaction was to create an AGE surrogate key, and then put together a DIM_AGE table. Any age less than 18 and more than 120 would be given a surrogate key of 0, which in the DIM table would relate to 'UNKNOWN'. The advantage of the DIM table is that I could then also create attributes to handle commonly used grouping methodologies eg, "Broad" Age categories such as 18-30, 31-50, 51-65, 65+ or "Narrow" categories such as 18-25, 26-30, 31-35 on so forth. At the moment, grouping ages is handled differently in each report depending on the individual report developer (urgh!)
But I'm not sure how I then handle the question of average age.
In this instance, should age be treated as both a measure and a dimension?
Tyberious Funk- Posts : 3
Join date : 2013-02-21
RE: Modelling Age / Date of Birth
This is a thoughtful approach, but will need effort for implementation. You are treating age as a measure and thinking to have a dimension for it... does not sound smart. I do not intend to say that it is wrong. Option to have standard age group is a good case.
Store age as a calculated measure in the fact. You can handle exception during ETL, like replace age >120 and <18 as '0' or 18 if it is <18 and 100 if it is >120 and store these kind of exception separately with a unique identifier (Service ID in this case). Send the list back to the source team to fix these kind of data issues and demonstrate how a DW solution can help improve data quality.
Store age as a calculated measure in the fact. You can handle exception during ETL, like replace age >120 and <18 as '0' or 18 if it is <18 and 100 if it is >120 and store these kind of exception separately with a unique identifier (Service ID in this case). Send the list back to the source team to fix these kind of data issues and demonstrate how a DW solution can help improve data quality.
rathjeevesh- Posts : 15
Join date : 2013-02-16
Re: Modelling Age / Date of Birth
If you are calculating the age based on date of service, and you usually use other attributes from the client dimension in queries, I would tend to not bother storing age at all, and calculate it on the fly. This would allow the results to change as you get better birthdate information.
Cases where is it mandatory you store age in the fact usually involve health care and health insurance. In these cases, the age that goes into the fact (as a degenerate dimension) is the age recorded on the claim or admission form. It is not calculated.
Cases where is it mandatory you store age in the fact usually involve health care and health insurance. In these cases, the age that goes into the fact (as a degenerate dimension) is the age recorded on the claim or admission form. It is not calculated.
Re: Modelling Age / Date of Birth
Make sure your age banding dimension goes to the "age" level. This is analogous to the date dimension.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Modelling Age / Date of Birth
I believe AGE that TF is referring to is the age of client at the time he/she received service. Average, max, min age are some of the prospective Age is looked from. That is why I think age is a measure which has a fixed value at the time the service started unlike generic age which keeps on changing and should be calculated as TODAY-DOB. So storing the value in the fact table will keep it simple.
-JR
-JR
rathjeevesh- Posts : 15
Join date : 2013-02-16
Re: Modelling Age / Date of Birth
I like the idea of storing your clients date of birth (DOB) in a client dimension. I also suspect its best to "update' or "overwrite" this DOB if the operational system where it comes from gets a new value UNLESS you have a need to know the DOB (age) of the client as it was when the service transaction was generated in which case you might want a type 3 dimension with current DOB and a slowly changed DOB. This way you have the "latest" and greatest DOB to work with and the DOB at the time of the transaction. If you store these in the fact table you may have lots more rows to update given a "fix" to a date of birth in the operational system.
With this method of storing our DOB in the dimension, your age becomes a calculated field as already suggested above -> also you may have more than one calculation. You could have
1) age as of the date of service
2) cohort calculations (age groups) as of the date of service
3) current age of your clients
4) current age cohorts
Lastly, avg, min, max age can be treated the same way. They become a calculations based on age as of the date of service or current age -> depending on what question you are trying to answer.
With this method of storing our DOB in the dimension, your age becomes a calculated field as already suggested above -> also you may have more than one calculation. You could have
1) age as of the date of service
2) cohort calculations (age groups) as of the date of service
3) current age of your clients
4) current age cohorts
Lastly, avg, min, max age can be treated the same way. They become a calculations based on age as of the date of service or current age -> depending on what question you are trying to answer.
Tim Webber- Posts : 11
Join date : 2013-02-01
Re: Modelling Age / Date of Birth
rathjeevesh wrote:I believe AGE that TF is referring to is the age of client at the time he/she received service. Average, max, min age are some of the prospective Age is looked from. That is why I think age is a measure which has a fixed value at the time the service started unlike generic age which keeps on changing and should be calculated as TODAY-DOB. So storing the value in the fact table will keep it simple.
-JR
Yes it is. But, it is calculated based on the service date and the birthdate on record. It is not the client declaring their age and storing that information. The transaction will always have the service date, and the client will always have a birthdate, so the derived age can always be derived and does not need to be stored.
![-](https://2img.net/i/empty.gif)
» Adding a fact table with start date and end date to a (SSAS) multidimensional cube
» Date Dimension: Representing partial dates/Imputing date values
» Initial date of effective date column for SCD 2 implementation
» scd2 effective date, end date data type
» Dimensional Modelling compared to Relational (3NF +) Modelling for DSS
» Date Dimension: Representing partial dates/Imputing date values
» Initial date of effective date column for SCD 2 implementation
» scd2 effective date, end date data type
» Dimensional Modelling compared to Relational (3NF +) Modelling for DSS
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|