Distinct Count Measure on Varchar field
+2
Jeff Smith
scabral
6 posters
Page 1 of 1
Distinct Count Measure on Varchar field
I need to create a Distinct count measure on a varchar field that exists in my fact table. the field is actually a Degenerate Dimension called Claim_Num.
When i try to create the distinct count measure on this field, it gives me the following error:
"SSAS distinct count measures requires numeric data type"
Is there a workaround for this? Possibly using MDX calculation?
thanks
Scott
When i try to create the distinct count measure on this field, it gives me the following error:
"SSAS distinct count measures requires numeric data type"
Is there a workaround for this? Possibly using MDX calculation?
thanks
Scott
scabral- Posts : 58
Join date : 2012-05-02
Re: Distinct Count Measure on Varchar field
Move the Claim_Num to a dimension table and replace it in the fact table with a dimension key. Depending upon the number of distinct claim numbers, the size of the VARCHAR field, this could result in less space being used by the fact plus the new dimension table.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Distinct Count Measure on Varchar field
Jeff,
the issue with that is the dimension would not have any attributes. All of the attributes that are part of the Claim_Num are separate dimensions and are all FKs in the fact table. Claim_Num would be the only field in the dimension
the issue with that is the dimension would not have any attributes. All of the attributes that are part of the Claim_Num are separate dimensions and are all FKs in the fact table. Claim_Num would be the only field in the dimension
scabral- Posts : 58
Join date : 2012-05-02
Re: Distinct Count Measure on Varchar field
Nothing says you can't have a dimension table with 2 fields. Besides, a lot of times Claim Numbers have logic in them that can be broken down into more usable forms.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Distinct Count Measure on Varchar field
Here are a couple other ideas.
Add a numeric column whose value is mapped to the claim number.
Create a claim level fact table to track the claim metrics (e.g. number of claims). This will perform better than distinct counts at the claim line level.
Add a numeric column whose value is mapped to the claim number.
Create a claim level fact table to track the claim metrics (e.g. number of claims). This will perform better than distinct counts at the claim line level.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Distinct Count Measure on Varchar field
BoxesAndLines,
regarding your second comment, If I create a Claim Summary fact table and a Claim Line fact table, how would I join both tables so that I can see all fields?
here are the fields with regard to where they relate to either Claim Summary or Claim Line:
Claim Summary
Company_Name
Line of business
Risk_State
Agency_Name
Insured_Name
Occurrence_Status
Primary_Assignee_Name
Independent_Adjuster_Name
Loss_Date
Report_Date
Claim Line
Close_Date
FEATURE_TYPE_DESCRIPTION
Loss_Type
Claimant_Name
PERIL
Unit_Description
Loss Payment Amount
Loss Reserve Amount
Expense Payment Amount
Expense Reserve Amount
Thanks
scott
regarding your second comment, If I create a Claim Summary fact table and a Claim Line fact table, how would I join both tables so that I can see all fields?
here are the fields with regard to where they relate to either Claim Summary or Claim Line:
Claim Summary
Company_Name
Line of business
Risk_State
Agency_Name
Insured_Name
Occurrence_Status
Primary_Assignee_Name
Independent_Adjuster_Name
Loss_Date
Report_Date
Claim Line
Close_Date
FEATURE_TYPE_DESCRIPTION
Loss_Type
Claimant_Name
PERIL
Unit_Description
Loss Payment Amount
Loss Reserve Amount
Expense Payment Amount
Expense Reserve Amount
Thanks
scott
scabral- Posts : 58
Join date : 2012-05-02
Re: Distinct Count Measure on Varchar field
What Boxes and lines is suggesting is that you basically create a Claim Level Aggregate from the Claim Line Level table. You add a measure to the Claim level Aggregate called something like "Claim_Count" that has nothing but 1s in it.
The Claim level fact (or aggregate) would look a lot like your Claim Line Level Fact expect without the Line Level Dimensions.
This will work except when someone wants to know the number of claims with a particular piece of line level information. I work in dental insurance. If I someone wanted to know the number of claims for 2011, then Boxes' solution would work. But if someone wanted to know how many claims for 2011 with a filling or extraction, then it wouldn't work because the procedure is at the line level.
My Claim_Number is in it's own Dimension with a few other attributes. A Claim_Dimension could include such things as Original Open Date or Incident Date and Last Update Date, things that are at the Claim level that don't change much.
The Claim level fact (or aggregate) would look a lot like your Claim Line Level Fact expect without the Line Level Dimensions.
This will work except when someone wants to know the number of claims with a particular piece of line level information. I work in dental insurance. If I someone wanted to know the number of claims for 2011, then Boxes' solution would work. But if someone wanted to know how many claims for 2011 with a filling or extraction, then it wouldn't work because the procedure is at the line level.
My Claim_Number is in it's own Dimension with a few other attributes. A Claim_Dimension could include such things as Original Open Date or Incident Date and Last Update Date, things that are at the Claim level that don't change much.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Distinct Count Measure on Varchar field
Hi Jeff,
SSAS (2005-2012 multidimensional) does support DistinctCount measures based on non-numeric Source columns e.g. WChar. The DataType of the Measure itself must be numeric.
This usually works very elegantly for any convoluted query the users can come up with across any combination of dimensions or facts. Scalability can be a challenge - follow the recommendations to put Distinct Counts in a separate Measure Group and define Aggregations (start using the Wizard) on that Measure Group.
Good luck!
Mike
SSAS (2005-2012 multidimensional) does support DistinctCount measures based on non-numeric Source columns e.g. WChar. The DataType of the Measure itself must be numeric.
This usually works very elegantly for any convoluted query the users can come up with across any combination of dimensions or facts. Scalability can be a challenge - follow the recommendations to put Distinct Counts in a separate Measure Group and define Aggregations (start using the Wizard) on that Measure Group.
Good luck!
Mike
Re: Distinct Count Measure on Varchar field
Agreed with Mike. It's really about understanding how to use DISTINCT function in SSAS, and I don't think the problem has anything to do with your model. Claim can typically be modeled as an accumulating snapshot fact, in which case the Claim No is really nothing but a degenerate dimension in all your fact tables, as all the relevant attributes on claim have been grouped into respective dimensions around the Claim accumulating snapshot fact table, or any other type of fact table at claim level.
The point is, don't create a dimension table that is almost the same size as the fact table. If you have Claim line items, then it's similar to order-line situation where an elegant approach is to allocate order no as DD to the fact table at line level. The difference is you might also have an accumulating snapshot fact table at claim level, but it's focus is really on monitoring claim lifecycle with various milestone dates and measuring lags between them.
The point is, don't create a dimension table that is almost the same size as the fact table. If you have Claim line items, then it's similar to order-line situation where an elegant approach is to allocate order no as DD to the fact table at line level. The difference is you might also have an accumulating snapshot fact table at claim level, but it's focus is really on monitoring claim lifecycle with various milestone dates and measuring lags between them.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Distinct Count Measure on Varchar field
The Claim Number is different from Order Number. Claim Numbers typically have logic in them. For example, where I work, the claim number contains the Year and Julian Date that it was entered into the system, as well as other data, and it has a suffix to differentiate if it's a normal claim vs a cloned claim, which is used when there are COB issues. We recently converted transaction systems and 2 years of history was loaded into the new transaction system with a new claim number. My Claim Dimension contains both the new claim number and the original claim number.
The claims processing systems can have a level below the line - in our case the processing policy code. Each Claim goes through the entire adjudication process and may get denied for several reasons. This can result in a Claim Line Processing Policy Fact table with many Processing Policies for each denied Claim Line.
My claim number is 15 characters (by the way, if all claim numbers are the same length, they should be a CHAR field as a VARCHAR(15) uses more space than a CHAR(15)). If you have 150 million records in your claim_line fact table, and you use the Claim Number as a DD, the Claim Number will use up 2.25GB. If the Claim Number is it's own dimension, then it uses 0.6GB in the Fact Table. If on average the Claim has 4 Lines, then the Claim Dimension with just the Claim Number and dimension key uses just under 1 GB, and uses a total of 1.6GB for both the dimension table and fact table. If you have a fact table at the Claim/Line/processing poliy, then you save even more space. And if you aggregate data to the Claim, then even more space is saved.
If you need to join the Claim/Line fact to the Claim/Line/processing Policy Fact during the loading of the database (in case you want to assign the primary Processing Policy to the Claim/Line Fact), performance is better because you are joining on Intergers instead of Char or varchar.
Selecting on the Claim Number is queries is faster because it has fewer rows to navigate (75% fewer than with a DD). In a query, a join between the Claim Fact and Claim dimension is faster than with a DD because you are bringing over less data from the disk. Ad if you want to build an index on the Claim Number, the index will use less space if it's on a dimension table than if its a DD.
I personally don't see the upside of using a Claim Number as DD other than it's one fewer dimension table to build.
The claims processing systems can have a level below the line - in our case the processing policy code. Each Claim goes through the entire adjudication process and may get denied for several reasons. This can result in a Claim Line Processing Policy Fact table with many Processing Policies for each denied Claim Line.
My claim number is 15 characters (by the way, if all claim numbers are the same length, they should be a CHAR field as a VARCHAR(15) uses more space than a CHAR(15)). If you have 150 million records in your claim_line fact table, and you use the Claim Number as a DD, the Claim Number will use up 2.25GB. If the Claim Number is it's own dimension, then it uses 0.6GB in the Fact Table. If on average the Claim has 4 Lines, then the Claim Dimension with just the Claim Number and dimension key uses just under 1 GB, and uses a total of 1.6GB for both the dimension table and fact table. If you have a fact table at the Claim/Line/processing poliy, then you save even more space. And if you aggregate data to the Claim, then even more space is saved.
If you need to join the Claim/Line fact to the Claim/Line/processing Policy Fact during the loading of the database (in case you want to assign the primary Processing Policy to the Claim/Line Fact), performance is better because you are joining on Intergers instead of Char or varchar.
Selecting on the Claim Number is queries is faster because it has fewer rows to navigate (75% fewer than with a DD). In a query, a join between the Claim Fact and Claim dimension is faster than with a DD because you are bringing over less data from the disk. Ad if you want to build an index on the Claim Number, the index will use less space if it's on a dimension table than if its a DD.
I personally don't see the upside of using a Claim Number as DD other than it's one fewer dimension table to build.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Distinct Count Measure on Varchar field
My claim_num needs to be a varchar because it contains both numbers and letters. This is no logic behind the number, it's just the way the source system assigns them.
My claim fact table is a transaction level table at the claim_line level. Right now, i have the claim_num setup as a DD, but having issues with the distinct count because of varchar.
I see both points in either having the claim_num as a DD or a separate fact. The only thing that I can think of to add to the Claim dimension would possibly be a claim description. All other claim attributes have their own dimensions.
I guess i'm still up in the air about either keeping the Claim_Num as a DD or creating a separate Claim dimension table with the Claim_Num as a surrogate FK in the fact table.
My claim fact table is a transaction level table at the claim_line level. Right now, i have the claim_num setup as a DD, but having issues with the distinct count because of varchar.
I see both points in either having the claim_num as a DD or a separate fact. The only thing that I can think of to add to the Claim dimension would possibly be a claim description. All other claim attributes have their own dimensions.
I guess i'm still up in the air about either keeping the Claim_Num as a DD or creating a separate Claim dimension table with the Claim_Num as a surrogate FK in the fact table.
scabral- Posts : 58
Join date : 2012-05-02
Re: Distinct Count Measure on Varchar field
I would keep as a DD. It makes sense.
It's really a matter of getting SSAS to work correctly. I think Mike had some suggestions.
It's really a matter of getting SSAS to work correctly. I think Mike had some suggestions.
Similar topics
» Patient Re-admission Measure (count)
» SQL Server 2008 - CHAR and table compression vs VARCHAR in Dimension Tables
» Dimension design
» Distinct count at different level than the grain
» Finding the grain with One-To-Many fact tables.
» SQL Server 2008 - CHAR and table compression vs VARCHAR in Dimension Tables
» Dimension design
» Distinct count at different level than the grain
» Finding the grain with One-To-Many fact tables.
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum