Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Distinct Count Measure on Varchar field

+2
Jeff Smith
scabral
6 posters

Go down

Distinct Count Measure on Varchar field Empty Distinct Count Measure on Varchar field

Post  scabral Thu Dec 06, 2012 4:54 pm

I need to create a Distinct count measure on a varchar field that exists in my fact table. the field is actually a Degenerate Dimension called Claim_Num.

When i try to create the distinct count measure on this field, it gives me the following error:

"SSAS distinct count measures requires numeric data type"

Is there a workaround for this? Possibly using MDX calculation?

thanks
Scott


scabral

Posts : 58
Join date : 2012-05-02

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Jeff Smith Thu Dec 06, 2012 7:25 pm

Move the Claim_Num to a dimension table and replace it in the fact table with a dimension key. Depending upon the number of distinct claim numbers, the size of the VARCHAR field, this could result in less space being used by the fact plus the new dimension table.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  scabral Thu Dec 06, 2012 9:48 pm

Jeff,

the issue with that is the dimension would not have any attributes. All of the attributes that are part of the Claim_Num are separate dimensions and are all FKs in the fact table. Claim_Num would be the only field in the dimension

scabral

Posts : 58
Join date : 2012-05-02

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Jeff Smith Thu Dec 06, 2012 10:16 pm

Nothing says you can't have a dimension table with 2 fields. Besides, a lot of times Claim Numbers have logic in them that can be broken down into more usable forms.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  BoxesAndLines Fri Dec 07, 2012 10:34 am

Here are a couple other ideas.

Add a numeric column whose value is mapped to the claim number.

Create a claim level fact table to track the claim metrics (e.g. number of claims). This will perform better than distinct counts at the claim line level.

BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  scabral Fri Dec 07, 2012 3:19 pm

BoxesAndLines,

regarding your second comment, If I create a Claim Summary fact table and a Claim Line fact table, how would I join both tables so that I can see all fields?

here are the fields with regard to where they relate to either Claim Summary or Claim Line:

Claim Summary
Company_Name
Line of business
Risk_State
Agency_Name
Insured_Name
Occurrence_Status
Primary_Assignee_Name
Independent_Adjuster_Name
Loss_Date
Report_Date

Claim Line
Close_Date
FEATURE_TYPE_DESCRIPTION
Loss_Type
Claimant_Name
PERIL
Unit_Description
Loss Payment Amount
Loss Reserve Amount
Expense Payment Amount
Expense Reserve Amount

Thanks
scott

scabral

Posts : 58
Join date : 2012-05-02

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Jeff Smith Fri Dec 07, 2012 4:47 pm

What Boxes and lines is suggesting is that you basically create a Claim Level Aggregate from the Claim Line Level table. You add a measure to the Claim level Aggregate called something like "Claim_Count" that has nothing but 1s in it.

The Claim level fact (or aggregate) would look a lot like your Claim Line Level Fact expect without the Line Level Dimensions.

This will work except when someone wants to know the number of claims with a particular piece of line level information. I work in dental insurance. If I someone wanted to know the number of claims for 2011, then Boxes' solution would work. But if someone wanted to know how many claims for 2011 with a filling or extraction, then it wouldn't work because the procedure is at the line level.

My Claim_Number is in it's own Dimension with a few other attributes. A Claim_Dimension could include such things as Original Open Date or Incident Date and Last Update Date, things that are at the Claim level that don't change much.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Mike Honey Sun Dec 09, 2012 10:44 pm

Hi Jeff,

SSAS (2005-2012 multidimensional) does support DistinctCount measures based on non-numeric Source columns e.g. WChar. The DataType of the Measure itself must be numeric.

This usually works very elegantly for any convoluted query the users can come up with across any combination of dimensions or facts. Scalability can be a challenge - follow the recommendations to put Distinct Counts in a separate Measure Group and define Aggregations (start using the Wizard) on that Measure Group.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  hang Mon Dec 10, 2012 5:21 pm

Agreed with Mike. It's really about understanding how to use DISTINCT function in SSAS, and I don't think the problem has anything to do with your model. Claim can typically be modeled as an accumulating snapshot fact, in which case the Claim No is really nothing but a degenerate dimension in all your fact tables, as all the relevant attributes on claim have been grouped into respective dimensions around the Claim accumulating snapshot fact table, or any other type of fact table at claim level.

The point is, don't create a dimension table that is almost the same size as the fact table. If you have Claim line items, then it's similar to order-line situation where an elegant approach is to allocate order no as DD to the fact table at line level. The difference is you might also have an accumulating snapshot fact table at claim level, but it's focus is really on monitoring claim lifecycle with various milestone dates and measuring lags between them.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Jeff Smith Mon Dec 10, 2012 6:57 pm

The Claim Number is different from Order Number. Claim Numbers typically have logic in them. For example, where I work, the claim number contains the Year and Julian Date that it was entered into the system, as well as other data, and it has a suffix to differentiate if it's a normal claim vs a cloned claim, which is used when there are COB issues. We recently converted transaction systems and 2 years of history was loaded into the new transaction system with a new claim number. My Claim Dimension contains both the new claim number and the original claim number.

The claims processing systems can have a level below the line - in our case the processing policy code. Each Claim goes through the entire adjudication process and may get denied for several reasons. This can result in a Claim Line Processing Policy Fact table with many Processing Policies for each denied Claim Line.

My claim number is 15 characters (by the way, if all claim numbers are the same length, they should be a CHAR field as a VARCHAR(15) uses more space than a CHAR(15)). If you have 150 million records in your claim_line fact table, and you use the Claim Number as a DD, the Claim Number will use up 2.25GB. If the Claim Number is it's own dimension, then it uses 0.6GB in the Fact Table. If on average the Claim has 4 Lines, then the Claim Dimension with just the Claim Number and dimension key uses just under 1 GB, and uses a total of 1.6GB for both the dimension table and fact table. If you have a fact table at the Claim/Line/processing poliy, then you save even more space. And if you aggregate data to the Claim, then even more space is saved.

If you need to join the Claim/Line fact to the Claim/Line/processing Policy Fact during the loading of the database (in case you want to assign the primary Processing Policy to the Claim/Line Fact), performance is better because you are joining on Intergers instead of Char or varchar.

Selecting on the Claim Number is queries is faster because it has fewer rows to navigate (75% fewer than with a DD). In a query, a join between the Claim Fact and Claim dimension is faster than with a DD because you are bringing over less data from the disk. Ad if you want to build an index on the Claim Number, the index will use less space if it's on a dimension table than if its a DD.

I personally don't see the upside of using a Claim Number as DD other than it's one fewer dimension table to build.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  scabral Mon Dec 10, 2012 11:09 pm

My claim_num needs to be a varchar because it contains both numbers and letters. This is no logic behind the number, it's just the way the source system assigns them.

My claim fact table is a transaction level table at the claim_line level. Right now, i have the claim_num setup as a DD, but having issues with the distinct count because of varchar.

I see both points in either having the claim_num as a DD or a separate fact. The only thing that I can think of to add to the Claim dimension would possibly be a claim description. All other claim attributes have their own dimensions.

I guess i'm still up in the air about either keeping the Claim_Num as a DD or creating a separate Claim dimension table with the Claim_Num as a surrogate FK in the fact table.

scabral

Posts : 58
Join date : 2012-05-02

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  ngalemmo Tue Dec 11, 2012 3:41 pm

I would keep as a DD. It makes sense.

It's really a matter of getting SSAS to work correctly. I think Mike had some suggestions.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Distinct Count Measure on Varchar field Empty Re: Distinct Count Measure on Varchar field

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum