Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Large Degenerate Dimension Values

+2
BoxesAndLines
Dave Jermy
6 posters

Go down

Large Degenerate Dimension Values Empty Large Degenerate Dimension Values

Post  Dave Jermy Mon Oct 03, 2011 6:10 am

I am designing a new dmensional model for an insurance company and users want to analyse the documents (stored in sharepoint) that have been attached to various entities (policies, claims etc.)

I have designed a Document dimension to hold some of the document metadata (document type, file type, document source). This is a relatively small dimension with just a couple of thousand rows. The fact table (grain: 1 row per change to a document's metadata) has FKs to the Date (for created date), Employee (for creator), Policy and Document dimensions and a small number of measures (count, filesize etc.)

This leaves me with a couple of attributes to fit into the model: Document Title and Document URL. The second is unique across all documents and the first is almost so, so my first thought was to put them onto the fact table as degenerate dimensions.

However, they are very long varchar columns (255 and 2000 characters respectively). Looking at the existing documents, the average combined length of the values in these columns is 157 characters (min 99, max 287). Given that the fact table is quite slender (<30 bytes), adding them seems wrong and may impact on query performance. But, I don't want to add them into the Document dimension, either as it would then be almost 1 to 1 with the fact table.

Any suggestions on the best way to model this?

Dave Jermy

Posts : 33
Join date : 2011-03-24
Location : London, UK

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  BoxesAndLines Mon Oct 03, 2011 10:42 am

You've identfied the two methods. Do some volume testing to see which works best. If the performance is similar go with the dimension solution since it is cleaner.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  VHF Mon Oct 03, 2011 12:06 pm

Because your fact table represents an instance of a document, it seems like that is where Document Title and Document URL should go as DD. I suspect in this case the majority of usage of this fact table will involve at least one of these fields, so there isn't a lot of performance savings to be gained from trying to move them elsewhere. If most of the useage of this fact table involved aggregating other measures, then it might be worthwhile trying to relocate the long DD columns to keep the fact table narrow.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  ngalemmo Mon Oct 03, 2011 2:39 pm

As VHF hinted, it becomes a matter of how often these values are appear in queries. Adding them as DD's, even if it is a document fact table, will severly impact performance. Adding 150+ bytes to a < 30 byte row will have a dramatic hit to I/O. If 80% of the queries need the data, DD's may be the correct option, otherwise a separate dimension, even if the cardinality is high, would be the better choice.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  Dave Jermy Tue Oct 04, 2011 4:21 am

Thanks for the responses guys - I'll dig into the requirements a little deeper and do some performance testing.

Dave Jermy

Posts : 33
Join date : 2011-03-24
Location : London, UK

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  Mike Honey Tue Oct 04, 2011 6:58 pm

Hi Dave,

Based on your description of the Fact table grain, I'd definitely go for storing them on the Dimension. I imagine there will be several Fact records for each Dimension record so you would be duplicating data if you put those attributes on the Fact. It would also be much easier to manage changes to those attributes if they are on the Dimension.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Large Degenerate Dimension Values Empty Columnar-Oriented Database Eliminates Inefficiency of Columns Not Used by a Query

Post  Alan Musnikow Thu Oct 06, 2011 5:06 pm

This issue demonstrates a benefit of a columnar-oriented database. Adding the columns to the fact table in a columnar-oriented database, such as Infobright, would impose no performance penalty on queries that do not use the columns.

In addition, Infobright would very heavily compress these columns.

In other words: "Adding 150+ bytes to a < 30 byte row will have" no impact on I/O in a columnar-oriented database for queries that do not use the added columns.
Alan Musnikow
Alan Musnikow

Posts : 6
Join date : 2010-06-17
Location : Lexington, Massachusetts, U.S.

http://musnikow.com/

Back to top Go down

Large Degenerate Dimension Values Empty Re: Large Degenerate Dimension Values

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum