Calculating Fact table sizes and free form text
5 posters
Page 1 of 1
Calculating Fact table sizes and free form text
Iam looking at the fact table that i have designed and have a few questions.
1) Calculating size of fact table - i assume 4 bytes for integer, 2 for small int, 16 for timestamp, 8 for float measures and add all the sizes up, does that sound right?
2) There is a free form text field varchar(100) that somebody wants to include in the fact table. Argument is, if its not included in the fact table, anyway we need to join to a dimension which will be 1:1 to the fact table and this column will be used often. This is free form text, no way to dimensionalize it.
3) Kimball toolkit says include timestamps in fact table. For Accumulating snapshots with 9-10 date roles, this really increases fact table size.
Any comments?
Thanks,
Arun.
1) Calculating size of fact table - i assume 4 bytes for integer, 2 for small int, 16 for timestamp, 8 for float measures and add all the sizes up, does that sound right?
2) There is a free form text field varchar(100) that somebody wants to include in the fact table. Argument is, if its not included in the fact table, anyway we need to join to a dimension which will be 1:1 to the fact table and this column will be used often. This is free form text, no way to dimensionalize it.
3) Kimball toolkit says include timestamps in fact table. For Accumulating snapshots with 9-10 date roles, this really increases fact table size.
Any comments?
Thanks,
Arun.
arfoot653@gmail.com- Posts : 9
Join date : 2012-02-01
Re: Calculating Fact table sizes and free form text
It really depends. Consult the documentation that came with your DBMS for data type sizes. Also, many compress things and there is row overhead to consider. It is far easier to load the data then check the sizes. Also, indexes make up such a large portion of the total space used on most DBMS systems that raw data size is almost meaningless. At best you are going to get a ballpark number that you then multiply by 3 or 4 to make up for indexing, transaction logging and other overhead. So, precision isn't really an issue... just make sure the number is pessimistic if you are using it for disk procurement.
As far as free form text goes, it also depends. Some DBMS systems will allow you to place CLOB type data in a separate 'table' with a DB maintained link to the rest of the row. The problem with physically including it in the same row and table is it bloats the table and slows down queries for everybody, not just the ones needing the text. By separating it, only those using the text take the performance hit.
What Kimball means is that if time is important, it should not be part of the date dimension, but rather stored as is in the fact. You would still have FKs to the date dimension. The question is, is time really necessary for all the 9-10 date roles you mention? If there are attributes related to time, you usually have a separate time dimension (i.e. just 24 hours worth) at some level of precision (usually minutes is enough).
As far as free form text goes, it also depends. Some DBMS systems will allow you to place CLOB type data in a separate 'table' with a DB maintained link to the rest of the row. The problem with physically including it in the same row and table is it bloats the table and slows down queries for everybody, not just the ones needing the text. By separating it, only those using the text take the performance hit.
What Kimball means is that if time is important, it should not be part of the date dimension, but rather stored as is in the fact. You would still have FKs to the date dimension. The question is, is time really necessary for all the 9-10 date roles you mention? If there are attributes related to time, you usually have a separate time dimension (i.e. just 24 hours worth) at some level of precision (usually minutes is enough).
Re: Calculating Fact table sizes and free form text
If your requriement requires accumlated fact then you need to use accumlated fact,there is no hiding.
But you can do one thing is to put date_key there in fact instead of dates ( if you required dates and not very precise timestamps).
ngalemmo -
what do you mean by bloating??
But you can do one thing is to put date_key there in fact instead of dates ( if you required dates and not very precise timestamps).
ngalemmo -
The problem with physically including it in the same row and table is it bloats the table and slows down queries for everybody
what do you mean by bloating??
Vishy- Posts : 75
Join date : 2012-02-28
Location : India
Re: Calculating Fact table sizes and free form text
A bloated feeling is what you feel after you eat too much food. A bloated fact table has lots of textual columns which ultimately decrease the performance of the fact table since it takes more bytes to retrieve one row of data. The astute will note that DBMS' don't fetch a row of data, they fectch a block of data. The bigger the row, the less rows retrived per block.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Calculating Fact table sizes and free form text
1. Sounds like you are using SQL Server as those numbers sound right to me.arfoot653@gmail.com wrote:Iam looking at the fact table that i have designed and have a few questions.
1) Calculating size of fact table - i assume 4 bytes for integer, 2 for small int, 16 for timestamp, 8 for float measures and add all the sizes up, does that sound right?
2) There is a free form text field varchar(100) that somebody wants to include in the fact table. Argument is, if its not included in the fact table, anyway we need to join to a dimension which will be 1:1 to the fact table and this column will be used often. This is free form text, no way to dimensionalize it.
3) Kimball toolkit says include timestamps in fact table. For Accumulating snapshots with 9-10 date roles, this really increases fact table size.
Any comments?
Thanks,
Arun.
2. Don't include it in the fact tavle. It doesn't matter if it will be used often - everytime I've had this kind of freeform text column users want to look at a only a small set of data, generally a hundred records or less. Any performance impact here will be minimal. How often will you be doing sums or other aggregates? This is where you will see a performance hit. See this post on my blog to see the actual impacts on performance:
http://jsimonbi.wordpress.com/2011/05/16/dimensional-modeling-worst-practices/
3. If you already store the timestamp in a transactional fact table, I wouldn't bother putting it in an Accumulating Snapshot unless there was a clear business requirement for it.
Re: Calculating Fact table sizes and free form text
John,
I read your article and it is good, but sometime we need to have degenerate dimensions which are of char type, what do you think of that?? Is there any criteria like we should store char type degenerated dimension which are upto 2-3 chars length, anything more then this should take a proper dim shape.
I read your article and it is good, but sometime we need to have degenerate dimensions which are of char type, what do you think of that?? Is there any criteria like we should store char type degenerated dimension which are upto 2-3 chars length, anything more then this should take a proper dim shape.
Vishy- Posts : 75
Join date : 2012-02-28
Location : India
Re: Calculating Fact table sizes and free form text
Vishy wrote:John,
I read your article and it is good, but sometime we need to have degenerate dimensions which are of char type, what do you think of that?? Is there any criteria like we should store char type degenerated dimension which are upto 2-3 chars length, anything more then this should take a proper dim shape.
That's fine Vishy. If we need to have multiple degenerate dimension columns, then in the pastI have put them in a separate dimension. But if you only have one or two small ones, I don't think you'll notice much of a performance impact.
Similar topics
» Free Form Text Attribute in Fact Table
» Free form text in transaction table
» Handling multiple free form text comments
» Free text fields in Transaction table
» Descriptive Text in Fact table
» Free form text in transaction table
» Handling multiple free form text comments
» Free text fields in Transaction table
» Descriptive Text in Fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum