estimate of the approximate size of the data mart
4 posters
Page 1 of 1
estimate of the approximate size of the data mart
If there are 250,000 sales records in a data mart; how would you estimate of the approximate size of the data mart and each of its tables?
Regards,
Liamptk
Regards,
Liamptk
liamptk- Posts : 6
Join date : 2016-03-13
Re: estimate of the approximate size of the data mart
If the tables already exist, a dbms systems keeps statistics on the and you should be able to get average row width and table size information directly.
If the tables don't exist, there is an easy way and a hard way. The hard way is to research the size of each data type, the overhead for storing a row, understand index structures and so on. Frankly, it isn't worth the bother. The easy way is to roughly estimate row width (chars are 1 byte, nchar is 2 to 4 bytes per character, and your manual should tell you the width of numeric fields), multiply by the number of rows, then double or triple it to account for indexes. That should get you close.
Keep in mind, 250K rows is not a lot of data. Just swag it at 1GB and that should be plenty.
If the tables don't exist, there is an easy way and a hard way. The hard way is to research the size of each data type, the overhead for storing a row, understand index structures and so on. Frankly, it isn't worth the bother. The easy way is to roughly estimate row width (chars are 1 byte, nchar is 2 to 4 bytes per character, and your manual should tell you the width of numeric fields), multiply by the number of rows, then double or triple it to account for indexes. That should get you close.
Keep in mind, 250K rows is not a lot of data. Just swag it at 1GB and that should be plenty.
Re: estimate of the approximate size of the data mart
I appreciate the response, unfortunately this is an assignment question of mine so I'm just gauging how I begin to even approach this.
I assume working out the size of an average row, by determining the byte sizes of each value, then multiplying that by the 250,000 is the best approach...
I assume working out the size of an average row, by determining the byte sizes of each value, then multiplying that by the 250,000 is the best approach...
liamptk- Posts : 6
Join date : 2016-03-13
Re: estimate of the approximate size of the data mart
How accurate are you expected to be?
Work out the contents of your dimensions and facts, and the data types of each column.
Do you have staging tables in the warehouse? You'll need to account for their contents as well.
Most database documentation will provide the number of bytes of storage for each data type. That isn't enough, because you'll need to understand internal structure of the database to account for page and file structures.
You'll also need to account for indexes, if supported by the database.
Is compression involved? A column store? Either of those will significantly reduce the size.
I think ngalemmo's answer is the most reasonable
Work out the contents of your dimensions and facts, and the data types of each column.
Do you have staging tables in the warehouse? You'll need to account for their contents as well.
Most database documentation will provide the number of bytes of storage for each data type. That isn't enough, because you'll need to understand internal structure of the database to account for page and file structures.
You'll also need to account for indexes, if supported by the database.
Is compression involved? A column store? Either of those will significantly reduce the size.
I think ngalemmo's answer is the most reasonable
Re: estimate of the approximate size of the data mart
Ok. Since it is an academic exercise, you got to do it the hard way.
Re: estimate of the approximate size of the data mart
Could anyone clarify further what this means..."discuss the relative size of each of its tables". Would this be the staging area (fact and dimension tables)?
liamptk- Posts : 6
Join date : 2016-03-13
Re: estimate of the approximate size of the data mart
The staging area would be a separate set of tables (typically in a different database) used as an intermediate step in loading the data mart. Because your assignment specifically said for you to determine the size of the tables in the data mart, I would disregard the staging area for purposes of this exercise. Do you know what tables exist in the data mart besides the fact table? For example, is there a customer dimension table and a product dimension table?
You are on the right track that you need to calculate the row size for each table, then multiply by the number of records in the table.
You are on the right track that you need to calculate the row size for each table, then multiply by the number of records in the table.
Last edited by VHF on Mon Mar 14, 2016 2:17 pm; edited 2 times in total (Reason for editing : added clarification)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Similar topics
» raw data size versus bufferpool/cache size
» Enormous data size
» difference between data mart and data warehouse at logical/physical level
» Data Mart/Dimensional Data Store Definition.
» Data Mart Does Not Equal Data Warehouse
» Enormous data size
» difference between data mart and data warehouse at logical/physical level
» Data Mart/Dimensional Data Store Definition.
» Data Mart Does Not Equal Data Warehouse
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum