Columnar database and Star Schema
2 posters
Page 1 of 1
Columnar database and Star Schema
I guess the star schema was designed keeping raw based RDBMS in mind and it offers the following befits as against the normalized OLTP database. However the columnar database has become quite matured in recent past i.e Sybase IQ. the questions is does Star schema still a good data model to use in columnar database?
some of advantages for start schema as i understand are: (let me know if i am missing something)
- Data for DW are to be kept in denormalized form for better query performance but this results into data explosion issues, which is very well addressed in Star Schema.
- Help arrange pre aggregated data
- Makes the query simple for reporting purpose
- Easy to analyze data with different dimensions without actually using the multi-dimensional DBMS
- star schema model is easy understand for non technical users
However Sybase IQ offers high level of compresions 50-80% in some cases. TPCH benchmark shows it reduces the database size by 3 to 10 times as compare to other market leading RDBMS. (http://www.sybase.com/content/1025531/tpch.v2.pdf)
Also IQ compes with smart indexing techniques like HG, LG, Bitmap, Bitwise etc, which address the data explosion issues by storing the unique data in field only once - similar to what we do in Dimension table. The bitwise index also helps to process avarage and aggregate quite faster.
-Nilesh
some of advantages for start schema as i understand are: (let me know if i am missing something)
- Data for DW are to be kept in denormalized form for better query performance but this results into data explosion issues, which is very well addressed in Star Schema.
- Help arrange pre aggregated data
- Makes the query simple for reporting purpose
- Easy to analyze data with different dimensions without actually using the multi-dimensional DBMS
- star schema model is easy understand for non technical users
However Sybase IQ offers high level of compresions 50-80% in some cases. TPCH benchmark shows it reduces the database size by 3 to 10 times as compare to other market leading RDBMS. (http://www.sybase.com/content/1025531/tpch.v2.pdf)
Also IQ compes with smart indexing techniques like HG, LG, Bitmap, Bitwise etc, which address the data explosion issues by storing the unique data in field only once - similar to what we do in Dimension table. The bitwise index also helps to process avarage and aggregate quite faster.
-Nilesh
ngajjar1- Posts : 4
Join date : 2010-08-27
Re: Columnar database and Star Schema
Star schemas work extremely well in columnar databases. Better than 3NF designs.
But, because columnar databases have efficient compression and a lot of advantages for locating data in large tables, doesn't mean you throw data modeling best practices out the window. You do not gain anything by implementing 'big flat tables' other than, maybe, shaving a few milliseconds off the query time.
Data modelling is mainly about providing organization and understanding to the data. The advantages you stated hold true regardless of the database platform you use.
About the only design decision that changes when using a columnar database is what to do with row level free form comments that one occasionally runs into. In a row oriented database the best practice is to put them in a junk dimension so the physical size of the data doesn't impead performance of queries that don't need the data. That is not the case with a columnar arrangement. In such a database, all it means is more vectors which are ignored if queries don't need the data. And, in fact, the number and size of such vectors would be the same had you created a separate dimension anyway... so there is no point in breaking such data out.
Also, to call the effect of denormalization of dimensional data an 'explosion' is a bit of an overstatement. Dimensional data is a small fraction of the overall storage requirements of a dimensional data warehouse, yet provides significant (often orders of magnitude) improvement over the query performance of a normalized design in a traditional row oriented database.
But, because columnar databases have efficient compression and a lot of advantages for locating data in large tables, doesn't mean you throw data modeling best practices out the window. You do not gain anything by implementing 'big flat tables' other than, maybe, shaving a few milliseconds off the query time.
Data modelling is mainly about providing organization and understanding to the data. The advantages you stated hold true regardless of the database platform you use.
About the only design decision that changes when using a columnar database is what to do with row level free form comments that one occasionally runs into. In a row oriented database the best practice is to put them in a junk dimension so the physical size of the data doesn't impead performance of queries that don't need the data. That is not the case with a columnar arrangement. In such a database, all it means is more vectors which are ignored if queries don't need the data. And, in fact, the number and size of such vectors would be the same had you created a separate dimension anyway... so there is no point in breaking such data out.
Also, to call the effect of denormalization of dimensional data an 'explosion' is a bit of an overstatement. Dimensional data is a small fraction of the overall storage requirements of a dimensional data warehouse, yet provides significant (often orders of magnitude) improvement over the query performance of a normalized design in a traditional row oriented database.
Last edited by ngalemmo on Fri Aug 27, 2010 1:02 pm; edited 1 time in total
Re: Columnar database and Star Schema
Thanks ngalemmo - I didn't meant to throw the data modeling practice (like star schema) our of window, sorry if i was sounded like that. Rational for the post was to confirm my understanding and to know more from Guru like you.
what you are saying makes sense - about data organization and little cost of joins.
Thanks
-nilesh
what you are saying makes sense - about data organization and little cost of joins.
Thanks
-nilesh
ngajjar1- Posts : 4
Join date : 2010-08-27
Similar topics
» In-Memory Database - Does star schema still relevant ?
» Designing for columnar database
» Large Degenerate Dimension Values
» Start schema in relational database vs OLAP
» Best Practice Database Schema/user layout
» Designing for columnar database
» Large Degenerate Dimension Values
» Start schema in relational database vs OLAP
» Best Practice Database Schema/user layout
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum