Designing for columnar database
4 posters
Page 1 of 1
Designing for columnar database
We are implementing our EDW layer in a highly normalized manner on a columnar database. The current design approach for the transactional or business event data is to have single measure/value column table (plus the keys) to hold all the values and have a corresponding table with the column name. In contrast, the other approach would be to have each column defined in the table. The trade-off was centered on easily extending the EDW since adding a new column would not require structure change.
I'm trying to find some information on whether this is a good design approach when using a columnar database. My gut tells me no as the single column design would seem to work against the columnar database approach of storing data by column. Simple example to illustrate...1 million row table with 10 measure columns. The single column table means this becomes column table/index of 10 million rows versus 10 column tables/indexes with 1 million rows each. In addition to this, there is always a join to the other table to know which column you are accessing....so lots of potentially 10 million row joins to get back to this is the column in the selection query. I understand that compression could be pretty good since the set of value could have even more redundancy, but that could also mean distribution of the data is really messed up because all the various columns are intermixed.
Any thoughts or factual information on this?
Thank you, Andy
I'm trying to find some information on whether this is a good design approach when using a columnar database. My gut tells me no as the single column design would seem to work against the columnar database approach of storing data by column. Simple example to illustrate...1 million row table with 10 measure columns. The single column table means this becomes column table/index of 10 million rows versus 10 column tables/indexes with 1 million rows each. In addition to this, there is always a join to the other table to know which column you are accessing....so lots of potentially 10 million row joins to get back to this is the column in the selection query. I understand that compression could be pretty good since the set of value could have even more redundancy, but that could also mean distribution of the data is really messed up because all the various columns are intermixed.
Any thoughts or factual information on this?
Thank you, Andy
Aheldt- Posts : 1
Join date : 2011-12-08
Re: Designing for columnar database
I know this is late to answer, but are you still looking for an answer? I have one question before proposing an answer: is your columnar database an inverted column database such as IQ or Vertica, or a wide-column database like Google's BigTable or Cassandra? the answer to this will significantly affect my answer.
thomashaughey- Posts : 2
Join date : 2012-05-10
Re: Designing for columnar database
.
Hi
it is good to know your views on both inverted and wide options. .can you pl help ?
regards
Sri
ksramjee- Posts : 1
Join date : 2012-06-27
Re: Designing for columnar database
High levels of normalization work against you in a columnar database, such as Sybase IQ or Vertica. Using name/value pairs in such a database is probably the worst thing you can do.
Similar topics
» Columnar database and Star Schema
» Large Degenerate Dimension Values
» star schema designing
» Help designing star schema
» Designing a accumlating snapshot
» Large Degenerate Dimension Values
» star schema designing
» Help designing star schema
» Designing a accumlating snapshot
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum