Designing for columnar database

Aheldt Thu Dec 08, 2011 11:27 am

We are implementing our EDW layer in a highly normalized manner on a columnar database. The current design approach for the transactional or business event data is to have single measure/value column table (plus the keys) to hold all the values and have a corresponding table with the column name. In contrast, the other approach would be to have each column defined in the table. The trade-off was centered on easily extending the EDW since adding a new column would not require structure change.

I'm trying to find some information on whether this is a good design approach when using a columnar database. My gut tells me no as the single column design would seem to work against the columnar database approach of storing data by column. Simple example to illustrate...1 million row table with 10 measure columns. The single column table means this becomes column table/index of 10 million rows versus 10 column tables/indexes with 1 million rows each. In addition to this, there is always a join to the other table to know which column you are accessing....so lots of potentially 10 million row joins to get back to this is the column in the selection query. I understand that compression could be pretty good since the set of value could have even more redundancy, but that could also mean distribution of the data is really messed up because all the various columns are intermixed.

Any thoughts or factual information on this?

Thank you, Andy

thomashaughey Wed May 30, 2012 6:49 pm

I know this is late to answer, but are you still looking for an answer? I have one question before proposing an answer: is your columnar database an inverted column database such as IQ or Vertica, or a wide-column database like Google's BigTable or Cassandra? the answer to this will significantly affect my answer.

ksramjee Wed Jun 27, 2012 6:50 am

.
Hi
it is good to know your views on both inverted and wide options. .can you pl help ?

regards
Sri

ngalemmo Wed Jun 27, 2012 1:50 pm

High levels of normalization work against you in a columnar database, such as Sybase IQ or Vertica. Using name/value pairs in such a database is probably the worst thing you can do.

Sponsored content

Designing for columnar database

Designing for columnar database

Re: Designing for columnar database

Re: Designing for columnar database

Re: Designing for columnar database

Re: Designing for columnar database