Big data vs rdbms
3 posters
Page 1 of 1
Big data vs rdbms
Hi,
I am new to big data. Can anybody provides comparison between rdbms vs big data? In what way the big data will change the dwh architecture and the way we model the db?
I am new to big data. Can anybody provides comparison between rdbms vs big data? In what way the big data will change the dwh architecture and the way we model the db?
dbadwh- Posts : 31
Join date : 2011-09-30
Re: Big data vs rdbms
I guess it depends on your definition of 'big data'.
There is the size distinction, which is primarily a hardware and physical modeling issue rather than a logical modeling challenge. There there is the structured versus unstructured data distinction.
When it comes to unstructured data its a matter of deciding if you want to structure it or leave it unstructured. Frankly, for analysis, I tend to lean towards implementing structure to the data so you can deal with it using more traditional SQL based tools. For very high volumes, HADOOP or some other highly parallel framework is useful for handing the parsing and structuring processes before loading into an MPP database platform.
The problem with leaving it unstructured and using HADOOP for analytics is the labor required to do it. Essentially a HADOOP process requires writing a bunch of Java code, which usually means IT needs to be heavily involved in coding whatever needs to get done. If you structure it and put it in a relational database, end users can be much more self-sufficient.
When dealing with an MPP platform there are physical modeling considerations (distribution, organization, indexing, etc…) to achieve optimal performance. What you do varies widely depending on the particular database system.
On the modeling side, choosing between a dimensional model or a 3NF model depends a lot on the peculiarities of the hardware you are using. For example, star schema perform very well on a Netezza platform (aka IBM Pure Data) but not so well on a Teradata platform.
There is the size distinction, which is primarily a hardware and physical modeling issue rather than a logical modeling challenge. There there is the structured versus unstructured data distinction.
When it comes to unstructured data its a matter of deciding if you want to structure it or leave it unstructured. Frankly, for analysis, I tend to lean towards implementing structure to the data so you can deal with it using more traditional SQL based tools. For very high volumes, HADOOP or some other highly parallel framework is useful for handing the parsing and structuring processes before loading into an MPP database platform.
The problem with leaving it unstructured and using HADOOP for analytics is the labor required to do it. Essentially a HADOOP process requires writing a bunch of Java code, which usually means IT needs to be heavily involved in coding whatever needs to get done. If you structure it and put it in a relational database, end users can be much more self-sufficient.
When dealing with an MPP platform there are physical modeling considerations (distribution, organization, indexing, etc…) to achieve optimal performance. What you do varies widely depending on the particular database system.
On the modeling side, choosing between a dimensional model or a 3NF model depends a lot on the peculiarities of the hardware you are using. For example, star schema perform very well on a Netezza platform (aka IBM Pure Data) but not so well on a Teradata platform.
Re: Big data vs rdbms
Nowadays, more tools are coming in to the picture to helps end user to use Hadoop without additional effort on creating Java codes, for example Hive, Presto, Tajo, etc. However, for low latency query, the performance is not as good as RDBMS. If you have a huge number of data and will do massive processing on it, Hadoop will be more appropriate.
rendybjunior- Posts : 7
Join date : 2014-09-30
Similar topics
» Is it a best practice that Data warehouse follows the source system data type?
» Looking for a Data Architect/Data Modeler for NYC Big Data Startup
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
» Looking for a Data Architect/Data Modeler for NYC Big Data Startup
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|