Integrating Hadoop environment with a DW environment
2 posters
Page 1 of 1
Integrating Hadoop environment with a DW environment
I want to know the different ways that exists in order to integrate a DW and Hadoop environment.
I mean the typical DW environment: PowerCenter + Teradata + Microstrategy
My first option to integrate a Hadoop cluster with the DW is using Hadoop as a first stage. I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured). The part of my source data that is structured I put via ETL into RDBMS Teradata.
This structured data is the same that I was integrating in the DW, before to add Hadoop to my evironment.
The big data information that I can not treat in my DW will keep in Hadoop. I process and analyze it in Hadoop clusters.
After processing, aggergting and consolidating some big data information will be tranferred to DW in order to enrich it.
My question is:
Is there another possible and usefull arquitecture to integrate typical DW and Hadoop cluster?, I suppose so, any advice will be greatly appreciate.
Thanks in advance
Juan
I mean the typical DW environment: PowerCenter + Teradata + Microstrategy
My first option to integrate a Hadoop cluster with the DW is using Hadoop as a first stage. I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured). The part of my source data that is structured I put via ETL into RDBMS Teradata.
This structured data is the same that I was integrating in the DW, before to add Hadoop to my evironment.
The big data information that I can not treat in my DW will keep in Hadoop. I process and analyze it in Hadoop clusters.
After processing, aggergting and consolidating some big data information will be tranferred to DW in order to enrich it.
My question is:
Is there another possible and usefull arquitecture to integrate typical DW and Hadoop cluster?, I suppose so, any advice will be greatly appreciate.
Thanks in advance
Juan
juanvg1972- Posts : 25
Join date : 2015-05-05
Re: Integrating Hadoop environment with a DW environment
Well... yeah, there are other ways to do this.
I do not recommend you throw away existing process for the sole purpose of using Hadoop. As in: "I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured).". If you have an existing DW and already have processes in place, why move them to Hadoop?
Each platform provides significant advantage depending on the nature of the data. Structured data performs extremely well in a traditional SQL environment, while unstructured, code driven, processing is very performant in a Hadoop environment.
Both are important components and should be leveraged for their strengths.
For example, you may have Hadoop based analytics that identifies consumer preferences from tweets. This analysis is going to produce structured information which should then be integrated into the relational data warehouse for further use. The original data can be retained on Hadoop or discarded as desired. It would not go to the relational DW.
I do not recommend you throw away existing process for the sole purpose of using Hadoop. As in: "I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured).". If you have an existing DW and already have processes in place, why move them to Hadoop?
Each platform provides significant advantage depending on the nature of the data. Structured data performs extremely well in a traditional SQL environment, while unstructured, code driven, processing is very performant in a Hadoop environment.
Both are important components and should be leveraged for their strengths.
For example, you may have Hadoop based analytics that identifies consumer preferences from tweets. This analysis is going to produce structured information which should then be integrated into the relational data warehouse for further use. The original data can be retained on Hadoop or discarded as desired. It would not go to the relational DW.
DW and Hadoop
Thanks Galemmo,
I don't mean throw away existing process, My ETL process of my DW remain the same, but the starting point of rawdata is Hadoop clusters instead of a normal server. I don't change my DW process, only the starting point. This way; I can have all my rawdata in Hadoop.
I understand your idea. One question....¿what is code driven data?
Thanks in advance,
I don't mean throw away existing process, My ETL process of my DW remain the same, but the starting point of rawdata is Hadoop clusters instead of a normal server. I don't change my DW process, only the starting point. This way; I can have all my rawdata in Hadoop.
I understand your idea. One question....¿what is code driven data?
Thanks in advance,
juanvg1972- Posts : 25
Join date : 2015-05-05
Re: Integrating Hadoop environment with a DW environment
Well, if you are using the map/reduce model, you are coding map & reduce classes in Java to do the work you need to do. The logic and complexity of that work is too much for a typical relational environment. It is the need to construct these classes is why I refer to it as 'code driven'. Hadoop is basically a framework to allow you to execute objects of these classes in a massively parallel environment.
Similar topics
» Integrating Survey Data
» Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
» Normalization in DWH environment
» Surrogate/Business Key in ODS Environment
» Estimating ROI / Predicting Benefits of a DW Implementation In a Utility Environment
» Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
» Normalization in DWH environment
» Surrogate/Business Key in ODS Environment
» Estimating ROI / Predicting Benefits of a DW Implementation In a Utility Environment
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum