Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
5 posters
Page 1 of 1
Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
We all know that a well designed and architected data warehouse is easy for business users to navigate, contains integrated data across all aspects of the enterprise, holds clean and logically consistent data, and performs well for the exploratory data analysis the business executives need to base their forward-looking strategies on a sound data foundation.
But creating this well designed and architected data warehouse takes talent. There are a lot of charlatans in this space, who claim to have data warehouse and dimensional design expertise, but who really don't have a clue. As a result, we still see many failed, or less than successful DW / BI efforts. Has the emergence of Hadoop signaled the pathway to a less technically challenging, and lower operating cost solution for integation of enterprise data, that can ultimately replace the Architected Data Warehouse?
Will the data warehouse of the future have a totally different structure? Will we see data sources simply dumped into Hadoop clusters and then cross-reference bridge tables populated to achieve cross-system integration? Then the massive parallelism of the Hadoop environment would provide the ability to achieve acceptable performance for exploratory data analysis.
This seems to be too simplistic to me. But this is the message that I am getting from the Hadoop crowd.
Thoughts?
But creating this well designed and architected data warehouse takes talent. There are a lot of charlatans in this space, who claim to have data warehouse and dimensional design expertise, but who really don't have a clue. As a result, we still see many failed, or less than successful DW / BI efforts. Has the emergence of Hadoop signaled the pathway to a less technically challenging, and lower operating cost solution for integation of enterprise data, that can ultimately replace the Architected Data Warehouse?
Will the data warehouse of the future have a totally different structure? Will we see data sources simply dumped into Hadoop clusters and then cross-reference bridge tables populated to achieve cross-system integration? Then the massive parallelism of the Hadoop environment would provide the ability to achieve acceptable performance for exploratory data analysis.
This seems to be too simplistic to me. But this is the message that I am getting from the Hadoop crowd.
Thoughts?
MilesWis- Posts : 3
Join date : 2015-03-30
Age : 76
Location : Milwaukee, Wisconsin, USA
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
Hadoop replace traditional dimensionally modeled DW? No Way.
What hadoop can offer is a place to fail cheaply and to prove cheaply what does/doesn't work with each data set/business process. Using Hadoop and a proving ground and then only bringing proven solutions into the data warehouse will significantly decrease the DW failure rates.
What hadoop can offer is a place to fail cheaply and to prove cheaply what does/doesn't work with each data set/business process. Using Hadoop and a proving ground and then only bringing proven solutions into the data warehouse will significantly decrease the DW failure rates.
TheNJDevil- Posts : 68
Join date : 2011-03-01
But why not?
Is there something about the Hadoop environment or architecture that makes it inherently unsuitable for the type of exploratory data analysis that typically informs business strategy? Yes, it comes from an unstructured data background, but the advocates of Hadoop are asserting that it performs effectively for structured and metric data as well. If that is the case, can it lower the expertise level demanded of the technical design and development staff?
MilesWis- Posts : 3
Join date : 2015-03-30
Age : 76
Location : Milwaukee, Wisconsin, USA
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
Hadoop is not a replacement for a traditional data warehouse.
1. Hadoop is very inefficient for processing structured data. It needs to repeatedly pack and unpack structured data before it can process it. It handles fault tolerance through redundant processing, effectively utilizing 25-33% of the available capacity to do actual work. it is not as cost efficient as proponents would want you to believe.
2. Hadoop is still pretty immature. While it has been around for a long time, there isn't much out there that frees you from having to write Java to do much of anything. The Hadoop environment is anything but 'ad-hoc'.
3. 90+% of all data warehouses simply don't have the kind of data volumes Hadoop was intended to handle. And, as storage and processor technologies continue to advance, there isn't any reason to expect that this ratio will change very much. There simply isn't any valid reason or advantage to switch.
4. There are many superior alternatives to Hadoop for storing and querying huge volumes of structured data. And the long-term cost advantage of Hadoop is very small, if none at all.
1. Hadoop is very inefficient for processing structured data. It needs to repeatedly pack and unpack structured data before it can process it. It handles fault tolerance through redundant processing, effectively utilizing 25-33% of the available capacity to do actual work. it is not as cost efficient as proponents would want you to believe.
2. Hadoop is still pretty immature. While it has been around for a long time, there isn't much out there that frees you from having to write Java to do much of anything. The Hadoop environment is anything but 'ad-hoc'.
3. 90+% of all data warehouses simply don't have the kind of data volumes Hadoop was intended to handle. And, as storage and processor technologies continue to advance, there isn't any reason to expect that this ratio will change very much. There simply isn't any valid reason or advantage to switch.
4. There are many superior alternatives to Hadoop for storing and querying huge volumes of structured data. And the long-term cost advantage of Hadoop is very small, if none at all.
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
I'm not as pessimistic as previous commenters.
(1) Every major vendor has a SQL + Hadoop play.
(2) Initiatives like Impala (Cloudera) and Stinger (Hortonworks) create a true SQL engine layer over the top of a Hadoop data store. Coming releases of these products should make SQL-based ETL processing possible.
(3) SQL on Hadoop is an interesting play for cloud-based data warehouses.
(4) Hadoop makes a great place for a persisting load/stage data that doesn't make it through to the Star Schema.
(5) Like it or not, the coming wave of IOT data is likely to hit Hadoop before it gets anywhere near a data warehouse. Hadoop as a data source is already feasible, and will become commonplace in a few short years.
Ralph Kimball did an interesting video for Cloudera, last year:
http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse-video.html
(1) Every major vendor has a SQL + Hadoop play.
(2) Initiatives like Impala (Cloudera) and Stinger (Hortonworks) create a true SQL engine layer over the top of a Hadoop data store. Coming releases of these products should make SQL-based ETL processing possible.
(3) SQL on Hadoop is an interesting play for cloud-based data warehouses.
(4) Hadoop makes a great place for a persisting load/stage data that doesn't make it through to the Star Schema.
(5) Like it or not, the coming wave of IOT data is likely to hit Hadoop before it gets anywhere near a data warehouse. Hadoop as a data source is already feasible, and will become commonplace in a few short years.
Ralph Kimball did an interesting video for Cloudera, last year:
http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse-video.html
Great video by Ralph on Hadoop for the Data Warehouse
Thanks for the link. This is fabulous. It is exactly the sort of thing, from a true industry icon, that I was hoping to find.
MilesWis- Posts : 3
Join date : 2015-03-30
Age : 76
Location : Milwaukee, Wisconsin, USA
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
I was in a class last month taught by Ralph Kimball and he was very clear when he said that anyone that has not included, or is not strongly considering including, Hadoop in their overall BI architecture was being foolish. The slide he had included Hadoop as a pre-warehouse work area that only specialized users could access. He also said that Hadoop will not replace the EDW, only provide a better sandbox to allow analysts (the fabled data scientist) access to create value quickly. Create that value before moving it out of the data scientists project space, into the EDW where the rest of the company can then use the results.
TheNJDevil- Posts : 68
Join date : 2011-03-01
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
I would not say I was being pessimistic.
There is a tendency in our business that whenever anything new comes along the hype cycle begins and whatever the technology of the day is will replace just about everything and anything. It simply doesn't happen.
Hadoop is and always will be a processing technology. Data storage was an afterthought, and none of it particularly well suited for a data warehouse. The basic storage methods focus on the 'F' word… files. It is about as old school as you can get.
Hadoop has a place, but it is a complementary technology that can supplement a data warehouse, not replace it.
There is a tendency in our business that whenever anything new comes along the hype cycle begins and whatever the technology of the day is will replace just about everything and anything. It simply doesn't happen.
Hadoop is and always will be a processing technology. Data storage was an afterthought, and none of it particularly well suited for a data warehouse. The basic storage methods focus on the 'F' word… files. It is about as old school as you can get.
Hadoop has a place, but it is a complementary technology that can supplement a data warehouse, not replace it.
Re: Thoughts on the potential of Hadoop to replace the Architected Data Warehouse
When you start seeing Wall Street financial reports coming out of Hadoop clusters, you might start to worry about your EDW based on relational databases. Until then, the threat of jail time for CXO's due to SOX compliance violations in financial reporting will keep the traditional warehouse in charge. Close just isn't good enough for publicly traded companies.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
» Data warehouse / data retention strategy - ERP upgrade and consolidation
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
» Data warehouse / data retention strategy - ERP upgrade and consolidation
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum