Tuesday, October 6, 2015

Kudu - the end of MapReduce?


Kudu is going to fill in the gaps in Hadoop's storage layer and be almost as good as HDFS at what HDFS is good at (high speed writes and scans) and, at the same time, almost as good as HBase at what HBase does best (random access queries).  Although it's a long way from being enterprise ready it's clear that Kudu can avoid architectures that we have implemented in several customers :-
  • Persistence of data in HDFS and
  • A subset of data in HBase for real time access / analytics / GIS services etc.
Not only is this architecture expensive in Hardware and Professional services but it introduces high levels of complexity that  a lot of customers are uneasy about. So - when Kudu is ready - it will spell the end of HDFS and HBase for many customers.
So now it becomes clear that the architectural goal is to replace HDFS and HBase with Kudu and, as we have all ready heard about before now - replace MapReduce with Spark.
So then you have the target architecture 2-3  years out - Kudu and Spark replacing HDFS/MapReduce/HBase. As I and others have written in previous posts this will then allow for full real-time streaming analytics and services on massively scaleable clusters. It is this architecture that will lead to an explosion in IoT use cases.
Now - one last point - in all this change there are some Dinosaurs out there - yes EMC I am looking at you, and you HP and IBM and Teradata and even NetApp. These businesses are in flat or declining markets (take a look at this Forbes article for more detail http://www.forbes.com/sites/greatspeculations/2015/01/02/how-emc-lines-up-against-netapp-hp-ibm-hitachi-in-storage-systems-market/ ). As Kudu gains traction these older vendors with old style technology will become less and less relevant. EMC has been reinventing itself for a while but it will be interesting to see how the decline or even disappearance of Enterprise storage will reshape the landscape.
Maria Deutscher also makes some great points about Spark and the Hadoop Ecosystem over at SiliconANGLE  http://siliconangle.com/blog/2015/09/28/apache-kudu-how-cloudera-wants-to-save-hadoop-by-killing-it/ 

No comments:

Post a Comment

Have comments, please post here or send to info@exceleratesystems.net