Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop and other animals

1,186 views

Published on

Hadoop and other animals

Published in: Technology

Hadoop and other animals

  1. 1. Hadoop and other animals Matthew Aslett, Research Director
  2. 2. Copyright (C) 2016 451 Research LLC 451 Research is a leading IT research & advisory company 2 Founded in 2000 250+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 50,000+ IT professionals, business users and consumers in our research community Over 52 million data points published each quarter and 4,500+ reports published each year 2,000+ technology & service providers under coverage 451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia Research & Data Advisory Events Go 2 Market
  3. 3. Copyright (C) 2016 451 Research LLC 3 A combination of research & data is delivered across fifteen channels aligned to the prevailing topics and technologies of digital infrastructure… from the datacenter core to the mobile edge.
  4. 4. Copyright (C) 2016 451 Research LLC The future of Hadoop How the meaning of ‘Hadoop’ has evolved over time and how it will continue to do so. Comparison of Hadoop distributions. Converging data platforms – the future evolution of ‘Hadoop’. Beyond the zoo – focus on use-cases rather than projects. 4
  5. 5. Copyright (C) 2016 451 Research LLC Hadoop (disambiguation) What do we mean by ‘Hadoop’? In the beginning ‘Hadoop’ referred to the Hadoop Distributed File System, Hadoop MapReduce, and the Hadoop Common set of utilities. Since then, ‘Hadoop’ has evolved to become a catch-all brand for a wider distributed data-processing ecosystem that encompasses a mix of data processing and storage capabilities. 5
  6. 6. Copyright (C) 2016 451 Research LLC Hadoop (disambiguation) “There’s two things that people mean by Hadoop.” • The Apache Hadoop project • The set of technologies built around it 6 Doug Cutting (and Hadoop)Hadoop
  7. 7. Copyright (C) 2016 451 Research LLC Hadoop and other animals 7
  8. 8. Copyright (C) 2016 451 Research LLC The Hadoop ecosystem 8
  9. 9. Copyright (C) 2016 451 Research LLC The Table of Hadoop elements 9 MAPREDUCE M 1 HDFS H 2 YARN Y 26 ASF projects in more than one Hadoop distribution CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT HIVE Hi 8 HBASE Hb 3 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 ZOOKEEPER Z 5 MAHOUT Ma 6 KAFKA K 19 WHIRR W 11 AMBARI Am 22 OOZIE O 20 IMPALA Im 59 TEZ Te 28 KNOX Kn 27 SENTRY Se 32 STORM St 33 DATAFU Da 39 PARQUET Pa 40 SLIDER Sl 38 HUE Hu 13 SOLR So 0
  10. 10. Copyright (C) 2016 451 Research LLC The Table of Hadoop elements 10 Other Hadoop-related ASF projects SAMZA Sa 31 GIRAPH G 21 HAMA Ha 7 ACCUMULO Ac 23 FLINK Fl 37 TINKERPOP Ti 47 APEX Ap 53 S2GRAPH Sg 58 BEAM Be 63 CASSANDRA C 9 GEODE Ge 50 TRAFODION Tr 53 BIGTOP B 18 MRUNIT Mu 15 TWILL Tw 34 RANGER R 42 METRON Me 62 EAGLE Ea 57 AVRO A 10 CALCITE Ca 41 ATLAS At 51 RYA Ry 56 KUDU Ku 61 ARROW Ar 64 CRUNCH Cr 24 FALCON Fa 29 CHUKWA Ch 12 MYRIAD My 49 MADLIB Md 55 SYSTEMML Sm 59 HAWQ Hq 54 ZEPPELIN Z 46 KYLIN K 44 MRQL Mq 36 TAJO T 14 DRILL D 25 PHOENIX Ph 35 IGNITE I 43 ASTERIXDB As 48 CLOUDERA MANAGER AMAZON S3 EMC ISILON IBM BIG SQL MAPR-FS MANAGEMENT CORE PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT NON-ASF And non-ASF Hadoop products/projects NIFI N 45
  11. 11. Copyright (C) 2016 451 Research LLC Combining Hadoop elements 11 AMAZON S3 IMPALA Im 59 So 0 SPARK Sp 30 KAFKA K 19 STORM St 33 HDFS H 2 YARN Y 26 SOLR So 0 ZOOKEEPER Z 5 MAPREDUCE M 1 HDFS H 2 YARN Y 26 FLUME F 16 = “Hadoop” = “Hadoop” = “Hadoop” MAPREDUCE M 1 HDFS H 2 YARN Y 26 HIVE Hi 8 TEZ Te 28 PIG P 4 = “Hadoop”? And if not Hadoop – then what?
  12. 12. Copyright (C) 2016 451 Research LLC Hadoop (disambiguation) “If people stop using MapReduce and HDFS we’ll let them disappear, we’re not religious about that.” 12 Doug Cutting (and Hadoop)Hadoop
  13. 13. Copyright (C) 2016 451 Research LLC Hadoop (disambiguation) “As long as it’s open source we can bring it in to this platform.” 13 Doug Cutting (and Hadoop)Hadoop
  14. 14. Copyright (C) 2016 451 Research LLC Hadoop distributions 14 Hortonworks Data Platform (HDP) 2009 2010 2011 2014 20152012 2013 2016 Cloudera’s Distribution including Apache Hadoop Cloudera Distribution for Hadoop Cloudera CDH Yahoo! Distribution of Hadoop IBM InfoSphere BigInsights Basic Edition IBM Distribution of Apache Hadoop IBM Open Platform with Apache Hadoop Greenplum HD Greenplum HD Community Edition Pivotal HD Greenplum MR Greenplum HD Enterprise Edition MapR Distribution including Hadoop MapR Distribution for Apache Hadoop Intel Distribution for Apache Hadoop WANdisco Distro Teradata Open Distribution for Hadoop Apache Hadoop for MapR CDP Pivotal HDP
  15. 15. Copyright (C) 2016 451 Research LLC Hadoop distributions 15 Hortonworks Data Platform (HDP) 2009 2010 2011 2014 20152012 2013 2016 Cloudera’s Distribution including Apache Hadoop Cloudera Distribution for Hadoop Cloudera CDH IBM InfoSphere BigInsights Basic Edition IBM Distribution of Apache Hadoop IBM Open Platform with Apache Hadoop
  16. 16. Copyright (C) 2016 451 Research LLC Comparing Hadoop distributions 16 MAPREDUCE M 1 HDFS H 2 YARN Y 26 HIVE Hi 8 HBASE Hb 3 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 ZOOKEEPER Z 5 KAFKA K 19 AMBARI Am 22 OOZIE O 20 TEZ Te 28 KNOX Kn 27 STORM St 33 SLIDER Sl 38 HUE Hu 13 CASCADING ACCUMULO Ac 23 CALCITE Ca 41 ATLAS At 50 RANGER R 42 FALCON Fa 29 PHOENIX Ph 35 SOLR So 0 MAHOUT Ma 6 DATAFU Da 39 CLOUDBREAK Hortonworks Data Platform
  17. 17. Copyright (C) 2016 451 Research LLC Comparing Hadoop distributions 17 MAPREDUCE M 1 HDFS H 2 YARN Y 26 HIVE Hi 8 HBASE Hb 3 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 ZOOKEEPER Z 5 KAFKA K 19 AMBARI Am 22 OOZIE O 20 KNOX Kn 27 SLIDER Sl 38 SOLR So 0 DATAFU Da 39 IBM Open Platform - 17 projects in common with HDP PARQUET Pa 40
  18. 18. Copyright (C) 2016 451 Research LLC IMPALA Im 59 Comparing Hadoop distributions 18 MAPREDUCE M 1 HDFS H 2 YARN Y 26 HIVE Hi 8 HBASE Hb 3 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 ZOOKEEPER Z 5 OOZIE O 20 HUE Hu 13 SOLR So 0 MAHOUT Ma 6 DATAFU Da 39 Cloudera CDH - 15 projects in common with HDP PARQUET Pa 40 WHIRR W 11 SENTRY Se 32 AVRO A 10 CRUNCH Cr 24 CLOUDERA SEARCH LLAMAKITE Plus others supported on Cloudera Enterprise (e.g. Kafka)
  19. 19. Copyright (C) 2016 451 Research LLC IMPALA Im 59 Hadoop bifurcation 19 Hortonworks HDP - 11 ASF projects not in CDH PARQUET Pa 40 WHIRR W 11 SENTRY Se 32 AVRO A 10 CRUNCH Cr 24 Cloudera CDH – 6 ASF projects not in HDP TEZ Te 28 STORM St 33 ACCUMULO Ac 23 AMBARI Am 22 FALCON Fa 29 KAFKA K 19 KNOX Kn 27 SLIDER Sl 38 CALCITE Ca 41 ATLAS At 50 RANGER R 42 CLOUDERA DIRECTOR CLOUDERA NAVIGATOR CLOUDERA MANAGER CASCADING CLOUDBREAK CLOUDERA SEARCH LLAMAKITE
  20. 20. Copyright (C) 2016 451 Research LLC RELATIONAL OPERATIONAL DATABASE NOSQL DATABASE DISTRIBUTED GRID/CACHE ANALYTIC DATABASE STREAM PROCESSING CONTAINERIZATION HADOOP 26 Converging data platforms
  21. 21. Copyright (C) 2016 451 Research LLC 21 HADOOP HDFS ANALYTIC DBMS MAPREDUCE STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO SPARK DOCUMENT DATABASE HBASE OPERATIONAL DBMS Converging data platforms
  22. 22. Copyright (C) 2016 451 Research LLC 22 HADOOP HDFS ANALYTIC DBMS MAPREDUCE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL SPARK NUODB* VOLTDB OPERATIONAL DBMS COCKROACHDBCLUSTRIX DOCUMENT DATABASE DISTRIBUTED KEY VALUE STORE GRAPH DATABASE/ENGINE NEO4J TITANAPACHE GIRAPHSTARDOG MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB OBJECTROCKET INFINITEGRAPH HBASE STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO
  23. 23. Copyright (C) 2016 451 Research LLC 23 HADOOP HDFS ANALYTIC DBMS MAPREDUCE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL SPARK NUODB* VOLTDB COCKROACHDBCLUSTRIX DOCUMENT DATABASE DISTRIBUTED KEY VALUE STORE GRAPH DATABASE/ENGINE NEO4J TITANAPACHE GIRAPHSTARDOG MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB OBJECTROCKET INFINITEGRAPH HBASE STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO OPERATIONAL DBMS
  24. 24. Copyright (C) 2016 451 Research LLC 24 HADOOP HDFS ANALYTIC DBMS MAPREDUCE FEDERATED QUERY PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY GRID/MICROSOFT POLYBASE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL NUODB* VOLTDB OPERATIONAL DBMS COCKROACHDBCLUSTRIX DOCUMENT DATABASE DISTRIBUTED KEY VALUE STORE GRAPH DATABASE/ENGINE NEO4J TITANAPACHE GIRAPHSTARDOG MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB OBJECTROCKET INFINITEGRAPH STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO SPARK HBASE
  25. 25. Copyright (C) 2016 451 Research LLC 25 HADOOP HDFS ANALYTIC DBMS MAPREDUCE FEDERATED QUERY PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY GRID/MICROSOFT POLYBASE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL DATASTAX/CASSANDRA* MARKLOGIC ARANGODBORIENTDB MONGODB* RIAK* SQRRL DATA OBJECTROCKET* COUCHBASE* ORCHESTRATEAWS DYNAMODB CLOUDANT LOCAL* AEROSPIKE* MULTI-MODEL DATABASE NUODB* VOLTDB OPERATIONAL DBMS COCKROACHDBCLUSTRIX Multi-model databases support a combination of data models, including (potentially) key value, graph and document *Anticipated functionality STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO SPARK HBASE
  26. 26. Copyright (C) 2016 451 Research LLC 26 HADOOP HDFS ANALYTIC DBMS MAPREDUCE FEDERATED QUERY PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY GRID/MICROSOFT POLYBASE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL DATASTAX/CASSANDRA* MARKLOGIC ARANGODBORIENTDB MONGODB* RIAK* SQRRL DATA OBJECTROCKET* SPARK COUCHBASE* ORCHESTRATEAWS DYNAMODB CLOUDANT LOCAL* AEROSPIKE* NEO4J REDIS COUCHDB MULTI-MODEL DATABASE NUODB* VOLTDB OPERATIONAL DBMS COCKROACHDBCLUSTRIX STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO HBASE
  27. 27. Copyright (C) 2016 451 Research LLC 27 HADOOP HDFS ANALYTIC DBMS MAPREDUCE FEDERATED QUERY PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY GRID/MICROSOFT POLYBASE DISTRIBUTED RELATIONAL DATABASE SPLICE MACHINEMEMSQL DATASTAX/CASSANDRA* MARKLOGIC ARANGODBORIENTDB MONGODB* RIAK* SQRRL DATA OBJECTROCKET* COUCHBASE* ORCHESTRATEAWS DYNAMODB CLOUDANT LOCAL* AEROSPIKE* NEO4J REDIS COUCHDB MULTI-MODEL DATABASE NUODB* VOLTDB OPERATIONAL DBMS MESOS/DOCKER/KUBERNETES/ATOMIC etc COCKROACHDBCLUSTRIX STREAMING STORM/ SPARK STREAMING/ DATATORRENT SQL-ON-HADOOP IMPALA/ SPARK SQL/HIVE/ DRILL/PRESTO SPARK HBASE
  28. 28. Copyright (C) 2016 451 Research LLC C________ DATA PLATFORM RELATIONAL OPERATIONAL DATABASE NOSQL DATABASE DISTRIBUTED GRID/CACHE ANALYTIC DATABASE STREAM PROCESSING CONTAINERIZATION HADOOP 27 Converging data platforms
  29. 29. Copyright (C) 2016 451 Research LLC Toward a c________ data platform MapR – Converged Data Platform Hortonworks – Connected Data Platforms Cloudera Data Platform? Chimeric Data Platform? Chimera (greek mythology) • a multi-headed hybrid creature composed of the parts of more than one animal 29 Image source Wikimedia https://commons.wikimedia.org/wiki/File:Chimera_di_Arezzo.jpg
  30. 30. Copyright (C) 2016 451 Research LLC Toward a c________ data platform MapR – Converged Data Platform Hortonworks – Connected Data Platforms Cloudera Data Platform? Chimeric Data Platform? Chimera (greek mythology) • a multi-headed hybrid creature composed of the parts of more than one animal 30
  31. 31. Copyright (C) 2016 451 Research LLC Toward a c________ data platform MapR – Converged Data Platform Hortonworks – Connected Data Platforms Cloudera Data Platform? Chimeric Data Platform? Chimera (Merriam Webster) • “something that exists only in the imagination and is not possible in reality” 31
  32. 32. Copyright (C) 2016 451 Research LLC RELATIONAL OPERATIONAL DATABASE NOSQL DATABASE DISTRIBUTED GRID/CACHE ANALYTIC DATABASE STREAM PROCESSING CONTAINERIZATION HADOOP 29 Toward a chimeric data platform
  33. 33. Copyright (C) 2016 451 Research LLC Rather than projects, focus on use-cases 33 MAPREDUCE M 1 HDFS H 2 YARN Y 26 HIVE Hi 8 HBASE Hb 3 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 ZOOKEEPER Z 5 MAHOUT Ma 6 KAFKA K 19 WHIRR W 11 AMBARI Am 22 OOZIE O 20 IMPALA Im 59 TEZ Te 28 KNOX Kn 27 SENTRY Se 32 STORM St 33 DATAFU Da 39 PARQUET Pa 40 SLIDER Sl 38 HUE Hu 13 CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT SOLR So 0
  34. 34. Copyright (C) 2016 451 Research LLC Rather than projects, focus on use-cases 34 MAPREDUCE M 1 HDFS H 2 YARN Y 26 PIG P 4 FLUME F 16 SQOOP Sq 17 SPARK Sp 30 TEZ Te 28 Hadoop for data engineering CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT
  35. 35. Copyright (C) 2016 451 Research LLC Rather than projects, focus on use-cases 35 MAPREDUCE M 1 HDFS H 2 YARN Y 26CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT SOLR So 0 Hadoop for search FLUME F 16 ZOOKEEPER Z 5
  36. 36. Copyright (C) 2016 451 Research LLC Rather than projects, focus on use-cases 36 MAPREDUCE M 1 HDFS H 2 YARN Y 26 HBASE Hb 3 CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT Hadoop for operational applications PHOENIX Ph 35
  37. 37. Copyright (C) 2016 451 Research LLC Rather than projects, focus on use-cases 37 MAPREDUCE M 1 HDFS H 2 YARN Y 26 SPARK Sp 30 KAFKA K 19 STORM St 33 CORE MANAGEMENT PROCESSING ANALYTICS OTHER SECURITY DATA MANAGEMENT Hadoop for stream processing
  38. 38. Copyright (C) 2016 451 Research LLC DATA SCIENCE RELATIONAL OPERATIONAL DATABASE NOSQL DATABASE DISTRIBUTED GRID/CACHE ANALYTIC DATABASE STREAM PROCESSING CONTAINERIZATION HADOOP 35 Toward a chimeric data platform ANALYTIC OPERATIONAL SEARCH DATA ENGINEERING
  39. 39. Copyright (C) 2016 451 Research LLC Thank You! matthew.aslett@451research.com @maslett www.451research.com

×