Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Turns a Corner and Sees the Future

11,791 views

Published on

Published in: Technology

Hadoop Turns a Corner and Sees the Future

  1. 1. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity." Merv Adrian Research Vice President, Information Management Twitter: @merv Blogs.gartner.com/merv-adrian Hadoop — Entering Phase Two?
  2. 2. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. NEXUS Nexus of Forces Drives Innovation Extreme Networking Pervasive Access Global-Class Delivery "Big," Rich Context
  3. 3. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Cameras and microphones widely deployed New routes to market via intelligent objects Content and services via connected products Everything has a URL Remote sensing of objects and environment Augmented reality Situational decision support Building and infrastructure management Over 50% of Internet connections are things: 2011: 15+ billion permanent, 50+ billion intermittent 2020: 30+ billion permanent, >200 billion intermittent Audio GPRS Wi-Fi NFC Higher-resolution display LTE Flash
  4. 4. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner Definition of Big Data: High-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Gartner Research Circle 2013 Big Data Survey 687Respondents Worldwide $3.2BMean Company Size 5,100 Mean Employees 60%Mainstream Adopters 18%Focused on Running/Maintaining
  5. 5. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Are They Investing? 30% Have 31% No plans at this time 19% Plan to within the next year 15% Plan to within two years 5% Don't know
  6. 6. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. How Does That Compare to Last Year? Note — Survey base increased from 473 in 2012 to 687 in 2013 27 15 16 11 30 19 15 31 5 Have invested Within next year Within two years No plans Don't know 20132012 0 10 20 30 40
  7. 7. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Things Are Done Differently in Silicon Valley … Traditional IM • Requirements based • Top-down design • Integration and reuse • Technology consolidation • World of DW and ECM • Competence centers • Better decisions • Commercial software "Big Data" Style • Opportunity oriented • Bottom-up experimentation • Immediate use • Tool proliferation • "World of Hadoop" • Hackathons • Better business • Open source
  8. 8. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Introducing: The Open-Source Car!
  9. 9. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Apache Hadoop is a set of standard open-source software projects that provide a framework for using massive amounts of data across a distributed network The standards steward — Apache Software Foundation — manages and distributes many typical components of "Hadoop" platform Many distributions exist — Built and/or marketed by pure-play specialists or major vendors and they include additional open-source and commercial components
  10. 10. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Apache Hadoop is a set of standard open source software projects that provide a framework for using massive amounts of data across a distributed network The standards steward — Apache Software Foundation — manages and distributes many typical components of "Hadoop" platform Many distributions exist — Built and/or marketed by pure play specialists or major vendors and they include additional open source and commercial components
  11. 11. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Clients Ask: Which Projects Are "Hadoop"? • Minimum set (from Apache website): - Apache HDFS - Apache MapReduce - Apache Yarn • Other independent Apache projects: Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, ZooKeeper - The virtuous circle of open-source community • Apache Hadoop is version 1.0. Version 2.0, including Yarn, is alpha.
  12. 12. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Rich, Complex Set of Functional Choices Ingest/Propagate Persist Describe, Develop Monitor, Administer Analytics, Machine Learning Compute, Search
  13. 13. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Ingest/Propagate Apache Flume, Apache Kafka, Apache Sqoop, HDFS NFS, Informatica HParser, DBMS vendor utilities, Talend, WebHDFS Import data into HDFS (or alternatives) • Commercial DBMS, DI or OSS • "Big data" ≠ Hadoop — import is not always required − MapReduce inside DBMSs, HPCC, SAS, Splunk, others Export data into RDBMS (or alternatives) • NoSQL DBMS supported, or offer integration • On same cluster (HBase), even same nodes (Hadapt)
  14. 14. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Also included here: "intercept-based" data remediation Develop refers to coding functions, as in Pig, for execution elsewhere, such as MapReduce Metadata (Hive, Hcatalog) describes for other stack components and external ones; e.g., DI and BI tools Describe, Develop Apache Crunch, Apache Hive, Apache Pig, Apache Tika, Cascading, Cloudera Hue, DataFu, Dataguise, IBM Jaql
  15. 15. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Runtime execution for programs created to run against HDFS or HBase data With Apache Hadoop 2.0, MapReduce will begin to lose its exclusivity in "the basic stack" with Yarn support MapReduce was first, but others have emerged as additions/ alternatives/supplements Compute, Search Apache Blur, Apache Drill, Apache Giraph, Apache Hama, Apache Lucene, Apache MapReduce, Apache Solr, Cloudera Impala, HP HAVEn, IBM BigSQL, IBM InfoSphere Streams, HStreaming, Pivotal HAWQ, SQLstream, Storm, Teradata SQL-H
  16. 16. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. File system: Append only, access methods at OS level Database: Collected and structured to facilitate storage, retrieval, modification, and deletion in online, not only batch, mode Serialized: Format that can be stored in a database, eliminating byte ordering, adding metadata Persist File System: Apache HDFS, IBM GPFS, Lustre, MapR Data Platform Serialization: Apache Avro, RCFile (and ORCFile), SequenceFile, Text, Trevni DBMS: Apache Accumulo, Apache Cassandra, Apache HBase, Google Dremel, Hadapt, HP Vertica, IBM DB2, Kognitio, Oracle, Oracle MySQL, RainStor, Teradata Aster, Teradata, others
  17. 17. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. System health and administration Cloud configuration and connection to resources Virtualization and resource management Job management and orchestration Monitor, Administer Apache Ambari, Apache Chukwa, Apache Falcon, Apache Oozie, Apache Whirr, Apache ZooKeeper, Cloudera Manager, Ganglia, Nagios, Pivotal Serengeti
  18. 18. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Analytics, Machine Learning Apache Drill, Apache Hive, Apache Mahout, Datameer, IBM Big Sheets, IBM BigSQL, Karmasphere, Microsoft Excel, Platfora, Revolution Analytics RHadoop, SAS, Skytree This is where the future is — it's not just "a part of the stack" but why it exists Machine learning, advanced statistical analysis, scenario modeling "BI for Hadoop": Statistical libraries for use in programs, spreadsheets, reporting, visualization tools
  19. 19. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Go Ahead — Pick the Pieces You Need Ingest/Propagate Persist Describe, Develop Monitor, Administer Analytics, Machine Learning Compute, Search
  20. 20. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Distribution Vendors Sort It Out for You Megavendors: Amazon, EMC Pivotal, IBM, Intel Megapartners: Dell, HP, NetApp, Microsoft, Oracle, Teradata Leading pure plays: Cloudera, Hortonworks, MapR Others: Datastax, LucidWorks, RainStor, Sqrrl, WANdisco, Zettaset
  21. 21. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Hadoop's Great Leap Forward Hadoop has moved to the next stage with Apache Hadoop 2.0. • Mainstream vendors are all interested, contributing and adding value • Skills development is ramping rapidly From To Single-stack Yarn-based multistyle environment, supporting multiple engines Batch-only, file-based stack Interactive capabilities with multiple optional databases SQL translation with Hive "SQL in front of Hadoop": Cloudera Impala, IBM Big SQL, Pivotal Hawq, Platfora, others Relatively unmanaged Ambari-based beginnings of real management
  22. 22. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. What's Next? Search Advanced prebuilt analytic functions Cluster, appliance or cloud? Virtualization Graph processing
  23. 23. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. What's Still Needed? Security Data Warehousing Tools Governance Distributed Optimization Subproject Optimization Skills
  24. 24. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. By 2015, big data demand will reach 4.4 million jobs worldwide, but only one-third of those jobs will be filled. 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 Americas EMEA APJ Education Wholesale Trade Healthcare Providers Transportation Utilities Retail Insurance Communications, Media & Services Government Banking & Securities Manufacturing & Natural Resources
  25. 25. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Recommendations  Audit your data — find "dark data" and map it to business opportunities to identify pilot projects  Familiarize yourself with the capabilities of available Hadoop distributions  Build skills and recruit within the organization from early experimenters for a data science lab  Consider cloud pilots to minimize capital expenditure
  26. 26. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Thank you! http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/

×