Hadoop Now, Next & Beyond

3,122 views

Published on

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,122
On SlideShare
0
From Embeds
0
Number of Embeds
190
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Hadoop Now, Next & Beyond

  1. 1. Hadoop Now, Next & BeyondEric Baldeschwieler, Hortonworks CTOJune 13, 2012
  2. 2. Hadoop Summit Is BIG! 10x growth 2012 Summit in 5 years! 2200+ people 2011 Summit 1600+ people2008: First Summit 200+ people
  3. 3. Timeline: Apache Hadoop 1.0 & 2.0 1.0: The most stable release 0.20.1 DEV QA beta 3 years of stabilization & key featuresHADOOP 1.0 DEV QA beta 0.20.2 Security DEV QA beta 0.20.1xx Security, MR multi tenancy 0.20.2xx DEV QA beta Hadoop 1.0 Append GA 1.0 DEV QA beta New AppendHADOOP 2.0 DEV 0.21 Security DEV QA 0.22 Federation, YARN Hadoop 2.0 0.23 DEV QA alpha HA, Wire Compatibility alpha QA DEV beta 2.0: Next-gen MapReduce & HDFS 2.0 Exciting community innovations under development 2008 2009 2010 2011 2012
  4. 4. Hadoop 1.0 Key Features• Flush / Sync for HBase – 1.0 is the first Apache Hadoop release to support HBase – This work began in 0.18 in 2008! – Benefit: Interactive apps – Web site personalization• Security – Strong authentication via Kerberos – Benefit: Audit compliance, multi-tenancy• MapReduce limits – Solve whack-a-mole like bad user job problem – Benefit: Reliability, multi-tenancy
  5. 5. Hadoop 2.0 Innovations•  Focus on Scale and Community Innovation –  YARN and Federation designed to support 10,000+ computer clusters•  YARN: Scalable, Pluggable Execution Frameworks –  Improves MapReduce performance –  Will support community development of new frameworks –  Near real-time, Machine learning & Analytics use cases•  Federation: Scalable, Pluggable Storage –  Isolation via multiple volumes / Name Nodes –  Shared block pool w/ pluggable volume managers•  Always On: No Cluster Downtime –  Wire compatible APIs (protobufs) –  HDFS hot standby HA –  Rolling upgrades –  Log & checkpoint management
  6. 6. Balancing Innovation & Stability INNOVATION STABILITYSource: The above graphic based on concepts from Geoffrey Moore’s book – Crossing the Chasm
  7. 7. Hortonworks Data Platform (HDP) 1.0
  8. 8. HDP 1.0 Highlights1 Pure Apache Hadoop 1.0 code line, 100% open source2 Open source Management & Monitoring via Ambari3 Common Metadata Services via HCatalog4 Enterprise Data Integration with Talend Open Studio5 Multi-tenant Protections via Capacity Scheduler6 Full Stack HA via proven 3rd party products
  9. 9. Management & Monitoring Services -> Ambari•  Powerful monitoring and alerting dashboards –  View topology, health & utilization of cluster –  Detailed view of cluster operations, server & storage utilization, job status, and performance levels –  Get alerts to critical events•  Simple installation & provisioning –  Easy configuration process –  One-click deployment for clusters of all sizes –  Analyzes/recommends optimal services configuration –  Automatically configures mount points in the cluster
  10. 10. Full Stack High Availability Proven HA solutions with proven Hadoop 1.0 Failover and restart for •  NameNode •  JobTrackerHA Cluster •  Other services to come… Open API allows use of Proven HA from multiple vendors Minimized changes to clients and configuration Auto-detects failures: •  Services, OS & Hardware HA Cluster Complementary to 2.0 HA efforts
  11. 11. The Road Ahead• Ambari – REST APIs & general hardening – Integrations w/ enterprise & cloud management solutions• HCatalog – ODBC / JDBC, security, relaxed schemas (AVRO, JSON…) – More REST APIs and Integrations with 3rd party data stores• Full Stack HA – Continued work with virtualization & operating system vendors• Native Windows support – Integrations with broader Windows ecosystem of systems/tools
  12. 12. Welcome to the Hadoop Summit! Enjoy Help the grow ecosystem!

×