Crossing the Chasm

Uploaded on

Sanjay Radia, Founder and Architect at Hortonworks, talks about Apache Hadoop and it's uptake in the industry.

Sanjay Radia, Founder and Architect at Hortonworks, talks about Apache Hadoop and it's uptake in the industry.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Sanjay Radia –been on Hadoop project at Yahoo for last 4 yearsThese 4 years have been a blastVery proud to be a Y employee - Yahoo’s leadership has made Hadoop the success it is today(Forward thinking in Spinning out Hortonworks.)I am very exited to be part of the team at Hortonworks and continuing to grow Apache Hadoop as an open-source project
  • Early customers at YahooA key idea was to provide Hadoop as a service – otherwise, adoption would not have been so rapid.Customer starts focusing his application on hadoop rather than figure out how to get the capex and convince IT to deploy hadoop
  • Some early decisions, why and how well they workedChoices/tradeoffs based on immediate customer needs, time, resources, operational needsGoal:Gain acceptance, evolve and grow customer baseFundamental architectural improvements that cuts across various issuesConclude with the community effect
  • Stronger security driven by multiple tenants in shared clustersSecurity will be critical for new enterprise users especially those with financial data10 Person year effort by completely by Yahoo! and a major milestone for Apache Hadoop
  • Early customers happy with a new platform that let them solve problems they could not solve otherwiseBut many new customers …Unused capacity – esp effective if applications peaks at different timesMix production and non production jobsFairness not a goal –customers are guaranteed capacity they have paid forNN under federation also provides isolation
  • scaling is enough to meet needs of very largecustomers – like FB and Yother customers should be more than okay Before you need more namespace, it will be therea NN that stores only partial namespace in memory
  • Data – can I read what I wrote, is the service availableWhen I asked one of the original authors of of GFS if there were any decisions they would revist – random writersSimplicity is keyRaw disk – fs take time to stabilize – we can take advantage of ext4, xfs or zfs
  • Pipeline – useful for slow appender
  • Certification once ready for releaseHDFS (over 92% coverage)balancer, block replication, corruption, fsck, security,. Cmd lines , viewfs, faults and injection, checkpointing, Federation testing – ready to move to release testingMRDist cache, capacity scheduler, speculative exec, block listing, decommissioning, ui testingFailures and injectionsScale testing – 4 TT per node.Performance– sort, scan, compression, gridmixv3, slive, ..HIT integration whole stack (hdfs, MR, Oozie, Pig, Hive, HCat,) Sandbox – Customer application validation - release vote for 23.0Research:Larger variety of jobs and load than production clusters
  • So far I have been explaining how we have been addressing specific issuesBut we have made fundamental change to the architecture to both HDFS and MRFundamental changes that cuts across issues
  • HDFS – the change was fairly simpleMR – redesign – allocation of compute resources a general layer separate Scheduler from App/Job Manager
  • New Append, Security, Federation, – YahooRaid, High tide, FBHBase improvements – FB & Cloudera
  • The fundamental design improvements will accelerate the rate of improvementIf there is feature that customers need – it will be provided quickly


  • 1. Crossing the ChasmHadoop for the Enterprise
    Sanjay Radia – Hortonworks Founder & Architect
    Formerly Hadoop Architect @ Yahoo!
    4 Years @ Yahoo!
    @srr (@hortonworks)
    © Hortonworks Inc. 2011
    June 29, 2011
  • 2. Crossing the Chasm
    Geoffrey A. Moore
    Apache Hadoop grew rapidly charting new territories in features, abstractions, APIs, scale, fault tolerance, multi-tenancy, operations …
    Small number of early customers who needed a new platform
    Provide Hadoop as a service to make adoption easy
    Dramatic growth in adoption and customer base
    Growth of Hadoop stack and applications
    New requirements and expectations
  • 3. Crossing the Chasm: Overview
    How the Chasm is being crossed
    SLAs & Predictability
    Availability & Data Integrity
    Backward Compatibility
    Quality & Testing
    Fundamental architectural improvements
    Federation &
    Adapt to changing, sometime unforeseen, needs
    Fuel innovation and rapid development
    The Community Effect
  • 4. Security
    Early Gains
    No authorization or authentication requirements
    Added permissions and passed client-side userid to server (0.16)
    Addresses accidental deletes by another user
    Service Authorization (0.18, 0.20)
    Issues: Stronger Authorization required
    Shared clusters – multiple tenants
    Critical data
    New categories of users (financial)
    SOX compliance
    Our Response
    Authentication using Kerberos (0.20.203)
    10 Person-year effort by Yahoo!
  • 5. SLAs and Predictability
    Issue: Customers uncomfortable with shared clusters
    Customer traditionally plan for peaks with dedicated HW
    Dedicated clusters had poor utilization
    Response: Capacity scheduler (0.20)
    Guaranteed capacities in a multi-tenant shared cluster
    Almost like dedicated hardware
    Each organization given queue(s) with a guaranteed capacity
    controls who is allowed to submit jobs to their queues
    sets the priorities of jobs within their queue
    creates sub-queues (0.21) for finer grain control within their capacity
    Unused capacity given to tasks in other queues
    Better than private cluster –access to unused capacity when in crunch
    Resource limits for tasks – deals with misbehaved apps
    Response: FairShare Scheduler (0.20)
    Focus is fair share of resources, but does have pools
  • 6. Scalability
    Early Gains
    Simple design allowed rapid improvements
    Single master, namespace in RAM, simpler locking
    Cluster size improvements: 1K  2K  4K
    Vertical scaling: Tuned GC + Efficient memory usage
    Archive file system – reduce files and blocks (0.20)
    Current Issues
    Growth of files and storage limited by single NN (0.20)
    Only an issue for very very large clusters
    JobTracker does not scale to beyond 30K tasks – needs redesign
    Our Response
    RW locks in NN (0.22)– complete rewrite of MR servers (JT, TT) - 100K tasks (0.23)
    Federation: horizontal scaling of namespace – billion files (0.23)
    NN that keeps only part of Namespace in memory –trillion files (0.23.x)
  • 7. HDFS Availability & Data Integrity:Early Gains
    Simple design, Java, storage fault tolerance
    Java – saved from pointer errors that lead to data corruption
    Simplicity - subset of Posix – random writers not supported
    Storage: Rely in OS’s file system rather than use raw disk
    Storage Fault Tolerance: multiple replicas, active monitoring
    Single Namenode Master
    Persistent state: multiple copies + checkpoints
    Restart on failure
    How well did it work?
    Lost 650 blocks out of 329 M on 10 clusters with 20K nodes in 2009
    82% abandoned open file (append bug, fixed in 0.21)
    15% files created with single replica (data reliability not needed)
    3% due to roughly 7 bugs that were then fixed (0.21)
    Over the last 18 months 22 failures on 25 clusters
    Only 8 would have benefitted from HA failover!! (0.23 failures per cluster year)
    NN is very robust and can take a lot of abuse
    NN is resilient against overload caused by misbehaving apps
  • 8. HDFS Availability & Data Integrity:Response
    Data Integrity
    Append/flush/sync redesign (0.21)
    Pipeline recruits new replicas rather than just remove them on failures (0.23)
    Improving Availability of NN
    Faster HDFS restarts
    NN bounce in 20 minutes (0.23)
    Federation allows smaller NNs (0.23)
    Federation will significantly improve NN isolation hence availability (0.23)
    Why did we wait this long for HA NN?
    The failure rates did not demand making this a high priority
    Failover requires corner cases to be correctly addressed
    Correct fencing of shared state during failover is critical
    Can lead to corruption of data and reduceavailability!!
    Many factors impact availability, not just failover
  • 9. HDFS Availability & Data Integrity:Response: HA NN
    Active work has started on HA NN (Failover)
    HA NN – Detailed design (HDFS-1623)
    Community effort
    HDFS-1971, 1972, 1973, 1974,1975, 2005, 2064, 1073
    HA: Prototype work
    Backup NN (0.21)
    Avatar NN (Facebook)
    HA NN prototype using Linux HA (Yahoo!)
    HA NN prototype with Backup NN and block report replicator (EBay)
    HA the highest priority for 23.x
  • 10. MapReduce: Fault Tolerance and Availability
    Early Gains: Fault-tolerance of tasks and compute nodes
    Current Issues:Loss of job queue if Job tracker is restarted
    Our Response designed with fault tolerance and availability
    HA Resource Manager (0.23.x)
    Loss of Resource Manager – degraded mode - recover via restart or failover
    Apps continue with their current resources
    App Manager can reschedule with current resources
    New apps cannot submitted or launched,
    New resources cannot be allocated
    Loss of an App Manager - recovers
    App is restarted and state is recovered
    Loss of tasks and nodes - recovers
    Recovered as in old MapReduce
  • 11. Backwards Compatibility
    Early Gains
    Early success stemmed from a philosophy of ship early and often, resulting in changing APIs.
    Data and metadata compatibility always maintained
    The early customers paid the price
    current customers reap benefits of more mature interfaces
    Increased adoption leads to increased expectations of backwards compatibility
  • 12. Backward Compatibility:Response
    Interface classification - audience and stability tags (0.21)
    Patterned on enterprise-quality software process
    Evolve interfaces but maintain backward compatible
    Added newer forward looking interfaces - old interface maintained
    Test for compatibility
    Run old jars of automation tests, Real Yahoo applications
    Applications adopting higher abstractions (Pig, Hive)
    Insulates from lower primitive interfaces
    Wire compatibility (Hadoop-7347)
    Maintain compatibility with current protocol (java serialization)
    Adapters for addressing future discontinuity
    e.g. serializationor protocol change
    Moved to ProtocolBuf for data transfer protocol
  • 13. Testing & Quality
    Nightly Testing
    Against 1200 automated tests on 30 nodes
    Against live data and live applications
    QE Certification for Release
    Large variety and scale tests on 500 nodes
    Performance benchmarking
    QE HIT integration testing of whole stack
    Release Testing
    Sandbox cluster – 3 clusters each with 400 -1K nodes
    Major releases: 2 months testing on actual data - all production projects must sign off
    Research clusters – 6 Clusters (non-revenue production jobs) (4K Nodes)
    Major releases – minimum 2 months before moving to production
    .25Million to .5Million jobs per week
    if it clears research then mostly fine in fine in production
    Production clusters - 11 clusters (4.5K nodes)
    Revenue generating, stricter SLAs
  • 14. Fundamental Architecture Changes that cut across several issues
    Job Manager
    Resource Scheduler
    Storage Resources
    Compute Resources
    HDFS storage: mostly a separate layer – but one customer: one NN
    Federation generalizes the layer
    MapReduce – compute resource scheduling tightly coupled to MapReduce job management separates the layers
  • 15. HBase
    Fundamental Architecture Changes that cut across several issues
    Resource Scheduler
    Alternate NN Implementation
    MR App with Different version MR lib
    Storage Resources
    MR tmp
    MPI App
    MR App
    Compute Resources
    Scalability, Isolation, Availability
    Generic lower layer: first class support new applications on top
    MR tmp, HBase, MPI,
    Layering facilitates faster development of new work
    NN that caches Namespace – a few months of work
    New implementations of MR App manager
    Compatibility: Support multiple versions of MR
    Tenants upgrade at their own pace – crucial for shared clusters
  • 16. The Community Effect
    Some projects are done entirely by teams at Yahoo!, FB or Cloudera
    But several projects are joint work
    Yahoo & FB on NN scalability and concurrency esp in face of misbehaved apps
    Edits log v2 and refactoring edits log (Cloudera and Yahoo!/Hortonworks)
    HDFS-1073, 2003, 1557, 1926
    NN HA – Yahoo!/Hortonworks, Cloudera, FB, EBay
    HDFS-1623, 1971, 1972, 1973, 1974, ,1975, 2005
    Features to support HBase: FB, Cloudera, Yahoo, and the HBase community
    Expect to see rapid improvements in the very near future
    Further Scalability - NN that cache part of namespace
    Improved IO Performance - DN performance improvements
    Wire Compatibility - Wire protocols, operational improvements,
    New App Managers for
    Continued improvement of management and operability
  • 17. Hadoop is Successfully Crossing the Chasm
    Hadoop used in enterprises for revenue generating applications
    Apache Hadoop is improving at a rapid rate
    Addressing many issues including HA
    Fundamental design improvements to fuel innovation
    The might of a large growing developer community
    Battle tested on large clusters and variety of applications
    At Yahoo!, Facebook and the many other Hadoop customers.
    Data integrity has been a focus from the early days
    A level of testing that even the large commercial vendors cannot match!
    Can you trust your data to anything less?
  • 18. Q & A
    Hortonworks @ Hadoop Summit
    1:45pm: Next Generation Apache Hadoop MapReduce
    Community track by Arun Murthy
    2:15pm: Introducing HCatalog (Hadoop Table Manager)
    Community track by Alan Gates
    4:00pm: Large Scale Math with Hadoop MapReduce
    Applications and Research Track by Tsz-Wo Sze
    4:30pm: HDFS Federation and Other Features
    Community track by Suresh Srinivas and Sanjay Radia
  • 19. About Hortonworks
    Mission: Revolutionize and commoditize the storage and processing of big data via open source
    Vision: Half of the world’s data will be stored in Apache Hadoop within five years
    Strategy: Drive advancements that make Apache Hadoop projects more consumable for the community, enterprises and ecosystem
    Make Apache Hadoop easy to install, manage and use
    Improve Apache Hadoop performance and availability
    Make Apache Hadoop easy to integrate and extend
    © Hortonworks Inc. 2011
  • 20. Thank You.
    © Hortonworks Inc. 2011