Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Upcoming SlideShare
Loading in...5
×
 

Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

on

  • 1,630 views

This is the presentation from the "Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS" webinar on May 28, 2014. Rohit Bahkshi, a senior product manager at Hortonworks, and Vinod Vavilapalli, PMC for ...

This is the presentation from the "Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS" webinar on May 28, 2014. Rohit Bahkshi, a senior product manager at Hortonworks, and Vinod Vavilapalli, PMC for Apache Hadoop, discuss an overview of YARN in HDFS and new features in HDP 2.1. Those new features include: HDFS extended ACLs, HTTPs wire encryption, HDFS DataNode caching, resource manager high availability, application timeline server, and capacity scheduler pre-emption.

Statistics

Views

Total Views
1,630
Views on SlideShare
1,611
Embed Views
19

Actions

Likes
14
Downloads
154
Comments
0

5 Embeds 19

http://www.slideee.com 9
http://192.168.6.56 5
http://stxaviershighschoolpatna.dschool.co 3
http://localhost 1
http://dschool.co 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS Presentation Transcript

  • Page 1 © Hortonworks Inc. 2014 Discover HDP 2.1 Apache Hadoop 2.4.0, YARN & HDFS Hortonworks. We do Hadoop.
  • Page 2 © Hortonworks Inc. 2014 Speakers Justin Sears Hortonworks Product Marketing Manager Rohit Bakhshi Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform Vinod Vavilapalli Foundational Hadoop Architect, Hortonworks Engineer, PMC for Apache Hadoop & Leads YARN Development at Hortonworks
  • Page 3 © Hortonworks Inc. 2014 Agenda •  Overview of YARN in HDFS •  New YARN & HDFS Features in HDP 2.1 •  Q & A
  • Page 4 © Hortonworks Inc. 2014 OPERATIONS  TOOLS   Provision, Manage & Monitor DEV  &  DATA  TOOLS   Build & Test A Modern Data ArchitectureAPPLICATIONS  DATA    SYSTEM   REPOSITORIES   RDBMS   EDW   MPP   Business     Analy<cs   Custom   Applica<ons   Packaged   Applica<ons   Governance &Integration ENTERPRISE HADOOP Security Operations Data Access Data Management SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data  
  • Page 5 © Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform     Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   YARN  :  Data  Opera<ng  System   DATA    MANAGEMENT   DATA    ACCESS   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox  
  • Page 6 © Hortonworks Inc. 2014 HDP 2.1: Data Management HDP 2.1 Hortonworks Data Platform Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   DATA    ACCESS   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox       YARN  :  Data  Opera<ng  System   DATA    MANAGEMENT   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)  
  • Page 7 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • Page 8 © Hortonworks Inc. 2014 Apache Hadoop YARN and HDFS Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads The Data Operating System for Hadoop 2.0 Data  Processing  Engines  Run  Na<vely  IN  Hadoop   BATCH   MapReduce   INTERACTIVE   Tez   STREAMING   Storm   IN-­‐MEMORY   Spark   GRAPH   Giraph   SAS   LASR,  HPA   ONLINE   HBase,  Accumulo     OTHERS     HDFS:  Redundant,  Reliable  Storage   YARN:  Cluster  Resource  Management      
  • Page 9 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • Page 10 © Hortonworks Inc. 2014 HDP 2.1 HDFS: What’s New HDFS  Extended  ACLs   •  Provides  granular  access  control  to  datasets  in  HDFS   Security   THEME   HTTPs  Wire  Encryp<on     •  swebhdfs:  HTTPs support for WebHDFS •  HTTPs support for Hadoop WebUI Security   THEME   HDFS  DataNode  Caching   •  Enhanced  read  performance  via  in  memory  caching  of  files   Performance   THEME  
  • Page 11 © Hortonworks Inc. 2014 HDFS Coordinated DataNode Caching •  In memory cache for HDFS file - enhanced read performance •  Identify files to be cached through centralized management controls •  Manage caching through pools and directives
  • Page 12 © Hortonworks Inc. 2014 HDP 2.1 YARN: What’s New Resource  Manager  High  Availability   •  No  service  disrupCon  in  YARN   Reliability   THEME   Applica<on  Timeline  Server   •  Operational monitoring across all YARN applications Monitoring   THEME   Capacity  Scheduler  Pre-­‐emp<on   •  Enforce  SLAs  across  applicaCons  and  organizaCons   Scheduling   THEME  
  • Page 13 © Hortonworks Inc. 2014 YARN Resource Manager (RM) HA Automated failover HDP detects and reacts to Resource Manager host & process failures Active/Standby Standby ResourceManager with access to shared state store Fencing Protection against Split Brain Full stack resiliency - Entire HDP Stack certified with ResourceManager HA - RM Restart enables application recovery Integrated into HDP stack - No external HA Frameworks - No external storage needed
  • Page 14 © Hortonworks Inc. 2014 Client Standby RM Active RM ZooKeeper Service Cluster Monitor and try to take active lock Monitor and maintain active lock Store State YARN RM HA: Architecture NodeManager NodeManager NodeManager
  • Page 15 © Hortonworks Inc. 2014 Application Timeline Server Entity and Event collection Applications of all types can create entities and send events Pluggable store Depending on site requirements REST APIs Applications and user-interfaces can access information via REST Visualizations Users can build tools and visualizations using the APIs Users and Admins Applications as well as the system entities/ events
  • Page 16 © Hortonworks Inc. 2014 Application Timeline Server App  Timeline   Server   AMBARI   Custom   App   Monitoring   Client  
  • Page 17 © Hortonworks Inc. 2014 Capacity Scheduler Preemption •  Enforce SLAs •  Preempt across queues 1.  Current Capacity 2.  Guaranteed Capacity 3.  Pending Requests Gather     Queue     State   STEP  1   1.  Figure out what is needed to achieve capacity balance 2.  Select applications to preempt: Over cap. Qs and FIFO order 3.  Respect bounds on amount of preemption allowed for each round Iden<fy  set  of   preemp<ons   STEP  2   1.  Remove reservations from the most recently assigned app 2.  Issue preemptions for containers of same app (reverse chronological order, last assigned container first) 3.  App Master pre-emption is last resort. Preempt   applica<on(s)   STEP  3   1.  Track containers that have been issued by not yet executed preemption 2.  After a set of execution periods, forcibly kill these containers Kill  containers   STEP  4  
  • Page 18 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • Page 19 © Hortonworks Inc. 2014 Learn More About the Hadoop Operating System Hortonworks.com/labs/yarn/ Register for the remaining 3 Discover HDP 2.1 Webinars Hortonworks.com/ webinars Next Webinar: Apache Solr for Hadoop Search Thursday, June 12, 10am Pacific