SlideShare a Scribd company logo
1 of 41
Download to read offline
Copyright©2014 NTT corp. All Rights Reserved. 
Apache Hadoop-What’s next?- @db tech showcase 2014 
Tsuyoshi Ozawa 
ozawa.tsuyoshi@lab.ntt.co.jp
2 
Copyright©2014 NTT corp. All Rights Reserved. 
•Tsuyoshi Ozawa 
•Researcher & Engineer @ NTTTwitter: @oza_x86_64 
•A Hadoop developer 
•Merged patches –53 patches! 
•Author of “Hadoop 徹底入門2nd Edition” Chapter 22(YARN) 
About me
3 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz!!
4 
Copyright©2014 NTT corp. All Rights Reserved. 
Does Hadoophave SPoF? 
Quiz
5 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz 
All master nodes in Hadoopcan run as highly available mode
6 
Copyright©2014 NTT corp. All Rights Reserved. 
Is Hadooponly for MapReduce? 
Quiz
7 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz 
Hadoop isnot only for MapReducebut also Spark/Tez/Storm and so on…
8 
Copyright©2014 NTT corp. All Rights Reserved. 
•Current Status of Hadoop-New features since Hadoop 2 - 
•HDFS 
•No SPoFwith NamenodeHA + JournalNode 
•Scaling out Namenodewith NamenodeFederation 
•YARN 
•Resource Management with YARN 
•No SPoFwith ResourceManagerHA 
•MapReduce 
•No SPoFwith ApplicationMasterrestart 
•What’s next? -Coming features in 2.6 release - 
•HDFS 
•Heterogeneous Storage 
•Memory as Storage Tier 
•YARN 
•Label-based scheduling 
•RM HA Phase 2 
Agenda
9 
Copyright©2014 NTT corp. All Rights Reserved. 
HDFS IN HADOOP 2
10 
Copyright©2014 NTT corp. All Rights Reserved. 
•Once on a time, NameNodewas SPoF 
•In Hadoop 2, NameNodehasQuorum JournalManager 
•Replication is done by Pasxos-based protocol 
See also: 
http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/ 
NameNode with JournalNode 
NameNode 
QuorumJournalManager 
JournalNode 
JournalNode 
JournalNode 
Local disk 
Local disk 
Local disk
11 
Copyright©2014 NTT corp. All Rights Reserved. 
•Once on a time, scalability of NameNodewas limited to memory 
•In Hadoop 2, NameNodehasFederation feature 
•Distributing metadata per namespace 
NameNode Federation 
Figures from: 
https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop- hdfs/Federation.html
12 
Copyright©2014 NTT corp. All Rights Reserved. 
RESOURCE MANAGEMENT IN HADOOP 2
13 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN 
•Generic resource management framework 
•YARN = Yet Another Resource Negotiator 
•Proposed by ArunC Murthy in 2011 
•Container-level resource management 
•Container is more generic unit of resource than slots 
•Separate JobTracker’srole 
•Job Scheduling/Resource Management/Isolation 
•Task Scheduling 
What’s YARN? 
JobTracker 
MRv1 architecture 
MRv2 and YARN Architecture 
YARN ResourceManager 
Impala Master 
Spark Master 
MRv2 Master 
TaskTracker 
YARN NodeManager 
map slot 
reduce slot 
container 
container 
container
14 
Copyright©2014 NTT corp. All Rights Reserved. 
•Running various processing frameworkson same cluster 
•Batch processing with MapReduce 
•Interactive query with Impala 
•Interactive deep analytics(e.g. Machine Learning) with Spark 
Why YARN?(Use case) 
MRv2/Tez 
YARN 
HDFS 
Impala 
Spark 
Periodic long batch 
query 
Interactive 
Aggregation 
query 
Interactive 
Machine Learning 
query
15 
Copyright©2014 NTT corp. All Rights Reserved. 
•More effective resource management for multiple processing frameworks 
•difficult to use entire resources without thrashing 
•Cannot move *Real* big data from HDFS/S3 
Why YARN?(Technical reason) 
Master for MapReduce 
Master for Impala 
Slave 
Impala slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
Slave 
Slave 
HDFS slave 
Each frameworks has own scheduler 
Job2 
Job1 
Job1 
thrashing
16 
Copyright©2014 NTT corp. All Rights Reserved. 
•Resource is managed by JobTracker 
•Job-level Scheduling 
•Resource Management 
MRv1 Architecture 
Master for MapReduce 
Slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
map slot 
reduce slot 
MapReduce slave 
Master for Impala 
Schedulers only now own resource usages
17 
Copyright©2014 NTT corp. All Rights Reserved. 
•Idea 
•One global resource manager(ResourceManager) 
•Common resource pool for all frameworks(NodeManagerand Container) 
•Schedulers for each frameworks(AppMaster) 
YARN Architecture 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Client 
1. Submit jobs 
2. Launch Master 
3. Launch Slaves
18 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN and Mesos 
YARN 
•AppMasteris launched for each jobs 
•More scalability 
•Higher latency 
•One container per req 
•One Master per Job 
Mesos 
•AppMasteris launched for each app(framework) 
•Less scalability 
•Lower latency 
•Bundle of containers per req 
•One Master per Framework 
ResourceManager 
NM 
NM 
NM 
ResourceMaster 
Slave 
Slave 
Slave 
Master1 
Master2 
Master1 
Master2 
Policy/Philosophy is different
19 
Copyright©2014 NTT corp. All Rights Reserved. 
•MapReduce 
•Of course, it works 
•DAG-style processing framework 
•Spark on YARN 
•Hive on Tezon YARN 
•Interactive Query 
•Impala on YARN(via llama) 
•Users 
•Yahoo! 
•Twitter 
•LinkdedIn 
•Hadoop 2 @ Twitter http://www.slideshare.net/Hadoop_Summit/t- 235p210-cvijayarenuv2 
YARN Eco-system
20 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN COMPONENTS
21 
Copyright©2014 NTT corp. All Rights Reserved. 
•Master Node of YARN 
•Role 
•Accepting requests from 
1.Application Masters for allocating containers 
2.Clients for submitting jobs 
•Managing Cluster Resources 
•Job-level Scheduling 
•Container Management 
•Launching Application-level Master(e.g. for MapReduce) 
ResourceManager(RM) 
ResourceManager 
Client 
Slave 
NodeManager 
Container 
Container 
Master 
4.Container allocationrequests to NodeManager 
1. Submitting Jobs 
2. Launching Master of jobs 
3.Container allocation requests
22 
Copyright©2014 NTT corp. All Rights Reserved. 
•Slave Node of YARN 
•Role 
•Accepting requests from RM 
•Monitoring local machine and report it to RM 
•Health Check 
•Managing local resources 
NodeManager(NM) 
NodeManager 
ResourceManager 
2. Allocating containers 
Clients 
Master 
or 
3. Launching containers 
containers 
4. Containers information(host, port, etc.) 
1. Request containers 
Periodic health check via heartbeat
23 
Copyright©2014 NTT corp. All Rights Reserved. 
•Master of Applications(e.g. Master of MapReduce, Tez, Spark etc.) 
•Run on Containers 
•Roles 
•Getting containers from ResourceManager 
•Application-level Scheduling 
•How much and where Map tasks run? 
•When reduce tasks will be launched? 
ApplicationMaster(AM) 
NodeManager 
Container 
Master of MapReduce 
ResourceManager 
1. Request containers 
2. List of Allocated containers
24 
Copyright©2014 NTT corp. All Rights Reserved. 
RESOURCE MANAGER HA
25 
Copyright©2014 NTT corp. All Rights Reserved. 
•What’s happen when ResourceManagerfails? 
•cannot submit new jobs 
•NOTE: 
•Launched Apps continues to run 
•AppMasterrecover is done in each frameworks 
•MRv2 
ResourceManager High Availability 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Client 
Submit jobs 
Continue to run each jobs
26 
Copyright©2014 NTT corp. All Rights Reserved. 
•Approach 
•Storing RM information to ZooKeeper 
•Automatic Failover by Embedded Elector 
•Manual Failover by RMHAUtils 
•NodeManagersuses local RMProxyto access them 
ResourceManager High Availability 
ResourceManager 
Active 
ResourceManager 
Standby 
ZooKeeper 
ZooKeeper 
ZooKeeper 
2. failure 
3. Embedded 
Detects 
failure 
EmbeddedElector 
EmbeddedElector 
4. Failover 
RMState 
RMState 
RMState 
1.Active Node storesall state into RMStateStore 
3. Standby 
Node become 
active 
5. Load states fromRMStateStore
27 
Copyright©2014 NTT corp. All Rights Reserved. 
CAPACITY PLANNINGON YARN
28 
Copyright©2014 NTT corp. All Rights Reserved. 
•Define resources with XML(etc/hadoop/yarn-site.xml) 
Resource definition on NodeManager 
NodeManager 
CPU 
CPU 
CPU 
CPU 
CPU 
Memory 
Memory 
Memory 
Memory 
Memory 
<property> 
<name>yarn.nodemanager.resource.cpu-vcores</name> 
<value>8</value> 
</property> 
<property> 
<name>yarn.nodemanager.resource.memory-mb</name> 
<value>8192</value> 
</property> 
8 CPU cores 
8 GB memory
29 
Copyright©2014 NTT corp. All Rights Reserved. 
Container allocation on ResourceManager 
•RM accepts container request and send it to NM, but the request can be rewritten 
•Small requests will be rounded up to minimum-allocation-mb 
•Large requests will be rounded down tomaximum-allocation-mb 
<property> 
<name>yarn.scheduler.minimum-allocation-mb</name> 
<value>1024</value> 
</property> 
<property> 
<name>yarn.scheduler.maximum-allocation-mb</name> 
<value>8192</value> 
</property> 
ResourceManager 
Client 
Request 512MB 
NodeManager 
NodeManager 
NodeManager 
Request 1024MB 
Master
30 
Copyright©2014 NTT corp. All Rights Reserved. 
•Define how much MapTasksor ReduceTasksuse resource 
•MapReduce: etc/hadoop/mapred-site.xml 
Container allocation at framework side 
NodeManager 
CPU 
CPU 
CPU 
CPU 
CPU 
Memory 
Memory 
Memory 
Memory 
Memory 
8 CPU cores 
8 GB memory 
<property> 
<name>mapreduce.map.memory.mb</name> 
<value>1024</value> 
</property> 
<property> 
<name>mapreduce.reduce.memory.mb</name> 
<value>4096</value> 
</property> 
Slave 
NodeManager 
Container 
Container 
Master 
Giving us containers 
For map task-1024 MB memory, 
1 CPU core 
Container 
1024MB memory1 core
31 
Copyright©2014 NTT corp. All Rights Reserved. 
WHAT’S NEXT? –HDFS -
32 
Copyright©2014 NTT corp. All Rights Reserved. 
•HDFS-2832, HDFS-5682 
•Handling various storage types in HDFS 
•SSD, memory, disk, and so on. 
•Setting quota per storage types 
•Setting SSD quota on /home/user1 to 10 TB. 
•Setting SSD quota on /home/user2 to 10 TB. 
•(c) Not configuring any SSD quota on the remaining user directories (i.e. leaving it to defaults). 
Heterogeneous Storages for HDFS Phase 2 
<configuration> 
... 
<property> 
<name>dfs.datanode.data.dir</name> 
<value>[DISK]/mnt/sdc2/,[DISK]/mnt/sdd2,[SSD]/mnt/sde2</value> 
</property> 
... 
</configuration>
33 
Copyright©2014 NTT corp. All Rights Reserved. 
•HDFS-5851 
•Introducing obvious “Cache”layer in HDFS 
•DiscardableDistributed Memory 
•Applications can accelerate their speedsby using memory 
•DiscardableMemory and Materialized Queries is one of examples 
•Difference between RDD and DDM 
•Multi-tenancy aware 
•Handling data in processing layer or in Storage layer 
Support memory as a storage medium
34 
Copyright©2014 NTT corp. All Rights Reserved. 
•Archival storage 
•HDFS-6584 
•Transparent encryption 
•HDFS-6134 
And, more!
35 
Copyright©2014 NTT corp. All Rights Reserved. 
WHAT’S NEXT? –YARN -
36 
Copyright©2014 NTT corp. All Rights Reserved. 
•Non-stop YARN updating(YARN-666) 
•NodeManger, ResourceManager, Applications 
•Before 2.6.0 
•Restarting RM -> RM restarts all AMs -> restart all jobs 
•Restarting NMs -> NMs are removed from cluster-> Containers are restarted! 
•After 2.6.0 
•Restarting RM -> AMs continue run 
•Restarting NM -> NMs restore the state from local data 
Support for rolling upgrades in YARN 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave
37 
Copyright©2014 NTT corp. All Rights Reserved. 
•Now we can run various subsystems on YARN 
•Interactive query engines : Spark, Impala, … 
•Batch processing engines : MapReduce, Tez, … 
•Problem 
•Interactive query engines allocates resources at the same time –it can delay daily batch. 
•Time-based reservation scheduling 
•8:00am –6:00pm, allocating resources for Impala 
•6:00pm –0:00am, allocating resources for MapReduce 
YARN reservation-subsystem 
Allocation for Interactive query engine 
Batch processing for 
The next day! 
8:00am 
6:00pm 
0:00am
38 
Copyright©2014 NTT corp. All Rights Reserved. 
•YARN-796 
•Handling heterogeneous machinesin one YARN cluster 
•GPU cluster 
•High memory cluster 
•40Gbps Network cluster 
•Labeling them and scheduling based on labels 
•Admin can add/remove labels via yarn rmadmincommands 
Support for admin-specified labels in YARN 
NodeManager 
NodeManager 
NodeManager 
NodeManager 
GPU 
NodeManager 
NodeManager 
NodeManager 
NodeManager 
40Gnetwork 
ResourceManager 
Client 
Submit jobs 
On GPU!
39 
Copyright©2014 NTT corp. All Rights Reserved. 
•Timeline service security 
•YARN-1935 
•Minimal support for running long-running services on YARN 
•YARN-896 
•Support for automatic, shared cache for YARN application artifacts 
•YARN-1492 
•And, and more! 
•Please check Wiki http://wiki.apache.org/hadoop/Roadmap 
And, more!
40 
Copyright©2014 NTT corp. All Rights Reserved. 
•Hadoop 2 is evolving rapidly 
•I appreciate if you can catch up via this presentaion! 
•New components from V2 
•HDFS 
•Quorum Journal Manager 
•NamenodeFederation 
•ResourceManager 
•NodeManager 
•Application Master 
•New features in 2.6: 
•Discardablememory store on HDFS, and so on. 
•Rolling update, labels for heterogeneous cluster on YARN, Reservation system, and so on… 
•Questions or Feedbacks -> user@hadoop.apache.org 
•Issue -> https://issues.apache.org/jira/browse/{HDFS,YARN,HADOOP, MAPREDUCE} 
Summary
41 
Copyright©2014 NTT corp. All Rights Reserved. 
•YARN-666 
•https://www.youtube.com/watch?v=O4Q73e2ua9Y&feature=youtu.be 
•http://www.slideshare.net/Hadoop_Summit/ t-145p230avavilapalli-mac

More Related Content

What's hot

Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshootingmapr-academy
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn Yahoo Developer Network
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 
Ambari Meetup: NameNode HA
Ambari Meetup: NameNode HAAmbari Meetup: NameNode HA
Ambari Meetup: NameNode HAHortonworks
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system adminTrieu Dao Minh
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 

What's hot (19)

MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
U rpm-v2
U rpm-v2U rpm-v2
U rpm-v2
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
52 nfs
52 nfs52 nfs
52 nfs
 
Ambari Meetup: NameNode HA
Ambari Meetup: NameNode HAAmbari Meetup: NameNode HA
Ambari Meetup: NameNode HA
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system admin
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 

Similar to [db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPOmkar Joshi
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnDataWorks Summit
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012hitesh1892
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextDataWorks Summit
 

Similar to [db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史 (20)

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Yarn
YarnYarn
Yarn
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 

More from Insight Technology, Inc.

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?Insight Technology, Inc.
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Insight Technology, Inc.
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明するInsight Technology, Inc.
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーンInsight Technology, Inc.
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとInsight Technology, Inc.
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?Insight Technology, Inc.
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームInsight Technology, Inc.
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門Insight Technology, Inc.
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 Insight Technology, Inc.
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也Insight Technology, Inc.
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー Insight Technology, Inc.
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?Insight Technology, Inc.
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Insight Technology, Inc.
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?Insight Technology, Inc.
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...Insight Technology, Inc.
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 Insight Technology, Inc.
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Insight Technology, Inc.
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]Insight Technology, Inc.
 

More from Insight Technology, Inc. (20)

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
Docker and the Oracle Database
Docker and the Oracle DatabaseDocker and the Oracle Database
Docker and the Oracle Database
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごと
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォーム
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門
 
Lunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL ServicesLunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL Services
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史

  • 1. Copyright©2014 NTT corp. All Rights Reserved. Apache Hadoop-What’s next?- @db tech showcase 2014 Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp
  • 2. 2 Copyright©2014 NTT corp. All Rights Reserved. •Tsuyoshi Ozawa •Researcher & Engineer @ NTTTwitter: @oza_x86_64 •A Hadoop developer •Merged patches –53 patches! •Author of “Hadoop 徹底入門2nd Edition” Chapter 22(YARN) About me
  • 3. 3 Copyright©2014 NTT corp. All Rights Reserved. Quiz!!
  • 4. 4 Copyright©2014 NTT corp. All Rights Reserved. Does Hadoophave SPoF? Quiz
  • 5. 5 Copyright©2014 NTT corp. All Rights Reserved. Quiz All master nodes in Hadoopcan run as highly available mode
  • 6. 6 Copyright©2014 NTT corp. All Rights Reserved. Is Hadooponly for MapReduce? Quiz
  • 7. 7 Copyright©2014 NTT corp. All Rights Reserved. Quiz Hadoop isnot only for MapReducebut also Spark/Tez/Storm and so on…
  • 8. 8 Copyright©2014 NTT corp. All Rights Reserved. •Current Status of Hadoop-New features since Hadoop 2 - •HDFS •No SPoFwith NamenodeHA + JournalNode •Scaling out Namenodewith NamenodeFederation •YARN •Resource Management with YARN •No SPoFwith ResourceManagerHA •MapReduce •No SPoFwith ApplicationMasterrestart •What’s next? -Coming features in 2.6 release - •HDFS •Heterogeneous Storage •Memory as Storage Tier •YARN •Label-based scheduling •RM HA Phase 2 Agenda
  • 9. 9 Copyright©2014 NTT corp. All Rights Reserved. HDFS IN HADOOP 2
  • 10. 10 Copyright©2014 NTT corp. All Rights Reserved. •Once on a time, NameNodewas SPoF •In Hadoop 2, NameNodehasQuorum JournalManager •Replication is done by Pasxos-based protocol See also: http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/ NameNode with JournalNode NameNode QuorumJournalManager JournalNode JournalNode JournalNode Local disk Local disk Local disk
  • 11. 11 Copyright©2014 NTT corp. All Rights Reserved. •Once on a time, scalability of NameNodewas limited to memory •In Hadoop 2, NameNodehasFederation feature •Distributing metadata per namespace NameNode Federation Figures from: https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop- hdfs/Federation.html
  • 12. 12 Copyright©2014 NTT corp. All Rights Reserved. RESOURCE MANAGEMENT IN HADOOP 2
  • 13. 13 Copyright©2014 NTT corp. All Rights Reserved. YARN •Generic resource management framework •YARN = Yet Another Resource Negotiator •Proposed by ArunC Murthy in 2011 •Container-level resource management •Container is more generic unit of resource than slots •Separate JobTracker’srole •Job Scheduling/Resource Management/Isolation •Task Scheduling What’s YARN? JobTracker MRv1 architecture MRv2 and YARN Architecture YARN ResourceManager Impala Master Spark Master MRv2 Master TaskTracker YARN NodeManager map slot reduce slot container container container
  • 14. 14 Copyright©2014 NTT corp. All Rights Reserved. •Running various processing frameworkson same cluster •Batch processing with MapReduce •Interactive query with Impala •Interactive deep analytics(e.g. Machine Learning) with Spark Why YARN?(Use case) MRv2/Tez YARN HDFS Impala Spark Periodic long batch query Interactive Aggregation query Interactive Machine Learning query
  • 15. 15 Copyright©2014 NTT corp. All Rights Reserved. •More effective resource management for multiple processing frameworks •difficult to use entire resources without thrashing •Cannot move *Real* big data from HDFS/S3 Why YARN?(Technical reason) Master for MapReduce Master for Impala Slave Impala slave map slot reduce slot MapReduce slave Slave Slave Slave HDFS slave Each frameworks has own scheduler Job2 Job1 Job1 thrashing
  • 16. 16 Copyright©2014 NTT corp. All Rights Reserved. •Resource is managed by JobTracker •Job-level Scheduling •Resource Management MRv1 Architecture Master for MapReduce Slave map slot reduce slot MapReduce slave Slave map slot reduce slot MapReduce slave Slave map slot reduce slot MapReduce slave Master for Impala Schedulers only now own resource usages
  • 17. 17 Copyright©2014 NTT corp. All Rights Reserved. •Idea •One global resource manager(ResourceManager) •Common resource pool for all frameworks(NodeManagerand Container) •Schedulers for each frameworks(AppMaster) YARN Architecture ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave Client 1. Submit jobs 2. Launch Master 3. Launch Slaves
  • 18. 18 Copyright©2014 NTT corp. All Rights Reserved. YARN and Mesos YARN •AppMasteris launched for each jobs •More scalability •Higher latency •One container per req •One Master per Job Mesos •AppMasteris launched for each app(framework) •Less scalability •Lower latency •Bundle of containers per req •One Master per Framework ResourceManager NM NM NM ResourceMaster Slave Slave Slave Master1 Master2 Master1 Master2 Policy/Philosophy is different
  • 19. 19 Copyright©2014 NTT corp. All Rights Reserved. •MapReduce •Of course, it works •DAG-style processing framework •Spark on YARN •Hive on Tezon YARN •Interactive Query •Impala on YARN(via llama) •Users •Yahoo! •Twitter •LinkdedIn •Hadoop 2 @ Twitter http://www.slideshare.net/Hadoop_Summit/t- 235p210-cvijayarenuv2 YARN Eco-system
  • 20. 20 Copyright©2014 NTT corp. All Rights Reserved. YARN COMPONENTS
  • 21. 21 Copyright©2014 NTT corp. All Rights Reserved. •Master Node of YARN •Role •Accepting requests from 1.Application Masters for allocating containers 2.Clients for submitting jobs •Managing Cluster Resources •Job-level Scheduling •Container Management •Launching Application-level Master(e.g. for MapReduce) ResourceManager(RM) ResourceManager Client Slave NodeManager Container Container Master 4.Container allocationrequests to NodeManager 1. Submitting Jobs 2. Launching Master of jobs 3.Container allocation requests
  • 22. 22 Copyright©2014 NTT corp. All Rights Reserved. •Slave Node of YARN •Role •Accepting requests from RM •Monitoring local machine and report it to RM •Health Check •Managing local resources NodeManager(NM) NodeManager ResourceManager 2. Allocating containers Clients Master or 3. Launching containers containers 4. Containers information(host, port, etc.) 1. Request containers Periodic health check via heartbeat
  • 23. 23 Copyright©2014 NTT corp. All Rights Reserved. •Master of Applications(e.g. Master of MapReduce, Tez, Spark etc.) •Run on Containers •Roles •Getting containers from ResourceManager •Application-level Scheduling •How much and where Map tasks run? •When reduce tasks will be launched? ApplicationMaster(AM) NodeManager Container Master of MapReduce ResourceManager 1. Request containers 2. List of Allocated containers
  • 24. 24 Copyright©2014 NTT corp. All Rights Reserved. RESOURCE MANAGER HA
  • 25. 25 Copyright©2014 NTT corp. All Rights Reserved. •What’s happen when ResourceManagerfails? •cannot submit new jobs •NOTE: •Launched Apps continues to run •AppMasterrecover is done in each frameworks •MRv2 ResourceManager High Availability ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave Client Submit jobs Continue to run each jobs
  • 26. 26 Copyright©2014 NTT corp. All Rights Reserved. •Approach •Storing RM information to ZooKeeper •Automatic Failover by Embedded Elector •Manual Failover by RMHAUtils •NodeManagersuses local RMProxyto access them ResourceManager High Availability ResourceManager Active ResourceManager Standby ZooKeeper ZooKeeper ZooKeeper 2. failure 3. Embedded Detects failure EmbeddedElector EmbeddedElector 4. Failover RMState RMState RMState 1.Active Node storesall state into RMStateStore 3. Standby Node become active 5. Load states fromRMStateStore
  • 27. 27 Copyright©2014 NTT corp. All Rights Reserved. CAPACITY PLANNINGON YARN
  • 28. 28 Copyright©2014 NTT corp. All Rights Reserved. •Define resources with XML(etc/hadoop/yarn-site.xml) Resource definition on NodeManager NodeManager CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>8</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> </property> 8 CPU cores 8 GB memory
  • 29. 29 Copyright©2014 NTT corp. All Rights Reserved. Container allocation on ResourceManager •RM accepts container request and send it to NM, but the request can be rewritten •Small requests will be rounded up to minimum-allocation-mb •Large requests will be rounded down tomaximum-allocation-mb <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> ResourceManager Client Request 512MB NodeManager NodeManager NodeManager Request 1024MB Master
  • 30. 30 Copyright©2014 NTT corp. All Rights Reserved. •Define how much MapTasksor ReduceTasksuse resource •MapReduce: etc/hadoop/mapred-site.xml Container allocation at framework side NodeManager CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory 8 CPU cores 8 GB memory <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>4096</value> </property> Slave NodeManager Container Container Master Giving us containers For map task-1024 MB memory, 1 CPU core Container 1024MB memory1 core
  • 31. 31 Copyright©2014 NTT corp. All Rights Reserved. WHAT’S NEXT? –HDFS -
  • 32. 32 Copyright©2014 NTT corp. All Rights Reserved. •HDFS-2832, HDFS-5682 •Handling various storage types in HDFS •SSD, memory, disk, and so on. •Setting quota per storage types •Setting SSD quota on /home/user1 to 10 TB. •Setting SSD quota on /home/user2 to 10 TB. •(c) Not configuring any SSD quota on the remaining user directories (i.e. leaving it to defaults). Heterogeneous Storages for HDFS Phase 2 <configuration> ... <property> <name>dfs.datanode.data.dir</name> <value>[DISK]/mnt/sdc2/,[DISK]/mnt/sdd2,[SSD]/mnt/sde2</value> </property> ... </configuration>
  • 33. 33 Copyright©2014 NTT corp. All Rights Reserved. •HDFS-5851 •Introducing obvious “Cache”layer in HDFS •DiscardableDistributed Memory •Applications can accelerate their speedsby using memory •DiscardableMemory and Materialized Queries is one of examples •Difference between RDD and DDM •Multi-tenancy aware •Handling data in processing layer or in Storage layer Support memory as a storage medium
  • 34. 34 Copyright©2014 NTT corp. All Rights Reserved. •Archival storage •HDFS-6584 •Transparent encryption •HDFS-6134 And, more!
  • 35. 35 Copyright©2014 NTT corp. All Rights Reserved. WHAT’S NEXT? –YARN -
  • 36. 36 Copyright©2014 NTT corp. All Rights Reserved. •Non-stop YARN updating(YARN-666) •NodeManger, ResourceManager, Applications •Before 2.6.0 •Restarting RM -> RM restarts all AMs -> restart all jobs •Restarting NMs -> NMs are removed from cluster-> Containers are restarted! •After 2.6.0 •Restarting RM -> AMs continue run •Restarting NM -> NMs restore the state from local data Support for rolling upgrades in YARN ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave
  • 37. 37 Copyright©2014 NTT corp. All Rights Reserved. •Now we can run various subsystems on YARN •Interactive query engines : Spark, Impala, … •Batch processing engines : MapReduce, Tez, … •Problem •Interactive query engines allocates resources at the same time –it can delay daily batch. •Time-based reservation scheduling •8:00am –6:00pm, allocating resources for Impala •6:00pm –0:00am, allocating resources for MapReduce YARN reservation-subsystem Allocation for Interactive query engine Batch processing for The next day! 8:00am 6:00pm 0:00am
  • 38. 38 Copyright©2014 NTT corp. All Rights Reserved. •YARN-796 •Handling heterogeneous machinesin one YARN cluster •GPU cluster •High memory cluster •40Gbps Network cluster •Labeling them and scheduling based on labels •Admin can add/remove labels via yarn rmadmincommands Support for admin-specified labels in YARN NodeManager NodeManager NodeManager NodeManager GPU NodeManager NodeManager NodeManager NodeManager 40Gnetwork ResourceManager Client Submit jobs On GPU!
  • 39. 39 Copyright©2014 NTT corp. All Rights Reserved. •Timeline service security •YARN-1935 •Minimal support for running long-running services on YARN •YARN-896 •Support for automatic, shared cache for YARN application artifacts •YARN-1492 •And, and more! •Please check Wiki http://wiki.apache.org/hadoop/Roadmap And, more!
  • 40. 40 Copyright©2014 NTT corp. All Rights Reserved. •Hadoop 2 is evolving rapidly •I appreciate if you can catch up via this presentaion! •New components from V2 •HDFS •Quorum Journal Manager •NamenodeFederation •ResourceManager •NodeManager •Application Master •New features in 2.6: •Discardablememory store on HDFS, and so on. •Rolling update, labels for heterogeneous cluster on YARN, Reservation system, and so on… •Questions or Feedbacks -> user@hadoop.apache.org •Issue -> https://issues.apache.org/jira/browse/{HDFS,YARN,HADOOP, MAPREDUCE} Summary
  • 41. 41 Copyright©2014 NTT corp. All Rights Reserved. •YARN-666 •https://www.youtube.com/watch?v=O4Q73e2ua9Y&feature=youtu.be •http://www.slideshare.net/Hadoop_Summit/ t-145p230avavilapalli-mac