SlideShare a Scribd company logo
2016/10/27
1
Kai Fukazawa, Yahoo Japan Corporation
Network for the Large-scale
Hadoop cluster at Yahoo! JAPAN
Agenda
2
Hadoop and Related Network
Yahoo! JAPAN’s Hadoop Network Transition
Network Related Problems and Solutions
 Network Related Problems
 Network Requirements of The Latest Cluster
 Adopted IP CLOS Network for Solving Problems
Yahoo! JAPAN’s IP CLOS Network
 Architecture
 Performance Tests
 New Problems
Future Plan
Hadoop and Related Network
Hadoop and Related Network
4
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
Hadoop and Related Network
5
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
“HDFS Architecture“. Apache Hadoop.
http://hadoop.apache.org/docs/current/hadoop-project-
dist/hadoop-hdfs/HdfsDesign.html. (10/06/2016).
“Google I/O 2011: App Engine MapReduce”. (05/11/2011).
Retrieved https://www.youtube.com/watch?v=EIxelKcyCC0.
(10/06/2016).
Hadoop and Related Network
6
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
North/South
Hadoop and Related Network
7
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
East/West
Hadoop and Related Network
8
 Hadoop has various communication events
 Heartbeat
 Reports (Job/Block/Resource)
 Block Data Transfer
High
Low
Hadoop and Related Network
9
“Introduction to Facebook‘s data center fabric”. (11/14/2014). Retrieved
https://www.youtube.com/watch?v=mLEawo6OzFM. (10/06/2016).
Hadoop and Related Network
10
 Oversubscription
 commonly expressed as a ratio of the amount of desired bandwidth required
versus bandwidth available
10Gbps
1Gbps NIC 40Nodes
= 40Gbps
Oversubscription
40 : 10 = 4 : 1
“Hadoop Operations by Eric Sammer (O’Reilly). Copyright 2012 Eric Sammer, 978-1-449-32705-7.”
Yahoo! JAPAN’s
Hadoop Network Transition
12
Yahoo! JAPAN’s Hadoop Network Transition
0
10
20
30
40
50
60
70
80
Cluster1
(Jun. 2011)
Cluster2
(Jan. 2013)
Cluster3
(Apr. 2014)
Cluster4
(Dec. 2015)
Cluster5
(Jun. 2016)
PB Cluster Volume
13
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack
Server NIC
UpLink
Oversubscription
14
Yahoo! JAPAN’s Hadoop Network Transition
20G
Cluster1
4 Switches/Stack
Stack Architecture
Nodes/Rack
Server NIC
UpLink
Oversubscription
15
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink
Oversubscription
16
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink
Oversubscription
17
Yahoo! JAPAN’s Hadoop Network Transition
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription20Gbps
18
Yahoo! JAPAN’s Hadoop Network Transition
20Gbps
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 4.5 : 1
19
Yahoo! JAPAN’s Hadoop Network Transition
20Gbps
Cluster1
Stack Architecture
Nodes/Rack 90Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 4.5 : 1
Up to ~10 switches
20
…
Cluster2
Yahoo! JAPAN’s Hadoop Network Transition
Spanning Tree Protocol
Nodes/Rack
Server NIC
UpLink
Oversubscription
21
…
Cluster2
Yahoo! JAPAN’s Hadoop Network Transition
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
22
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
23
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription10Gbps
24
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription 4 : 110Gbps
25
Yahoo! JAPAN’s Hadoop Network Transition
…
Cluster2
Spanning Tree Protocol
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 10Gbps
Oversubscription 4 : 1Blocking
26
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack
Server NIC
UpLink
Oversubscription
Cluster3
27
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
Cluster3
28
L2 Fabric
…
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink
Oversubscription
Cluster3
29
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription
L2 Fabric
…
Cluster3
20Gbps 20Gbps
30
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 40Nodes
Server NIC 1Gbps
UpLink 20Gbps
Oversubscription 2 : 1
L2 Fabric
…
Cluster3
20Gbps 20Gbps
31
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack
Server NIC
UpLink
Oversubscription
L2 Fabric
…
Cluster4
32
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 16Nodes
Server NIC 10Gbps
UpLink
Oversubscription
L2 Fabric
…
Cluster4
33
Yahoo! JAPAN’s Hadoop Network Transition
L2 Fabric/Channel
Nodes/Rack 16Nodes
Server NIC 10Gbps
UpLink 80Gbps
Oversubscription 2 : 1
L2 Fabric
…
80Gbps 80Gbps
Cluster4
34
Yahoo! JAPAN’s Hadoop Network transition
Release Volume #Nodes/Switch NIC Oversubscription
Cluster1 3PByte 90 1Gbps 4.5:1
Cluster2 20PByte 40 1Gbps 4:1
Cluster3 38PByte 40 1Gbps 2:1
Cluster4 58PByte 16 10Gbps 2:1
Cluster5 75PByte ? ?Gbps ?:?
Network Related Problems
And Solutions
Network Related Problems
36
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
37
Effect of switch failure in the Stack Architecture
 One of the switches which formed
the Stack failed
 This affected the other switches
forming the same Stack
 Communication interruption
among 90 nodes(5 racks)
 insufficient computing resources
and processing stoppage
Network Related Problems
38
Load on the switch due to BUM Traffic
L2 Fabric
… …
4400Nodes
 Due to ARP traffic from servers,
load on the core switch CPU
increases
 Tuning of ARP Cache entry
timeout
 The problem is Large Network
Address
Network Related Problems
39
Limitations for the DataNode Decommission
Network Related Problems
 Consideration of the impact on
jobs
 Limiting the number of nodes
for Decommissioning
40
Limitations for the Scale-out
 Stack Architecture
 Up to ~10 switches
 L2 Fabric Architecture
 Depending on the number of
chassis
Network Related Problems
41
Requirements
120~200 Racks
Scale-out possible up to 10000 Nodes
100~200Gbps UpLink/Rack
10Gbps NIC Server
20Nodes/Rack
DataCenter Located in US
Network Requirements of The Latest Cluster
42
How to solve these problems?
43
How to solve these problems?
We adopted IP CLOS Network!
Adopted IP CLOS Network For Solving Problems
44
Google, Facebook, Amazon, Yahoo…
Over The Top have adopted
DC network architecture
“Introducing data center fabric, the next-generation Facebook data center network”. Facebook
Code. https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-
next-generation-facebook-data-center-network/. (10/06/2016).
Adopted IP CLOS Network For Solving Problems
45
Improved scalability
Improved high availability
Cope-Up with increase in East-West traffic
Reduction in operating cost
Yahoo! JAPAN’s
IP CLOS Network
47
BoxSwitch Architecture
 No limitation on Scale-out
 Requires many switches
・・・
・・
・・・
・・
・・・
・・
・・・
・・
・・ ・・ ・・ ・・・・・
Spine
Leaf
ToR
Architecture
48
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Architecture
Architecture
49
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
・・・
・・Spine
Leaf
 Why was this architecture adopted?
 Reduce in items to be managed
IP address and cable, Interface, BGP Neighbor…..
 Overcomes the physical constraints,
such as one floor limit
 Reduction in cost
Architecture
ECMP
Between Spine and Leaf is BGP
51
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
BGP
Architecture
52
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
/31
/26 /27
Architecture
Between Spine and Leaf : /31
Rack : /26, /27
53
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
/31
/26 /27
Architecture
Resolved the “BUM Traffic problem”
54
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160Gbps
①
②
③
④
Architecture
55
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
①
②
③
④
Architecture
10Gbps NIC
20Nodes
160Gbps
56
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1
10Gbps NIC
20Nodes
57
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1
Resolved the “Limitations for the DataNode
Decommission”
10Gbps NIC
20Nodes
58
・・・・・
Internet
Spine
Core
Router
Layer3
Layer2・・・・・
Leaf
Leaf Uplink 40Gbps x 4 = 160Gbps
160G
①
②
③
④
Architecture
200 : 160 = 1.25 : 1Improved High Availability
10Gbps NIC
20Nodes
Architecture
59
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
Architecture
60
 Effect of switch failure in the Stack Architecture
 Load on the switch due to BUM Traffic
 Limitations for the DataNode Decommission
 Limitations for the Scale-out
✔
✔
✔
61
Yahoo! JAPAN’s Hadoop Network transition
Release Volume #Nodes/Switch NIC Oversubscription
Cluster1 3PByte 90 1Gbps 4.5:1
Cluster2 20PByte 40 1Gbps 4:1
Cluster3 38PByte 40 1Gbps 2:1
Cluster4 58PByte 16 10Gbps 2:1
Cluster5 75PByte 20 10Gbps 1.25:1
Performance Tests(5TB Terasort)
62
63
Performance Tests(40TB DistCp)
64
Performance Tests(40TB DistCp)
16Nodes/Rack
8Gbps/Node
65
Performance Tests(40TB DistCp)
16Nodes/Rack
8Gbps/Node
About 30Gbps x 4 = 120Gbps
New Problems
66
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
Slow
New Problems
67
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
New Problems
68
 Delay in data transfer
 Out of 4, 1 error packet is generated in Uplink
 That one affected the data transfer delay
“org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror”
Slow
New Problems
69
 IP changes when the server rack changes
 Also has a network address for each rack
 Access control using IP address
 Requires ACL update according to relocation
192.168.0.0/26 192.168.0.64/26
192.168.0.10 192.168.0.100
Future Plan
Future Plan
71
 Detecting error packet failure before affecting the data
transfer
Error!
Future Plan
72
Error!
Auto Shutdown
 Detecting error packet failure before affecting the data
transfer
Future Plan
73
 Use Erasure Coding
striping
64kB
Originalrawdata
Future Plan
74
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Raw data
D5D4D3D2D1
Future Plan
75
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
Future Plan
76
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Future Plan
77
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Read
Future Plan
78
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Read
Future Plan
79
 Use Erasure Coding
D6
striping
64kB
Originalrawdata
Parity
Raw data
D5D4D3D2D1
P3P2P1
D6
D5
D4
D3
D2
D1 P1
P2
P3
Low Data Locality
Future Plan
80
・・・・・・・・・・・・
Interconnecting various platforms
… …
BOTTLENECK
Future Plan
81
・・・・・・・・・・・・・・
 Isolation of computing and storage
: Storage Machine
: Computing Machine
Thank You for Listening!
Appendix
Appendix
84
JANOG38
http://www.janog.gr.jp/meeting/janog38/program/clos

More Related Content

What's hot

Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
DataWorks Summit/Hadoop Summit
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache Storm
DataWorks Summit
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
DataWorks Summit/Hadoop Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Next Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache StormNext Generation Execution Engine for Apache Storm
Next Generation Execution Engine for Apache Storm
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 

Viewers also liked

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
DataWorks Summit/Hadoop Summit
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
DataWorks Summit/Hadoop Summit
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
DataWorks Summit/Hadoop Summit
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
DataWorks Summit/Hadoop Summit
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
DataWorks Summit/Hadoop Summit
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
DataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
DataWorks Summit/Hadoop Summit
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
DataWorks Summit/Hadoop Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
DataWorks Summit/Hadoop Summit
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
 
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 

Similar to Network for the Large-scale Hadoop cluster at Yahoo! JAPAN

6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol
Samsung Open Source Group
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Samsung Open Source Group
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
Jonny Doin
 
Jorgenson Loki
Jorgenson LokiJorgenson Loki
Jorgenson Loki
Carl Ford
 
ARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities ReportARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities Report
ARIN
 
Emerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial ApplicationsEmerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial Applications
Prasant Misra
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
Lagopus SDN/OpenFlow switch
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
Jim St. Leger
 
Ieee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network ApplicationsIeee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network Applications
guest0215f3
 
Riverbed Within Local Gov
Riverbed Within Local GovRiverbed Within Local Gov
Riverbed Within Local Gov
michaelking
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Igalia
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE Corporation
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Community
 
Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002
suvobgd
 
Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2
APNIC
 
Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8
IT Brand Pulse
 
L6 6 lowpan
L6 6 lowpanL6 6 lowpan
L6 6 lowpan
bimal2638
 

Similar to Network for the Large-scale Hadoop cluster at Yahoo! JAPAN (20)

6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
 
Jorgenson Loki
Jorgenson LokiJorgenson Loki
Jorgenson Loki
 
ARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities ReportARIN 34 IPv6 IAB/IETF Activities Report
ARIN 34 IPv6 IAB/IETF Activities Report
 
Emerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial ApplicationsEmerging Networking Technologies for Industrial Applications
Emerging Networking Technologies for Industrial Applications
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
Ieee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network ApplicationsIeee Transition Of I Pv4 To I Pv6 Network Applications
Ieee Transition Of I Pv4 To I Pv6 Network Applications
 
Riverbed Within Local Gov
Riverbed Within Local GovRiverbed Within Local Gov
Riverbed Within Local Gov
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
LINE's Infrastructure Platform: How It Scales Massive Services and Maintains ...
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
 
Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002Internet Protocol Version 6 By Suvo 2002
Internet Protocol Version 6 By Suvo 2002
 
Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2
 
Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8Converged IO for HP ProLiant Gen8
Converged IO for HP ProLiant Gen8
 
L6 6 lowpan
L6 6 lowpanL6 6 lowpan
L6 6 lowpan
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 

Network for the Large-scale Hadoop cluster at Yahoo! JAPAN

Editor's Notes

  1. それではヤフーの深澤から Network for the Large-scale Hadoop cluster at Yahoo! JAPAN と題しまして発表をさせていただきたいと思います。
  2. 本日のアジェンダはこのような内容になっています。 まずは簡単にHadoopとネットワークの関係について説明をさせていただきたいと思います。 その後にヤフーで採用されてきたHadoop用のネットワークについてとネットワークに関連する問題点、 その解決策として、実際に導入したIPClosNetwork をヤフーの構成を交えて説明したいと思っております。
  3. まず Hadoop とネットワークの関連について簡単にお話したいと思います。
  4. Hadoop では様々種類のデータのやり取りが行われます。 例えば、DataNode などの SlaveNode 系のコンポーネントから、 NameNode などのマスター系のコンポーネントへの死活監視のためのHaertbeat。 その他には Job、Block などのReport の送信。さらにはデータのレプリケーションや再配置などによるブロックデータの転送があります。
  5. 特にブロックデータの転送にはより多くのトラフィックが発生します。 それは HDFS ではブロックのレプリケーションや再配置、MapReduce ではShuffle フェーズが該当します。
  6. また、Hadoop では従来のユーザからサーバへのアクセスのような North/South の方向つまり、縦のトラフィックだけでなく
  7. データレプリケーションなどによる、マシン同士での通信が発生します。 そのため台数が多くなるとラック間での通信が発生するためEast/West 横方向での通信が発生します。
  8. Hadoopでは、North/South の方向の通信よりも East/West 横方向の通信より多く発生します。
  9. また、これはFacebookのブログのものですが、このようにマシンToユーザではなく、マシンToマシン つまりマシン同士のトラフィックが増えていると書かれています。 これより、Hadoop に関わらずラック間でのトラフィックの意識は重要だと考えられます。
  10. このようにラック間での通信はHadoopに関わらず重要だと話してきましたが、 ラック間通信ではオーバーサブスクリプションを意識する必要があります。 オーバーサブスクリプションとは求められる帯域と実際に使用できる帯域の比率のことです。 このような1ラックに1Gbps NICのサーバが40台積んであるラックを想定したときに、 ラックスイッチの UpLink が10Gbps の場合は 40:10 つまりオーバーサブスクリプションは 4:1 になります。
  11. 続いてこれまでのヤフーでのHadoop用ネットワークについてお話したいと思います。
  12. まず、こちらのグラフを御覧ください。こちらはヤフーのいままでのHadoopクラスタのクラスタサイズをグラフ化したものです。 横軸がクラスタのリリース時期で、縦軸がクラスタのサイズとなっています。 ヤフーのクラスタは2011年の最初のクラスタから、このようにクラスタサイズが大きくなっています。 このクラスタの変化に合わせてHadoop用に採用されてきたネットワークも変化してきました。 ここからはそのネットワークの変遷についてお話したいと思います。
  13. まずは1番古いクラスタ1のネットワーク構成についてです。 こちらのクラスタのネットワークは複数のラックスイッチをStack構成。 つまり複数のスイッチを仮想的に1台に見せたスイッチをコアスイッチに接続しています。
  14. すべてのスイッチでStack構成を組んでいるわけではなく、4スイッチごとに一つのStackを構成しています。
  15. ひとつのStack構成には1GigabpsのNICのサーバが90台接続されています。 この構成場合上位のコアスイッチを経由して他のStack構成のラックへ通信する場合は
  16. このような経路を通ります。
  17. UpLnkが20Gbpsとなっているため、サーバの台数とUpLinkの数値よりオーバーサブスクリプションを求めると。
  18. 4.5:1という数値になります。
  19. このStack構成の問題点としては、構成が組めるスイッチの数が10台程度までという、スケールアウトに限度があります。
  20. 二つ目のクラスタ2のネットワーク構成はスパニングツリープロトコルを利用した標準的な構成となっています。
  21. このネットワーク構成のHadoopクラスタには、1ラックに1GigabpsのNICのサーバが40台設置されています。 この構成ではラック間で通信する場合、
  22. スパニングツリープロトコルのため、2本のUpLnkのうち片方のUpLinkを利用する形となり、このような経路を通ります。
  23. また、このネットワーク構成ではUpLink が 10Gigabps のため、
  24. オーバーサブスクリプションが 4:1となります。
  25. この構成では、UpLinkの片方がループ防止のためにブロッキングされているため、帯域を活かしきれていません。
  26. こちらは3番目のクラスタにリリースしたクラスタのネットワーク構成です。 こちらの構成は L2 Fabric と Channel という複数の物理ポートを単一の論理ポートにみせる技術を用いた構成になっています。 そのため、先程の構成とは違いUpLinkを片方のみ使うのではなく、UpLinkを両方とも使用する構成となっております。
  27. この構成で1ラックに1Gigabps のサーバが40台設置されています
  28. ラック間の通信経路ですが、こちらの構成では先程お話したように2本のUpLinkを Channel構成にしているため、UpLink2本とも使用するようになっております。
  29. UpLinkの帯域ですが、ラックスイッチのUpLinkは20Gigabps でとなっています。
  30. そのため、オーバーサブスクリプションは2:1となっています。 先程のスパニングツリープロトコルを採用していた構成よりもオーバーサブスクリプションが 良くなっています。
  31. 最後にクラスタ4の構成ですが、こちらは先程のクラスタ3と同じ構成になっています。
  32. ラック間の通信の仕方などは一緒ですが、こちらは10Gbps NICのサーバを1ラックに16台設置されています。
  33. この構成ではアップリンクが80Gigabpsとなっているため、10Giga bps のサーバでもオーバーサブスクリプション 2:1を維持しています。
  34. これまでのクラスタとネットワークに関する情報をまとめるとこのようになります。 クラスタが新しくなるにつれて規模が大きくなっていますが、オーバーサブスクリプションは改善されていることがわかります。
  35. 次にここまで紹介してきたヤフージャパンのHadoop用ネットワークで起きた障害や問題点とその解決策についてお話したいと思います。
  36. これまでのヤフージャパンでのHadoopの運用の中でネットワークに関連した障害と問題点はこちらになります。 上から順番に紹介していきたいと思います。
  37. まず一つ目は、最初にご紹介したクラスタで使用しているStack構成のネットワークでの障害です。 こちらはStackを組んでいるスイッチのうち1台が不調になったことで、同じStackを組んでいるスイッチにも 影響が及んでしまい、90台のサーバに対してネットワークリーチがとれなくなってしまいました。 それにより、計算リソースが不足し処理の停止が発生してしまいました。
  38. 次に BUM Traffic によるスイッチへの負荷です。 ちなみに BUM トラフィックとは、ブロードキャスト、ユニキャスト、マルチキャスト によるトラフィックのことです。 このとき同じネットワークアドレスの中に4000台以上、さきほどのCluster3とCluster4のノードが存在し、それぞれから短い間隔でARPのトラフィックが発生していたことが原因でした。 こちらはネットワーク内のサーバからのARPによるブロードキャストが原因で上位のコアスイッチに負荷を与え、CPUの上昇の原因となっていました。 対応としてサーバ側のARPエントリの保持時間を伸ばす対応でスイッチへ負荷を軽減させました。
  39. 次にこちらの、DataNodeのデコミッション時の制限ですが、 UpLinkの帯域や既存のジョブを考慮し、DataNode のデコミッションを実施する場合に限られた台数で実施していました。 みなさまはご存知だと思いますが、デコミッションではレプリケーションの再配置でデータの転送が発生します。 ちなみに弊社だと、電源のメンテナンスなどのために数ラック単位でのデコミッション処理が通常の運用で発生したりします。 そのような場合全台同時に実施するのではなくある程度数を区切って実施していました。
  40. こちらのスケールアウトの限界ですが、これはクラスタのスケールアウトに合わせて 物理的な制約などで制限がかかってしまうという問題です。 先程もお話しましたように、Stack構成では最大で10台程度が限度です。 L2 ファブリックなどの構成でもシャーシのポートの数に依存してしまうという問題がありました。
  41. また、このような問題があったなかで、去年の春頃にこのようなHadoopチームから ネットワークチームへこのような要件を出しました。規模としては120-200ラック。 10000ノードクラスのクラスタでも問題ないネットワークというものです。 また、場所は国内ではなくアメリカのデータセンターです。
  42. ヤフーはこの問題を解決するために
  43. IP CLOS Networkを選択しました。
  44. そもそもIP CLOS Networkとは世界の技術Top会社が採用しているネットワーク構成です。
  45. IP CLOS Network にはこのような特徴があります。 まずはスケーラビリティや耐障害性の向上、 East-West トラフィックの増大に対応が可能となっています。 1番下の運用コストの軽減ですが、耐障害性の向上による運用負荷の軽減と BGPやOSPFといった一般的なルーティングプロトコルを用いるため Switchメーカーに依存した運用と言ったものがなくなります。 各特徴の詳細はこの後構成を交えて説明させていただきます。
  46. ここからヤフージャパンのIP CLOS Network について説明したいと思います
  47. まず IP CLOS Network の構成にはこのような3層構造のボックススイッチ型の構成があります。 この構成の場合、SpineとLaefと呼ばれるSwitchを追加することでいくらでもスケールアウトが可能となります。
  48. こちらが現在Hadoopで採用しているネットワーク構成は先ほどご紹介したボックススイッチ構成のような 3層ではなくこのような シャーシ型スイッチを用いた2層のSpine-Leaf 構成になっています。
  49. 先程の Spine/Leaf の部分がシャーシに収まった形になります。
  50. 今回なぜこのような構成を採用したかというと、3層構造のBoxSwitchの構成だと管理するIPやケーブルが多くなってしまう点と 1フロア限定など物理的な制約があったためです。 今回初めてヤフーとしてCLOSNetworkの構築だったため、なるべく管理コストを減らす目的がありました。 また、シャーシ型のSwitchのコストが軽減したのも一つの要因です。
  51. このネットワーク構成では、Spine と Leaf の間を一般的なネットワークルーティングプロトコルである BGPで経路広報を行っております。また、SpineとLeafの通信はECMP(イコールコストマルチパス)となっているので 一部の経路のみを使用するのではなく、すべての経路を使用します
  52. Spine-Leaf の接続は各配線のそれぞれのインターフェースにIPを持っているため、 /31 でネットワークアドレスを割り当てています。 また、ラック毎に/26や/27のネットワークアドレスを持っています。
  53. この構成になったことで、ラック毎にネットワークアドレスをもつことによりL2の範囲が小さくなったため、BUMTraffic の影響が小さくなり 先程紹介したBUMTrafficの問題は解消されました。
  54. ラックスイッチからのUpLinkの帯域に関しては40Gigabps×4本で合計160Gigabpsとなっています。
  55. このHadoopクラスタでは 1ラックあたり10GNICのサーバが最大20台設置されているため、
  56. この場合オーバーサブスクリプションがこのように 1.25:1という形になります。
  57. これにより、DataNode のデコミッション処理にたいしてジョブの実行への影響を ネットワークに関して考慮しなくてよくなりました。
  58. さらにUpLinkが4本となり、より冗長になったため耐障害性も向上しました。 先程このHadoopクラスタはアメリカに構築されているとお話したと思いますが、 より冗長化したことでネットワーク障害時の即時対応が求められることが軽減されました。 これは24時間365日在住しているわけではないアメリカのデータセンターでは大きなメリットです。
  59. ここまでのお話でIP CLOS Network を採用したことで、今まで起きていた問題が
  60. このように解決することができました。スケールアウトの限界については 今回は先程お話したとおりフロアなどの別の制約があったため、限界があります。
  61. 先程の表にIP CLOS Network上のHadoopクラスタを加えるとこのような形になります。 オーバーサブスクリプションが大幅に改善されています。
  62. こちらは IP CLOS Network 上に構築した実際のHadoopクラスタで5TB のTeraSort を実施したときのネットワークトラフィックになります。 左がネットワーク機器からみたインプットトラフィックで右がアウトプットトラフィックです。 グラフからわかるように、インプットとアウトプットともに4つのUplinkにほぼ均等にトラフィックが分散されているのがわかります。
  63. また、DistCp で実際にパフォーマンスを出し切れるか実施したときの結果こちらになります。
  64. 1ラックに16Nodeサーバがあり、1台あたり8gigabpsトラフィックが出ていました。
  65. 一つのラックスイッチから約120Gigabps 出すことができました。
  66. ただ、IP CLOS Network にしたことで新たな問題も発生しました。 まずはデータ転送の遅延です。あるときデータのプットやMapReduceのジョブが遅いという報告がユーザからありました。
  67. 原因を調査すると特定のラックで、SlowBlockReceiver というログが出力されていました。 これはデータ転送が遅いときなどに出力されるログです。
  68. 原因はUpLinkの4本中1本でエラーパケットを出してしまっているため、そのラックへのデータ転送が遅延していました。 つまり、本数が増えたことで障害ポイントも増えてしまっています。
  69. また、運用上の注意点として、サーバラック毎にネットワークアドレスをもっているためラックを移動させた場合に サーバのIPアドレスが変わってしまう点があります。 これは、弊社の場合ですとIPアドレス単位でACLを設定しているためサーバの移設によってACLの変更も必要になります。
  70. 最後に今後についてです
  71. まずは、データ転送へ影響が起きる前のネットワーク障害への対応です。 先程お話したような一部のUpLinkにエラーパケットの上昇などが起きた場合に、それを検知し
  72. エラーカウンターの上昇などを検知し自動でインターフェースをShutdownなど実施して影響が出る前に対応することを目指しています。 これはUpLink が4本で冗長化がされているため、1本程度全断のリスクが減っているためです。
  73. 次にErasure Codingの採用です。弊社ではErasure CodingをIP CLOS Network上の Hadoop クラスタでの利用を開始しています。Erasure Coding とは
  74. 次にErasure Codingの採用です。弊社ではErasure CodingをIP CLOS Network上の Hadoop クラスタでの利用を開始しています。Erasure Coding ではブロックのデータを6分割し
  75. そこから3つパリティを作成します。そのため、一つのデータに対して9つのデータが生成されます。
  76. そしてErasure Codingではこの9つのデータを基本的にはすべて別のラックに配置します。
  77. そのため、特定のノードでデータをリードしたいときは
  78. このようにラック間の通信が発生してしまいます。
  79. ここからわかるようにデータローカリティが低くなっているため、通常のレプリケーションよりも一つのブロックを読み込むときに ラック間のデータ転送が多く発生することになります。 そのためラック間のネットワークトラフィックは重要になります。
  80. 次ですが、今後は Hadoop だけでなく様々なプラットフォームをCLOSNetwork上に載せることで ネットワーク帯域を気にせず、プラットフォーム毎の相互接続を可能にしたいと考えています。 現状ですと、プラットフォームごとに別のコアスイッチの配下にいるためネットワークがボトルネックになってしまっているため、 すべてのプラットフォームをCLOS Networkに載せることで解決したいと考えております。
  81. 最後ですが、こちらは今までのようなCPUとストレージをバランス良く考慮したサーバを導入するのではなく・ データローカリティを考慮をしないようにすることで、コンピューティングに特化したマシンと ストレージに特化したマシンを別々に置くことを目指しています。これにより、処理のリソースが足りなければ CPUをたくさん積んだサーバ、容量が足りなければストレージをたくさん積んだサーバを購入といったリソースの効率化を図れます。
  82. 以上となります。ご清聴ありがとうございました。