SlideShare a Scribd company logo
1 of 21
Download to read offline
Hadoop Hardware @Twitter:
Size does matter.
@joep and @eecraft
Hadoop Summit 2013

v2.3
About us
Joep Rottinghuis

•

•
•
•

Engineering Manager Hadoop/HBase team @ Twitter
Follow me @joep

Jay Shenoy

•

•
•
•
•

Software Engineer @ Twitter

Hardware Engineer @ Twitter
Engineering Manager HW @ Twitter
Follow me @eecraft

HW & Hadoop teams @ Twitter, Many others

#HadoopSummit2013

@Twitter

2
Agenda
•

Scale of Hadoop Clusters

•

Single versus multiple clusters

•

Twitter Hadoop Architecture
Hardware investigations

•

Results

•

#HadoopSummit2013

@Twitter

3
Scale

•

Scaling limits

# Nodes

• JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent
slots

•
•
•
•

Namenode 250-300 M objects in single namespace
Namenode @~100 GB heap -> full GC pauses
Shipping job jars to 1,000’s of nodes

#HadoopSummit2013

JobHistory server at a few 100’s K job history/conf files

@Twitter

4
When / why to split clusters ?
•

In principle preference for single cluster

•

Common logs, shared free space, reduced admin burden, more rack
diversity

•

Varying SLA’s

•

Workload diversity

•
•
•
•

Storage intensive
Processing (CPU / Disk IO) intensive
Network intensive

Data access

•

Hot, Warm, Cold

#HadoopSummit2013

@Twitter

5
Cluster Architecture

#HadoopSummit2013

@Twitter

6
Hardware investigations

#HadoopSummit2013

@Twitter

7
Service criteria for hardware
•

•

•

Hadoop does not need live HDD swap
Twitter DC : No SLA on data nodes
Rack SLA : Only 1 rack down at any time in a cluster

#HadoopSummit2013

@Twitter

8
Baseline Hadoop Server (~ early 2012)
DIMM
DIMM

GbE

E56xx

PCH

NIC

Characteristics:

DIMM

• Standard 2U
HBA

• 20 servers / rack

DIMM
DIMM

E56xx

DIMM

Works for the general cluster,
but...

• Need more density for storage
• Potential IO bottlenecks
#HadoopSummit2013

server

Expander

• E5645 CPU
• Dual 6-core
• 72GB memory
• 12 x 2TB HDD
• 2 x 1 GbE

@Twitter

9
Hadoop Server: Possible evolution
DIMM
DIMM
DIMM

E5-26xx or
E5-24xx

GbE

NIC

10GbE ?

Characteristics:

DIMM

+ CPU performance
? 20 servers / rack

DIMM
DIMM
DIMM

E5-26xx or
E5-24xx

HBA

DIMM

Expander

16 x 2T?
16 x 3T?
24 x 3T?

•

Candidate for
DW

Can deploy into the general DW cluster, but...

• Too much CPU for storage intensive apps
• Server failure domain too large if we scale up
disks

#HadoopSummit2013

@Twitter

10
Rethinking hardware evolution
•

Debunking myths

• Bigger is always better
• One size fits all
•

Back to Hadoop Hardware Roots:

• Scale horizontally, not vertically

Twitter Hadoop Server - “THS”
#HadoopSummit2013

@Twitter

11
THS for backups
GbE

NIC

DIMM

+ IO Performance

E3-12xx

• Few fast cores

DIMM

SAS
HBA

Storage focus:

• Cost efficient (single socket, 3T
drives)

Characteristics:

PCH

• E3-1230 V2 CPU
• 16 GB memory
• 12 x 3 TB HDD
• SSD boot
• 2 x 1 GbE

• Less memory needed
#HadoopSummit2013

@Twitter

12
THS variant for Hadoop-Proc and HBase
NIC

DIMM

10GbE

Characteristics:
+ IO Performance

E3-12xx
DIMM

• Few fast cores
SAS
HBA

Processing / throughput focus:

• Cost efficient (single socket, 1T
drives)

PCH

• E3-1230 V2 CPU
• 32 GB memory
• 12 x 1 TB HDD
• SSD boot
• 1 x 10 GbE

• More disk and network IO per
socket
#HadoopSummit2013

@Twitter

13
THS for cold cluster
GbE

NIC

DIMM

• Disk Efficiency
• Some compute

E3-12xx
DIMM

SAS
HBA

Combination of previous 2 use cases:

Characteristics:

PCH

• E3-1230 V2 CPU
• 32 GB memory
• 12 x 3 TB HDD
• 2 x 1 GbE

• Space & power efficient
• Storage dense and some processing
capabilities

#HadoopSummit2013

@Twitter

14
Rack-level view

1G TOR

Baseline

1G TOR
1G TOR

10G TOR

1G TOR
1G TOR

Twitter Hadoop Server
Backups

Proc

Cold

~ 8 kW

~ 8 kW

~ 8 kW

~ 8 kW

CPU sockets; DRAM

40; 1440 GB

40; 640 GB

40; 1280 GB

40; 1280 GB

Spindles; TB raw

240; 480 TB

480; 1,440 TB

480; 480 TB

480; 1,440 TB

Uplink; Internal BW

20 ; 40 Gbps

20 ; 80 Gbps

40 ; 400 Gbps

20 ; 80 Gbps

Power

#HadoopSummit2013

@Twitter

15
Processing performance comparison
Benchmark

Baseline Server

THS (-Cold)

TestDFSIO (write replication = 1)

360 MB/s / node

780 MB/s / node

TeraGen (30TB replication = 3)

1:36 hrs

1:35 hrs

TeraSort (30 TB, replication = 3)

6:11 hrs

4:22 hrs

2 Parallel TeraSort (30 TB each, replication = 3)

10:36 hrs

6:21 hrs

Application #1

4:37 min

3:09 min

Application set #2

13:3 hrs

10:57 hrs

Performance benchmark set up:

• Each clusters 102 nodes of respective type
• Efficient server = 3 racks, Baseline 5+ racks
• “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3
#HadoopSummit2013

@Twitter

16
Results

#HadoopSummit2013

@Twitter

17
LZO performance comparison

#HadoopSummit2013

@Twitter

18
16
Recap
•

At a certain scale it makes sense to split into multiple clusters

• For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP
•

For large enough clusters, depending on use-case, it may be worth to choose
different HW configurations

#HadoopSummit2013

@Twitter

19
Conclusion

@Twitter our “Twitter Hadoop Server”
not only saves many $$$, it is also
faster !

#HadoopSummit2013

@Twitter

20
#ThankYou
@joep and @eecraft
Come talk to us at booth 26

More Related Content

What's hot

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHortonworks
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopKai Sasaki
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Puppet
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24wyukawa
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache TezGal Vinograd
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCeph Community
 
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeHow to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeAerospike, Inc.
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 

What's hot (20)

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minute
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
 
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeHow to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 

Viewers also liked

2014 GITC 帶上數據去創業 talkingdata—高铎
 2014 GITC 帶上數據去創業 talkingdata—高铎 2014 GITC 帶上數據去創業 talkingdata—高铎
2014 GITC 帶上數據去創業 talkingdata—高铎Michael Zhang
 
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureHKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureMichael Zhang
 
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Michael Zhang
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_reportMichael Zhang
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven opsMichael Zhang
 
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性Michael Zhang
 
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Michael Zhang
 
Lvs在大规模网络环境下的应用pukong
Lvs在大规模网络环境下的应用pukongLvs在大规模网络环境下的应用pukong
Lvs在大规模网络环境下的应用pukongMichael Zhang
 
Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Feng Yu
 
廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐Michael Zhang
 

Viewers also liked (10)

2014 GITC 帶上數據去創業 talkingdata—高铎
 2014 GITC 帶上數據去創業 talkingdata—高铎 2014 GITC 帶上數據去創業 talkingdata—高铎
2014 GITC 帶上數據去創業 talkingdata—高铎
 
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureHKIX Upgrade to 100Gbps-Based Two-Tier Architecture
HKIX Upgrade to 100Gbps-Based Two-Tier Architecture
 
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]Q con shanghai2013-[刘海锋]-[京东文件系统简介]
Q con shanghai2013-[刘海锋]-[京东文件系统简介]
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven ops
 
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性2014 Hpocon 吴磊   ucloud - 由点到面 提升公有云服务可用性
2014 Hpocon 吴磊 ucloud - 由点到面 提升公有云服务可用性
 
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
Q con shanghai2013-[韩军]-[超大型电商系统架构解密]
 
Lvs在大规模网络环境下的应用pukong
Lvs在大规模网络环境下的应用pukongLvs在大规模网络环境下的应用pukong
Lvs在大规模网络环境下的应用pukong
 
Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言Erlang分布式系统的的领域语言
Erlang分布式系统的的领域语言
 
廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐廣告系統在Docker/Mesos上的可靠性實踐
廣告系統在Docker/Mesos上的可靠性實踐
 

Similar to Hadoop Hardware @Twitter: Size does matter.

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesRose Toomey
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 

Similar to Hadoop Hardware @Twitter: Size does matter. (20)

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 

More from Michael Zhang

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
2014 Hpocon 李志刚   1号店 - puppet在1号店的实践2014 Hpocon 李志刚   1号店 - puppet在1号店的实践
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践Michael Zhang
 
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用Michael Zhang
 
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控Michael Zhang
 
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试Michael Zhang
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Michael Zhang
 
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Michael Zhang
 
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Michael Zhang
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyMichael Zhang
 
Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Michael Zhang
 
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0Michael Zhang
 
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘Q con shanghai2013-黄慧攀-又拍云cdn技术探秘
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘Michael Zhang
 
Jedex stec DRAM Module Market Overview
Jedex stec DRAM Module Market  OverviewJedex stec DRAM Module Market  Overview
Jedex stec DRAM Module Market OverviewMichael Zhang
 
Percona live linux filesystems and my sql
Percona live   linux filesystems and my sqlPercona live   linux filesystems and my sql
Percona live linux filesystems and my sqlMichael Zhang
 
Velocity china2012kit life on edge —— 如何使用 esi 完成任务
Velocity china2012kit life on edge —— 如何使用 esi 完成任务Velocity china2012kit life on edge —— 如何使用 esi 完成任务
Velocity china2012kit life on edge —— 如何使用 esi 完成任务Michael Zhang
 
加快互联网核心协议,提高Web速度yuchungcheng
加快互联网核心协议,提高Web速度yuchungcheng加快互联网核心协议,提高Web速度yuchungcheng
加快互联网核心协议,提高Web速度yuchungchengMichael Zhang
 
前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpoMichael Zhang
 

More from Michael Zhang (20)

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
2014 Hpocon 李志刚   1号店 - puppet在1号店的实践2014 Hpocon 李志刚   1号店 - puppet在1号店的实践
2014 Hpocon 李志刚 1号店 - puppet在1号店的实践
 
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用2014 Hpocon 高驰涛   云智慧 - apm在高性能架构中的应用
2014 Hpocon 高驰涛 云智慧 - apm在高性能架构中的应用
 
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控2014 Hpocon 黄慧攀   upyun - 平台架构的服务监控
2014 Hpocon 黄慧攀 upyun - 平台架构的服务监控
 
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试2014 Hpocon 周辉   大众点评 - 大众点评混合开发模式下的加速尝试
2014 Hpocon 周辉 大众点评 - 大众点评混合开发模式下的加速尝试
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]
 
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
 
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodology
 
Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践Q con shanghai2013-赵永明-ats与cdn实践
Q con shanghai2013-赵永明-ats与cdn实践
 
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0
Q con shanghai2013- 荣先乾-qzone_touch跨终端优化_v2.0
 
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘Q con shanghai2013-黄慧攀-又拍云cdn技术探秘
Q con shanghai2013-黄慧攀-又拍云cdn技术探秘
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Jedex stec DRAM Module Market Overview
Jedex stec DRAM Module Market  OverviewJedex stec DRAM Module Market  Overview
Jedex stec DRAM Module Market Overview
 
Percona live linux filesystems and my sql
Percona live   linux filesystems and my sqlPercona live   linux filesystems and my sql
Percona live linux filesystems and my sql
 
Velocity china2012kit life on edge —— 如何使用 esi 完成任务
Velocity china2012kit life on edge —— 如何使用 esi 完成任务Velocity china2012kit life on edge —— 如何使用 esi 完成任务
Velocity china2012kit life on edge —— 如何使用 esi 完成任务
 
加快互联网核心协议,提高Web速度yuchungcheng
加快互联网核心协议,提高Web速度yuchungcheng加快互联网核心协议,提高Web速度yuchungcheng
加快互联网核心协议,提高Web速度yuchungcheng
 
前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo前瞻性Web性能优化pwpo
前瞻性Web性能优化pwpo
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Hadoop Hardware @Twitter: Size does matter.

  • 1. Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3
  • 2. About us Joep Rottinghuis • • • • Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay Shenoy • • • • • Software Engineer @ Twitter Hardware Engineer @ Twitter Engineering Manager HW @ Twitter Follow me @eecraft HW & Hadoop teams @ Twitter, Many others #HadoopSummit2013 @Twitter 2
  • 3. Agenda • Scale of Hadoop Clusters • Single versus multiple clusters • Twitter Hadoop Architecture Hardware investigations • Results • #HadoopSummit2013 @Twitter 3
  • 4. Scale • Scaling limits # Nodes • JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent slots • • • • Namenode 250-300 M objects in single namespace Namenode @~100 GB heap -> full GC pauses Shipping job jars to 1,000’s of nodes #HadoopSummit2013 JobHistory server at a few 100’s K job history/conf files @Twitter 4
  • 5. When / why to split clusters ? • In principle preference for single cluster • Common logs, shared free space, reduced admin burden, more rack diversity • Varying SLA’s • Workload diversity • • • • Storage intensive Processing (CPU / Disk IO) intensive Network intensive Data access • Hot, Warm, Cold #HadoopSummit2013 @Twitter 5
  • 8. Service criteria for hardware • • • Hadoop does not need live HDD swap Twitter DC : No SLA on data nodes Rack SLA : Only 1 rack down at any time in a cluster #HadoopSummit2013 @Twitter 8
  • 9. Baseline Hadoop Server (~ early 2012) DIMM DIMM GbE E56xx PCH NIC Characteristics: DIMM • Standard 2U HBA • 20 servers / rack DIMM DIMM E56xx DIMM Works for the general cluster, but... • Need more density for storage • Potential IO bottlenecks #HadoopSummit2013 server Expander • E5645 CPU • Dual 6-core • 72GB memory • 12 x 2TB HDD • 2 x 1 GbE @Twitter 9
  • 10. Hadoop Server: Possible evolution DIMM DIMM DIMM E5-26xx or E5-24xx GbE NIC 10GbE ? Characteristics: DIMM + CPU performance ? 20 servers / rack DIMM DIMM DIMM E5-26xx or E5-24xx HBA DIMM Expander 16 x 2T? 16 x 3T? 24 x 3T? • Candidate for DW Can deploy into the general DW cluster, but... • Too much CPU for storage intensive apps • Server failure domain too large if we scale up disks #HadoopSummit2013 @Twitter 10
  • 11. Rethinking hardware evolution • Debunking myths • Bigger is always better • One size fits all • Back to Hadoop Hardware Roots: • Scale horizontally, not vertically Twitter Hadoop Server - “THS” #HadoopSummit2013 @Twitter 11
  • 12. THS for backups GbE NIC DIMM + IO Performance E3-12xx • Few fast cores DIMM SAS HBA Storage focus: • Cost efficient (single socket, 3T drives) Characteristics: PCH • E3-1230 V2 CPU • 16 GB memory • 12 x 3 TB HDD • SSD boot • 2 x 1 GbE • Less memory needed #HadoopSummit2013 @Twitter 12
  • 13. THS variant for Hadoop-Proc and HBase NIC DIMM 10GbE Characteristics: + IO Performance E3-12xx DIMM • Few fast cores SAS HBA Processing / throughput focus: • Cost efficient (single socket, 1T drives) PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 1 TB HDD • SSD boot • 1 x 10 GbE • More disk and network IO per socket #HadoopSummit2013 @Twitter 13
  • 14. THS for cold cluster GbE NIC DIMM • Disk Efficiency • Some compute E3-12xx DIMM SAS HBA Combination of previous 2 use cases: Characteristics: PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 3 TB HDD • 2 x 1 GbE • Space & power efficient • Storage dense and some processing capabilities #HadoopSummit2013 @Twitter 14
  • 15. Rack-level view 1G TOR Baseline 1G TOR 1G TOR 10G TOR 1G TOR 1G TOR Twitter Hadoop Server Backups Proc Cold ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kW CPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GB Spindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TB Uplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps Power #HadoopSummit2013 @Twitter 15
  • 16. Processing performance comparison Benchmark Baseline Server THS (-Cold) TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / node TeraGen (30TB replication = 3) 1:36 hrs 1:35 hrs TeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs 2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrs Application #1 4:37 min 3:09 min Application set #2 13:3 hrs 10:57 hrs Performance benchmark set up: • Each clusters 102 nodes of respective type • Efficient server = 3 racks, Baseline 5+ racks • “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3 #HadoopSummit2013 @Twitter 16
  • 19. Recap • At a certain scale it makes sense to split into multiple clusters • For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP • For large enough clusters, depending on use-case, it may be worth to choose different HW configurations #HadoopSummit2013 @Twitter 19
  • 20. Conclusion @Twitter our “Twitter Hadoop Server” not only saves many $$$, it is also faster ! #HadoopSummit2013 @Twitter 20
  • 21. #ThankYou @joep and @eecraft Come talk to us at booth 26