SlideShare a Scribd company logo
1 of 41
Proprietary & Confidential. Copyright © 2014.
Hado’ops’
or
Had’oops’ 1
We’re Hiring
rocketfuel.com/careers
Kishore Kumar Yellamraju
Abhijit Pol
Proprietary & Confidential. Copyright © 2014.
The Web Is Monetized By Advertising
Proprietary & Confidential. Copyright © 2014.
Delivery Methods
»Display
»Video
»Mobile
»Social
Proprietary & Confidential. Copyright © 2014.
6. Ad
Served
User
Segment
s
3. Bid
Reques
t
Overview
Publishers
2. Ad
Request
1. Page
Request
4. Bid &
Ad
User
Engagement
s
Data
Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Model
s
Refresh
learning
Data
Store
Ads &
Budget
Model
Scores
Events
5.
Rocketfuel
Winning Ad
Proprietary & Confidential. Copyright © 2014.
1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
$2.78
$0.586
$2.009
1.25
$2.11
$1.26
$2.78
$1.56
$0.00
[ + ][ + ]
Site/PageGeo/WeatherTime of DayBrand AffinityUser
Always buying the best impressions & serving the best ad
Real Time Bidding and Serving
Proprietary & Confidential. Copyright © 2014.
Goal:
Leads
& sales
Goal:
Coupon
downloads
Goal:
Brand
awareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-market
Behavior
Response
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
X
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
+100
+40
-20
+20
+15
+10
+40
+35
+9.7%
+40
-70
-20
+10
+15
-25
-40
-18
+0.7%
+10
-10
-20
+20
+10
-35
-25
+10
+1.4%
Real Time Bidding and Serving
X
Proprietary & Confidential. Copyright © 2014.
6. Ad
Served
User
Segment
s
3. Bid
Reques
t
Overview
Publishers
2. Ad
Request
1. Page
Request
4. Bid &
Ad
User
Engagement
s
Data
Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Model
s
Refresh
learning
Data
Store
Ads &
Budget
Model
Scores
Events
5.
Rocketfuel
Winning Ad
Proprietary & Confidential. Copyright © 2014.
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary & Confidential. Copyright © 2014.
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbird's wing
Look up in Blackbird
Time (ms)
Latency
Proprietary & Confidential. Copyright © 2014.
Architecture and Scale
»Datacenters
»Scale
»Growth
»Architecture
Proprietary & Confidential. Copyright © 2014.
Data Center Expansion
»abc
Proprietary & Confidential. Copyright © 2014.
Data Center Design
• Racks custom built at Rocket Fuel
• Leased space/bandwidth in colocation facilities
Hadoop Server
20 2U servers (8.5kW)
Bidders
40 2-U Twin 2 servers (17kW)
Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
»34,474 CPU processor cores
–2655 servers
–187.4 Teraflops of computing
»188 Terabytes of memory
–13X the memory of IBM computer Watson that
played Jeopardy
»42PB Petabytes of storage
–106X the data volume of the entire Library of
Congress
Proprietary & Confidential. Copyright © 2014.
Hadoop at Rocket Fuel
»1400 servers
»15K Disks
»15K Cores
»90 TB
»30K MR slots
»12K daily MR jobs
Proprietary & Confidential. Copyright © 2014.
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary & Confidential. Copyright © 2014.
Data Architecture 3.0
Proprietary & Confidential. Copyright © 2014.
Hadoop Setup
QJM ZK Quorum
» 6x2TB Disks
» 2x6 core
» 196 GB RAM
» 2x1G NIC
» 12x3TB Disks
» 2x6 core
» 64 GB RAM
» 10G NIC
» same as DN’s
» Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary & Confidential. Copyright © 2014.
Puppet and Infradb
» Automate as much as you can
» Adding a slave node to Hadoop cluster < 120 seconds
» Bringing up a new Hadoop cluster < 500 seconds
» MR slots are automatically determined based on hardware config
Isn’t it cool ?
Just define once
Proprietary & Confidential. Copyright © 2014.
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary & Confidential. Copyright © 2014.
dfs.namenode.handler.count
dfs.image.transfer.timeout
mapred.reduce.parallel.copies
mapred.job.tracker.handler.count
io.sort.mbio.sort.factor
maxClientCnxns
ZK :
HDFS :
MR :
IMP : MAPREDUCE-2026
-XX:+UseConcMarkSweepGC
-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=60
ha.*-timeout.ms
JVM:
Performance Tuning
mapreduce.reduce.shuffle.parallelcopies
Proprietary & Confidential. Copyright © 2014.
MAPREDUCE-5351
MAPREDUCE-5508
"keep.failed.task.files=true"
We Have an Issue!
Proprietary & Confidential. Copyright © 2014.
#instances of "JobInProgress” class = no. of users submitted jobs X
mapred.jobtracker.completeuserjobs.maximum
mapred.jobtracker.completeuserjobs.maximum mapred.jobtracker.retirejob.interval
mapred.jobtracker.retiredjobs.cache.size
JT OOM
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
Monitoring
Wall of Ops
Proprietary & Confidential. Copyright © 2014.
Monitoring
hadoop.namenode.CallQueueLength hadoop.jobtracker.jvm.memheapusedm
Don’t fly blind, you will crash!
Proprietary & Confidential. Copyright © 2014.
MR Workload Monitoring
Proprietary & Confidential. Copyright © 2014.
Network Monitoring
Don’t blame network, instead monitor it Network Mesh can be mess
Proprietary & Confidential. Copyright © 2014.
Alerting
Monitoring is not enough, need better Alerting
Proprietary & Confidential. Copyright © 2014.
Alerts
http://hostname:port/jmx?
qry=Hadoop:service=NameNode,name=NameNodeInfo
>> Checking whether NN and JT are up is a no brainer
>> Reduce alert noise by having summary/aggregate alerts
>> We heavily rely on custom scripts that query /jmx for NN and JT
qry=hadoop:service=JobTracker,name=JobTrackerInfo
NameDirStatuses, DeadNodes, NumberOfMissingBlocks ,
qry=Hadoop:service=NameNode,name=FSNamesystemState
FSState , CapacityRemaining , NumDeadDataNodes , UnderReplicatedBlocks
Blacklisted TT’s , #jobs , #slots_used , ThreadCount ,
qry=java.lang:type=Memory"
Used jvm , free jvm etc
Proprietary & Confidential. Copyright © 2014.
MR Workload Alerting
» Monitoring MR workload and alert
– In-house tool that use “houdah” ruby gem monitors
– Long running jobs , jobs with more map tasks , blacklisted
TT’s with more failure counts etc…
» Collect details and auto-restart blacklisted TT’s
» Parse the JT logfile for rouge jobs.
» Parse the JT log and collects all Job related info
» White-elephant or hraven could help
» Parse the scheduler html page or use metrics page
http://<JT-hostname>:50030/scheduler?advanced
http://<JT-hostname>:50030/metrics
Proprietary & Confidential. Copyright © 2014.
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary & Confidential. Copyright © 2014.
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
BCP
» BCP  Business Continuity Plan
» Near real time reporting over 15+ TB of daily data
» Freshness of models trained over petabytes of data
Proprietary & Confidential. Copyright © 2014.
Data BCP Cluster
INW
Data
Cluster
US
Serving
Clusters
EU
Serving
Clusters
HK
Serving
Clusters
Modeling
Repor
ting
User
Queries
Amazon Backup
LSV
Data
Cluster
US/EU/HK
Serving
Clusters
Research
Ad-hoc
Queries
Processed Data
Proprietary & Confidential. Copyright © 2014.
YARN
» Resource Manager
- Global resource scheduler
- Hierarchical queues
- Application management
» Node Manager
- Per-machine agent
- Manages life cycle of container
- Container resource monitoring
» Application Master
- Per-application
- Manages application scheduling and
task execution
Proprietary & Confidential. Copyright © 2014.
YARN at Rocket FueI
» Yarn is in production
» 700+ nodes
» 31TB RAM , 8500 disks , 8500 cores
» Primary use case Map-Reduce
» No more static slots
» Tez , Spark , Storm are in race
YAY !!!
Proprietary & Confidential. Copyright © 2014.
Obligatory “we are hiring” slide!
http://rocketfuel.com/careers
Proprietary & Confidential. Copyright © 2014.
THANKS
kishore@rocketfuel.com
apol@rocketfuel.com

More Related Content

What's hot

What's hot (20)

Foundations of Amazon EC2 - SRV319 - Chicago AWS Summit
Foundations of Amazon EC2 - SRV319 - Chicago AWS SummitFoundations of Amazon EC2 - SRV319 - Chicago AWS Summit
Foundations of Amazon EC2 - SRV319 - Chicago AWS Summit
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Amazon EC2 Foundations - SRV319 - Atlanta AWS Summit
Amazon EC2 Foundations - SRV319 - Atlanta AWS SummitAmazon EC2 Foundations - SRV319 - Atlanta AWS Summit
Amazon EC2 Foundations - SRV319 - Atlanta AWS Summit
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャリアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 

Similar to Hado"ops" or Had"oops"

How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?
DataWorks Summit
 
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
Guruprasad Nagaraja
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
DataWorks Summit
 

Similar to Hado"ops" or Had"oops" (20)

Dawn of YARN @ Rocket Fuel
Dawn of YARN @ Rocket FuelDawn of YARN @ Rocket Fuel
Dawn of YARN @ Rocket Fuel
 
How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
141106 actifio overview
141106 actifio overview 141106 actifio overview
141106 actifio overview
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
 
Beginners Guide to High Availability for Postgres
Beginners Guide to High Availability for PostgresBeginners Guide to High Availability for Postgres
Beginners Guide to High Availability for Postgres
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
Performance Engineering Sterling MCS-OM - An Accenture Capability (3)
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
HP 3PAR SSMC 2.1
HP 3PAR SSMC 2.1HP 3PAR SSMC 2.1
HP 3PAR SSMC 2.1
 
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Where to Deploy Hadoop: Bare-metal or Cloud?
Where to Deploy Hadoop:  Bare-metal or Cloud?Where to Deploy Hadoop:  Bare-metal or Cloud?
Where to Deploy Hadoop: Bare-metal or Cloud?
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 

Hado"ops" or Had"oops"

  • 1. Proprietary & Confidential. Copyright © 2014. Hado’ops’ or Had’oops’ 1 We’re Hiring rocketfuel.com/careers Kishore Kumar Yellamraju Abhijit Pol
  • 2. Proprietary & Confidential. Copyright © 2014. The Web Is Monetized By Advertising
  • 3. Proprietary & Confidential. Copyright © 2014. Delivery Methods »Display »Video »Mobile »Social
  • 4. Proprietary & Confidential. Copyright © 2014. 6. Ad Served User Segment s 3. Bid Reques t Overview Publishers 2. Ad Request 1. Page Request 4. Bid & Ad User Engagement s Data Partners Advertisers Browser Some Exchange Partners Ad Exchange Optimize Rocket Fuel Platform Real-time Bidder Automated Decisions Model s Refresh learning Data Store Ads & Budget Model Scores Events 5. Rocketfuel Winning Ad
  • 5. Proprietary & Confidential. Copyright © 2014. 1.25 $2.11 $1.26 $2.78 $1.256 $1.809 $2.42 1.25 $2.11 $1.26 $2.78 $0.586 $2.009 1.25 $2.11 $1.26 $2.78 $1.56 $0.00 [ + ][ + ] Site/PageGeo/WeatherTime of DayBrand AffinityUser Always buying the best impressions & serving the best ad Real Time Bidding and Serving
  • 6. Proprietary & Confidential. Copyright © 2014. Goal: Leads & sales Goal: Coupon downloads Goal: Brand awareness Site/PageGeo/WeatherTime of DayBrand AffinityDemo Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-market Behavior Response Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response X Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response +100 +40 -20 +20 +15 +10 +40 +35 +9.7% +40 -70 -20 +10 +15 -25 -40 -18 +0.7% +10 -10 -20 +20 +10 -35 -25 +10 +1.4% Real Time Bidding and Serving X
  • 7. Proprietary & Confidential. Copyright © 2014. 6. Ad Served User Segment s 3. Bid Reques t Overview Publishers 2. Ad Request 1. Page Request 4. Bid & Ad User Engagement s Data Partners Advertisers Browser Some Exchange Partners Ad Exchange Optimize Rocket Fuel Platform Real-time Bidder Automated Decisions Model s Refresh learning Data Store Ads & Budget Model Scores Events 5. Rocketfuel Winning Ad
  • 8. Proprietary & Confidential. Copyright © 2014. 5 B 6 B 45 B Facebook likes Searches on Google Bid Requests Considered by Rocketfuel Requests per day Throughput
  • 9. Proprietary & Confidential. Copyright © 2014. 400 100 20 2 Blink of an eye SF to Tokyo network round trip One beat of a hummindbird's wing Look up in Blackbird Time (ms) Latency
  • 10. Proprietary & Confidential. Copyright © 2014. Architecture and Scale »Datacenters »Scale »Growth »Architecture
  • 11. Proprietary & Confidential. Copyright © 2014. Data Center Expansion »abc
  • 12. Proprietary & Confidential. Copyright © 2014. Data Center Design • Racks custom built at Rocket Fuel • Leased space/bandwidth in colocation facilities Hadoop Server 20 2U servers (8.5kW) Bidders 40 2-U Twin 2 servers (17kW)
  • 13. Proprietary & Confidential. Copyright © 2014. Rocket Fuel Scale »34,474 CPU processor cores –2655 servers –187.4 Teraflops of computing »188 Terabytes of memory –13X the memory of IBM computer Watson that played Jeopardy »42PB Petabytes of storage –106X the data volume of the entire Library of Congress
  • 14. Proprietary & Confidential. Copyright © 2014. Hadoop at Rocket Fuel »1400 servers »15K Disks »15K Cores »90 TB »30K MR slots »12K daily MR jobs
  • 15. Proprietary & Confidential. Copyright © 2014. 200 Servers 1400 Servers 5 PB 41 PB 8x Growth
  • 16. Proprietary & Confidential. Copyright © 2014. Data Architecture 3.0
  • 17. Proprietary & Confidential. Copyright © 2014. Hadoop Setup QJM ZK Quorum » 6x2TB Disks » 2x6 core » 196 GB RAM » 2x1G NIC » 12x3TB Disks » 2x6 core » 64 GB RAM » 10G NIC » same as DN’s » Dedicated disk to ZK or JN JT Standby NN ZKFCZKFC Active NN DN TT DN TT DN TT DN TT DN TT DN TT
  • 18. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 19. Proprietary & Confidential. Copyright © 2014. Puppet + Infradb Automation is key Maintenance is Not Easy
  • 20. Proprietary & Confidential. Copyright © 2014. Puppet and Infradb » Automate as much as you can » Adding a slave node to Hadoop cluster < 120 seconds » Bringing up a new Hadoop cluster < 500 seconds » MR slots are automatically determined based on hardware config Isn’t it cool ? Just define once
  • 21. Proprietary & Confidential. Copyright © 2014. No issues when cluster is small Problems starts when it grows Performance Tuning
  • 22. Proprietary & Confidential. Copyright © 2014. dfs.namenode.handler.count dfs.image.transfer.timeout mapred.reduce.parallel.copies mapred.job.tracker.handler.count io.sort.mbio.sort.factor maxClientCnxns ZK : HDFS : MR : IMP : MAPREDUCE-2026 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSInitiatingOccupancyFraction=60 ha.*-timeout.ms JVM: Performance Tuning mapreduce.reduce.shuffle.parallelcopies
  • 23. Proprietary & Confidential. Copyright © 2014. MAPREDUCE-5351 MAPREDUCE-5508 "keep.failed.task.files=true" We Have an Issue!
  • 24. Proprietary & Confidential. Copyright © 2014. #instances of "JobInProgress” class = no. of users submitted jobs X mapred.jobtracker.completeuserjobs.maximum mapred.jobtracker.completeuserjobs.maximum mapred.jobtracker.retirejob.interval mapred.jobtracker.retiredjobs.cache.size JT OOM
  • 25. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 26. Proprietary & Confidential. Copyright © 2014. Monitoring Wall of Ops
  • 27. Proprietary & Confidential. Copyright © 2014. Monitoring hadoop.namenode.CallQueueLength hadoop.jobtracker.jvm.memheapusedm Don’t fly blind, you will crash!
  • 28. Proprietary & Confidential. Copyright © 2014. MR Workload Monitoring
  • 29. Proprietary & Confidential. Copyright © 2014. Network Monitoring Don’t blame network, instead monitor it Network Mesh can be mess
  • 30. Proprietary & Confidential. Copyright © 2014. Alerting Monitoring is not enough, need better Alerting
  • 31. Proprietary & Confidential. Copyright © 2014. Alerts http://hostname:port/jmx? qry=Hadoop:service=NameNode,name=NameNodeInfo >> Checking whether NN and JT are up is a no brainer >> Reduce alert noise by having summary/aggregate alerts >> We heavily rely on custom scripts that query /jmx for NN and JT qry=hadoop:service=JobTracker,name=JobTrackerInfo NameDirStatuses, DeadNodes, NumberOfMissingBlocks , qry=Hadoop:service=NameNode,name=FSNamesystemState FSState , CapacityRemaining , NumDeadDataNodes , UnderReplicatedBlocks Blacklisted TT’s , #jobs , #slots_used , ThreadCount , qry=java.lang:type=Memory" Used jvm , free jvm etc
  • 32. Proprietary & Confidential. Copyright © 2014. MR Workload Alerting » Monitoring MR workload and alert – In-house tool that use “houdah” ruby gem monitors – Long running jobs , jobs with more map tasks , blacklisted TT’s with more failure counts etc… » Collect details and auto-restart blacklisted TT’s » Parse the JT logfile for rouge jobs. » Parse the JT log and collects all Job related info » White-elephant or hraven could help » Parse the scheduler html page or use metrics page http://<JT-hostname>:50030/scheduler?advanced http://<JT-hostname>:50030/metrics
  • 33. Proprietary & Confidential. Copyright © 2014. Modeling OPS ETL Ad-hoc Multi Tenancy
  • 34. Proprietary & Confidential. Copyright © 2014. No Scheduler is perfect unless you understand and tune it properly Scheduling
  • 35. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 36. Proprietary & Confidential. Copyright © 2014. BCP » BCP  Business Continuity Plan » Near real time reporting over 15+ TB of daily data » Freshness of models trained over petabytes of data
  • 37. Proprietary & Confidential. Copyright © 2014. Data BCP Cluster INW Data Cluster US Serving Clusters EU Serving Clusters HK Serving Clusters Modeling Repor ting User Queries Amazon Backup LSV Data Cluster US/EU/HK Serving Clusters Research Ad-hoc Queries Processed Data
  • 38. Proprietary & Confidential. Copyright © 2014. YARN » Resource Manager - Global resource scheduler - Hierarchical queues - Application management » Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring » Application Master - Per-application - Manages application scheduling and task execution
  • 39. Proprietary & Confidential. Copyright © 2014. YARN at Rocket FueI » Yarn is in production » 700+ nodes » 31TB RAM , 8500 disks , 8500 cores » Primary use case Map-Reduce » No more static slots » Tez , Spark , Storm are in race YAY !!!
  • 40. Proprietary & Confidential. Copyright © 2014. Obligatory “we are hiring” slide! http://rocketfuel.com/careers
  • 41. Proprietary & Confidential. Copyright © 2014. THANKS kishore@rocketfuel.com apol@rocketfuel.com

Editor's Notes

  1. Most people have probably used IMDb, but they probably won’t use if if they have to pay.
  2. What we do: Display, video, mobile, social
  3. Loop 1-6 is 200ms, 3-4 is 100ms for RF We do this 45B times a day
  4. Real Time Auction Selecting the right ad for each auction
  5. Automatically learning from every response & getting better Nobody else is doing this as fast, precisely, consistently for our customers
  6. Loop 1-6 is 200ms, 3-4 is 100ms for RF We do this 45B times a day
  7. We are now in 8 data centers in the world
  8. We have optimized design of data centers as well. We custom design our racks, get servers assembled, racked and tested in a California facility. Then, ship to the data center. This is what we do not just for US data centers but also for data centers in Europe or Asia. Each rack can be 1500 lb or more and many racks are sent by air for initially install. Now, let’s look at the two kinds of racks shown above: Hadoop Server (the full racks) :L Data (Hadoop) servers are bigger as they have 12X3TB drives and 20 servers fill the whole rack. Bidders: Bidders have lot of cores but take less space because they have only 2 2.5” drives each. 40 servers fill up half the rack but we run out of switch ports. And, this is 5% of Rocket Fuel
  9. Just say “We have amazing scale” – let the numbers speak for themselves.
  10. Managing Hadoop cluster is not easy Start early We are heavy users of puppet Infradb is similar what puppet hiera but infradb was written in house 4 yrs ago. -> puppet and infradb are tightly integrated We use puppet and infradb to make maintenance easy Infradb helps us populate hadoop property values based on hardware config we have. For ex: Our fairshare slot distribution is automatically handled by infradb whenever we add new nodes.
  11. -> here is an example , we define the formula to decide no. of MR slots per server based on mem , cpu , disks -> we always want to have homogenous hardware for easy maintenance and planning , but it is impossible since “need changes with time” -> automation like this will let you not about having heterogeneous servers. -> not just configuration, we use infradb to define alerts once and all the newly added hosts and clusters will be automatically monitored by our nagios.
  12. A typical hadoop problem, you start with small cluster and want to grow. Hadoop default properties works well on small clusters , the problem starts when your cluster grows. Problem will be big when it happens on large clusters Aren’t we suppose to get better performance after adding nodes ?
  13. -> too many properties to change -> be careful when tuning any changes -> have metrics to compare pre and post changes. -> MAPREDUCE-2026 : JobTracker.getJobCounter() will lock JobTracker and call JobInProgress.getCounters(). JobInProgress.getCounters() can be very expensive because it aggregates all the task counters. We found that from the JobTracker jstacks that this method is one of the bottleneck of the JobTracker performance.
  14. -> few jiras that talk about memory leak in JobTracker. -> none of them really fixed issue even though the bugs are marked as resolved. -> 5351 is fixed but introduced 5508. 5508 is later fixed but there is workaround “set keep.failed.task.files=true” -> none of them really resolved the issue of JT OOM.
  15. -> any HDFS heavy job can impact your HDFS performance , you will not realize unless you monitor the metrics -> don’t let any engineer impact your cluster.
  16. -> monitoring your applications is not enough when you are running on a scale in multiple datacenters across world -> we should monitor the network mesh
  17. -> find out bad queries immediately and kill them before they impact cluster -> don’t loose your capacity due to mass tasktracker blacklisting by a single job. -> long running jobs should be killed . No point in letting them run.
  18. -> understand the workload on your cluster to better tune the scheduler properties ->whenever you add more nodes, the slot distribution should be automatically distributed to different queues. -> no scheduler is perfect unless you understand and tune it -> have ACL’s in place , don’t let any one engineer impact your MR workload. -> have an proper accounting for teams who use more MR capacity.
  19. How we operated for initial few years Recently added DATA BCP (Business Continuity Plan) Latency critical and important data goes both places Other data after processing Make use of BCP cluster to do meaningful things until disaster happens