Big data summit

Proprietary & Confidential. Copyright © 2014.
Hadoop Operations @ Rocket Fuel
We’re Hiring
rocketfuel.com/careers
Kishore Kumar Yellamraju

The Web Is Monetized By Advertising

Delivery Methods
»Display
»Video
»Mobile
»Social

6. Ad
Served
User
Segment
s
3. Bid
Reques
t
Overview
Publishers
2. Ad
Request
1. Page
Request
4. Bid &
Ad
User
Engagement
s
Data
Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Model
s
Refresh
learning
Data
Store
Ads &
Budget
Model
Scores
Events
5.
Rocketfuel
Winning Ad

1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
$2.78
$0.586
$2.009
1.25
$2.11
$1.26
$2.78
$1.56
$0.00
[ + ][ + ]
Site/PageGeo/WeatherTime of DayBrand AffinityUser
Always buying the best impressions & serving the best ad
Real Time Bidding and Serving

Goal:
Leads
& sales
Goal:
Coupon
downloads
Goal:
Brand
awareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-market
Behavior
Response
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
X
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
+100
+40
-20
+20
+15
+10
+40
+35
+9.7%
+40
-70
-20
+10
+15
-25
-40
-18
+0.7%
+10
-10
-20
+20
+10
-35
-25
+10
+1.4%
Real Time Bidding and Serving
X

5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput

400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbird's wing
Look up in Blackbird
Time (ms)
Latency

Architecture and Scale
»Datacenters
»Scale
»Growth
»Architecture

Data Center Expansion
»abc

Data Center Design
• Racks custom built at Rocket Fuel
• Leased space/bandwidth in colocation facilities
Hadoop Server
24 2U servers (8.5kW)
Bidders
40 2-U Twin 2 servers (17kW)

Rocket Fuel Scale
»34,474 CPU processor cores
–2655 servers
–187.4 Teraflops of computing
»188 Terabytes of memory
–13X the memory of IBM computer Watson that
played Jeopardy
»42PB Petabytes of storage
–106X the data volume of the entire Library of
Congress

Hadoop at Rocket Fuel
»1400 servers
»15K Disks
»15K Cores
»90 TB
»30K MR slots
»12K daily MR jobs

200 Servers 1400 Servers
5 PB
41 PB
8x
Growth

Data Architecture 3.0

Batch and Real Time Pipelines
Webservers
STORM
MySq
l
Zookeeper

Hadoop Setup
QJM ZK Quorum
» 6x2TB Disks
» 2x6 core
» 196 GB RAM
» 2x1G NIC
» 12x3TB Disks
» 2x6 core
» 64 GB RAM
» 10G NIC
» same as DN’s
» Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT

Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN

Puppet
+
Infradb
Automation is key
Maintenance is Not Easy

Puppet and Infradb
» Automate as much as you can
» Adding a slave node to Hadoop cluster < 120 seconds
» Bringing up a new Hadoop cluster < 500 seconds
» MR slots are automatically determined based on hardware config
Isn’t it cool ?
Just define once

No issues when cluster is small Problems starts when it grows
Performance Tuning

dfs.namenode.handler.count
dfs.image.transfer.timeout
mapred.reduce.parallel.copies
mapred.job.tracker.handler.count
io.sort.mbio.sort.factor
maxClientCnxns
ZK :
HDFS :
MR :
IMP : MAPREDUCE-2026
-XX:+UseConcMarkSweepGC
-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=60
ha.*-timeout.ms
JVM:
Performance Tuning
mapreduce.reduce.shuffle.parallelcopies

Monitoring
Wall of Ops

Monitoring
hadoop.namenode.CallQueueLength hadoop.jobtracker.jvm.memheapusedm
Don’t fly blind, you will crash!

MR Workload Monitoring

Network Monitoring
Don’t blame network, instead monitor it Network Mesh can be mess

Alerting
Monitoring is not enough, need better Alerting

Alerts
http://hostname:port/jmx?
qry=Hadoop:service=NameNode,name=NameNodeInfo
>> Checking whether NN and JT are up is a no brainer
>> Reduce alert noise by having summary/aggregate alerts
>> We heavily rely on custom scripts that query /jmx for NN and JT
qry=hadoop:service=JobTracker,name=JobTrackerInfo
NameDirStatuses, DeadNodes, NumberOfMissingBlocks ,
qry=Hadoop:service=NameNode,name=FSNamesystemState
FSState , CapacityRemaining , NumDeadDataNodes , UnderReplicatedBlocks
Blacklisted TT’s , #jobs , #slots_used , ThreadCount ,
qry=java.lang:type=Memory"
Used jvm , free jvm etc

MR Workload Alerting
» Monitoring MR workload and alert
– In-house tool that use “houdah” ruby gem monitors
– Long running jobs , jobs with more map tasks , blacklisted
TT’s with more failure counts etc…
» Collect details and auto-restart blacklisted TT’s
» Parse the JT logfile for rouge jobs.
» Parse the JT log and collects all Job related info
» White-elephant or hraven could help
» Parse the scheduler html page or use metrics page
http://<JT-hostname>:50030/scheduler?advanced
http://<JT-hostname>:50030/metrics

Modeling
OPS
ETL
Ad-hoc
Multi Tenancy

Multi Tenancy
» create separate Queues
» Enable ACL’s for queues
» limit no. of jobs per user and per queue
» set pre-emption timeouts based on priority
» set weight based on priority

No Scheduler is perfect unless you understand and tune it properly
Scheduling

BCP
» BCP  Business Continuity Plan
» Near real time reporting over 15+ TB of daily data
» Freshness of models trained over petabytes of data

Data BCP Cluster
INW
Data
Cluster
US
Serving
Clusters
EU
Serving
Clusters
HK
Serving
Clusters
Modeling
Repor
ting
User
Queries
Amazon Backup
LSV
Data
Cluster
US/EU/HK
Serving
Clusters
Research
Ad-hoc
Queries
Processed Data

YARN
» Resource Manager
- Global resource scheduler
- Hierarchical queues
- Application management
» Node Manager
- Per-machine agent
- Manages life cycle of container
- Container resource monitoring
» Application Master
- Per-application
- Manages application scheduling and
task execution

YARN at Rocket FueI
» Yarn in production
» 1000+ nodes
» 51TB RAM , 123K disks , 123K cores
» Primary use case Map-Reduce
» HBase on Yarn
» Tez , Spark , Storm are in race

We Are Hiring!

THANKS
kishore@rocketfuel.com

Big data summit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Big data summit

Similar to Big data summit (20)

Recently uploaded

Recently uploaded (20)

Big data summit

Editor's Notes