SlideShare a Scribd company logo
1 of 41
Proprietary & Confidential. Copyright © 2014.
Hadoop Operations @ Rocket Fuel
We’re Hiring
rocketfuel.com/careers
Kishore Kumar Yellamraju
Proprietary & Confidential. Copyright © 2014.
The Web Is Monetized By Advertising
Proprietary & Confidential. Copyright © 2014.
Delivery Methods
»Display
»Video
»Mobile
»Social
Proprietary & Confidential. Copyright © 2014.
6. Ad
Served
User
Segment
s
3. Bid
Reques
t
Overview
Publishers
2. Ad
Request
1. Page
Request
4. Bid &
Ad
User
Engagement
s
Data
Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Model
s
Refresh
learning
Data
Store
Ads &
Budget
Model
Scores
Events
5.
Rocketfuel
Winning Ad
Proprietary & Confidential. Copyright © 2014.
1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
$2.78
$0.586
$2.009
1.25
$2.11
$1.26
$2.78
$1.56
$0.00
[ + ][ + ]
Site/PageGeo/WeatherTime of DayBrand AffinityUser
Always buying the best impressions & serving the best ad
Real Time Bidding and Serving
Proprietary & Confidential. Copyright © 2014.
Goal:
Leads
& sales
Goal:
Coupon
downloads
Goal:
Brand
awareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-market
Behavior
Response
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
X
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
+100
+40
-20
+20
+15
+10
+40
+35
+9.7%
+40
-70
-20
+10
+15
-25
-40
-18
+0.7%
+10
-10
-20
+20
+10
-35
-25
+10
+1.4%
Real Time Bidding and Serving
X
Proprietary & Confidential. Copyright © 2014.
6. Ad
Served
User
Segment
s
3. Bid
Reques
t
Overview
Publishers
2. Ad
Request
1. Page
Request
4. Bid &
Ad
User
Engagement
s
Data
Partners
Advertisers
Browser
Some Exchange Partners
Ad Exchange
Optimize
Rocket Fuel Platform
Real-time Bidder
Automated Decisions
Model
s
Refresh
learning
Data
Store
Ads &
Budget
Model
Scores
Events
5.
Rocketfuel
Winning Ad
Proprietary & Confidential. Copyright © 2014.
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
Throughput
Proprietary & Confidential. Copyright © 2014.
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbird's wing
Look up in Blackbird
Time (ms)
Latency
Proprietary & Confidential. Copyright © 2014.
Architecture and Scale
»Datacenters
»Scale
»Growth
»Architecture
Proprietary & Confidential. Copyright © 2014.
Data Center Expansion
»abc
Proprietary & Confidential. Copyright © 2014.
Data Center Design
• Racks custom built at Rocket Fuel
• Leased space/bandwidth in colocation facilities
Hadoop Server
24 2U servers (8.5kW)
Bidders
40 2-U Twin 2 servers (17kW)
Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
»34,474 CPU processor cores
–2655 servers
–187.4 Teraflops of computing
»188 Terabytes of memory
–13X the memory of IBM computer Watson that
played Jeopardy
»42PB Petabytes of storage
–106X the data volume of the entire Library of
Congress
Proprietary & Confidential. Copyright © 2014.
Hadoop at Rocket Fuel
»1400 servers
»15K Disks
»15K Cores
»90 TB
»30K MR slots
»12K daily MR jobs
Proprietary & Confidential. Copyright © 2014.
200 Servers 1400 Servers
5 PB
41 PB
8x
Growth
Proprietary & Confidential. Copyright © 2014.
Data Architecture 3.0
Proprietary & Confidential. Copyright © 2014.
Batch and Real Time Pipelines
Webservers
STORM
MySq
l
Zookeeper
Proprietary & Confidential. Copyright © 2014.
Hadoop Setup
QJM ZK Quorum
» 6x2TB Disks
» 2x6 core
» 196 GB RAM
» 2x1G NIC
» 12x3TB Disks
» 2x6 core
» 64 GB RAM
» 10G NIC
» same as DN’s
» Dedicated disk
to ZK or JN
JT
Standby NN
ZKFCZKFC
Active NN
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT
DN
TT
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
Puppet
+
Infradb
Automation is key
Maintenance is Not Easy
Proprietary & Confidential. Copyright © 2014.
Puppet and Infradb
» Automate as much as you can
» Adding a slave node to Hadoop cluster < 120 seconds
» Bringing up a new Hadoop cluster < 500 seconds
» MR slots are automatically determined based on hardware config
Isn’t it cool ?
Just define once
Proprietary & Confidential. Copyright © 2014.
No issues when cluster is small Problems starts when it grows
Performance Tuning
Proprietary & Confidential. Copyright © 2014.
dfs.namenode.handler.count
dfs.image.transfer.timeout
mapred.reduce.parallel.copies
mapred.job.tracker.handler.count
io.sort.mbio.sort.factor
maxClientCnxns
ZK :
HDFS :
MR :
IMP : MAPREDUCE-2026
-XX:+UseConcMarkSweepGC
-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=60
ha.*-timeout.ms
JVM:
Performance Tuning
mapreduce.reduce.shuffle.parallelcopies
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
Monitoring
Wall of Ops
Proprietary & Confidential. Copyright © 2014.
Monitoring
hadoop.namenode.CallQueueLength hadoop.jobtracker.jvm.memheapusedm
Don’t fly blind, you will crash!
Proprietary & Confidential. Copyright © 2014.
MR Workload Monitoring
Proprietary & Confidential. Copyright © 2014.
Network Monitoring
Don’t blame network, instead monitor it Network Mesh can be mess
Proprietary & Confidential. Copyright © 2014.
Alerting
Monitoring is not enough, need better Alerting
Proprietary & Confidential. Copyright © 2014.
Alerts
http://hostname:port/jmx?
qry=Hadoop:service=NameNode,name=NameNodeInfo
>> Checking whether NN and JT are up is a no brainer
>> Reduce alert noise by having summary/aggregate alerts
>> We heavily rely on custom scripts that query /jmx for NN and JT
qry=hadoop:service=JobTracker,name=JobTrackerInfo
NameDirStatuses, DeadNodes, NumberOfMissingBlocks ,
qry=Hadoop:service=NameNode,name=FSNamesystemState
FSState , CapacityRemaining , NumDeadDataNodes , UnderReplicatedBlocks
Blacklisted TT’s , #jobs , #slots_used , ThreadCount ,
qry=java.lang:type=Memory"
Used jvm , free jvm etc
Proprietary & Confidential. Copyright © 2014.
MR Workload Alerting
» Monitoring MR workload and alert
– In-house tool that use “houdah” ruby gem monitors
– Long running jobs , jobs with more map tasks , blacklisted
TT’s with more failure counts etc…
» Collect details and auto-restart blacklisted TT’s
» Parse the JT logfile for rouge jobs.
» Parse the JT log and collects all Job related info
» White-elephant or hraven could help
» Parse the scheduler html page or use metrics page
http://<JT-hostname>:50030/scheduler?advanced
http://<JT-hostname>:50030/metrics
Proprietary & Confidential. Copyright © 2014.
Modeling
OPS
ETL
Ad-hoc
Multi Tenancy
Proprietary & Confidential. Copyright © 2014.
Multi Tenancy
» create separate Queues
» Enable ACL’s for queues
» limit no. of jobs per user and per queue
» set pre-emption timeouts based on priority
» set weight based on priority
Proprietary & Confidential. Copyright © 2014.
No Scheduler is perfect unless you understand and tune it properly
Scheduling
Proprietary & Confidential. Copyright © 2014.
Operations
» Maintenance
» Performance Tuning
» Monitoring
» BCP
» YARN
Proprietary & Confidential. Copyright © 2014.
BCP
» BCP  Business Continuity Plan
» Near real time reporting over 15+ TB of daily data
» Freshness of models trained over petabytes of data
Proprietary & Confidential. Copyright © 2014.
Data BCP Cluster
INW
Data
Cluster
US
Serving
Clusters
EU
Serving
Clusters
HK
Serving
Clusters
Modeling
Repor
ting
User
Queries
Amazon Backup
LSV
Data
Cluster
US/EU/HK
Serving
Clusters
Research
Ad-hoc
Queries
Processed Data
Proprietary & Confidential. Copyright © 2014.
YARN
» Resource Manager
- Global resource scheduler
- Hierarchical queues
- Application management
» Node Manager
- Per-machine agent
- Manages life cycle of container
- Container resource monitoring
» Application Master
- Per-application
- Manages application scheduling and
task execution
Proprietary & Confidential. Copyright © 2014.
YARN at Rocket FueI
» Yarn in production
» 1000+ nodes
» 51TB RAM , 123K disks , 123K cores
» Primary use case Map-Reduce
» HBase on Yarn
» Tez , Spark , Storm are in race
Proprietary & Confidential. Copyright © 2014.
We Are Hiring!
Proprietary & Confidential. Copyright © 2014.
THANKS
kishore@rocketfuel.com

More Related Content

What's hot

Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
DataWorks Summit
 

What's hot (20)

Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Optimizing training on Apache MXNet
Optimizing training on Apache MXNetOptimizing training on Apache MXNet
Optimizing training on Apache MXNet
 
Advanced Scheduling with Amazon ECS (September 2017)
Advanced Scheduling with Amazon ECS (September 2017)Advanced Scheduling with Amazon ECS (September 2017)
Advanced Scheduling with Amazon ECS (September 2017)
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Running BSD on AWS
Running BSD on AWSRunning BSD on AWS
Running BSD on AWS
 
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)
 
HBase New Features
HBase New FeaturesHBase New Features
HBase New Features
 
10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
20200422 AWS Black Belt Online Seminar Amazon Elastic Container Service (Amaz...
20200422 AWS Black Belt Online Seminar Amazon Elastic Container Service (Amaz...20200422 AWS Black Belt Online Seminar Amazon Elastic Container Service (Amaz...
20200422 AWS Black Belt Online Seminar Amazon Elastic Container Service (Amaz...
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdf
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャリアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
 
Develop and deploy using Hybrid Cloud Strategies confoo2012
Develop and deploy using Hybrid Cloud Strategies confoo2012Develop and deploy using Hybrid Cloud Strategies confoo2012
Develop and deploy using Hybrid Cloud Strategies confoo2012
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 

Viewers also liked

Viewers also liked (10)

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
Shanghai Breakout: Location Analytics – Key Considerations and Use Cases
Shanghai Breakout: Location Analytics – Key Considerations and Use CasesShanghai Breakout: Location Analytics – Key Considerations and Use Cases
Shanghai Breakout: Location Analytics – Key Considerations and Use Cases
 
Big Data and E-Commerce
Big Data and E-CommerceBig Data and E-Commerce
Big Data and E-Commerce
 
Big Data Scotland 2016
Big Data Scotland 2016Big Data Scotland 2016
Big Data Scotland 2016
 
Predictive Analytics in Retail
Predictive Analytics in RetailPredictive Analytics in Retail
Predictive Analytics in Retail
 
Big Data Airline Project at UAEU
Big Data Airline Project at UAEUBig Data Airline Project at UAEU
Big Data Airline Project at UAEU
 
Predictive lead-generation 2017-gentsch
Predictive lead-generation 2017-gentschPredictive lead-generation 2017-gentsch
Predictive lead-generation 2017-gentsch
 
Merging Online & Offline to Deliver Omni-Channel Experiences, Performics & Ko...
Merging Online & Offline to Deliver Omni-Channel Experiences, Performics & Ko...Merging Online & Offline to Deliver Omni-Channel Experiences, Performics & Ko...
Merging Online & Offline to Deliver Omni-Channel Experiences, Performics & Ko...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 

Similar to Big data summit

How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?
DataWorks Summit
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
 

Similar to Big data summit (20)

Hado"ops" or Had"oops"
Hado"ops" or Had"oops"Hado"ops" or Had"oops"
Hado"ops" or Had"oops"
 
Dawn of YARN @ Rocket Fuel
Dawn of YARN @ Rocket FuelDawn of YARN @ Rocket Fuel
Dawn of YARN @ Rocket Fuel
 
How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?How did you know this ad would be relevant for me?
How did you know this ad would be relevant for me?
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
 
141106 actifio overview
141106 actifio overview 141106 actifio overview
141106 actifio overview
 
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Brian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time biddingBrian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time bidding
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 

Recently uploaded

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
hublikarsn
 

Recently uploaded (20)

Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
 

Big data summit

  • 1. Proprietary & Confidential. Copyright © 2014. Hadoop Operations @ Rocket Fuel We’re Hiring rocketfuel.com/careers Kishore Kumar Yellamraju
  • 2. Proprietary & Confidential. Copyright © 2014. The Web Is Monetized By Advertising
  • 3. Proprietary & Confidential. Copyright © 2014. Delivery Methods »Display »Video »Mobile »Social
  • 4. Proprietary & Confidential. Copyright © 2014. 6. Ad Served User Segment s 3. Bid Reques t Overview Publishers 2. Ad Request 1. Page Request 4. Bid & Ad User Engagement s Data Partners Advertisers Browser Some Exchange Partners Ad Exchange Optimize Rocket Fuel Platform Real-time Bidder Automated Decisions Model s Refresh learning Data Store Ads & Budget Model Scores Events 5. Rocketfuel Winning Ad
  • 5. Proprietary & Confidential. Copyright © 2014. 1.25 $2.11 $1.26 $2.78 $1.256 $1.809 $2.42 1.25 $2.11 $1.26 $2.78 $0.586 $2.009 1.25 $2.11 $1.26 $2.78 $1.56 $0.00 [ + ][ + ] Site/PageGeo/WeatherTime of DayBrand AffinityUser Always buying the best impressions & serving the best ad Real Time Bidding and Serving
  • 6. Proprietary & Confidential. Copyright © 2014. Goal: Leads & sales Goal: Coupon downloads Goal: Brand awareness Site/PageGeo/WeatherTime of DayBrand AffinityDemo Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-market Behavior Response Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response X Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response +100 +40 -20 +20 +15 +10 +40 +35 +9.7% +40 -70 -20 +10 +15 -25 -40 -18 +0.7% +10 -10 -20 +20 +10 -35 -25 +10 +1.4% Real Time Bidding and Serving X
  • 7. Proprietary & Confidential. Copyright © 2014. 6. Ad Served User Segment s 3. Bid Reques t Overview Publishers 2. Ad Request 1. Page Request 4. Bid & Ad User Engagement s Data Partners Advertisers Browser Some Exchange Partners Ad Exchange Optimize Rocket Fuel Platform Real-time Bidder Automated Decisions Model s Refresh learning Data Store Ads & Budget Model Scores Events 5. Rocketfuel Winning Ad
  • 8. Proprietary & Confidential. Copyright © 2014. 5 B 6 B 45 B Facebook likes Searches on Google Bid Requests Considered by Rocketfuel Requests per day Throughput
  • 9. Proprietary & Confidential. Copyright © 2014. 400 100 20 2 Blink of an eye SF to Tokyo network round trip One beat of a hummindbird's wing Look up in Blackbird Time (ms) Latency
  • 10. Proprietary & Confidential. Copyright © 2014. Architecture and Scale »Datacenters »Scale »Growth »Architecture
  • 11. Proprietary & Confidential. Copyright © 2014. Data Center Expansion »abc
  • 12. Proprietary & Confidential. Copyright © 2014. Data Center Design • Racks custom built at Rocket Fuel • Leased space/bandwidth in colocation facilities Hadoop Server 24 2U servers (8.5kW) Bidders 40 2-U Twin 2 servers (17kW)
  • 13. Proprietary & Confidential. Copyright © 2014. Rocket Fuel Scale »34,474 CPU processor cores –2655 servers –187.4 Teraflops of computing »188 Terabytes of memory –13X the memory of IBM computer Watson that played Jeopardy »42PB Petabytes of storage –106X the data volume of the entire Library of Congress
  • 14. Proprietary & Confidential. Copyright © 2014. Hadoop at Rocket Fuel »1400 servers »15K Disks »15K Cores »90 TB »30K MR slots »12K daily MR jobs
  • 15. Proprietary & Confidential. Copyright © 2014. 200 Servers 1400 Servers 5 PB 41 PB 8x Growth
  • 16. Proprietary & Confidential. Copyright © 2014. Data Architecture 3.0
  • 17. Proprietary & Confidential. Copyright © 2014. Batch and Real Time Pipelines Webservers STORM MySq l Zookeeper
  • 18. Proprietary & Confidential. Copyright © 2014. Hadoop Setup QJM ZK Quorum » 6x2TB Disks » 2x6 core » 196 GB RAM » 2x1G NIC » 12x3TB Disks » 2x6 core » 64 GB RAM » 10G NIC » same as DN’s » Dedicated disk to ZK or JN JT Standby NN ZKFCZKFC Active NN DN TT DN TT DN TT DN TT DN TT DN TT
  • 19. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 20. Proprietary & Confidential. Copyright © 2014. Puppet + Infradb Automation is key Maintenance is Not Easy
  • 21. Proprietary & Confidential. Copyright © 2014. Puppet and Infradb » Automate as much as you can » Adding a slave node to Hadoop cluster < 120 seconds » Bringing up a new Hadoop cluster < 500 seconds » MR slots are automatically determined based on hardware config Isn’t it cool ? Just define once
  • 22. Proprietary & Confidential. Copyright © 2014. No issues when cluster is small Problems starts when it grows Performance Tuning
  • 23. Proprietary & Confidential. Copyright © 2014. dfs.namenode.handler.count dfs.image.transfer.timeout mapred.reduce.parallel.copies mapred.job.tracker.handler.count io.sort.mbio.sort.factor maxClientCnxns ZK : HDFS : MR : IMP : MAPREDUCE-2026 -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSInitiatingOccupancyFraction=60 ha.*-timeout.ms JVM: Performance Tuning mapreduce.reduce.shuffle.parallelcopies
  • 24. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 25. Proprietary & Confidential. Copyright © 2014. Monitoring Wall of Ops
  • 26. Proprietary & Confidential. Copyright © 2014. Monitoring hadoop.namenode.CallQueueLength hadoop.jobtracker.jvm.memheapusedm Don’t fly blind, you will crash!
  • 27. Proprietary & Confidential. Copyright © 2014. MR Workload Monitoring
  • 28. Proprietary & Confidential. Copyright © 2014. Network Monitoring Don’t blame network, instead monitor it Network Mesh can be mess
  • 29. Proprietary & Confidential. Copyright © 2014. Alerting Monitoring is not enough, need better Alerting
  • 30. Proprietary & Confidential. Copyright © 2014. Alerts http://hostname:port/jmx? qry=Hadoop:service=NameNode,name=NameNodeInfo >> Checking whether NN and JT are up is a no brainer >> Reduce alert noise by having summary/aggregate alerts >> We heavily rely on custom scripts that query /jmx for NN and JT qry=hadoop:service=JobTracker,name=JobTrackerInfo NameDirStatuses, DeadNodes, NumberOfMissingBlocks , qry=Hadoop:service=NameNode,name=FSNamesystemState FSState , CapacityRemaining , NumDeadDataNodes , UnderReplicatedBlocks Blacklisted TT’s , #jobs , #slots_used , ThreadCount , qry=java.lang:type=Memory" Used jvm , free jvm etc
  • 31. Proprietary & Confidential. Copyright © 2014. MR Workload Alerting » Monitoring MR workload and alert – In-house tool that use “houdah” ruby gem monitors – Long running jobs , jobs with more map tasks , blacklisted TT’s with more failure counts etc… » Collect details and auto-restart blacklisted TT’s » Parse the JT logfile for rouge jobs. » Parse the JT log and collects all Job related info » White-elephant or hraven could help » Parse the scheduler html page or use metrics page http://<JT-hostname>:50030/scheduler?advanced http://<JT-hostname>:50030/metrics
  • 32. Proprietary & Confidential. Copyright © 2014. Modeling OPS ETL Ad-hoc Multi Tenancy
  • 33. Proprietary & Confidential. Copyright © 2014. Multi Tenancy » create separate Queues » Enable ACL’s for queues » limit no. of jobs per user and per queue » set pre-emption timeouts based on priority » set weight based on priority
  • 34. Proprietary & Confidential. Copyright © 2014. No Scheduler is perfect unless you understand and tune it properly Scheduling
  • 35. Proprietary & Confidential. Copyright © 2014. Operations » Maintenance » Performance Tuning » Monitoring » BCP » YARN
  • 36. Proprietary & Confidential. Copyright © 2014. BCP » BCP  Business Continuity Plan » Near real time reporting over 15+ TB of daily data » Freshness of models trained over petabytes of data
  • 37. Proprietary & Confidential. Copyright © 2014. Data BCP Cluster INW Data Cluster US Serving Clusters EU Serving Clusters HK Serving Clusters Modeling Repor ting User Queries Amazon Backup LSV Data Cluster US/EU/HK Serving Clusters Research Ad-hoc Queries Processed Data
  • 38. Proprietary & Confidential. Copyright © 2014. YARN » Resource Manager - Global resource scheduler - Hierarchical queues - Application management » Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring » Application Master - Per-application - Manages application scheduling and task execution
  • 39. Proprietary & Confidential. Copyright © 2014. YARN at Rocket FueI » Yarn in production » 1000+ nodes » 51TB RAM , 123K disks , 123K cores » Primary use case Map-Reduce » HBase on Yarn » Tez , Spark , Storm are in race
  • 40. Proprietary & Confidential. Copyright © 2014. We Are Hiring!
  • 41. Proprietary & Confidential. Copyright © 2014. THANKS kishore@rocketfuel.com

Editor's Notes

  1. Most people have probably used IMDb, but they probably won’t use if if they have to pay.
  2. What we do: Display, video, mobile, social
  3. Loop 1-6 is 200ms, 3-4 is 100ms for RF We do this 45B times a day
  4. Real Time Auction Selecting the right ad for each auction
  5. Automatically learning from every response & getting better Nobody else is doing this as fast, precisely, consistently for our customers
  6. Loop 1-6 is 200ms, 3-4 is 100ms for RF We do this 45B times a day
  7. We are now in 8 data centers in the world
  8. We have optimized design of data centers as well. We custom design our racks, get servers assembled, racked and tested in a California facility. Then, ship to the data center. This is what we do not just for US data centers but also for data centers in Europe or Asia. Each rack can be 1500 lb or more and many racks are sent by air for initially install. Now, let’s look at the two kinds of racks shown above: Hadoop Server (the full racks) :L Data (Hadoop) servers are bigger as they have 12X3TB drives and 20 servers fill the whole rack. Bidders: Bidders have lot of cores but take less space because they have only 2 2.5” drives each. 40 servers fill up half the rack but we run out of switch ports. And, this is 5% of Rocket Fuel
  9. Just say “We have amazing scale” – let the numbers speak for themselves.
  10. Managing Hadoop cluster is not easy Start early We are heavy users of puppet Infradb is similar what puppet hiera but infradb was written in house 4 yrs ago. -> puppet and infradb are tightly integrated We use puppet and infradb to make maintenance easy Infradb helps us populate hadoop property values based on hardware config we have. For ex: Our fairshare slot distribution is automatically handled by infradb whenever we add new nodes.
  11. -> here is an example , we define the formula to decide no. of MR slots per server based on mem , cpu , disks -> we always want to have homogenous hardware for easy maintenance and planning , but it is impossible since “need changes with time” -> automation like this will let you not about having heterogeneous servers. -> not just configuration, we use infradb to define alerts once and all the newly added hosts and clusters will be automatically monitored by our nagios.
  12. A typical hadoop problem, you start with small cluster and want to grow. Hadoop default properties works well on small clusters , the problem starts when your cluster grows. Problem will be big when it happens on large clusters Aren’t we suppose to get better performance after adding nodes ?
  13. -> too many properties to change -> be careful when tuning any changes -> have metrics to compare pre and post changes. -> MAPREDUCE-2026 : JobTracker.getJobCounter() will lock JobTracker and call JobInProgress.getCounters(). JobInProgress.getCounters() can be very expensive because it aggregates all the task counters. We found that from the JobTracker jstacks that this method is one of the bottleneck of the JobTracker performance.
  14. -> any HDFS heavy job can impact your HDFS performance , you will not realize unless you monitor the metrics -> don’t let any engineer impact your cluster.
  15. -> monitoring your applications is not enough when you are running on a scale in multiple datacenters across world -> we should monitor the network mesh
  16. -> find out bad queries immediately and kill them before they impact cluster -> don’t loose your capacity due to mass tasktracker blacklisting by a single job. -> long running jobs should be killed . No point in letting them run.
  17. -> understand the workload on your cluster to better tune the scheduler properties ->whenever you add more nodes, the slot distribution should be automatically distributed to different queues. -> no scheduler is perfect unless you understand and tune it -> have ACL’s in place , don’t let any one engineer impact your MR workload. -> have an proper accounting for teams who use more MR capacity.
  18. How we operated for initial few years Recently added DATA BCP (Business Continuity Plan) Latency critical and important data goes both places Other data after processing Make use of BCP cluster to do meaningful things until disaster happens