Submit Search
Upload
How Klout migrated from CDH3 to CDH4 …and survived to tell about it
•
1 like
•
583 views
I
Ian Kallen
Follow
A short talk on Klout's journey from cdh3 to cdh4
Read less
Read more
Technology
Report
Share
Report
Share
1 of 13
Download now
Download to read offline
Recommended
Red hat openstack and ceph meetup, pune 28th november 2015
Red hat openstack and ceph meetup, pune 28th november 2015
Vikhyat Umrao
OpenStack Storage Buddy Ceph
OpenStack Storage Buddy Ceph
openstackindia
From airflow to google cloud composer
From airflow to google cloud composer
Bruce Kuo
Kolla Onboarding (Vancouver 2018)
Kolla Onboarding (Vancouver 2018)
Paul Bourke
Kolla Project Update (Vancouver 2018)
Kolla Project Update (Vancouver 2018)
Paul Bourke
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
Iftach Schonbaum
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
GCP CloudRun Overview
GCP CloudRun Overview
Oliver Fierro
Recommended
Red hat openstack and ceph meetup, pune 28th november 2015
Red hat openstack and ceph meetup, pune 28th november 2015
Vikhyat Umrao
OpenStack Storage Buddy Ceph
OpenStack Storage Buddy Ceph
openstackindia
From airflow to google cloud composer
From airflow to google cloud composer
Bruce Kuo
Kolla Onboarding (Vancouver 2018)
Kolla Onboarding (Vancouver 2018)
Paul Bourke
Kolla Project Update (Vancouver 2018)
Kolla Project Update (Vancouver 2018)
Paul Bourke
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
Iftach Schonbaum
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
GCP CloudRun Overview
GCP CloudRun Overview
Oliver Fierro
Apache Airflow overview
Apache Airflow overview
NikolayGrishchenkov
Airflow introduction
Airflow introduction
Chandler Huang
Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud Functions
Malepati Bala Siva Sai Akhil
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
Derrick Qin
Airflow presentation
Airflow presentation
Anant Corporation
Airflow presentation
Airflow presentation
Ilias Okacha
Apache Airflow
Apache Airflow
Sumit Maheshwari
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
Apache Airflow at Dailymotion
Apache Airflow at Dailymotion
Germain Tanguy
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
Arunvel Sriram
Introduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
Airflow at WePay
Airflow at WePay
Chris Riccomini
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
Airflow - a data flow engine
Airflow - a data flow engine
Walter Liu
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Itai Yaffe
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
InfluxData
Nyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talk
Adhita Selvaraj
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Kaxil Naik
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
Apache Airflow
Apache Airflow
Knoldus Inc.
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Gonzalo Marcos Ansoain
Clearing Airflow Obstructions
Clearing Airflow Obstructions
Tatiana Al-Chueyr
More Related Content
What's hot
Apache Airflow overview
Apache Airflow overview
NikolayGrishchenkov
Airflow introduction
Airflow introduction
Chandler Huang
Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud Functions
Malepati Bala Siva Sai Akhil
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
Derrick Qin
Airflow presentation
Airflow presentation
Anant Corporation
Airflow presentation
Airflow presentation
Ilias Okacha
Apache Airflow
Apache Airflow
Sumit Maheshwari
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
Apache Airflow at Dailymotion
Apache Airflow at Dailymotion
Germain Tanguy
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
Arunvel Sriram
Introduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
Airflow at WePay
Airflow at WePay
Chris Riccomini
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
Airflow - a data flow engine
Airflow - a data flow engine
Walter Liu
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Itai Yaffe
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
InfluxData
Nyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talk
Adhita Selvaraj
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Kaxil Naik
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
Apache Airflow
Apache Airflow
Knoldus Inc.
What's hot
(20)
Apache Airflow overview
Apache Airflow overview
Airflow introduction
Airflow introduction
Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud Functions
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
Airflow presentation
Airflow presentation
Airflow presentation
Airflow presentation
Apache Airflow
Apache Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Apache Airflow at Dailymotion
Apache Airflow at Dailymotion
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
Introduction to Apache Airflow
Introduction to Apache Airflow
Airflow at WePay
Airflow at WePay
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Airflow - a data flow engine
Airflow - a data flow engine
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Nyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talk
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Apache Airflow
Apache Airflow
Similar to How Klout migrated from CDH3 to CDH4 …and survived to tell about it
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Gonzalo Marcos Ansoain
Clearing Airflow Obstructions
Clearing Airflow Obstructions
Tatiana Al-Chueyr
Getting more into GCP.pdf
Getting more into GCP.pdf
Knoldus Inc.
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
aspyker
Building a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at Netflix
All Things Open
Task migration using CRIU
Task migration using CRIU
Rohit Jnagal
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
MayaData Inc
Head in the clouds @ bol.com
Head in the clouds @ bol.com
Maarten Dirkse
Promise of DevOps
Promise of DevOps
Juraj Hantak
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Noam Elfanbaum
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
LibbySchulze
OpenFlow @ Google
OpenFlow @ Google
Open Networking Summits
Monitoring with Clickhouse
Monitoring with Clickhouse
unicast
Data ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson
Berlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWS
Cristian Măgherușan-Stanciu
Upcoming features in Airflow 2
Upcoming features in Airflow 2
Kaxil Naik
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Ravindra Singh
Similar to How Klout migrated from CDH3 to CDH4 …and survived to tell about it
(20)
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Clearing Airflow Obstructions
Clearing Airflow Obstructions
Getting more into GCP.pdf
Getting more into GCP.pdf
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
Building a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at Netflix
Task migration using CRIU
Task migration using CRIU
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Head in the clouds @ bol.com
Head in the clouds @ bol.com
Promise of DevOps
Promise of DevOps
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
OpenFlow @ Google
OpenFlow @ Google
Monitoring with Clickhouse
Monitoring with Clickhouse
Data ops in practice - Swedish style
Data ops in practice - Swedish style
Berlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWS
Upcoming features in Airflow 2
Upcoming features in Airflow 2
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Recently uploaded
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
iSEO AI
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Skynet Technologies
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
Safe Software
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
Stephen Perrenod
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
IES VE
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
Samy Fodil
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
Memoori
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
MasterG
Overview of Hyperledger Foundation
Overview of Hyperledger Foundation
Hyperleger Tokyo Meetup
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
FIDO Alliance
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
AI mind or machine power point presentation
AI mind or machine power point presentation
yogeshlabana357357
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
jbellis
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
Patrick Viafore
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
marcuskenyatta275
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
Recently uploaded
(20)
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Overview of Hyperledger Foundation
Overview of Hyperledger Foundation
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
AI mind or machine power point presentation
AI mind or machine power point presentation
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
How Klout migrated from CDH3 to CDH4 …and survived to tell about it
1.
How Klout migrated from
CDH3 to CDH4 …and survived to tell about it Large Scale Production Engineering Meetup September 19, 2013 Ian Kallen Lead Engineer, Klout © 2013 Klout
2.
About Klout ● recognizing
& rewarding online influence ● major social network activity signals ● Facebook, Twitter, Google+, LinkedIn, 4sq ● billions data points consumed & processed ● pipelines update scores & topics ● hive & oozie driven jobs & workflow © 2013 Klout
3.
By The Numbers ●
2 TB data intake, 200 TB processed daily ● jobs clusters x 2 (dev/staging + production) ● hbase x 6 (dev/staging + production x 5) ● hbase: 350M req/day, 17K req/sec peak ● jobs, hbase & zookeeper total =~ 350 hosts © 2013 Klout
4.
● pipelines instable,
slow on cdh3 (v0.20.2) ● HBase performance predictability ● old hive version limited pipeline developers ● cdh3 EOL’d 6/2013 ● cdh4 (v2.0.x) supports NN H/A, impala ● more shiney things Motivations © 2013 Klout
5.
The Environment ● data
center hosted ● I/O subsystems are under our control ● network latencies are under our control ● FAQ: Why not AWS? ● saved millions of dollars last year ● that's a lot of beer money. ● elasticity need is low, but... © 2013 Klout
6.
● this is
super easy on AWS ● bring up a replacement cluster ● double-write or migrate data to replacement ● tear down old cluster ● have a celebratory drink ● if you have any beer money left Cloud Envy © 2013 Klout
7.
● nagios, pager
duty for monitoring ● monit for process watchdogging ● jmx+, graphite, gdash+ for metrics ● ubuntu boot images for provisioning ● puppet for configuration management ● … no Cloudera Manager Ops Infra © 2013 Klout
8.
● no replacement
infra to migrate to ○ so upgrades must be done in place ● Cloudera's prefers Cloudera Manager ○ so we were on our own to devise a plan ● Cloudera helped vet our plan (thanks!) ● confidence building on dev/staging clusters ● lots of rehearsals on VM's, bug reports Making Plans © 2013 Klout
9.
● detailed checklists,
kanban board ● small test clusters, the dev clusters ● planned SLA miss for prod cluster upgrade ● lined up phone consult availability w/Cloudera ○ we needed it about 10 hours into prod jobs cluster ● nobody died Execution © 2013 Klout
10.
● jobs run
faster (speculative execution?) ● pipelines are faster ● metrics exposed are improved ● HBase clusters lose block locality in transit ○ fixable ● no animals were harmed in this production Aftermath © 2013 Klout
11.
● we had
many post-mortems along the way ● lots of engineering time & attention ● sweating the details paid off ● mostly because we’re “power users” of hive ● lessons learned: ○ re-align clusters ○ improve use of vendor tools where possible ■ e.g. Cloudera Manager Retrospect © 2013 Klout
12.
● dev/staging +
prod clusters x 2 ● better use of HDFS paths & job scheduling ● consolidating zookeeper ensembles ● implementing NameNode H/A ● evaluating Cloudera Manager ● evaluating Impala (maybe) Onward © 2013 Klout
13.
Klout is hiring
awesome people passionate about optimizing for innovation & stability, crunching big data & robust systems If you are a great Hadoop DevOps Engineer Join Us! ian@klout.com Thanks! Gratuitous Recruiting Slide © 2013 Klout
Download now