SlideShare a Scribd company logo
Hadoop Applications at Facebook


Jeff Hammerbacher
Manager, Data
May 28 - 29, 2008
Initial Hadoop Deployment
▪   Tested in mid-2006: not great performance, small community
▪   Already had Cheetah and another Hadoop-like project underway
▪   Strong resistance to Java
▪   Early adopters: Yahoo!, Powerset, Quantcast, Last.fm
▪   First serious cluster: spring 2007
    ▪   Pulled sixty web server boxes and put 3 x 500 GB SATA disks in the back
    ▪   Loaded two separate log files: clickstream and activity logs
    ▪   Clickstream was nearly 600 GB per day, activity logs around 200 GB
    ▪   Lots of difficulties just getting data into the system
    ▪   All sorts of fun learning to operate the file system
Initial Hadoop Applications
Hadoop Streaming
▪   Almost all applications at Facebook use Hadoop Streaming
▪   Mapper and Reducer take inputs from a pipe and write outputs to a pipe
▪   Facebook users write in Python, PHP, C++ (though Pipes would be better)
▪   Allows for library reuse, faster development
▪   Eats way too much CPU
▪   More info: http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
Initial Hadoop Applications
Unstructured text analysis
▪   Intern asked to understand brand sentiment and influence
▪   First began by building an online language classifier for wall posts
▪   Ported application to Hadoop for offline processing
▪   Many tools for supporting his project had to be built
    ▪   Understanding serialization format of wall post logs
    ▪   Common data operations: project, filter, join, group by
    ▪   Developed using Hadoop streaming for rapid prototyping in Python
    ▪   Scheduling regular processing and recovering from failures
    ▪   Making it easy to regularly load new data
Lexicon
Initial Hadoop Applications
Lexicon: Future Directions
▪   Further segmentation and visualization of term intensities
    ▪   Age
    ▪   Gender
    ▪   Geography
▪   TF-IDF
▪   Topic modeling
▪   Sentiment analysis
▪   Augment with data sources from around the internet
Initial Hadoop Applications
Ensemble Learning
▪   Build a lot of Decision Trees and average them
    ▪   Random Forests are a combination of tree predictors such that each
        tree depends on the values of a random vector sampled independently
        and with the same distribution for all trees in the forest
    ▪   Can be used for regression or classification
    ▪   See “Random Forests” by Leo Breiman
More Hadoop Applications
Insights
▪   Monitor performance of your Facebook Ad, Page, Application
▪   Regular aggregation of high volumes of log file data
▪   First hourly pipelines
▪   Publish data back to a MySQL tier
▪   System currently only running partially on Hadoop
Insights
More Hadoop Applications
Platform Application Reputation Scoring
▪   Users complaining about being spammed by Platform applications
▪   Now, every Platform Application has a set of quotas
    ▪   Notifications
    ▪   News Feed story insertion
    ▪   Invitations
    ▪   Emails
▪   Quotas determined by calculating a “reputation score” for the
    application
Platform Application Reputation Scoring
More Hadoop Applications
Recommendation Engines and Affinity Scores
▪   People You May Know (PYMK)
▪   Other application areas
    ▪   Pages
    ▪   Applications
    ▪   News Feed
    ▪   Search
    ▪   Ads
    ▪   Chat
More Hadoop Applications
Miscellaneous
▪   Experimentation Platform back end
    ▪   A/B Testing
    ▪   Champion/Challenger Testing
▪   Lots of internal analyses
    ▪   Export smaller data sets to R
▪   Ad targeting optimization
▪   Search index building
▪   Load testing for new storage systems
▪   Language prediction for translation targeting
(c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

More Related Content

What's hot

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
AhmedDoukh
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
conline training
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
20080611accel
20080611accel20080611accel
20080611accel
Jeff Hammerbacher
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
Edureka!
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
HBaseCon
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
Dzung Nguyen
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
Edureka!
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Csaba Toth
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Atul Kushwaha
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
stratapps
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 

What's hot (20)

Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
20080611accel
20080611accel20080611accel
20080611accel
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 

Viewers also liked

Animal Discovery
Animal DiscoveryAnimal Discovery
Animal Discovery
guestf30baa
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
Jeff Hammerbacher
 
Kurzpräsentation Stendal
Kurzpräsentation StendalKurzpräsentation Stendal
Kurzpräsentation Stendalhsslide
 
Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02Crislaine Matozinhos
 
Lettya nologia impirica (SER)
Lettya nologia impirica (SER)Lettya nologia impirica (SER)
Lettya nologia impirica (SER)
Arthur Dellarubia
 

Viewers also liked (12)

Animal Discovery
Animal DiscoveryAnimal Discovery
Animal Discovery
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100418sos
20100418sos20100418sos
20100418sos
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
Kurzpräsentation Stendal
Kurzpräsentation StendalKurzpräsentation Stendal
Kurzpräsentation Stendal
 
Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02Violnciadomstica anagmeasandraalline-111213233321-phpapp02
Violnciadomstica anagmeasandraalline-111213233321-phpapp02
 
Lettya nologia impirica (SER)
Lettya nologia impirica (SER)Lettya nologia impirica (SER)
Lettya nologia impirica (SER)
 
El tranvia comentarios
El tranvia comentariosEl tranvia comentarios
El tranvia comentarios
 
Etica
EticaEtica
Etica
 
Quatro meses
Quatro mesesQuatro meses
Quatro meses
 
Berthold
BertholdBerthold
Berthold
 

Similar to 20080528dublinpt2

20080529dublinpt1
20080529dublinpt120080529dublinpt1
20080529dublinpt1
Jeff Hammerbacher
 
20081022cca
20081022cca20081022cca
20081022cca
Jeff Hammerbacher
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
guest5b1607
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
Subhas Kumar Ghosh
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
Vignesh Prajapati
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
Jeff Hammerbacher
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
Training Institute
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
George Long
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big DataSri Ambati
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
Pablo Godel
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
Adam Doyle
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
Rajat Mittal
 

Similar to 20080528dublinpt2 (20)

20080529dublinpt1
20080529dublinpt120080529dublinpt1
20080529dublinpt1
 
20081022cca
20081022cca20081022cca
20081022cca
 
Qcon
QconQcon
Qcon
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
20080528dublinpt1
20080528dublinpt120080528dublinpt1
20080528dublinpt1
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 

More from Jeff Hammerbacher

Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Jeff Hammerbacher
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
Jeff Hammerbacher
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
Jeff Hammerbacher
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
Jeff Hammerbacher
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
Jeff Hammerbacher
 

More from Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hdfs Dhruba
Hdfs DhrubaHdfs Dhruba
Hdfs Dhruba
 

Recently uploaded

Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
LR1709MUSIC
 
Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024
Kirill Klimov
 
Top mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptxTop mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptx
JeremyPeirce1
 
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Lviv Startup Club
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
Lital Barkan
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
RajPriye
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
narasimhamurthyh4
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
Corey Perlman, Social Media Speaker and Consultant
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Boris Ziegler
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Adam Smith
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...
Lviv Startup Club
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
creerey
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Holger Mueller
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
tanyjahb
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
agatadrynko
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
SOFTTECHHUB
 
Observation Lab PowerPoint Assignment for TEM 431
Observation Lab PowerPoint Assignment for TEM 431Observation Lab PowerPoint Assignment for TEM 431
Observation Lab PowerPoint Assignment for TEM 431
ecamare2
 

Recently uploaded (20)

Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
 
Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024
 
Top mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptxTop mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptx
 
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
 
Auditing study material for b.com final year students
Auditing study material for b.com final year  studentsAuditing study material for b.com final year  students
Auditing study material for b.com final year students
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
 
Observation Lab PowerPoint Assignment for TEM 431
Observation Lab PowerPoint Assignment for TEM 431Observation Lab PowerPoint Assignment for TEM 431
Observation Lab PowerPoint Assignment for TEM 431
 

20080528dublinpt2

  • 1.
  • 2. Hadoop Applications at Facebook Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  • 3. Initial Hadoop Deployment ▪ Tested in mid-2006: not great performance, small community ▪ Already had Cheetah and another Hadoop-like project underway ▪ Strong resistance to Java ▪ Early adopters: Yahoo!, Powerset, Quantcast, Last.fm ▪ First serious cluster: spring 2007 ▪ Pulled sixty web server boxes and put 3 x 500 GB SATA disks in the back ▪ Loaded two separate log files: clickstream and activity logs ▪ Clickstream was nearly 600 GB per day, activity logs around 200 GB ▪ Lots of difficulties just getting data into the system ▪ All sorts of fun learning to operate the file system
  • 4. Initial Hadoop Applications Hadoop Streaming ▪ Almost all applications at Facebook use Hadoop Streaming ▪ Mapper and Reducer take inputs from a pipe and write outputs to a pipe ▪ Facebook users write in Python, PHP, C++ (though Pipes would be better) ▪ Allows for library reuse, faster development ▪ Eats way too much CPU ▪ More info: http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
  • 5. Initial Hadoop Applications Unstructured text analysis ▪ Intern asked to understand brand sentiment and influence ▪ First began by building an online language classifier for wall posts ▪ Ported application to Hadoop for offline processing ▪ Many tools for supporting his project had to be built ▪ Understanding serialization format of wall post logs ▪ Common data operations: project, filter, join, group by ▪ Developed using Hadoop streaming for rapid prototyping in Python ▪ Scheduling regular processing and recovering from failures ▪ Making it easy to regularly load new data
  • 7. Initial Hadoop Applications Lexicon: Future Directions ▪ Further segmentation and visualization of term intensities ▪ Age ▪ Gender ▪ Geography ▪ TF-IDF ▪ Topic modeling ▪ Sentiment analysis ▪ Augment with data sources from around the internet
  • 8. Initial Hadoop Applications Ensemble Learning ▪ Build a lot of Decision Trees and average them ▪ Random Forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest ▪ Can be used for regression or classification ▪ See “Random Forests” by Leo Breiman
  • 9. More Hadoop Applications Insights ▪ Monitor performance of your Facebook Ad, Page, Application ▪ Regular aggregation of high volumes of log file data ▪ First hourly pipelines ▪ Publish data back to a MySQL tier ▪ System currently only running partially on Hadoop
  • 11. More Hadoop Applications Platform Application Reputation Scoring ▪ Users complaining about being spammed by Platform applications ▪ Now, every Platform Application has a set of quotas ▪ Notifications ▪ News Feed story insertion ▪ Invitations ▪ Emails ▪ Quotas determined by calculating a “reputation score” for the application
  • 13. More Hadoop Applications Recommendation Engines and Affinity Scores ▪ People You May Know (PYMK) ▪ Other application areas ▪ Pages ▪ Applications ▪ News Feed ▪ Search ▪ Ads ▪ Chat
  • 14. More Hadoop Applications Miscellaneous ▪ Experimentation Platform back end ▪ A/B Testing ▪ Champion/Challenger Testing ▪ Lots of internal analyses ▪ Export smaller data sets to R ▪ Ad targeting optimization ▪ Search index building ▪ Load testing for new storage systems ▪ Language prediction for translation targeting
  • 15. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0