SlideShare a Scribd company logo
1 of 97
Download to read offline
Hadoop at MeeboLessons learned in the real world,[object Object],Vikram Oberoi,[object Object],August, 2010,[object Object],Hadoop Day, Seattle,[object Object]
About me,[object Object],SDE Intern at Amazon, ’07,[object Object],R&D on item-to-item similarities,[object Object],Data Engineer Intern at Meebo, ’08,[object Object],Built an A/B testing system,[object Object],CS at Stanford, ’09,[object Object],Senior project: Ext3 and XFS under HadoopMapReduce workloads,[object Object],Data Engineer at Meebo, ’09—present,[object Object],Data infrastructure, analytics ,[object Object]
About Meebo,[object Object],Products,[object Object],Browser-based IM client (www.meebo.com),[object Object],Mobile chat clients,[object Object],Social widgets (the Meebo Bar),[object Object],Company,[object Object],Founded 2005,[object Object],Over 100 employees, 30 engineers,[object Object],Engineering,[object Object],Strong engineering culture,[object Object],Contributions to CouchDB, Lounge, Hadoop components,[object Object]
The Problem,[object Object],Hadoop is powerful technology,[object Object],Meets today’s demand for big data,[object Object],But it’s still a young platform,[object Object],Evolving components and best practices,[object Object],With many challenges in real-world usage,[object Object],Day-to-day operational headaches,[object Object],Missing eco-system features (e.g recurring jobs?),[object Object],Lots of re-inventing the wheel to solve these,[object Object]
Purpose of this talk,[object Object],Discuss some real problems we’ve seen,[object Object],Explain our solutions,[object Object],Propose best practices so you can avoid them,[object Object]
What will I talk about?,[object Object],Background:,[object Object],Meebo’s data processing needs,[object Object],Meebo’s pre and post Hadoop data pipelines,[object Object],Lessons:,[object Object],Better workflow management,[object Object],Scheduling, reporting, monitoring, etc.,[object Object],A look at Azkaban,[object Object],Get wiser about data serialization,[object Object],Protocol Buffers (or Avro, or Thrift),[object Object]
Meebo’s Data Processing Needs,[object Object]
What do we use Hadoop for?,[object Object],ETL,[object Object],Analytics,[object Object],Behavioral targeting,[object Object],Ad hoc data analysis, research,[object Object],Data produced helps power:,[object Object],internal/external dashboards,[object Object],our ad server,[object Object]
What kind of data do we have?,[object Object],Log data from all our products,[object Object],The Meebo Bar,[object Object],Meebo Messenger (www.meebo.com),[object Object],Android/iPhone/Mobile Web clients,[object Object],Rooms,[object Object],Meebo Me,[object Object],Meebonotifier,[object Object],Firefox extension,[object Object]
How much data?,[object Object],150MM uniques/month from the Meebo Bar,[object Object],Around 200 GB of uncompressed daily logs,[object Object],We process a subset of our logs,[object Object]
Meebo’s Data Pipeline,[object Object],Pre and Post Hadoop,[object Object]
A data pipeline in general,[object Object],1. Data,[object Object],Collection,[object Object],2. Data,[object Object],Processing,[object Object],3. Data,[object Object],Storage,[object Object],4. Workflow Management,[object Object]
Our data pipeline, pre-Hadoop,[object Object],Servers,[object Object],Python/shell scripts pull log data,[object Object],Python/shell scripts process data,[object Object],MySQL, CouchDB, flat files,[object Object],Cron, wrapper shell scripts glue everything together,[object Object]
Our data pipeline post Hadoop,[object Object],Servers,[object Object],Push logs to HDFS,[object Object],Pig scripts process data,[object Object],MySQL, CouchDB, flat files,[object Object],Azkaban, a workflow management system, glues everything together,[object Object]
Our transition to using Hadoop,[object Object],Deployed early ’09,[object Object],Motivation: processing data took aaaages!,[object Object],Catalyst: Hadoop Summit,[object Object],Turbulent, time consuming,[object Object],New tools, new paradigms, pitfalls,[object Object],Totally worth it,[object Object],24 hours to process day’s logs  under an hour,[object Object],Leap in ability to analyze our data,[object Object],Basis for new core product features,[object Object]
Workflow Management,[object Object]
What is workflow management?,[object Object]
What is workflow management?,[object Object],It’s the glue that binds your data pipeline together: scheduling, monitoring, reporting etc.,[object Object],Most people use scripts and cron,[object Object],But end up spending too much time managing,[object Object],We need a better way,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object]
Split up your jobs into discrete chunks with dependencies,[object Object],[object Object]
 Allow engineers to work on chunks separately
 Monolithic scripts are no fun ,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object],Schedules recurring jobs to run at a given time ,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object],Schedules recurring jobs to run at a given time ,[object Object],Monitors job progress,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object],Schedules recurring jobs to run at a given time ,[object Object],Monitors job progress,[object Object],Reports when job fails, how long jobs take,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object],Schedules recurring jobs to run at a given time ,[object Object],Monitors job progress,[object Object],Reports when job fails, how long jobs take,[object Object],Logs job execution and exposes logs so that engineers can deal with failures swiftly,[object Object]
Workflow management consists of:,[object Object],Executes jobs with arbitrarily complex dependency chains,[object Object],Schedules recurring jobs to run at a given time ,[object Object],Monitors job progress,[object Object],Reports when job fails, how long jobs take,[object Object],Logs job execution and exposes logs so that engineers can deal with failures swiftly,[object Object],Provides resource management capabilities,[object Object]
Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],DB somewhere,[object Object],Don’t DoS yourself,[object Object]
Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],Export to DB somewhere,[object Object],2,[object Object],1,[object Object],0,[object Object],0,[object Object],0,[object Object],Permit Manager,[object Object],DB somewhere,[object Object]
Don’t roll your own scheduler!,[object Object],Building a good scheduling framework is hard,[object Object],Myriad of small requirements, precise bookkeeping with many edge cases,[object Object],Many roll their own,[object Object],It’s usually inadequate,[object Object],So much repeated effort!,[object Object],Mold an existing framework to your requirements and contribute,[object Object]
Two emerging frameworks,[object Object],Oozie,[object Object],Built at Yahoo,[object Object],Open-sourced at Hadoop Summit ’10,[object Object],Used in production for [don’t know],[object Object],Packaged by Cloudera,[object Object],Azkaban,[object Object],Built at LinkedIn,[object Object],Open-sourced in March ‘10,[object Object],Used in production for over nine months as of March ’10,[object Object],Now in use at Meebo,[object Object]
Azkaban,[object Object]
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
Azkaban jobs are bundles of configuration and code,[object Object]
Configuring a job,[object Object],process_log_data.job,[object Object],type=command,[object Object],command=python process_logs.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object],process_logs.py,[object Object],importos,[object Object],import sys,[object Object],# Do useful things,[object Object],…,[object Object]
Deploying a job,[object Object],Step 1: Shove your config and code into a zip archive.,[object Object],process_log_data.zip,[object Object],.job,[object Object],.py,[object Object]
Deploying a job,[object Object],Step 2: Upload to Azkaban,[object Object],process_log_data.zip,[object Object],.job,[object Object],.py,[object Object]
Scheduling a job,[object Object],The Azkaban front-end:,[object Object]
What about dependencies?,[object Object]
get_users_widgets,[object Object],process_widgets.job,[object Object],process_users.job,[object Object],join_users_widgets.job,[object Object],export_to_db.job,[object Object]
get_users_widgets,[object Object],process_widgets.job,[object Object],type=command,[object Object],command=python process_widgets.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object],process_users.job,[object Object],type=command,[object Object],command=python process_users.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object]
get_users_widgets,[object Object],join_users_widgets.job,[object Object],type=command,[object Object],command=python join_users_widgets.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object],dependencies=process_widgets,process_users,[object Object],export_to_db.job,[object Object],type=command,[object Object],command=python export_to_db.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object],dependencies=join_users_widgets,[object Object]
get_users_widgets,[object Object],get_users_widgets.zip,[object Object],.job,[object Object],.job,[object Object],.job,[object Object],.job,[object Object],.py,[object Object],.py,[object Object],.py,[object Object],.py,[object Object]
You deploy and schedule a job flow as you would a single job.,[object Object]
Hadoop at Meebo: Lessons in the Real World
Hierarchical configuration,[object Object],process_widgets.job,[object Object],type=command,[object Object],command=python process_widgets.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object],This is silly. Can‘t I specify failure.emailsglobally?,[object Object],process_users.job,[object Object],type=command,[object Object],command=python process_users.py,[object Object],failure.emails=datateam@whereiwork.com,[object Object]
azkaban-job-dir/,[object Object],system.properties,[object Object],get_users_widgets/,[object Object],process_widgets.job,[object Object],process_users.job,[object Object],join_users_widgets.job,[object Object],export_to_db.job,[object Object],some-other-job/,[object Object],…,[object Object]
Hierarchical configuration,[object Object],system.properties,[object Object],failure.emails=datateam@whereiwork.com,[object Object],db.url=foo.whereiwork.com,[object Object],archive.dir=/var/whereiwork/archive,[object Object]
What is type=command?,[object Object],Azkaban supports a few ways to execute jobs,[object Object],command,[object Object],Unix command in a separate process,[object Object],javaprocess,[object Object],Wrapper to kick off Java programs,[object Object],java,[object Object],Wrapper to kick off Runnable Java classes,[object Object],Can hook into Azkaban in useful ways,[object Object],Pig,[object Object],Wrapper to run Pig scripts through Grunt,[object Object]
What’s missing?,[object Object],Scheduling and executing multiple instances of the same job at the same time.,[object Object]
3:00 PM,[object Object],FOO,[object Object],[object Object]
 3:00 PM took longer than expected4:00 PM,[object Object],FOO,[object Object]
3:00 PM,[object Object],FOO,[object Object],[object Object]
 3:00 PM failed, restarted at 4:25 PM4:00 PM,[object Object],FOO,[object Object],FOO,[object Object],5:00 PM,[object Object]
What’s missing?,[object Object],Scheduling and executing multiple jobs at the same time.,[object Object],AZK-49, AZK-47,[object Object],Stay tuned for complete, reviewed patch branches: www.github.com/voberoi/azkaban,[object Object]
What’s missing?,[object Object],Scheduling and executing multiple jobs at the same time.,[object Object],AZK-49, AZK-47,[object Object],Stay tuned for complete, reviewed patch branches: www.github.com/voberoi/azkaban,[object Object],Passing arguments between jobs.,[object Object],Write a library used by your jobs,[object Object],Put your arguments anywhere you want,[object Object]
What did we get out of it?,[object Object],No more monolithic wrapper scripts,[object Object],Massively reduced job setup time,[object Object],It’s configuration, not code!,[object Object],More code reuse, less hair pulling,[object Object],Still porting over jobs,[object Object],It’s time consuming,[object Object]
Data Serialization,[object Object]
What’s the problem?,[object Object],Serializing data in simple formats is convenient,[object Object],CSV, XML etc.,[object Object],Problems arise when data changes,[object Object],Needs backwards-compatibility,[object Object],Does this really matter? Let’s discuss.,[object Object]
v1,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
“Click a Button” Analytics PRD,[object Object],We want to know the number of unique users who clicked on the button.,[object Object],Over an arbitrary range of time.,[object Object],Broken down by whether they’re logged in or not.,[object Object],With hour granularity.,[object Object]
“I KNOW!”,[object Object],Every hour, process logs and dump lines that look like this to HDFS with Pig:,[object Object],unique_id,logged_in,clicked,[object Object]
“I KNOW!”,[object Object],--‘clicked’ and ‘logged_in’ are either 0 or 1,[object Object],LOAD ‘$IN’ USING PigStorage(‘,’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
v2,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
“Click a Button” Analytics PRD,[object Object],Break users down by which button they clicked, too.,[object Object]
“I KNOW!”,[object Object],Every hour, process logs and dump lines that look like this to HDFS with Pig:,[object Object],unique_id,logged_in,red_click,green_click,[object Object]
“I KNOW!”,[object Object],--‘clicked’ and ‘logged_in’ are either 0 or 1,[object Object],LOAD ‘$IN’ USING PigStorage(‘.’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],red_clicked:int,,[object Object],green_clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
v3,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
“Hmm.”,[object Object]
Bad Solution 1,[object Object],Remove red_click,[object Object],unique_id,logged_in,red_click,green_click,[object Object],unique_id,logged_in,green_click,[object Object]
Why it’s bad,[object Object],Your script thinks green clicks are red clicks.,[object Object],LOAD ‘$IN’ USING PigStorage(‘.’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],red_clicked:int,,[object Object],green_clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
Why it’s bad,[object Object],Now your script won’t work for all the data you’ve collected so far.,[object Object],LOAD ‘$IN’ USING PigStorage(‘.’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],green_clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
“I’ll keep multiple scripts lying around”,[object Object]
LOAD ‘$IN’ USING PigStorage(‘.’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],green_clicked:int,[object Object],);,[object Object],My data has three fields. Which one do I use?,[object Object],LOAD ‘$IN’ USING PigStorage(‘.’) AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],orange_clicked:int,[object Object],);,[object Object]
Bad Solution 2,[object Object],Assign a sentinel to red_clickwhen it should be ignored, i.e. -1. ,[object Object],unique_id,logged_in,red_click,green_click,[object Object]
Why it’s bad,[object Object],It’s a waste of space.,[object Object]
Why it’s bad,[object Object],Sticking logic in your data is iffy.,[object Object]
The Preferable Solution,[object Object],Serialize your data using backwards-compatible data structures!,[object Object],Protocol Buffers and Elephant Bird,[object Object]
Protocol Buffers,[object Object],Serialization system,[object Object],Avro, Thrift,[object Object],Compiles interfaces to language modules,[object Object],Construct a data structure,[object Object],Access it (in a backwards-compatible way),[object Object],Ser/deser the data structure in a standard, compact, binary format,[object Object]
uniqueuser.proto,[object Object],message UniqueUser {,[object Object],optional string id = 1;,[object Object],optional int32 logged_in = 2;,[object Object],optional int32 red_clicked = 3;,[object Object],},[object Object],.h,,[object Object],.cc,[object Object],.java,[object Object],.py,[object Object]
Elephant Bird,[object Object],Generate protobuf-based Pig load/store functions + lots more,[object Object],Developed at Twitter,[object Object],Blog post,[object Object],http://engineering.twitter.com/2010/04/hadoop-at-twitter.html,[object Object],Available at:,[object Object],http://www.github.com/kevinweil/elephant-bird,[object Object]
uniqueuser.proto,[object Object],message UniqueUser {,[object Object],optional string id = 1;,[object Object],optional int32 logged_in = 2;,[object Object],optional int32 red_clicked = 3;,[object Object],},[object Object],*.pig.load.UniqueUserLzoProtobufB64LinePigLoader,[object Object],*.pig.store.UniqueUserLzoProtobufB64LinePigStorage,[object Object]
LzoProtobufB64?,[object Object]
LzoProtobufB64Serialization,[object Object],(bak49jsn, 0, 1),[object Object],Protobuf Binary Blob,[object Object],Base64-encoded Protobuf Binary Blob,[object Object],LZO-compressed Base64-encoded Protobuf Binary Blob,[object Object]
LzoProtobufB64Deserialization,[object Object],(bak49jsn, 0, 1),[object Object],Protobuf Binary Blob,[object Object],Base64-encoded Protobuf Binary Blob,[object Object],LZO-compressed Base64-encoded Protobuf Binary Blob,[object Object]
Setting it up,[object Object],Prereqs,[object Object],Protocol Buffers 2.3+,[object Object],LZO codec for Hadoop,[object Object],Check out docs,[object Object],http://www.github.com/kevinweil/elephant-bird,[object Object]
Time to revisit,[object Object]
v1,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
Every hour, process logs and dump lines to HDFS that use this protobuf interface:,[object Object],uniqueuser.proto,[object Object],message UniqueUser {,[object Object],optional string id = 1;,[object Object],optional int32 logged_in = 2;,[object Object],optional int32 red_clicked = 3;,[object Object],},[object Object]
--‘clicked’ and ‘logged_in’ are either 0 or 1,[object Object],LOAD ‘$IN’ USING myudfs.pig.load.UniqueUserLzoProtobufB64LinePigLoader AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],red_clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
v2,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
Every hour, process logs and dump lines to HDFS that use this protobuf interface:,[object Object],uniqueuser.proto,[object Object],message UniqueUser {,[object Object],optional string id = 1;,[object Object],optional int32 logged_in = 2;,[object Object],optional int32 red_clicked = 3;,[object Object],optional int32 green_clicked = 4;,[object Object],},[object Object]
--‘clicked’ and ‘logged_in’ are either 0 or 1,[object Object],LOAD ‘$IN’ USING myudfs.pig.load.UniqueUserLzoProtobufB64LinePigLoader AS (,[object Object],unique_id:chararray,,[object Object],logged_in:int,,[object Object],red_clicked:int,,[object Object],green_clicked:int,[object Object],);,[object Object],-- Munge data according to the PRD,[object Object],… ,[object Object]
v3,[object Object],clickabutton.com,[object Object],Username:,[object Object],Password:,[object Object],Go!,[object Object]
No need to change your scripts.,[object Object],They’ll work on old and new data!,[object Object]

More Related Content

What's hot

Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdf
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdfOracle Systems _ Nathan Kroenert _ New Software New Hardware.pdf
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdfInSync2011
 
Cloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedCloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedVMware Tanzu
 
DOES16 London - Better Faster Cheaper .. How?
DOES16 London - Better Faster Cheaper .. How? DOES16 London - Better Faster Cheaper .. How?
DOES16 London - Better Faster Cheaper .. How? John Willis
 
AD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension LibraryAD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension Librarypaidi_ed
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The CloudAmazon Web Services
 
Optimus XPages: An Explosion of Techniques and Best Practices
Optimus XPages: An Explosion of Techniques and Best PracticesOptimus XPages: An Explosion of Techniques and Best Practices
Optimus XPages: An Explosion of Techniques and Best PracticesTeamstudio
 
An Introduction to Web Components
An Introduction to Web ComponentsAn Introduction to Web Components
An Introduction to Web ComponentsRed Pill Now
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
 
Pass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalPass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalKellyn Pot'Vin-Gorman
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Office 365 UK User Group London 4th September 2012
Office 365 UK User Group London 4th September 2012Office 365 UK User Group London 4th September 2012
Office 365 UK User Group London 4th September 2012Office 365 UK User Group
 

What's hot (18)

Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdf
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdfOracle Systems _ Nathan Kroenert _ New Software New Hardware.pdf
Oracle Systems _ Nathan Kroenert _ New Software New Hardware.pdf
 
GluonCV
GluonCVGluonCV
GluonCV
 
Cloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedCloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons Learned
 
DOES16 London - Better Faster Cheaper .. How?
DOES16 London - Better Faster Cheaper .. How? DOES16 London - Better Faster Cheaper .. How?
DOES16 London - Better Faster Cheaper .. How?
 
AD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension LibraryAD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension Library
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Optimus XPages: An Explosion of Techniques and Best Practices
Optimus XPages: An Explosion of Techniques and Best PracticesOptimus XPages: An Explosion of Techniques and Best Practices
Optimus XPages: An Explosion of Techniques and Best Practices
 
An Introduction to Web Components
An Introduction to Web ComponentsAn Introduction to Web Components
An Introduction to Web Components
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's Perspective
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Pass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalPass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft Professional
 
BIG DATA ANALYSIS
BIG DATA ANALYSISBIG DATA ANALYSIS
BIG DATA ANALYSIS
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Coscup
CoscupCoscup
Coscup
 
Office 365 UK User Group London 4th September 2012
Office 365 UK User Group London 4th September 2012Office 365 UK User Group London 4th September 2012
Office 365 UK User Group London 4th September 2012
 

Similar to Hadoop at Meebo: Lessons in the Real World

UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloudErnest Mueller
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for DevelopersSarah Dutkiewicz
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Mirco Hering
 
Aws vs azure bakeoff
Aws vs azure bakeoffAws vs azure bakeoff
Aws vs azure bakeoffSoHo Dragon
 
Movin’ On Up - SP Engage Oct 2015
Movin’ On Up - SP Engage Oct 2015Movin’ On Up - SP Engage Oct 2015
Movin’ On Up - SP Engage Oct 2015Jim Adcock
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 
Mvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldMvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldAlessandro Alpi
 
Getting started with SAP PI/PO an overview presentation
Getting started with SAP PI/PO an overview presentationGetting started with SAP PI/PO an overview presentation
Getting started with SAP PI/PO an overview presentationFigaf.com
 
Delphix and DBmaestro
Delphix and DBmaestroDelphix and DBmaestro
Delphix and DBmaestroKyle Hailey
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Hadoop applicationarchitectures
Hadoop applicationarchitecturesHadoop applicationarchitectures
Hadoop applicationarchitecturesDoug Chang
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 

Similar to Hadoop at Meebo: Lessons in the Real World (20)

UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloud
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for Developers
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 
Big dataanalyticsinthecloud
Big dataanalyticsinthecloudBig dataanalyticsinthecloud
Big dataanalyticsinthecloud
 
Aws vs azure bakeoff
Aws vs azure bakeoffAws vs azure bakeoff
Aws vs azure bakeoff
 
manage databases like codebases
manage databases like codebasesmanage databases like codebases
manage databases like codebases
 
Movin’ On Up - SP Engage Oct 2015
Movin’ On Up - SP Engage Oct 2015Movin’ On Up - SP Engage Oct 2015
Movin’ On Up - SP Engage Oct 2015
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
Mvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops worldMvp4 croatia - Being a dba in a devops world
Mvp4 croatia - Being a dba in a devops world
 
Getting started with SAP PI/PO an overview presentation
Getting started with SAP PI/PO an overview presentationGetting started with SAP PI/PO an overview presentation
Getting started with SAP PI/PO an overview presentation
 
Delphix and DBmaestro
Delphix and DBmaestroDelphix and DBmaestro
Delphix and DBmaestro
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
 
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
50 Shades of SharePoint: SharePoint 2013 Insanity Demystified
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Hadoop applicationarchitectures
Hadoop applicationarchitecturesHadoop applicationarchitectures
Hadoop applicationarchitectures
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 

Hadoop at Meebo: Lessons in the Real World

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Allow engineers to work on chunks separately
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.