SlideShare a Scribd company logo
1 of 56
Download to read offline
Erik Bernhardsson
erikbern@spotify.com
Batchdataprocessingin
Python
Focusingmostlyonmusicdiscoveryandlargescalemachinelearning
Previouslymanagedthe“Analyticsteam”inStockholm
I’matSpotify in NYC
BtwI’mErikBernhardsson
Background
Billionsoflogmessages(severalTBs)everyday
Usageandbackendstats,debuginformation
Whatwewanttodo
AB-testing
Musicrecommendations
Monthly/daily/hourlyreporting
Businessmetricdashboards
Weexperimentalot–needquickdevelopmentcycles
Wecrunchalot of data
WhydidwebuildLuigi?
Oursecondcluster(in2009):
WelikeHadoop
Longstoryshort:)
Ourfifthcluster
Runningonejobiseasy
Lotsoflong-runningprocesseswithdependencies
Needmonitoring
Handlefailures
Gofromexperimentationtoproductioneasily
Butwhataboutrunning1000sofjob every day?
Butalsonon-Hadoopstuff
MostthingsarePythonMap/Reducejobs
AlsoPig,Hive
SCPfilesfromonehosttoanother
Trainamachinelearningmodel
PutdatainCassandra
Inthepre-Luigiworld
Hownottodoworkflows
“Streams”isalistof(username,track,artist,timestamp)tuples
Example:ArtistToplist
Streams
Artist
Aggregation
Top 10 Database
Pre-Luigiexampleofartisttoplists
Don’tdothisathome
OK,sochainthetasks
Cronnicer,yay!
That’sOK,butdon’tleavebrokendatasomewhere
(btw,LuigigivesyouatomicfileoperationslocallyandinHDFS)
Errorswilloccur
Thesecondstepfails,youfixit,thenyouwanttoresume
Don’trunthingstwice
Tousedataflowsascommandlinetools
Parametrizetasks
Youwanttorunthedataflowforasetofsimilarinputs
Puttasksinloops
Plumbingsucks
Graphalgorithmsrock!
Plumbingsucks...
Who’stheworld’ssecond
mostfamousplumber?
Hint:hewearsgreen
APythonframeworkfordataflowdefinitionandexecution
IntroducingLuigi
OnsteroidsandPCP
...withatoolboxofmainlyHadooprelatedstuff
Simpledependencydefinitions
EmphasisonHadoop/HDFSintegration
Atomicfileoperations
Dataflowvisualization
Commandlineintegration
Mainfeatures
Luigiis“kindoflike
Makefile”inPython
LuigiTask
Luigi-AggregateArtists
Luigi-AggregateArtists
Run on the command line:
$ python dataflow.py AggregateArtists
DEBUG: Checking if AggregateArtists() is complete
INFO: Scheduled AggregateArtists()
DEBUG: Checking if Streams() is complete
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 74375] Running AggregateArtists()
INFO: [pid 74375] Done AggregateArtists()
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
Top10artists-WrappedarbitraryPythoncode
Completingthetoplist
BasicfunctionalityforexportingtoPostgres.Cassandrasupportisintheworks
Databasesupport
Runningitall...
DEBUG: Checking if ArtistToplistToDatabase() is complete
INFO: Scheduled ArtistToplistToDatabase()
DEBUG: Checking if Top10Artists() is complete
INFO: Scheduled Top10Artists()
DEBUG: Checking if AggregateArtists() is complete
INFO: Scheduled AggregateArtists()
DEBUG: Checking if Streams() is complete
INFO: Done scheduling tasks
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 3
INFO: [pid 74811] Running AggregateArtists()
INFO: [pid 74811] Done AggregateArtists()
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 74811] Running Top10Artists()
INFO: [pid 74811] Done Top10Artists()
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 74811] Running ArtistToplistToDatabase()
INFO: Done writing, importing at 2013-03-13 15:41:09.407138
INFO: [pid 74811] Done ArtistToplistToDatabase()
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
Imaginehowcoolthiswouldbewithrealdata...
Theresults
Taskshaveimplicit__init__
TaskParameters
Generatescommandlineinterfacewithtypinganddocumentation
Classvariableswithsomemagic
$ python dataflow.py AggregateArtists --date 2013-03-05
Combinedusageexample
TaskParameters
RunningHadoopMapReduceutilizingHadoopStreamingorcustomjar-files
RunningHiveand(soon)Pigqueries
InsertingdatasetsintoPostgres
LuigicomeswithatoolboxofabstractTasksfor...
...howtorunanything,really
Tasktemplatesandtargets
Writingnew onesareaseasyasdefininganinterfaceand
implementingrun()
Built-inHadoopStreamingPythonframework
HadoopMapReduce
Tinyinterface–justimplementmapperandreducer
FetcheserrorlogsfromHadoopclusteranddisplaysthemtotheuser
ClassinstancevariablescanbereferencedinMapReducecode,whichmakesit
easytosupplyextradataindictionariesetc.formapsidejoins
EasytosendalongPythonmodulesthatmightnotbeinstalledonthecluster
Supportforcounters,secondarysort,combiners,distributedcache,etc.
RunsonCPythonsoyoucanuseyourfavoritelibs(numpy,pandasetc.)
Features
Built-inHadoopStreamingPythonframework
HadoopMapReduce
Morefeatures
Luigi’s“visualiser”
Diveintoanytask
Basicmulti-processing
Multipleworkers
$ python dataflow.py --workers 3 AggregateArtists --date_interval 2013-W08
Greatforautomatedexecution
Errornotifications
Preventstwoidenticaltasksfromrunningsimultaneously
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A C
F
Luigi central planner
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
...whathappens
ProcessSynchronization
Luigi worker 1 Luigi worker 2
A
B C
A
C
F
Largedataflows
(Screenshotfromwebinterface)
ThingsLuigiisnot
Yes,youcanrunPythonHadoopjobsinLuigi. Butthemainfocusisworkflow
management.
Luigiisnottryingto
replacemrjob
Youstillneedtofigureouthoweachtaskruns
Luigidoesnotgiveyou
scalability
Mapreduce/Pig/Hive/etcarewonderfultoolsfordoingthisandLuigiismorethan
happytodelegateittothem.
Luigidoesnothelpyou
transformthedata
AlthoughOozieiskindofannoying
...butit’ssortoflikeOozie
Oozie Luigi
Only Hadoop Yes!
Horrible XML Yes!
Easy Yes!
Fun & powerful Yes!
“Oozieexample”
<workflow-app xmlns='uri:oozie:workflow:0.1' name='processDir'>
<start to='getDirInfo' />
<!-- STEP ONE -->
<action name='getDirInfo'>
<!--writes 2 properties: dir.num-files: returns -1 if dir doesn't exist,
otherwise returns # of files in dir dir.age: returns -1 if dir doesn't exist,
otherwise returns age of dir in days -->
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.navteq.oozie.GetDirInfo</main-class>
<arg>${inputDir}</arg>
<capture-output />
</java>
<ok to="makeIngestDecision" />
<error to="fail" />
</action>
<!-- STEP TWO -->
<decision name="makeIngestDecision">
<switch>
<!-- empty or doesn't exist -->
<case to="end">
${wf:actionData('getDirInfo')['dir.num-files'] lt 0 ||
(wf:actionData('getDirInfo')['dir.age'] lt 1 and
wf:actionData('getDirInfo')['dir.num-files'] lt 24)}
</case>
<!-- # of files >= 24 -->
<case to="ingest">
${wf:actionData('getDirInfo')['dir.num-files'] gt 23 ||
wf:actionData('getDirInfo')['dir.age'] gt 6}
</case>
<default to="sendEmail"/>
</switch>
</decision>
<!--EMAIL-->
<action name="sendEmail">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.navteq.oozie.StandaloneMailer</main-class>
<arg>probedata2@navteq.com</arg>
<arg>gregory.titievsky@navteq.com</arg>
<arg>${inputDir}</arg>
<arg>${wf:actionData('getDirInfo')['dir.num-files']}</arg>
<arg>${wf:actionData('getDirInfo')['dir.age']}</arg>
Instead,focusonridiculouslylittleboilerplatecode
Generalsoyoucanbuildwhateverontopofit
Aswellasrapidexperimentationcycle
Oncethingswork,trivialtoputinproduction
Luigidoesnothave999
features
WhatweuseLuigifor
HadoopStreaming
JavaHadoopMapReduce
Hive
Pig
Trainmachinelearningmodels
Import/exportdatato/fromPostgres
InsertdataintoCassandra
scp/rsync/ftpdatafilesandreports
Dumpandloaddatabases
OthersusingitwithScalaMapReduceandMRJobaswell
Beoneofthecoolkids!
OriginatedatSpotify
MainlybuiltbymeandEliasFreider
Basedonmanyyearsofexperiencewithdataprocessing
OpensourcesinceSeptember2012
https://github.com/spotify/luigi
Luigiisopensource
•Pig
•EC2
•Scalding
•Cassandra
Futureplans!
Formoreinformationfeelfreetoreachoutat
http://github.com/spotify/luigi
Thankyou!
Oh,andwe’rehiring–http://spotify.com/jobs
Erik Bernhardsson
erikbern@spotify.com

More Related Content

What's hot

H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaSri Ambati
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonYahoo Developer Network
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 
Web Scraping in Python with Scrapy
Web Scraping in Python with ScrapyWeb Scraping in Python with Scrapy
Web Scraping in Python with Scrapyorangain
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridMatt Williams
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Kyunghoon Kim
 
Migrations With Transmogrifier
Migrations With TransmogrifierMigrations With Transmogrifier
Migrations With TransmogrifierRok Garbas
 
Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Yahoo Developer Network
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitHenry Schreiner
 
Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudgfodor
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Theming Plone with Deliverance
Theming Plone with DeliveranceTheming Plone with Deliverance
Theming Plone with DeliveranceRok Garbas
 
Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...jasonscheirer
 
Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Provectus
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Viach Kakovskyi
 
XPath for web scraping
XPath for web scrapingXPath for web scraping
XPath for web scrapingScrapinghub
 

What's hot (20)

H2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi MehtaH2O World - PySparkling Water - Nidhi Mehta
H2O World - PySparkling Water - Nidhi Mehta
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Web Scraping in Python with Scrapy
Web Scraping in Python with ScrapyWeb Scraping in Python with Scrapy
Web Scraping in Python with Scrapy
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing grid
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2
 
Migrations With Transmogrifier
Migrations With TransmogrifierMigrations With Transmogrifier
Migrations With Transmogrifier
 
Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010Workflow on Hadoop Using Oozie__HadoopSummit2010
Workflow on Hadoop Using Oozie__HadoopSummit2010
 
DIANA: Recent developments in GooFit
DIANA: Recent developments in GooFitDIANA: Recent developments in GooFit
DIANA: Recent developments in GooFit
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 
Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloud
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Theming Plone with Deliverance
Theming Plone with DeliveranceTheming Plone with Deliverance
Theming Plone with Deliverance
 
Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...Esri International User Conference 2011: Python: Integrating Standard and Thi...
Esri International User Conference 2011: Python: Integrating Standard and Thi...
 
Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"Блохин Леонид - "Mist, как часть Hydrosphere"
Блохин Леонид - "Mist, как часть Hydrosphere"
 
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
Austin Python Meetup 2017: What's New in Pythons 3.5 and 3.6?
 
XPath for web scraping
XPath for web scrapingXPath for web scraping
XPath for web scraping
 

Viewers also liked

Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
Las tics en la educacion 1234
Las tics en la educacion 1234Las tics en la educacion 1234
Las tics en la educacion 1234marreju99
 
Luigi Paris.py meetup presentation
Luigi Paris.py meetup presentationLuigi Paris.py meetup presentation
Luigi Paris.py meetup presentationJonàs Bru Monserrat
 
The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009Brian Whitman
 
Cut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotCut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotBrian Whitman
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)Sophia Yeiji Shin
 
Music data is scary, beautiful and exciting
Music data is scary, beautiful and excitingMusic data is scary, beautiful and exciting
Music data is scary, beautiful and excitingBrian Whitman
 
The future music platform
The future music platformThe future music platform
The future music platformBrian Whitman
 
The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009Brian Whitman
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012Paul Lamere
 
Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInRussell Jurney
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 

Viewers also liked (20)

Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Las tics en la educacion 1234
Las tics en la educacion 1234Las tics en la educacion 1234
Las tics en la educacion 1234
 
Luigi Paris.py meetup presentation
Luigi Paris.py meetup presentationLuigi Paris.py meetup presentation
Luigi Paris.py meetup presentation
 
The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009The Echo Nest at Music and Bits, October 21 2009
The Echo Nest at Music and Bits, October 21 2009
 
Cut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of DorkbotCut Bait - 10 Years of Dorkbot
Cut Bait - 10 Years of Dorkbot
 
The echo nest-music_discovery(1)
The echo nest-music_discovery(1)The echo nest-music_discovery(1)
The echo nest-music_discovery(1)
 
Music data is scary, beautiful and exciting
Music data is scary, beautiful and excitingMusic data is scary, beautiful and exciting
Music data is scary, beautiful and exciting
 
The future music platform
The future music platformThe future music platform
The future music platform
 
The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009The Echo Nest Remix at Dorkbot NYC, March 4 2009
The Echo Nest Remix at Dorkbot NYC, March 4 2009
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012
 
Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017Luigi Galluccio Booklet 2017
Luigi Galluccio Booklet 2017
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
Quartz
QuartzQuartz
Quartz
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Azkaban
AzkabanAzkaban
Azkaban
 
Azkaban and Pig at LinkedIn
Azkaban and Pig at LinkedInAzkaban and Pig at LinkedIn
Azkaban and Pig at LinkedIn
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 

Similar to Luigi Presentation at OSCON 2013

Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)PyData
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceRoy Cecil
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...Paris Open Source Summit
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxLex Avstreikh
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksDaniel Lemire
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013David Chen
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
TTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressTTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressDylan Jay
 

Similar to Luigi Presentation at OSCON 2013 (20)

Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)Luigi - Batch Data Processing in Python (PyData SV 2013)
Luigi - Batch Data Processing in Python (PyData SV 2013)
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops Rolls
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
#OSSPARIS19 - Computer Vision framework for GeoSpatial Imagery: RoboSat.pink ...
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Accurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarksAccurate and efficient software microbenchmarks
Accurate and efficient software microbenchmarks
 
Spring Batch Introduction
Spring Batch IntroductionSpring Batch Introduction
Spring Batch Introduction
 
Talk in Google fest 2013
Talk in Google fest 2013Talk in Google fest 2013
Talk in Google fest 2013
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
TTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpressTTW FTW: Plone as the new wordpress
TTW FTW: Plone as the new wordpress
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Luigi Presentation at OSCON 2013