SlideShare a Scribd company logo
1 of 18
Download to read offline
Luigi - A Python framework for data
processing an pipelining
Mickaël Mendez
Tech Talk – Jun 27, 2016
Summary
● What is a pipeline?
● Luigi
● Tips and Tricks
● Conclusion
A Simple Pipeline
● Count the number of interval per chromosome
A more robust pipeline
Add logs
GNU Make 1/2
http://www.gnu.org/software/make/manual/make.html
GNU Make 2/2
http://www.gnu.org/software/make/manual/make.html
Luigi
● Python Framework
● Developed at Spotify©
“
“
http://luigi.readthedocs.io/en/stable/
Luigi – Main features
● Task templating
● Dependency graphs
● Resumption of data flows after intermediate
failure
● Command line integration
● Run Tasks in parallel
Luigi – Cheat sheet
Luigi - Example
Luigi -Example
Running a pipeline
● python luigi_word_count.py
● On mordor:
– qlogin -pe smp 10 #From the main node
– Python luigi_word_count.py --workers 10
● A specific task:
– python luigi_word_count.py CountLetters
Luigi – Parameters
Configurations
● Allows to specify the parameters in an external
file
● luifi.cfg
[MyTask]
a_parameter=21
Tips and Tricks
Tips
Conclusion
● Luigi is a workflow manager
● It handles dependencies, job failures and
parallelism
● It is well maintained (More than 100
contributors)
● Mailing list is very active
● Used by popular companies (foursquare...)

More Related Content

Similar to Luigi - A Python framework for data processing and pipelining

Programming with Python - Basic
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - BasicMosky Liu
 
Testing cloud and kubernetes applications - ElasTest
Testing cloud and kubernetes applications - ElasTestTesting cloud and kubernetes applications - ElasTest
Testing cloud and kubernetes applications - ElasTestMicael Gallego
 
MicroPython&electronics prezentācija
MicroPython&electronics prezentācija MicroPython&electronics prezentācija
MicroPython&electronics prezentācija CRImier
 
Plone Intranet under the hood
Plone Intranet under the hoodPlone Intranet under the hood
Plone Intranet under the hoodGuido Stevens
 
How to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipelineHow to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipelineElasTest Project
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerNg'eno Victor
 
Python and big data : a good match?
Python and big data : a good match?Python and big data : a good match?
Python and big data : a good match?PyDataParis
 
Build and deploy scientific Python Applications
Build and deploy scientific Python Applications  Build and deploy scientific Python Applications
Build and deploy scientific Python Applications Ramakrishna Reddy
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniquesTuomas Suutari
 
Hong Kong Drupal User Group - 2014 March 8th
Hong Kong Drupal User Group - 2014 March 8thHong Kong Drupal User Group - 2014 March 8th
Hong Kong Drupal User Group - 2014 March 8thWong Hoi Sing Edison
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...OW2
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDoku
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAPLDAPCon
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with PythonSivaranjan Goswami
 
MQTT on Raspberry Pi with node.js
MQTT on Raspberry Pi with node.jsMQTT on Raspberry Pi with node.js
MQTT on Raspberry Pi with node.jsPaul Tanner
 
Python Django Intro V0.1
Python Django Intro V0.1Python Django Intro V0.1
Python Django Intro V0.1Udi Bauman
 

Similar to Luigi - A Python framework for data processing and pipelining (20)

Programming with Python - Basic
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - Basic
 
Testing cloud and kubernetes applications - ElasTest
Testing cloud and kubernetes applications - ElasTestTesting cloud and kubernetes applications - ElasTest
Testing cloud and kubernetes applications - ElasTest
 
MicroPython&electronics prezentācija
MicroPython&electronics prezentācija MicroPython&electronics prezentācija
MicroPython&electronics prezentācija
 
Plone Intranet under the hood
Plone Intranet under the hoodPlone Intranet under the hood
Plone Intranet under the hood
 
How to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipelineHow to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipeline
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServer
 
Python and big data : a good match?
Python and big data : a good match?Python and big data : a good match?
Python and big data : a good match?
 
Build and deploy scientific Python Applications
Build and deploy scientific Python Applications  Build and deploy scientific Python Applications
Build and deploy scientific Python Applications
 
ngconf2015
ngconf2015ngconf2015
ngconf2015
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
 
Hong Kong Drupal User Group - 2014 March 8th
Hong Kong Drupal User Group - 2014 March 8thHong Kong Drupal User Group - 2014 March 8th
Hong Kong Drupal User Group - 2014 March 8th
 
Mikrotik fasttrack
Mikrotik fasttrackMikrotik fasttrack
Mikrotik fasttrack
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
 
OpenNTF Essentials
OpenNTF EssentialsOpenNTF Essentials
OpenNTF Essentials
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAP
 
Software maintenance PyConUK 2016
Software maintenance PyConUK 2016Software maintenance PyConUK 2016
Software maintenance PyConUK 2016
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
MQTT on Raspberry Pi with node.js
MQTT on Raspberry Pi with node.jsMQTT on Raspberry Pi with node.js
MQTT on Raspberry Pi with node.js
 
Python Django Intro V0.1
Python Django Intro V0.1Python Django Intro V0.1
Python Django Intro V0.1
 

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkHoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetHoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome BrowserHoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelHoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornHoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and RmdformatsHoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching toolsHoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroHoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and ShellsHoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/AcronymHoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyHoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With GitHoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2Hoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Recently uploaded

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Luigi - A Python framework for data processing and pipelining