SlideShare a Scribd company logo
1 of 24
Engineering @ Exzeo
Building Data Science
Pipelines in Python
Pydata Delhi Meetup
Exzeo, Noida
Feb 10, 2018
Shivam Bansal
Shwet Kamal Mishra
Contents
● Introduction
● Typical Data Science Workflow
● Challenges in the Data Science Workflow
● Data Science Pipelines
● Why use a Data Science Pipeline
● Luigi - Pipeline in python
● Luigi Features
● Luigi Demo
Who We are
Exzeo is a software development company specialized in core tech products
and services that optimize human capital
It was registered with Registrar of Companies on 9th August 2012.
We are a part of HCI group (NYSE: HCI) , a multinational conglomerate based
at Tampa, FL,USA.
The key focus of Exzeo is to improve the Insurance Sector using technology,
analytics and data science
Our Products and Services
ATLAS VIEWER
A data visualization product to view real-time feeds
and massive datasets on a map.
EXZEO HQ
Cloud based process management and Intelligent
automation for the insurance industry.
PROPLET
Innovative policy quoting application leveraging
multiple proprietary data sources.
TYPTAP
A complete, quick and secure platform to access
user’s insurance policies, and loss information
JUSTER
An intelligent app which helps to organize the claim
inspections and sync information with Exzeo Cloud.
HARMONY
Project Harmony offers insurance solutions; right
from buying a policy to filing a claim.
Our Tech Stack
Backend Frontend DataStorage DataScience DevOps /
Platforms
Data Science Problems @ Exzeo
● Property Risk Scoring from Multidimensional Data
● Detecting Roof Shape from Satellite Images
● Fraud Detection in Insurance Claims
● Claim Cause and Cost Prediction
● Knowledge Graph : Root Claim Cause Detection using NLP
● Climate Risk Forecasting
● Insurance Price Quoting Chatbot
● Object Detection from Property Interior Images
Typical Data Science Workflow
$ python procure_data.py
$ python clean_data.py
$ python feature_engineering.py
$ python exploratory_data_analysis.py
$ python modelling.py
$ python visualize_results.py
Too many tasks
procure_data()
clean_data()
feature_engineering()
exploratory_data_anaysis
()
<<--
Error
modelling()
visualize_results()
Failure Recovery
Reproducibility
generic_data_cleaning()
generic_data_processing()
generic_data_analysis()
generic_data_modeling()
Too Much Boilerplate Code
If __name__ == ‘__main__’
Solution - Pipeline
Continuous Integration of data processing steps and analysis tasks
Why use a pipeline
- Reuse the models
- Quick Implementation of Ideas
- Focus more on science instead of engineering
- Production ready products
Pipelines in Python - Luigi
● Python tool for workflow task management
● Developed and maintained by Spotify
● Open Source: https://github.com/spotify/luigi
pip install luigi
What’s so special about Luigi
● Tasks Templating
● Tasks Scheduling
● Tasks Monitoring
● Command Line Integration
● Batch and Parallel Processing
● Dependency Graphs
● Failure Recovery and Error Emails
Luigi Tasks
Monitoring Tasks
Visualizing Tasks Workflow
Central Scheduler
Problem Statement:
Building a Pipeline to predict the Performance Score of a mobile game user.
The game consists of 120 different characters(heroes) and every hero has some capabilities.
Input Data
Training Data: User score for given characters
Independent Variables: User ID, Character ID, User-Character ID, Num Tries, Boost Used(0/1),
Attack Duration
Dependent Variable: Performance Score
Character Metadata: Data of each character
Variables: Character ID, Character Type, Hitpoints
Solution Pipeline
● Load Data
● Aggregate Data
● PreProcess Data
● Model Training
● Linear Regression
● Random Forest
● Model Selection
● Model Prediction
Luigi Pipeline Demo
- Not ideal for Streaming Data
- No built in triggering(crontab or message broker is used)
Limitations of Luigi
Shivam Bansal | shivam5992@gmail.com | www.shivambansal.com
Shwet Kamal Mishra | shwetmishraa@gmail.com | www.shwetkmishra.com
Thanks !

More Related Content

What's hot

Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in financeQuantUniversity
 
Frontiers in Alternative Data : Techniques and Use Cases
Frontiers in Alternative Data : Techniques and Use CasesFrontiers in Alternative Data : Techniques and Use Cases
Frontiers in Alternative Data : Techniques and Use CasesQuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 
Quant university MRM and machine learning
Quant university MRM and machine learningQuant university MRM and machine learning
Quant university MRM and machine learningQuantUniversity
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa polandQuantUniversity
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid AgeQuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
 
Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in financeQuantUniversity
 
Ml and AI for financial professionals
Ml and AI for financial professionalsMl and AI for financial professionals
Ml and AI for financial professionalsQuantUniversity
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Institute of Contemporary Sciences
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investingQuantUniversity
 
CFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesCFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesQuantUniversity
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementQuantUniversity
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsSpotle.ai
 
Rapid prototyping quant research ml models using the qu sandbox
Rapid prototyping quant research ml models using the qu sandboxRapid prototyping quant research ml models using the qu sandbox
Rapid prototyping quant research ml models using the qu sandboxQuantUniversity
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learningQuantUniversity
 
AI and ML Disruption in Finance
AI and ML Disruption in FinanceAI and ML Disruption in Finance
AI and ML Disruption in FinanceGopi Suvanam
 

What's hot (20)

Ml master class
Ml master classMl master class
Ml master class
 
Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in finance
 
Frontiers in Alternative Data : Techniques and Use Cases
Frontiers in Alternative Data : Techniques and Use CasesFrontiers in Alternative Data : Techniques and Use Cases
Frontiers in Alternative Data : Techniques and Use Cases
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Quant university MRM and machine learning
Quant university MRM and machine learningQuant university MRM and machine learning
Quant university MRM and machine learning
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid Age
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in finance
 
Careers in analytics
Careers in analyticsCareers in analytics
Careers in analytics
 
Ml conference slides
Ml conference slidesMl conference slides
Ml conference slides
 
Ml and AI for financial professionals
Ml and AI for financial professionalsMl and AI for financial professionals
Ml and AI for financial professionals
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investing
 
CFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesCFA-NY Workshop - Final slides
CFA-NY Workshop - Final slides
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk Management
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and Analytics
 
Rapid prototyping quant research ml models using the qu sandbox
Rapid prototyping quant research ml models using the qu sandboxRapid prototyping quant research ml models using the qu sandbox
Rapid prototyping quant research ml models using the qu sandbox
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learning
 
AI and ML Disruption in Finance
AI and ML Disruption in FinanceAI and ML Disruption in Finance
AI and ML Disruption in Finance
 

Similar to Building Data Science Pipelines in Python using Luigi

Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Sri Ambati
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing cosma_r
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainingsTop 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainingsMildain Solutions
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonIRJET Journal
 
Second Review GTU intern ship about plant disease.pptx
Second Review GTU intern ship about plant disease.pptxSecond Review GTU intern ship about plant disease.pptx
Second Review GTU intern ship about plant disease.pptxroyromeo560
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesRudiger Wolf
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...Codemotion
 
2018 Oracle Impact 발표자료: Oracle Enterprise AI
2018  Oracle Impact 발표자료: Oracle Enterprise AI2018  Oracle Impact 발표자료: Oracle Enterprise AI
2018 Oracle Impact 발표자료: Oracle Enterprise AITaewan Kim
 
Session 2023-11.pptx
Session 2023-11.pptxSession 2023-11.pptx
Session 2023-11.pptxAndreeaTom
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestAlluxio, Inc.
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updatedHari Duche
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 

Similar to Building Data Science Pipelines in Python using Luigi (20)

Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainingsTop 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Second Review GTU intern ship about plant disease.pptx
Second Review GTU intern ship about plant disease.pptxSecond Review GTU intern ship about plant disease.pptx
Second Review GTU intern ship about plant disease.pptx
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slides
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
 
2018 Oracle Impact 발표자료: Oracle Enterprise AI
2018  Oracle Impact 발표자료: Oracle Enterprise AI2018  Oracle Impact 발표자료: Oracle Enterprise AI
2018 Oracle Impact 발표자료: Oracle Enterprise AI
 
Session 2023-11.pptx
Session 2023-11.pptxSession 2023-11.pptx
Session 2023-11.pptx
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
 
28022017 Simen Munter Mindfields
28022017 Simen Munter Mindfields28022017 Simen Munter Mindfields
28022017 Simen Munter Mindfields
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 

Recently uploaded

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Building Data Science Pipelines in Python using Luigi

  • 1. Engineering @ Exzeo Building Data Science Pipelines in Python Pydata Delhi Meetup Exzeo, Noida Feb 10, 2018 Shivam Bansal Shwet Kamal Mishra
  • 2. Contents ● Introduction ● Typical Data Science Workflow ● Challenges in the Data Science Workflow ● Data Science Pipelines ● Why use a Data Science Pipeline ● Luigi - Pipeline in python ● Luigi Features ● Luigi Demo
  • 3. Who We are Exzeo is a software development company specialized in core tech products and services that optimize human capital It was registered with Registrar of Companies on 9th August 2012. We are a part of HCI group (NYSE: HCI) , a multinational conglomerate based at Tampa, FL,USA. The key focus of Exzeo is to improve the Insurance Sector using technology, analytics and data science
  • 4. Our Products and Services ATLAS VIEWER A data visualization product to view real-time feeds and massive datasets on a map. EXZEO HQ Cloud based process management and Intelligent automation for the insurance industry. PROPLET Innovative policy quoting application leveraging multiple proprietary data sources. TYPTAP A complete, quick and secure platform to access user’s insurance policies, and loss information JUSTER An intelligent app which helps to organize the claim inspections and sync information with Exzeo Cloud. HARMONY Project Harmony offers insurance solutions; right from buying a policy to filing a claim.
  • 5. Our Tech Stack Backend Frontend DataStorage DataScience DevOps / Platforms
  • 6. Data Science Problems @ Exzeo ● Property Risk Scoring from Multidimensional Data ● Detecting Roof Shape from Satellite Images ● Fraud Detection in Insurance Claims ● Claim Cause and Cost Prediction ● Knowledge Graph : Root Claim Cause Detection using NLP ● Climate Risk Forecasting ● Insurance Price Quoting Chatbot ● Object Detection from Property Interior Images
  • 8. $ python procure_data.py $ python clean_data.py $ python feature_engineering.py $ python exploratory_data_analysis.py $ python modelling.py $ python visualize_results.py Too many tasks
  • 11. Too Much Boilerplate Code If __name__ == ‘__main__’
  • 12. Solution - Pipeline Continuous Integration of data processing steps and analysis tasks
  • 13. Why use a pipeline - Reuse the models - Quick Implementation of Ideas - Focus more on science instead of engineering - Production ready products
  • 14. Pipelines in Python - Luigi ● Python tool for workflow task management ● Developed and maintained by Spotify ● Open Source: https://github.com/spotify/luigi pip install luigi
  • 15. What’s so special about Luigi ● Tasks Templating ● Tasks Scheduling ● Tasks Monitoring ● Command Line Integration ● Batch and Parallel Processing ● Dependency Graphs ● Failure Recovery and Error Emails
  • 20. Problem Statement: Building a Pipeline to predict the Performance Score of a mobile game user. The game consists of 120 different characters(heroes) and every hero has some capabilities. Input Data Training Data: User score for given characters Independent Variables: User ID, Character ID, User-Character ID, Num Tries, Boost Used(0/1), Attack Duration Dependent Variable: Performance Score Character Metadata: Data of each character Variables: Character ID, Character Type, Hitpoints
  • 21. Solution Pipeline ● Load Data ● Aggregate Data ● PreProcess Data ● Model Training ● Linear Regression ● Random Forest ● Model Selection ● Model Prediction
  • 23. - Not ideal for Streaming Data - No built in triggering(crontab or message broker is used) Limitations of Luigi
  • 24. Shivam Bansal | shivam5992@gmail.com | www.shivambansal.com Shwet Kamal Mishra | shwetmishraa@gmail.com | www.shwetkmishra.com Thanks !

Editor's Notes

  1. Manage, Monitor, Visualize
  2. BoilerPlate
  3. Data science tasks are repetitive, there needs to be some workflow which can reproduce set of tasks DS involves a long chain of sequential processes and failure can happen at any step There needs to be a framework which can help us resume the work from failure point Tasks should be able to generalise for different set of parameters When running a large process, monitoring is required to find the progress of tasks and find error at exact failure point