SlideShare a Scribd company logo
1 of 14
Download to read offline
Experimenting with Data!
Andrea Montemaggio
head of data practice @ mashfrog group
andrea.montemaggio@mashfrog.com
github.com/klinamen
linkedin.com/in/amontemaggio
Data Science Trend
Source: Google Trends
Keyword: “Data Science”
Wikipedia on “Data Science”
Data science is a "concept to unify statistics, data analysis, informatics, and their related methods" in order to
"understand and analyze actual phenomena" with data.
[3]
It uses techniques and theories drawn from many fields
within the context of mathematics, statistics, computer science, information science, and domain knowledge.
[4]
Workflow Overview
Exploratory
Data Analysis
Experimental loop
Data preparation
Feature selection
and extraction
Model selection
Evaluation
Solution
Prototyping
Scoping
& Data Acquisition
Data Analysis, Modeling
and Prototyping
Engineering &
Deployment
Monitoring
and Tuning
We are a wholesale distributor and want to improve our receivables collection strategy.
Being able to know as soon as possible whether an invoice is going to be paid on time or not, would allow us to plan ahead
and target our collection efforts to address the most critical situations first.
How long will it take to cash a given invoice?
Scenario
Scoping +
Invoice data Customers data
+ …
Data Acquisition
Data
preparation
Feature selection
and extraction
Model selection
Evaluation
Trained
Classification
Model
New invoice
on time
1-30 days late
> 30 days late
Supervised Machine-Learning
A trained classification model is able to assign an invoice
to one of a predefined set of classes.
Using historical enterprise data and machine-learning to predict whether
or not an invoice is likely to be paid on time can help organizations to
optimize invoice-to-cash flow.
Problem
How long will it take to cash a given invoice?
Classification
Regression
Predicting a discrete label.
(e.g. “on-time”, “1-30”)
Predicting a continuous quantity.
(e.g. 17.5 days)
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Dataset +
Invoice data Customers data
+ …
Data Acquisition
Data
preparation
Feature selection
and extraction
Model selection
Evaluation
Trained
Classification
Model
New invoice
on time
1-30 days late
> 30 days late
A trained classification model is able to assign an invoice
to one of a predefined set of classes.
Dataset:
https:/
/www.kaggle.com/datasets/himanshu007121/invoice-data
Description:
Wholesale invoice data extracted from some accounting system (SAP?) in
CSV format.
Each record describes a document and has, among others, these pieces
of information:
- the branch that issued the document
- customer information
- total amount
- due date
- payment date
Geometry (rows × cols): 50,000 × 19
Size: 7.17 MB
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Exploratory Data Analysis
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Getting to know your data.
Goals
- Data understanding
- Data Quality assessment (e.g. missing data,
encoding problems, inconsistencies)
- Assessing value distributions and
correlations
Tools
- Excel (!)
- Programming languages: Python1
, R1
- low/no code integrated data analysis tools such
as OpenRefine1
, Orange1
, KNIME, RapidMiner.
- statistical software packages
1
FLOSS (free or open source software)
Data Preparation
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Cook before eating.
Goals
- processing raw data (or primary data),
which is rarely ready to feed your
algorithms
- fix missing values and inconsistencies
- convert between different representations
of the same datum (e.g. dates, decimal
numbers)
Tools
- Python1
- Visual tools: OpenRefine1
, AWS Data Brew
1
FLOSS (free or open source software)
Feature Engineering
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Knowledge is power.
Goals
- using domain knowledge to augment data with derived information
(feature extraction), which usually leads to better performance of ML
models
- selecting the least number of features with the greatest significance
(feature selection)
- removing redundant or useless information
Model Selection and Training
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
One model does not fit all.
Goals
- identifying candidate models for the
problem and dataset at hand
- splitting the dataset into a training set and a
test set
- model training and performance of the
candidates
- optimization of hyperparameters (i.e. model
parameters that controls the learning
process) and fine-tuning to select “The One”
whole dataset
100%
training set
~80%
test set
~20%
model selection evaluation
Evaluation
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Is it really “The One”?
Goals
- testing the best candidate model on the test
set to see how it behaves with unseen data
(generalization)
Model complexity
(# of parameters)
Classification
Metrics
Exploratory
Data Analysis
Data preparation Feature engineering
Model selection
& training
Evaluation
Scoping & Data
Acquisition
Precision (“1-5” class)
TP / (TP + FP) = 6674 / 10087 =66.17%
10087 samples predicted as “1-5”: 6674 TP + 3413 FP
10542
(samples that are really “1-5”):
6674 TP + 3868 FN
Confusion Matrix
Recall (“1-5” class)
TP / (TP + FN) = 6674 / 10542 =63.31%
Accuracy (unweighted)
Pc / Pt = 23872 / 32382 = 73.72%
32382 total predictions (Pt):
23872 correct (Pc) + 8510 errors
correct predictions
Don’t Try This at Home!
Just clone the following repository and have fun!
https://github.com/klinamen/ds0-experimenting-with-data
Thank you.
Andrea Montemaggio
head of data practice @ mashfrog group
andrea.montemaggio@mashfrog.com
github.com/klinamen
linkedin.com/in/amontemaggio
experiments never fail.

More Related Content

What's hot

( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...Nicolas Sarramagna
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Crisp dm
Crisp dmCrisp dm
Crisp dmakbkck
 
Online retail a look at data consulting approach
Online retail   a look at data consulting approachOnline retail   a look at data consulting approach
Online retail a look at data consulting approachShesha R
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoTShivam Singh
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesRevolution Analytics
 
ms-ba-course-descriptions
ms-ba-course-descriptionsms-ba-course-descriptions
ms-ba-course-descriptionsAniket Joshi
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan SikdarTrend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdarraihansikdar
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataData Driven Innovation
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystifiedMarc Moreau
 
Mbaddar intro pred_anlaytics_spss
Mbaddar intro pred_anlaytics_spssMbaddar intro pred_anlaytics_spss
Mbaddar intro pred_anlaytics_spssM Baddar
 

What's hot (20)

Data analytics
Data analyticsData analytics
Data analytics
 
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Online retail a look at data consulting approach
Online retail   a look at data consulting approachOnline retail   a look at data consulting approach
Online retail a look at data consulting approach
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 
ms-ba-course-descriptions
ms-ba-course-descriptionsms-ba-course-descriptions
ms-ba-course-descriptions
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Data science guide
Data science guideData science guide
Data science guide
 
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan SikdarTrend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
Trend analysis-of-time-series-data-using-data-mining-techniques By Raihan Sikdar
 
Data analytics
Data analyticsData analytics
Data analytics
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big Data
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
Mbaddar intro pred_anlaytics_spss
Mbaddar intro pred_anlaytics_spssMbaddar intro pred_anlaytics_spss
Mbaddar intro pred_anlaytics_spss
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
 

Similar to Experimenting with Data!

Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxaudeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxroushhsiu
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Data-Driven Organisation
Data-Driven OrganisationData-Driven Organisation
Data-Driven OrganisationJaakko Särelä
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life CycleSrujanaMerugu1
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulationNguyen Ngoc Binh Phuong
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Dolapo Amusat
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Data Analytics Course in Chennai-January
Data Analytics Course in Chennai-JanuaryData Analytics Course in Chennai-January
Data Analytics Course in Chennai-JanuaryDataMites
 
Data Analytics Certification in Pune-January
Data Analytics Certification in Pune-JanuaryData Analytics Certification in Pune-January
Data Analytics Certification in Pune-JanuaryDataMites
 

Similar to Experimenting with Data! (20)

Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Analytics
AnalyticsAnalytics
Analytics
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Data-Driven Organisation
Data-Driven OrganisationData-Driven Organisation
Data-Driven Organisation
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Data Analytics Course in Chennai-January
Data Analytics Course in Chennai-JanuaryData Analytics Course in Chennai-January
Data Analytics Course in Chennai-January
 
Data Analytics Certification in Pune-January
Data Analytics Certification in Pune-JanuaryData Analytics Certification in Pune-January
Data Analytics Certification in Pune-January
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Experimenting with Data!

  • 1. Experimenting with Data! Andrea Montemaggio head of data practice @ mashfrog group andrea.montemaggio@mashfrog.com github.com/klinamen linkedin.com/in/amontemaggio
  • 2. Data Science Trend Source: Google Trends Keyword: “Data Science” Wikipedia on “Data Science” Data science is a "concept to unify statistics, data analysis, informatics, and their related methods" in order to "understand and analyze actual phenomena" with data. [3] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. [4]
  • 3. Workflow Overview Exploratory Data Analysis Experimental loop Data preparation Feature selection and extraction Model selection Evaluation Solution Prototyping Scoping & Data Acquisition Data Analysis, Modeling and Prototyping Engineering & Deployment Monitoring and Tuning
  • 4. We are a wholesale distributor and want to improve our receivables collection strategy. Being able to know as soon as possible whether an invoice is going to be paid on time or not, would allow us to plan ahead and target our collection efforts to address the most critical situations first. How long will it take to cash a given invoice? Scenario
  • 5. Scoping + Invoice data Customers data + … Data Acquisition Data preparation Feature selection and extraction Model selection Evaluation Trained Classification Model New invoice on time 1-30 days late > 30 days late Supervised Machine-Learning A trained classification model is able to assign an invoice to one of a predefined set of classes. Using historical enterprise data and machine-learning to predict whether or not an invoice is likely to be paid on time can help organizations to optimize invoice-to-cash flow. Problem How long will it take to cash a given invoice? Classification Regression Predicting a discrete label. (e.g. “on-time”, “1-30”) Predicting a continuous quantity. (e.g. 17.5 days) Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition
  • 6. Dataset + Invoice data Customers data + … Data Acquisition Data preparation Feature selection and extraction Model selection Evaluation Trained Classification Model New invoice on time 1-30 days late > 30 days late A trained classification model is able to assign an invoice to one of a predefined set of classes. Dataset: https:/ /www.kaggle.com/datasets/himanshu007121/invoice-data Description: Wholesale invoice data extracted from some accounting system (SAP?) in CSV format. Each record describes a document and has, among others, these pieces of information: - the branch that issued the document - customer information - total amount - due date - payment date Geometry (rows × cols): 50,000 × 19 Size: 7.17 MB Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition
  • 7. Exploratory Data Analysis Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition Getting to know your data. Goals - Data understanding - Data Quality assessment (e.g. missing data, encoding problems, inconsistencies) - Assessing value distributions and correlations Tools - Excel (!) - Programming languages: Python1 , R1 - low/no code integrated data analysis tools such as OpenRefine1 , Orange1 , KNIME, RapidMiner. - statistical software packages 1 FLOSS (free or open source software)
  • 8. Data Preparation Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition Cook before eating. Goals - processing raw data (or primary data), which is rarely ready to feed your algorithms - fix missing values and inconsistencies - convert between different representations of the same datum (e.g. dates, decimal numbers) Tools - Python1 - Visual tools: OpenRefine1 , AWS Data Brew 1 FLOSS (free or open source software)
  • 9. Feature Engineering Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition Knowledge is power. Goals - using domain knowledge to augment data with derived information (feature extraction), which usually leads to better performance of ML models - selecting the least number of features with the greatest significance (feature selection) - removing redundant or useless information
  • 10. Model Selection and Training Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition One model does not fit all. Goals - identifying candidate models for the problem and dataset at hand - splitting the dataset into a training set and a test set - model training and performance of the candidates - optimization of hyperparameters (i.e. model parameters that controls the learning process) and fine-tuning to select “The One” whole dataset 100% training set ~80% test set ~20% model selection evaluation
  • 11. Evaluation Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition Is it really “The One”? Goals - testing the best candidate model on the test set to see how it behaves with unseen data (generalization) Model complexity (# of parameters)
  • 12. Classification Metrics Exploratory Data Analysis Data preparation Feature engineering Model selection & training Evaluation Scoping & Data Acquisition Precision (“1-5” class) TP / (TP + FP) = 6674 / 10087 =66.17% 10087 samples predicted as “1-5”: 6674 TP + 3413 FP 10542 (samples that are really “1-5”): 6674 TP + 3868 FN Confusion Matrix Recall (“1-5” class) TP / (TP + FN) = 6674 / 10542 =63.31% Accuracy (unweighted) Pc / Pt = 23872 / 32382 = 73.72% 32382 total predictions (Pt): 23872 correct (Pc) + 8510 errors correct predictions
  • 13. Don’t Try This at Home! Just clone the following repository and have fun! https://github.com/klinamen/ds0-experimenting-with-data
  • 14. Thank you. Andrea Montemaggio head of data practice @ mashfrog group andrea.montemaggio@mashfrog.com github.com/klinamen linkedin.com/in/amontemaggio experiments never fail.