SlideShare a Scribd company logo
OPTIMIZING
IRRIGATION
TEAM-O
SONALI GULERIA
INCHARA B. DIWAKA
CONTENT
▸OBJECTIVE
‣ DATASET
‣ DATA PRE-PROCESSING
‣ BASELINE MODEL
‣ PREDICTION APPROACH 1
‣ PREDICTION APPROACH 2
‣ PREDICTION APPROACH 3
‣ MODEL COMPARISON : CHOOSING THE MODEL
‣ RECOMMENDATIONS
OBJECTIVE
‣ Build a prediction engine which can be used
by the government to forecast/budget the
irrigation demands of a village as well as
allow the farmers to plan their irrigation
methods.
‣ The model is used to predict Percentage of
Agricultural Land Irrigated.
‣ This will help to reduce the dependence on
rainfall and strengthen farming decisions by
following more empirical approach.
DATASET
▸Source: India Open data portal
▸Picked the states with a diverse socio-economic conditions
to have a broad representation of the data.
▸350+ dimensions spread across various domains-
population, education, irrigation sources, household etc at a
village level.
▸Merged rainfall data to add more information to the data.
DATA PRE-PROCESSING
SANITAT
IONAGRO
EDUCATI
ON
LAND
USAGE
CONNECTI
VITY
WATER
INDEXED
DATA
FEATURE SELECTION
DATA PRE-PROCESSING
▸Exhaustive Backward feature
selection.
▸Reduced the features from 51
(Indexed data) to 17 most
reflective features.
FEATURE
District Name
Total Geographical Area in Hectares
Total Population of Village
EDUCATION_GOVT
Water
Sanitation
AGRO_Rating
Power Supply For Agriculture Use Status (Active: 1 NA 2)
Power Supply For Commercial Use Summer April Sept per day in Hours
Power Supply For All Users Status (Active: 1 NA 2)
Agro_commodity
Manufacture
Area under Non Agricultural Uses in Hectares
Cultivable Waste Land Area in Hectares
Fallows Land other than Current Fallows Area in Hectares
Current Fallows Area in Hectares
Net Area Sown in Hectares
BASELINE MODEL
‣ Random Forest on all 350+ raw
predictors.
‣ Root Mean Square Error: 12.2
CLUSTER
ING
‣ Default Model- Mean of Percentage
Irrigated
‣ Root Mean Square Error: 29.16
REGRES
SION
PREDICTION PART 1
▸ State-wise clustering
using Partitioning Around
Mediods (PAM)
▸ 2 clusters based on
highest silhouette width.
CLUSTER
ING
THE BIG
PICTURE
▸10-cross validation- tuning on MTRY parameter.
▸Achieved R-square of approx. .95 for 12 mtry
PREDICTION PART 1(CONTD)
BAGGING
BOOSTIN
G
▸Gradient Boosting algorithm trained on parameters-
n.trees and interaction depth with constant learning
rate and minimum number of observations in trees.
LINEAR
REGRESSION
▸10-cross validation performed to calculate OLS estimates
▸Performed 10- cross
validation to perform
regularized regression on
both raw and indexed Data.
▸Better performance for
Indexed Data.
▸Lasso (penalty, alpha=1)
outperformed ridge (alpha
=0) and average (alpha
=0.5).
PREDICTION PART 2
REGULARIZED
REGRESSION
Top Left: Lasso (MSE vs log(Lamda)) Top Right: Mid (MSE vs log(Lamda))
Bottom: Ridge (MSE vs log(Lamda)) Bottom Right: RMSE comparison of all three
▸Performed PCA on both raw and indexed data.
▸Higher performance by indexed data.
▸Performed 10-cross validation regression using various top
principal components.
PREDICTION PART 3
PRINCIPAL
COMPONENT
ANALYSIS
Principal Component Variance R-squared CV RMSE
PC5 (Knee) 52.00% 0.28 24
PC15 81.22% 0.436 21.4
PC19 89.78% 0.498 20.1
MODEL COMPARISON
▸Bagging outperforms with the lowest RMSE and Highest Lift.
DEFAULT MODEL RMSE MODEL RMSE LIFT
Random Forest 12.2 Bagging 5.12 0.580328
Random Forest 12.2 Boosting 7.62 0.37541
Mean 29.16
Linear
Regression 13.8 0.526749
Mean 29.16 Lasso 18.2 0.375857
Mean 29.16 Ridge 18.3 0.372428
Mean 29.16 Mid 18.3 0.372428
Mean 29.16 PC model 23.9 0.180384
RECOMMENDATIONS
▸The most important features in our data set are- Power
supply, Electricity and Education.
▸There are various other features that have more of a
correlated relationship and present an empirical
representation of socio-economic condition of villages.
▸These features can be used by government to drive the
budget and other village related policies.
▸Model can be further advanced by incorporating other rich
features like temperature, soil composition and type.

More Related Content

Viewers also liked

CrowdANALTIX Data Competition Visualizing Deals
CrowdANALTIX Data Competition Visualizing DealsCrowdANALTIX Data Competition Visualizing Deals
CrowdANALTIX Data Competition Visualizing Deals
Sawinder Pal Kaur
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
Harri Hämäläinen
 
Predictive Model
Predictive ModelPredictive Model
Predictive Model
ModakAnalytics
 
Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)
Gabriel Moreira
 
Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
Ícaro Medeiros
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
Arc & Codementor
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
Gabriel Moreira
 
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
The Future of Personalized Health Care: Predictive Analytics by @Rock_HealthThe Future of Personalized Health Care: Predictive Analytics by @Rock_Health
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
Rock Health
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
MachinePulse
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
Predictive Analytics as a Product
Predictive Analytics as a Product Predictive Analytics as a Product
Predictive Analytics as a Product
Ramkumar Ravichandran
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Big Data
Big DataBig Data
Big Data
NGDATA
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
Matt Harrison
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
Nowell Strite
 

Viewers also liked (20)

Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
CrowdANALTIX Data Competition Visualizing Deals
CrowdANALTIX Data Competition Visualizing DealsCrowdANALTIX Data Competition Visualizing Deals
CrowdANALTIX Data Competition Visualizing Deals
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Predictive Model
Predictive ModelPredictive Model
Predictive Model
 
Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)Python for Data Science - Python Brasil 11 (2015)
Python for Data Science - Python Brasil 11 (2015)
 
Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
The Future of Personalized Health Care: Predictive Analytics by @Rock_HealthThe Future of Personalized Health Care: Predictive Analytics by @Rock_Health
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Predictive Analytics as a Product
Predictive Analytics as a Product Predictive Analytics as a Product
Predictive Analytics as a Product
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 

Similar to Predictive modelling

big data.pptx
big data.pptxbig data.pptx
big data.pptx
TejashreeKumar3
 
paper 2.pdf
paper 2.pdfpaper 2.pdf
paper 2.pdf
ssuserb22f5a
 
Landscape Capacity Analysis For Ventura County
Landscape Capacity Analysis For Ventura CountyLandscape Capacity Analysis For Ventura County
Landscape Capacity Analysis For Ventura CountyEcotrust
 
EcoTas13 Hutchinson e-MAST ANU
EcoTas13 Hutchinson e-MAST ANUEcoTas13 Hutchinson e-MAST ANU
EcoTas13 Hutchinson e-MAST ANU
TERN Australia
 
Big ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of SydneyBig ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of Sydney
Amanda Woods
 
G-Range: An intermediate complexity model for simulating and forecasting ecos...
G-Range: An intermediate complexity model for simulating and forecasting ecos...G-Range: An intermediate complexity model for simulating and forecasting ecos...
G-Range: An intermediate complexity model for simulating and forecasting ecos...
ILRI
 
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture DataIRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
 
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining TechniquesIRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET Journal
 
Crop.pptx
Crop.pptxCrop.pptx
Crop.pptx
ssuser0ef37c
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinMinchao Lin
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
Kazi Toufiq Wadud
 
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
IRJET Journal
 
Development of Effective Crop Monitoring and Management System with Weather R...
Development of Effective Crop Monitoring and Management System with Weather R...Development of Effective Crop Monitoring and Management System with Weather R...
Development of Effective Crop Monitoring and Management System with Weather R...
IRJET Journal
 
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
PRBETTER
 
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
IRJET Journal
 
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall PredictionComparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
ijtsrd
 
Maximize the value of Earth Observation Data in a Big Data World
Maximize the value of Earth Observation Data in a Big Data WorldMaximize the value of Earth Observation Data in a Big Data World
Maximize the value of Earth Observation Data in a Big Data World
BYTE Project
 
data_analytics_2014_5_30_60155
data_analytics_2014_5_30_60155data_analytics_2014_5_30_60155
data_analytics_2014_5_30_60155Neil Dahlqvist
 
Crop yield prediction.pdf
Crop yield prediction.pdfCrop yield prediction.pdf
Crop yield prediction.pdf
ssuserb22f5a
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1Mathew Prindle
 

Similar to Predictive modelling (20)

big data.pptx
big data.pptxbig data.pptx
big data.pptx
 
paper 2.pdf
paper 2.pdfpaper 2.pdf
paper 2.pdf
 
Landscape Capacity Analysis For Ventura County
Landscape Capacity Analysis For Ventura CountyLandscape Capacity Analysis For Ventura County
Landscape Capacity Analysis For Ventura County
 
EcoTas13 Hutchinson e-MAST ANU
EcoTas13 Hutchinson e-MAST ANUEcoTas13 Hutchinson e-MAST ANU
EcoTas13 Hutchinson e-MAST ANU
 
Big ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of SydneyBig ideas for using data by Brett Whelan University of Sydney
Big ideas for using data by Brett Whelan University of Sydney
 
G-Range: An intermediate complexity model for simulating and forecasting ecos...
G-Range: An intermediate complexity model for simulating and forecasting ecos...G-Range: An intermediate complexity model for simulating and forecasting ecos...
G-Range: An intermediate complexity model for simulating and forecasting ecos...
 
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture DataIRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
 
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining TechniquesIRJET - Agricultural Analysis using Data Mining Techniques
IRJET - Agricultural Analysis using Data Mining Techniques
 
Crop.pptx
Crop.pptxCrop.pptx
Crop.pptx
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
Supervise Machine Learning Approach for Crop Yield Prediction in Agriculture ...
 
Development of Effective Crop Monitoring and Management System with Weather R...
Development of Effective Crop Monitoring and Management System with Weather R...Development of Effective Crop Monitoring and Management System with Weather R...
Development of Effective Crop Monitoring and Management System with Weather R...
 
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
Better Hackathon 2020 - WFP - Enhancing Agricultural Mapping With BETTER Pipe...
 
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
Agricultural Product Price and Crop Cultivation Prediction based on Data Scie...
 
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall PredictionComparative Study of Machine Learning Algorithms for Rainfall Prediction
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
 
Maximize the value of Earth Observation Data in a Big Data World
Maximize the value of Earth Observation Data in a Big Data WorldMaximize the value of Earth Observation Data in a Big Data World
Maximize the value of Earth Observation Data in a Big Data World
 
data_analytics_2014_5_30_60155
data_analytics_2014_5_30_60155data_analytics_2014_5_30_60155
data_analytics_2014_5_30_60155
 
Crop yield prediction.pdf
Crop yield prediction.pdfCrop yield prediction.pdf
Crop yield prediction.pdf
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

Predictive modelling

  • 2. CONTENT ▸OBJECTIVE ‣ DATASET ‣ DATA PRE-PROCESSING ‣ BASELINE MODEL ‣ PREDICTION APPROACH 1 ‣ PREDICTION APPROACH 2 ‣ PREDICTION APPROACH 3 ‣ MODEL COMPARISON : CHOOSING THE MODEL ‣ RECOMMENDATIONS
  • 3. OBJECTIVE ‣ Build a prediction engine which can be used by the government to forecast/budget the irrigation demands of a village as well as allow the farmers to plan their irrigation methods. ‣ The model is used to predict Percentage of Agricultural Land Irrigated. ‣ This will help to reduce the dependence on rainfall and strengthen farming decisions by following more empirical approach.
  • 4. DATASET ▸Source: India Open data portal ▸Picked the states with a diverse socio-economic conditions to have a broad representation of the data. ▸350+ dimensions spread across various domains- population, education, irrigation sources, household etc at a village level. ▸Merged rainfall data to add more information to the data.
  • 6. FEATURE SELECTION DATA PRE-PROCESSING ▸Exhaustive Backward feature selection. ▸Reduced the features from 51 (Indexed data) to 17 most reflective features. FEATURE District Name Total Geographical Area in Hectares Total Population of Village EDUCATION_GOVT Water Sanitation AGRO_Rating Power Supply For Agriculture Use Status (Active: 1 NA 2) Power Supply For Commercial Use Summer April Sept per day in Hours Power Supply For All Users Status (Active: 1 NA 2) Agro_commodity Manufacture Area under Non Agricultural Uses in Hectares Cultivable Waste Land Area in Hectares Fallows Land other than Current Fallows Area in Hectares Current Fallows Area in Hectares Net Area Sown in Hectares
  • 7. BASELINE MODEL ‣ Random Forest on all 350+ raw predictors. ‣ Root Mean Square Error: 12.2 CLUSTER ING ‣ Default Model- Mean of Percentage Irrigated ‣ Root Mean Square Error: 29.16 REGRES SION
  • 8. PREDICTION PART 1 ▸ State-wise clustering using Partitioning Around Mediods (PAM) ▸ 2 clusters based on highest silhouette width. CLUSTER ING THE BIG PICTURE
  • 9. ▸10-cross validation- tuning on MTRY parameter. ▸Achieved R-square of approx. .95 for 12 mtry PREDICTION PART 1(CONTD) BAGGING BOOSTIN G ▸Gradient Boosting algorithm trained on parameters- n.trees and interaction depth with constant learning rate and minimum number of observations in trees. LINEAR REGRESSION ▸10-cross validation performed to calculate OLS estimates
  • 10. ▸Performed 10- cross validation to perform regularized regression on both raw and indexed Data. ▸Better performance for Indexed Data. ▸Lasso (penalty, alpha=1) outperformed ridge (alpha =0) and average (alpha =0.5). PREDICTION PART 2 REGULARIZED REGRESSION Top Left: Lasso (MSE vs log(Lamda)) Top Right: Mid (MSE vs log(Lamda)) Bottom: Ridge (MSE vs log(Lamda)) Bottom Right: RMSE comparison of all three
  • 11. ▸Performed PCA on both raw and indexed data. ▸Higher performance by indexed data. ▸Performed 10-cross validation regression using various top principal components. PREDICTION PART 3 PRINCIPAL COMPONENT ANALYSIS Principal Component Variance R-squared CV RMSE PC5 (Knee) 52.00% 0.28 24 PC15 81.22% 0.436 21.4 PC19 89.78% 0.498 20.1
  • 12. MODEL COMPARISON ▸Bagging outperforms with the lowest RMSE and Highest Lift. DEFAULT MODEL RMSE MODEL RMSE LIFT Random Forest 12.2 Bagging 5.12 0.580328 Random Forest 12.2 Boosting 7.62 0.37541 Mean 29.16 Linear Regression 13.8 0.526749 Mean 29.16 Lasso 18.2 0.375857 Mean 29.16 Ridge 18.3 0.372428 Mean 29.16 Mid 18.3 0.372428 Mean 29.16 PC model 23.9 0.180384
  • 13. RECOMMENDATIONS ▸The most important features in our data set are- Power supply, Electricity and Education. ▸There are various other features that have more of a correlated relationship and present an empirical representation of socio-economic condition of villages. ▸These features can be used by government to drive the budget and other village related policies. ▸Model can be further advanced by incorporating other rich features like temperature, soil composition and type.