SlideShare a Scribd company logo
1 of 8
American Express Campus
Analyze This 2019
Final Submission
Team Details
Name Campus Roll No. Mobile No. Email Id
Krishna Priya IIT Roorkee 15411007 8910091352 kpriya@es.iitr.
ac.in
Manish
Kumar
Kushwaha
IIT Roorkee 15110013 9456522346 mkushwaha@
ar.iitr.ac.in
Team Name : KMAnalytica
Estimation Technique Used
Please provide the estimation/modeling technique(s)/approach
used to arrive at the solution/equation
• Iterative Imputer(not used but this was closest to the
approximation but certainly not good enough. )
• Feature transformation.
• Outlier removal.
• Creation of Interaction features(between variables).
• XGBoost, LightGBM.
• Model tuning and optimization.
Strategy to decide final list
Please provide the strategy employed to decide the final list for
submission
• Replaced VAR10 with forward filling upon observing the pattern.
• Identified outliers with quantiles and mean and removed them.
• Tried to impute missing values using iterative imputer but that did
not give satisfactory results.
• Reciprocal, square root and Gaussian transformation applied to
skewed variables.
• Prevented overfitting by controlling max_depth and
min_child_weight.
Details of each Variable used in the logic/model/strategy
Please provide details of each variable used in the final logic
These are few variables which I added on top of present features:
• FICO/CMV.
• Reported annual business revenue /Average amount paid towards card bills in the last 3 months.
• Risk score associated with probability of default*Average utilization of credit line in the last six months
(Balance/credit line) .
• Reported annual business income/Average amount paid towards card bills in the last 3 months.
• TSR/Months in Business.
• FICO/TSR
• TSR*CMV
On top of these I generated three statistical feature for each column which included:
• Mapped value count of each value in that column and with set max value limit of 10.
• Mapped (feature value-mean)*value count with min value limit 1
• Mapped feature value*value count with min limit as 2
Reasons for Technique(s) Used
Why do you think this is the best technique(s) for this particular
problem?
• First of all, since missing value could not be imputed with logic, so there were only 2
models left to explore on this dataset (LGBM,XGBOOST) , since most of the columns had
values missing completely at random.
• Also two types of feature engineering played a major role:
1) Interaction features between variables as they captured meanings and relations.
2) Statistical feature – surprisingly they worked on this data set as the values were scaled so
those interaction features could not work but this brought out meaning for the machine.
• Finally, XGBoost performed bit better than LGBM before and after tuning both as dataset
was not that large so time was not an issue so, opted for xgboost to gain that extra 0.5%.
Final Submission File
Please embed your final submission file (.csv) here.
THANK YOU

More Related Content

Similar to KMAnalytica_IITRoorkee.pptx

Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
Katy Lee
 
30thSep2014
30thSep201430thSep2014
30thSep2014
Mia liu
 

Similar to KMAnalytica_IITRoorkee.pptx (20)

Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
OTTO-Report
OTTO-ReportOTTO-Report
OTTO-Report
 
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATABINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
 
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATABINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run time
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
 
DS M1 full - KQB KtuQbank.pdf
DS M1 full - KQB KtuQbank.pdfDS M1 full - KQB KtuQbank.pdf
DS M1 full - KQB KtuQbank.pdf
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating System
Sampad Kar
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Lovely Professional University
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
Kamal Acharya
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
IJECEIAES
 

Recently uploaded (20)

Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating System
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 

KMAnalytica_IITRoorkee.pptx

  • 1. American Express Campus Analyze This 2019 Final Submission
  • 2. Team Details Name Campus Roll No. Mobile No. Email Id Krishna Priya IIT Roorkee 15411007 8910091352 kpriya@es.iitr. ac.in Manish Kumar Kushwaha IIT Roorkee 15110013 9456522346 mkushwaha@ ar.iitr.ac.in Team Name : KMAnalytica
  • 3. Estimation Technique Used Please provide the estimation/modeling technique(s)/approach used to arrive at the solution/equation • Iterative Imputer(not used but this was closest to the approximation but certainly not good enough. ) • Feature transformation. • Outlier removal. • Creation of Interaction features(between variables). • XGBoost, LightGBM. • Model tuning and optimization.
  • 4. Strategy to decide final list Please provide the strategy employed to decide the final list for submission • Replaced VAR10 with forward filling upon observing the pattern. • Identified outliers with quantiles and mean and removed them. • Tried to impute missing values using iterative imputer but that did not give satisfactory results. • Reciprocal, square root and Gaussian transformation applied to skewed variables. • Prevented overfitting by controlling max_depth and min_child_weight.
  • 5. Details of each Variable used in the logic/model/strategy Please provide details of each variable used in the final logic These are few variables which I added on top of present features: • FICO/CMV. • Reported annual business revenue /Average amount paid towards card bills in the last 3 months. • Risk score associated with probability of default*Average utilization of credit line in the last six months (Balance/credit line) . • Reported annual business income/Average amount paid towards card bills in the last 3 months. • TSR/Months in Business. • FICO/TSR • TSR*CMV On top of these I generated three statistical feature for each column which included: • Mapped value count of each value in that column and with set max value limit of 10. • Mapped (feature value-mean)*value count with min value limit 1 • Mapped feature value*value count with min limit as 2
  • 6. Reasons for Technique(s) Used Why do you think this is the best technique(s) for this particular problem? • First of all, since missing value could not be imputed with logic, so there were only 2 models left to explore on this dataset (LGBM,XGBOOST) , since most of the columns had values missing completely at random. • Also two types of feature engineering played a major role: 1) Interaction features between variables as they captured meanings and relations. 2) Statistical feature – surprisingly they worked on this data set as the values were scaled so those interaction features could not work but this brought out meaning for the machine. • Finally, XGBoost performed bit better than LGBM before and after tuning both as dataset was not that large so time was not an issue so, opted for xgboost to gain that extra 0.5%.
  • 7. Final Submission File Please embed your final submission file (.csv) here.