SlideShare a Scribd company logo
RESUME
CLASSIFICATION
1 . ) M R . M O I N D A L V I
2 . ) M R . Z O H E B K A Z I
3 . ) M R . S O U D A L H O D A
5 . ) M R . A N A N D J A G D A L E
6 . ) M R . S W A P N I L W A D K A R
7 . ) M R . N A G E N D R A P
4 . ) S N E H A L L A W A N D E
B U S I N E S S O B J E C T I V E -
The document classification solution should significantly reduce the manual human effort in the HRM. It
should achieve a higher level of accuracy and automation with minimal human intervention
Abstract:
A resume is a brief summary of your skills and experience. Companies recruiters and HR teams have a tough time scanning
thousands of qualified resumes. Spending too many labor hours segregating candidates resume's manually is a waste of a
company's time, money, and productivity. Recruiters, therefore, use resume classification in order to streamline the resume
and applicant screening process. NLP technology allows recruiters to electronically gather, store, and organize large
quantities of resumes. Once acquired, the resume data can be easily searched through and analyzed.
Resumes are an ideal example of unstructured data. Since there is no widely accepted resume layout, each resume may have
its own style of formatting, different text blocks and different category titles. Building a resume classification and gathering
text from it is no easy task as there are so many kinds of layouts of resumes that you could imagine
I N T R O D U C T I O N :
In this project we dive into building a Machine learning model for Resume Classification using Python and basic Natural language
processing techniques. We would be using Python's libraries to implement various NLP (natural language processing) techniques like tokenization,
lemmatization, parts of speech tagging, etc.
A resume classification technology needs to be implemented in order to make it easy for the companies to process the huge number of
resumes that are received by the organizations. This technology converts an unstructured form of resume data into a structured data format. The
resumes received are in the form of documents from which the data needs to be extracted first such that the text can be classified or predicted based
on the requirements. A resume classification analyzes resume data and extracts the information into the machine readable output. It helps
automatically store, organize, and analyze the resume data to find out the candidate for the particular job position and requirements. This thus helps
the organizations eliminate the error-prone and time-consuming process of going through thousands of resumes manually and aids in improving the
recruiters’ efficiency.
The basic data analysis process is performed such as data collection, data cleaning, exploratory data analysis, data visualization,
and model building. The dataset consists of two columns, namely, Role Applied and Resume, where ‘role applied’ column is the domain field of
the industry and ‘resume’ column consists of the text extracted from the resume document for each domain and industry.
The aim of this project is achieved by performing the various data analytical methods and using the Machine Learning models and
Natural Language Processing which will help in classifying the categories of the resume and building the Resume Classification Model.
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
In this project we have total 9 types of Profiles in the Resumes, and the most of them are for Workday Profile.
E X P L O R A T O R Y D A T A A N A L Y S I S :
Extracting Text from different Resume files and creating a data-frame with Column of Text from
Resumes And Profile for which each of it Applied for.
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
Converting Extracted Above Data into a Data-Frame
To use this as Features (Predictors, Attributes or Input) for Model to Predict the different Classes
Text pre-processing includes converting to lowercase, removing spaces, html links, emails, symbols, numbers,
stop-words, tokenization and lemmatization.
Removing All Unwanted Character’s
Word Tokenization - Tokenization is essentially splitting a phrase, sentence, paragraph, or an entire text document into
smaller units, such as individual words or terms. Each of these smaller units are called tokens.
Removing Stop-words - A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been
programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.
T E X T P R E - P R O C E S S I N G :
Before Text pre-processing
T E X T P R E - P R O C E S S I N G :
After Text pre-processing
Before Applying Porter Stemming
 The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and
inflectional endings from words in English.
T E X T P R E - P R O C E S S I N G :
After Applying Porter Stemming
E X P L O R A T O R Y D A T A A N A L Y S I S :
W O R D C L O U D :
E X P L O R A T O R Y D A T A A N A L Y S I S :
10 most common words used in each Profile Resumes
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
E X P L O R A T O R Y D A T A A N A L Y S I S :
Classes in the Data-Frame
Plotting Classes for Insights
There are Total 4 Classes in the Data Frame which means this a Multiclass Classification Problem.
Imbalance found in the dataset we can use Oversampling Techniques.
E X P L O R A T O R Y D A T A A N A L Y S I S :
10 Most Common Words Used in Different Classes
Count Vectorizer
with N-grams (Bigrams & Trigrams)
F E A T U R E E N G I N E E R I N G :
TF-IDF Vectorizer
with N-grams (Bigrams & Trigrams)
F E A T U R E E N G I N E E R I N G :
F E A T U R E E N G I N E E R I N G :
Problems with imbalanced data classification
If explained it in a very simple manner, the main problem with imbalanceddataset prediction is how accurately are we actually predicting both
majorityand minority class?
•SMOTE: Synthetic Minority Oversampling Technique
SMOTE is an oversampling technique where the synthetic samples are generated for the minority class. This algorithm helps to
overcome the overfitting problem posed by random oversampling. It focuses on the feature space to generate new instances with
the help of interpolation between the positive instances that lie together.
T R A I N T E S T S P L I T :
Problems with Random Data Splitting
If explained it in a very simple manner, the main problem is random splitting the data the ratio of the classes does not reflect on training and
testing. Due to random splitting one class can be heavily sampled in training and creating majorityand minority class issue ( ImbalancedData)
which will give rise to bad scores on test data and overall performance and misclassification.
•Stratified Samling:
In stratifiedSampling the ratio of all the classes is maintained on both training and testing data thus this type of Split results in good accuracy
and overall model building performance.
F E A T U R E E N G I N E E R I N G :
Before Oversampling After Oversampling
Sometimes when the records of a certain class are much more than the other class, our classifier may get biased towards the
prediction. Thus our traditional approach of classification and model accuracy calculation is not useful in the case of the
imbalanced dataset
F E A T U R E E N G I N E E R I N G :
Sometimes when the records of a certain class are much more than the other class, our classifier may get biased towards the
prediction. In this case, the confusion matrix for the classification problem shows how well our model classifies the target
classes and we arrive at the accuracy of the model from the confusion matrix.
M O D E L B U I L D I N G :
If we do random sampling to split the dataset into training set and test set. Then we might get a
majority of one of the class in training and minority of other in testing. If we train our model
obviously we will be getting bad evaluation scores.
Stratified sampling is the solution to maintain the ratio of all classes in both training as well as in
testing data
M O D E L B U I L D I N G :
The solution for the first problem where we were able to get different accuracy scores for different
random state parameter values is to use K-Fold Cross-Validation. But K-Fold Cross Validation also
suffers from the second problem i.e. random sampling.
The solution for both the first and second problems is to use Stratified K-Fold Cross-Validation.
Stratified k-fold cross-validation is the same as just k-fold cross-validation, But Stratified k-fold cross-
validation, it does stratified sampling instead of random sampling.
M O D E L E V A L U A T I O N :
Accuracy on Test Data
Precision on Test Data
Recall Score on Test Data
F1-Score on Test Data
M O D E L E V A L U A T I O N :
Random Forest Classification Model has 100% Accuracy on Test as well on Training Dataset.
0% Error . 100% Recall , Precision and F1-Score. No Overfitting, Underfitting or any Misclassification
M O D E L S E L E C T I O N :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
D E P L O Y M E N T :
Resume_Clasification.pptx

More Related Content

What's hot

はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
Yoshiki Hayama
 
これからはじめる Power Platform
これからはじめる Power Platformこれからはじめる Power Platform
これからはじめる Power Platform
Rie Okuda
 
「顧客の声を聞かない」とはどういうことか
「顧客の声を聞かない」とはどういうことか「顧客の声を聞かない」とはどういうことか
「顧客の声を聞かない」とはどういうことか
Yoshiki Hayama
 
ノーコードでAIサービスを使ってみよう!「AI Bulder」
ノーコードでAIサービスを使ってみよう!「AI Bulder」ノーコードでAIサービスを使ってみよう!「AI Bulder」
ノーコードでAIサービスを使ってみよう!「AI Bulder」
典子 松本
 
Webマーケティングの全体像
Webマーケティングの全体像Webマーケティングの全体像
Webマーケティングの全体像
ナイル株式会社
 
DMMで自己組織化に向けてやってきたこと
DMMで自己組織化に向けてやってきたことDMMで自己組織化に向けてやってきたこと
DMMで自己組織化に向けてやってきたこと
satoshinaito3
 

What's hot (6)

はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
はじめてのUXデザイン、はじめてのデザイン思考 〜現場で使えるように〜:ISE Technical Conference 2018
 
これからはじめる Power Platform
これからはじめる Power Platformこれからはじめる Power Platform
これからはじめる Power Platform
 
「顧客の声を聞かない」とはどういうことか
「顧客の声を聞かない」とはどういうことか「顧客の声を聞かない」とはどういうことか
「顧客の声を聞かない」とはどういうことか
 
ノーコードでAIサービスを使ってみよう!「AI Bulder」
ノーコードでAIサービスを使ってみよう!「AI Bulder」ノーコードでAIサービスを使ってみよう!「AI Bulder」
ノーコードでAIサービスを使ってみよう!「AI Bulder」
 
Webマーケティングの全体像
Webマーケティングの全体像Webマーケティングの全体像
Webマーケティングの全体像
 
DMMで自己組織化に向けてやってきたこと
DMMで自己組織化に向けてやってきたことDMMで自己組織化に向けてやってきたこと
DMMで自己組織化に向けてやってきたこと
 

Similar to Resume_Clasification.pptx

Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
MOINDALVS
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
IRJET Journal
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
Aditya Bhattacharya
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
Manohar Swamynathan
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
honey725342
 
Building an Immersive, Interactive Customer Experience using AI and Augmented...
Building an Immersive, Interactive Customer Experience using AI and Augmented...Building an Immersive, Interactive Customer Experience using AI and Augmented...
Building an Immersive, Interactive Customer Experience using AI and Augmented...
Amazon Web Services
 
Gender voice recognition.pptx
Gender voice recognition.pptxGender voice recognition.pptx
Gender voice recognition.pptx
Rohith572864
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
IRJET Journal
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 
What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)
Randa Elanwar
 
Data Science Using Python.pptx
Data Science Using Python.pptxData Science Using Python.pptx
Data Science Using Python.pptx
Sarkunavathi Aribal
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
Lucinda Linde
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 
Top 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdfTop 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdf
Suraj Kumar
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Feature Engineering in NLP.pdf
Feature Engineering in NLP.pdfFeature Engineering in NLP.pdf
Feature Engineering in NLP.pdf
bilaje4244prolugcom
 
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docxCase Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
wendolynhalbert
 

Similar to Resume_Clasification.pptx (20)

Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
 
Building an Immersive, Interactive Customer Experience using AI and Augmented...
Building an Immersive, Interactive Customer Experience using AI and Augmented...Building an Immersive, Interactive Customer Experience using AI and Augmented...
Building an Immersive, Interactive Customer Experience using AI and Augmented...
 
Gender voice recognition.pptx
Gender voice recognition.pptxGender voice recognition.pptx
Gender voice recognition.pptx
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Vikram emerging technologies
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)
 
Data Science Using Python.pptx
Data Science Using Python.pptxData Science Using Python.pptx
Data Science Using Python.pptx
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Top 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdfTop 40 Data Science Interview Questions and Answers 2022.pdf
Top 40 Data Science Interview Questions and Answers 2022.pdf
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Feature Engineering in NLP.pdf
Feature Engineering in NLP.pdfFeature Engineering in NLP.pdf
Feature Engineering in NLP.pdf
 
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docxCase Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
 

Recently uploaded

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

Resume_Clasification.pptx

  • 1. RESUME CLASSIFICATION 1 . ) M R . M O I N D A L V I 2 . ) M R . Z O H E B K A Z I 3 . ) M R . S O U D A L H O D A 5 . ) M R . A N A N D J A G D A L E 6 . ) M R . S W A P N I L W A D K A R 7 . ) M R . N A G E N D R A P 4 . ) S N E H A L L A W A N D E
  • 2. B U S I N E S S O B J E C T I V E - The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Abstract: A resume is a brief summary of your skills and experience. Companies recruiters and HR teams have a tough time scanning thousands of qualified resumes. Spending too many labor hours segregating candidates resume's manually is a waste of a company's time, money, and productivity. Recruiters, therefore, use resume classification in order to streamline the resume and applicant screening process. NLP technology allows recruiters to electronically gather, store, and organize large quantities of resumes. Once acquired, the resume data can be easily searched through and analyzed. Resumes are an ideal example of unstructured data. Since there is no widely accepted resume layout, each resume may have its own style of formatting, different text blocks and different category titles. Building a resume classification and gathering text from it is no easy task as there are so many kinds of layouts of resumes that you could imagine
  • 3. I N T R O D U C T I O N : In this project we dive into building a Machine learning model for Resume Classification using Python and basic Natural language processing techniques. We would be using Python's libraries to implement various NLP (natural language processing) techniques like tokenization, lemmatization, parts of speech tagging, etc. A resume classification technology needs to be implemented in order to make it easy for the companies to process the huge number of resumes that are received by the organizations. This technology converts an unstructured form of resume data into a structured data format. The resumes received are in the form of documents from which the data needs to be extracted first such that the text can be classified or predicted based on the requirements. A resume classification analyzes resume data and extracts the information into the machine readable output. It helps automatically store, organize, and analyze the resume data to find out the candidate for the particular job position and requirements. This thus helps the organizations eliminate the error-prone and time-consuming process of going through thousands of resumes manually and aids in improving the recruiters’ efficiency. The basic data analysis process is performed such as data collection, data cleaning, exploratory data analysis, data visualization, and model building. The dataset consists of two columns, namely, Role Applied and Resume, where ‘role applied’ column is the domain field of the industry and ‘resume’ column consists of the text extracted from the resume document for each domain and industry. The aim of this project is achieved by performing the various data analytical methods and using the Machine Learning models and Natural Language Processing which will help in classifying the categories of the resume and building the Resume Classification Model.
  • 4. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 5. E X P L O R A T O R Y D A T A A N A L Y S I S : In this project we have total 9 types of Profiles in the Resumes, and the most of them are for Workday Profile.
  • 6. E X P L O R A T O R Y D A T A A N A L Y S I S : Extracting Text from different Resume files and creating a data-frame with Column of Text from Resumes And Profile for which each of it Applied for.
  • 7. F E A T U R E E N G I N E E R I N G :
  • 8. F E A T U R E E N G I N E E R I N G :
  • 9. F E A T U R E E N G I N E E R I N G :
  • 10. F E A T U R E E N G I N E E R I N G :
  • 11. F E A T U R E E N G I N E E R I N G :
  • 12. F E A T U R E E N G I N E E R I N G :
  • 13. F E A T U R E E N G I N E E R I N G : Converting Extracted Above Data into a Data-Frame To use this as Features (Predictors, Attributes or Input) for Model to Predict the different Classes
  • 14. Text pre-processing includes converting to lowercase, removing spaces, html links, emails, symbols, numbers, stop-words, tokenization and lemmatization. Removing All Unwanted Character’s Word Tokenization - Tokenization is essentially splitting a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms. Each of these smaller units are called tokens. Removing Stop-words - A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. T E X T P R E - P R O C E S S I N G :
  • 15. Before Text pre-processing T E X T P R E - P R O C E S S I N G : After Text pre-processing
  • 16. Before Applying Porter Stemming  The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflectional endings from words in English. T E X T P R E - P R O C E S S I N G : After Applying Porter Stemming
  • 17. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 18. W O R D C L O U D :
  • 19. E X P L O R A T O R Y D A T A A N A L Y S I S : 10 most common words used in each Profile Resumes
  • 20. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 21. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 22. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 23. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 24. E X P L O R A T O R Y D A T A A N A L Y S I S :
  • 25. E X P L O R A T O R Y D A T A A N A L Y S I S : Classes in the Data-Frame Plotting Classes for Insights There are Total 4 Classes in the Data Frame which means this a Multiclass Classification Problem. Imbalance found in the dataset we can use Oversampling Techniques.
  • 26. E X P L O R A T O R Y D A T A A N A L Y S I S : 10 Most Common Words Used in Different Classes
  • 27. Count Vectorizer with N-grams (Bigrams & Trigrams) F E A T U R E E N G I N E E R I N G :
  • 28. TF-IDF Vectorizer with N-grams (Bigrams & Trigrams) F E A T U R E E N G I N E E R I N G :
  • 29. F E A T U R E E N G I N E E R I N G : Problems with imbalanced data classification If explained it in a very simple manner, the main problem with imbalanceddataset prediction is how accurately are we actually predicting both majorityand minority class? •SMOTE: Synthetic Minority Oversampling Technique SMOTE is an oversampling technique where the synthetic samples are generated for the minority class. This algorithm helps to overcome the overfitting problem posed by random oversampling. It focuses on the feature space to generate new instances with the help of interpolation between the positive instances that lie together.
  • 30. T R A I N T E S T S P L I T : Problems with Random Data Splitting If explained it in a very simple manner, the main problem is random splitting the data the ratio of the classes does not reflect on training and testing. Due to random splitting one class can be heavily sampled in training and creating majorityand minority class issue ( ImbalancedData) which will give rise to bad scores on test data and overall performance and misclassification. •Stratified Samling: In stratifiedSampling the ratio of all the classes is maintained on both training and testing data thus this type of Split results in good accuracy and overall model building performance.
  • 31. F E A T U R E E N G I N E E R I N G : Before Oversampling After Oversampling Sometimes when the records of a certain class are much more than the other class, our classifier may get biased towards the prediction. Thus our traditional approach of classification and model accuracy calculation is not useful in the case of the imbalanced dataset
  • 32. F E A T U R E E N G I N E E R I N G : Sometimes when the records of a certain class are much more than the other class, our classifier may get biased towards the prediction. In this case, the confusion matrix for the classification problem shows how well our model classifies the target classes and we arrive at the accuracy of the model from the confusion matrix.
  • 33. M O D E L B U I L D I N G : If we do random sampling to split the dataset into training set and test set. Then we might get a majority of one of the class in training and minority of other in testing. If we train our model obviously we will be getting bad evaluation scores. Stratified sampling is the solution to maintain the ratio of all classes in both training as well as in testing data
  • 34. M O D E L B U I L D I N G : The solution for the first problem where we were able to get different accuracy scores for different random state parameter values is to use K-Fold Cross-Validation. But K-Fold Cross Validation also suffers from the second problem i.e. random sampling. The solution for both the first and second problems is to use Stratified K-Fold Cross-Validation. Stratified k-fold cross-validation is the same as just k-fold cross-validation, But Stratified k-fold cross- validation, it does stratified sampling instead of random sampling.
  • 35. M O D E L E V A L U A T I O N : Accuracy on Test Data Precision on Test Data Recall Score on Test Data F1-Score on Test Data
  • 36. M O D E L E V A L U A T I O N :
  • 37. Random Forest Classification Model has 100% Accuracy on Test as well on Training Dataset. 0% Error . 100% Recall , Precision and F1-Score. No Overfitting, Underfitting or any Misclassification M O D E L S E L E C T I O N :
  • 38. D E P L O Y M E N T :
  • 39. D E P L O Y M E N T :
  • 40. D E P L O Y M E N T :
  • 41. D E P L O Y M E N T :
  • 42. D E P L O Y M E N T :
  • 43. D E P L O Y M E N T :
  • 44. D E P L O Y M E N T :
  • 45. D E P L O Y M E N T :
  • 46. D E P L O Y M E N T :