Fake Job Recruitment Detection Using
Machine Learning Approach
ABSTRACT
• To avoid fraudulent post for job in the internet, an automated tool
using machine learning based classification techniques is proposed in
the project.
• Different classifiers are used for checking fraudulent post in the web
and the results of those classifiers are compared for identifying the
best employment scam detection model.
• It helps in detecting fake job posts from an enormous number of
posts.
• Two major types of classifiers, such as single classifier and ensemble
classifiers are considered for fraudulent job posts detection.
• However, experimental results indicate that ensemble classifiers are
the best classification to detect scams over the single classifiers.
INTRODUCTION
• Employment scam is one of the serious issues in recent times
addressed in the domain of Online Recruitment Frauds (ORF).
• In recent days, many companies prefer to post their
vacancies online so that these can be accessed easily and
timely by the job-seekers.
• However, this intention may be one type of scam by the fraud
people because they offer employment to job-seekers in
terms of taking money from them.
• Fraudulent job advertisements can be posted against a
reputed company for violating their credibility.
4
LITERATURE SURVEY
TITLE AUTHOR Pros Cons DATASET METRICES
A Survey of Machine Learning
Based Approaches for blood
Disease Prediction 2021
Shubham Bind1 Used Realtime NIR
Images
Less Accuracy NIR Dataset 75
Classifcation of blood’s
disease and its stages using
machine learning
2022
John MichaelTempleton active NIR
patches are
extracted
The training
database consist less
image
NIR2D 65
Analysis and Prediction of
blood's Disease using Machine
Learning Algorithms
2022
Sohom Sen
Used 30 NIR Images The system was
trained with Less type
of Disease.
BRAINNIR 84
Detection of blood Disease
Using Machine Learning
2020
G.Priyadharshini,
2T.Gowtham
High Accuracy lowest accuracy with
40%.
NIRDATA 90
Predicting Severity Of blood's
Disease Using Deep Learning
2020
Srishti Grover Dataset
belonging to
Multiple category
Accuracy Low database with
Disease by Name
65
Existing system
• The Existing system are predicted using
classifiers that have been learned.
• When detecting fraudulent job postings, the
following classifiers are used-
• A. Naive Bayes
• B. Support Vector Machine
• C. Logistic Regression
Review Spam Detection
 People frequently share their opinions on the things they
buy on online forums.
 It might be useful to other buyers while they're deciding
what to buy.
 In this setting, spammers can modify reviews for financial
advantage, necessitating the development of
algorithms to detect spam reviews.
 This can be done by extracting features from the
reviews and using Natural Language Processing to do so
(NLP).
 These features are subjected to machine learning
algorithms.
Fake News Detection
 In social media, fake news is defined by malicious user
accounts and echo chamber effects.
 Fake news identification is based on three perspectives:
how fake news is written, how fake news spreads, and
how a user is connected to fake news.
 To identify fake news, features linked to news content
and social context are retrieved, and machine learning
algorithms are applied.
DISADVANTAGES:
• Less Features considered while training the Data
which results in Low Accuracy.
• Support Vector Machine model Having Less
Processing Accuracy compare to the Proposed
system.
• Also it’s a complex and Time Consuming Process.
PROPOSED SYSTEM:
• This system's main purpose is to identify whether a job posting is
genuine or not.
• Job seekers will be able to focus entirely on legitimate job
openings if fake job postings are identified and deleted.
• In this system, we plan to use a Kaggle dataset that contains
information on the job, including attributes such as job id, title,
location, and department.
• Then there's data preprocessing, which involves removing things
like trivial spaces, null entries, stopwords, and so on.
• The data is provided to the classifier for predictions after it has
been preprocessed and cleaned to make it prediction ready.
ARCHITECTURE
METHODOLOGY
 The proposed method is entirely composed of Artificial
Intelligence approaches, which is critical to
accurately classify between the real and the fake,
instead of using algorithms that are unable to mimic
cognitive functions.
 The three-part method is a combination between
Machine Learning algorithms that subdivide into
supervised learning techniques, and natural language
processing methods.
 Although each of these approaches can be solely
used to classify and detect fake Job, in order to
increase the accuracy and be applicable to the
social media domain, they have been combined into
an integrated algorithm as a method for fake news
detection
ALGORITHM
 A multilayer perceptron (MLP) Regional
Convolutional Neural Network is an artificial neural
network, with an input layer, one or more hidden
layers, and an output layer.
 RCNN can be as simple as having each of the three
layers; however, in our experiments we have fine-
tuned the model with various parameters and
number of layers to generate an optimum predicting
mode
Advantages
• success in detection of fake Jobs and posts using
various Machine learning approaches.
• everchanging characteristics and features of fake
Jobs in Online networks is posing a challenge in
categorization of fake news but proposed system
performs with the Better Accuracy of the system.
MODULES
 Data Collection
 Dataset
 Data Preparation
 Feature Extraction
 Implementation of Classifier
Data Collection
 Dataset Details
 This kaggle dataset contains 17,880 job posting
data entries.
 We must first preprocess this data in order to
prepare it for prediction before fitting it into any of
the machine learning models or classifiers.
 Some pre-processing techniques are used on this
dataset before it is fitted to any classifier.
Data Preprocessing
 Preprocessing data is the process of transforming raw
data into a clean data set.
 Before running the algorithm, the dataset is
preprocessed to check for missing values, noisy data,
and other irregularities.
 It also removes noise and uninformative characters and
words from the text, as well as stop-words, extraneous
attributes, and extra space.
Feature Extraction
 Feature extraction is a step in the dimensionality
reduction process, which divides and reduces a
large set of raw data into smaller groupings.
 The fact that these enormous data sets have a
large number of variables is the most crucial
feature.
 To process these variables, a large amount of
computational power is required.
 So, by selecting and merging variables into
features, feature extraction aids in extracting the
best feature from those large data sets,
effectively lowering the amount of data.
Implementation of
Classifier
 In this section, proper parameters are used to train
classifiers.
 For predictions, this framework used Multi Layer
Perception (MLP) models.
 MLP has a number of distinguishing characteristics, as a
result of which it has gained notoriety and has shown
promising experimental results.
 To partition the data points, MLP constructs a
 hyper level in authentic input space.
SOFTWARE REQUIREMENTS
 Front End – Anaconda IDE
 Backend – SQL
 Language – Python 3.8
HARDWARE REQUIREMENTS:
• RAM
• Processor
• Hard Disk
: 4GB and Higher
: Intel i3 and above
: 500GB:
Minimum
UML DIAGRAM’S USE CASE
CLASS
DIAGRAM
SEQUENCE
DIAGRAM
ACTIVITY
DIAGRAM
CONCLUTION
• Only reputable business offers will be sent to you.
• Several machine learning methods are proposed for
detecting employment scams.
• In this work, we discuss counter measures. Supervised
mechanism is used to demonstrate the utilization of
many mechanisms.
• Classifiers for detecting job scams. The results of the
experiments show that MLP is effective.
• The classifier exceeds its peers in classification. The
proposed method had a 98 percent accuracy rate.
Which is significantly greater than current approaches.
REFERENCE
[1] S. Anita, P. Nagarajan, G. A. Sairam, P. Ganesh, and G. Deepakkumar,
“Fake Job Detection and Analysis Using Machine Learning and Deep Learning
Algorithms,” Rev. GEINTECGESTAO Inov. E Tecnol., vol. 11, no. 2, pp. 642–650,
2021.
[2] B. Alghamdi and F. Alharby, “An intelligent model for online recruitment
fraud detection,” J. Inf. Secur., vol. 10, no. 03, p. 155, 2019.
[3] “Report | Cyber.gov.au.” https://www.cyber.gov.au/acsc/report
(accessed Jun. 19, 2021).
[4] A. Pagotto, “Text Classification with Noisy Class Labels.” Carleton University,
2020. [5] “Employment Scam Aegean Dataset.”
http://emscad.samos.aegean.gr/ (accessed Jun. 19, 2021).
[6] S. Vidros, C. Kolias, G. Kambourakis, and L. Akoglu, “Automatic detection of
online recruitment frauds: Characteristics, methods, and a public dataset,”
Futur. Internet, vol. 9, no. 1, p. 6, 2017.
– THANK YOU

Fake Job Detection PPT.pptx using python

  • 1.
    Fake Job RecruitmentDetection Using Machine Learning Approach
  • 2.
    ABSTRACT • To avoidfraudulent post for job in the internet, an automated tool using machine learning based classification techniques is proposed in the project. • Different classifiers are used for checking fraudulent post in the web and the results of those classifiers are compared for identifying the best employment scam detection model. • It helps in detecting fake job posts from an enormous number of posts. • Two major types of classifiers, such as single classifier and ensemble classifiers are considered for fraudulent job posts detection. • However, experimental results indicate that ensemble classifiers are the best classification to detect scams over the single classifiers.
  • 3.
    INTRODUCTION • Employment scamis one of the serious issues in recent times addressed in the domain of Online Recruitment Frauds (ORF). • In recent days, many companies prefer to post their vacancies online so that these can be accessed easily and timely by the job-seekers. • However, this intention may be one type of scam by the fraud people because they offer employment to job-seekers in terms of taking money from them. • Fraudulent job advertisements can be posted against a reputed company for violating their credibility.
  • 4.
    4 LITERATURE SURVEY TITLE AUTHORPros Cons DATASET METRICES A Survey of Machine Learning Based Approaches for blood Disease Prediction 2021 Shubham Bind1 Used Realtime NIR Images Less Accuracy NIR Dataset 75 Classifcation of blood’s disease and its stages using machine learning 2022 John MichaelTempleton active NIR patches are extracted The training database consist less image NIR2D 65 Analysis and Prediction of blood's Disease using Machine Learning Algorithms 2022 Sohom Sen Used 30 NIR Images The system was trained with Less type of Disease. BRAINNIR 84 Detection of blood Disease Using Machine Learning 2020 G.Priyadharshini, 2T.Gowtham High Accuracy lowest accuracy with 40%. NIRDATA 90 Predicting Severity Of blood's Disease Using Deep Learning 2020 Srishti Grover Dataset belonging to Multiple category Accuracy Low database with Disease by Name 65
  • 5.
    Existing system • TheExisting system are predicted using classifiers that have been learned. • When detecting fraudulent job postings, the following classifiers are used- • A. Naive Bayes • B. Support Vector Machine • C. Logistic Regression
  • 6.
    Review Spam Detection People frequently share their opinions on the things they buy on online forums.  It might be useful to other buyers while they're deciding what to buy.  In this setting, spammers can modify reviews for financial advantage, necessitating the development of algorithms to detect spam reviews.  This can be done by extracting features from the reviews and using Natural Language Processing to do so (NLP).  These features are subjected to machine learning algorithms.
  • 7.
    Fake News Detection In social media, fake news is defined by malicious user accounts and echo chamber effects.  Fake news identification is based on three perspectives: how fake news is written, how fake news spreads, and how a user is connected to fake news.  To identify fake news, features linked to news content and social context are retrieved, and machine learning algorithms are applied.
  • 8.
    DISADVANTAGES: • Less Featuresconsidered while training the Data which results in Low Accuracy. • Support Vector Machine model Having Less Processing Accuracy compare to the Proposed system. • Also it’s a complex and Time Consuming Process.
  • 9.
    PROPOSED SYSTEM: • Thissystem's main purpose is to identify whether a job posting is genuine or not. • Job seekers will be able to focus entirely on legitimate job openings if fake job postings are identified and deleted. • In this system, we plan to use a Kaggle dataset that contains information on the job, including attributes such as job id, title, location, and department. • Then there's data preprocessing, which involves removing things like trivial spaces, null entries, stopwords, and so on. • The data is provided to the classifier for predictions after it has been preprocessed and cleaned to make it prediction ready.
  • 10.
  • 11.
    METHODOLOGY  The proposedmethod is entirely composed of Artificial Intelligence approaches, which is critical to accurately classify between the real and the fake, instead of using algorithms that are unable to mimic cognitive functions.  The three-part method is a combination between Machine Learning algorithms that subdivide into supervised learning techniques, and natural language processing methods.  Although each of these approaches can be solely used to classify and detect fake Job, in order to increase the accuracy and be applicable to the social media domain, they have been combined into an integrated algorithm as a method for fake news detection
  • 12.
    ALGORITHM  A multilayerperceptron (MLP) Regional Convolutional Neural Network is an artificial neural network, with an input layer, one or more hidden layers, and an output layer.  RCNN can be as simple as having each of the three layers; however, in our experiments we have fine- tuned the model with various parameters and number of layers to generate an optimum predicting mode
  • 13.
    Advantages • success indetection of fake Jobs and posts using various Machine learning approaches. • everchanging characteristics and features of fake Jobs in Online networks is posing a challenge in categorization of fake news but proposed system performs with the Better Accuracy of the system.
  • 14.
    MODULES  Data Collection Dataset  Data Preparation  Feature Extraction  Implementation of Classifier
  • 15.
    Data Collection  DatasetDetails  This kaggle dataset contains 17,880 job posting data entries.  We must first preprocess this data in order to prepare it for prediction before fitting it into any of the machine learning models or classifiers.  Some pre-processing techniques are used on this dataset before it is fitted to any classifier.
  • 16.
    Data Preprocessing  Preprocessingdata is the process of transforming raw data into a clean data set.  Before running the algorithm, the dataset is preprocessed to check for missing values, noisy data, and other irregularities.  It also removes noise and uninformative characters and words from the text, as well as stop-words, extraneous attributes, and extra space.
  • 17.
    Feature Extraction  Featureextraction is a step in the dimensionality reduction process, which divides and reduces a large set of raw data into smaller groupings.  The fact that these enormous data sets have a large number of variables is the most crucial feature.  To process these variables, a large amount of computational power is required.  So, by selecting and merging variables into features, feature extraction aids in extracting the best feature from those large data sets, effectively lowering the amount of data.
  • 18.
    Implementation of Classifier  Inthis section, proper parameters are used to train classifiers.  For predictions, this framework used Multi Layer Perception (MLP) models.  MLP has a number of distinguishing characteristics, as a result of which it has gained notoriety and has shown promising experimental results.  To partition the data points, MLP constructs a  hyper level in authentic input space.
  • 19.
    SOFTWARE REQUIREMENTS  FrontEnd – Anaconda IDE  Backend – SQL  Language – Python 3.8
  • 20.
    HARDWARE REQUIREMENTS: • RAM •Processor • Hard Disk : 4GB and Higher : Intel i3 and above : 500GB: Minimum
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    CONCLUTION • Only reputablebusiness offers will be sent to you. • Several machine learning methods are proposed for detecting employment scams. • In this work, we discuss counter measures. Supervised mechanism is used to demonstrate the utilization of many mechanisms. • Classifiers for detecting job scams. The results of the experiments show that MLP is effective. • The classifier exceeds its peers in classification. The proposed method had a 98 percent accuracy rate. Which is significantly greater than current approaches.
  • 26.
    REFERENCE [1] S. Anita,P. Nagarajan, G. A. Sairam, P. Ganesh, and G. Deepakkumar, “Fake Job Detection and Analysis Using Machine Learning and Deep Learning Algorithms,” Rev. GEINTECGESTAO Inov. E Tecnol., vol. 11, no. 2, pp. 642–650, 2021. [2] B. Alghamdi and F. Alharby, “An intelligent model for online recruitment fraud detection,” J. Inf. Secur., vol. 10, no. 03, p. 155, 2019. [3] “Report | Cyber.gov.au.” https://www.cyber.gov.au/acsc/report (accessed Jun. 19, 2021). [4] A. Pagotto, “Text Classification with Noisy Class Labels.” Carleton University, 2020. [5] “Employment Scam Aegean Dataset.” http://emscad.samos.aegean.gr/ (accessed Jun. 19, 2021). [6] S. Vidros, C. Kolias, G. Kambourakis, and L. Akoglu, “Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset,” Futur. Internet, vol. 9, no. 1, p. 6, 2017.
  • 27.