SlideShare a Scribd company logo
1 of 24
Download to read offline
SALARY
PREDICTION
Introduction:
• In today's dynamic job market, predicting salaries accurately plays a pivotal role in various aspects of workforce
management, recruitment, and financial planning. The ability to estimate salaries based on a range of factors
empowers organizations to make informed decisions regarding budget allocation, employee compensation, and
talent acquisition strategies. Therefore, the development of robust salary prediction models has become
increasingly valuable in modern business operations.
• The goal of our project is to construct a reliable salary prediction system that leverages machine learning
techniques to forecast salaries for individuals based on relevant attributes such as education, experience, skills,
and geographic location. By analyzing historical salary data and identifying patterns within the job market, our
aim is to create a model capable of providing accurate salary estimates for new job listings or assessing the
competitiveness of compensation packages offered by employers.
• Through this project, we seek to address several key challenges in salary prediction, including the inherent
variability in compensation across industries, regions, and job roles, as well as the complex interplay of factors
influencing salary determination. By applying advanced machine learning algorithms and feature engineering
techniques to large-scale datasets, we aim to develop a predictive model that not only achieves high accuracy
but also provides insights into the factors driving salary disparities and trends within the job market.
• Ultimately, our salary prediction project aims to empower businesses, recruiters, and job seekers alike with
actionable insights into salary expectations, thereby facilitating more transparent and equitable negotiations,
optimizing resource allocation, and supporting informed decision-making in the realm of human resource
management.
Problem Statement:
• In today's competitive job market, accurately predicting salaries for job positions is essential for organizations
to make informed decisions regarding budget allocation, compensation strategies, and talent acquisition.
However, the task of salary prediction presents several challenges due to the multifaceted nature of salary
determinants and the inherent variability within the job market.
• The primary challenge we aim to address with our salary prediction project is the accurate estimation of
salaries for individuals based on a diverse set of attributes, including but not limited to education level, years of
experience, specialized skills, industry sector, and geographic location. Additionally, we seek to account for the
complex interactions between these factors and their impact on salary levels across different job roles and
industries.
• Furthermore, the availability and quality of data for salary prediction can vary significantly, posing challenges in
terms of data preprocessing, feature selection, and model generalization. Additionally, factors such as inflation,
market demand, and economic conditions introduce temporal variability that must be accounted for in the
prediction process.
• By developing a robust salary prediction model, our objective is to address these challenges and provide
stakeholders with a reliable tool for estimating salaries with a high degree of accuracy and precision. This model
will not only aid organizations in optimizing their recruitment and compensation strategies but also assist job
seekers in negotiating fair and competitive salaries based on their qualifications and market demand.
• In summary, our salary prediction project seeks to bridge the gap between employer expectations and
candidate aspirations by leveraging machine learning techniques to provide transparent and data-driven salary
estimations, thereby facilitating more equitable and informed decision-making in the realm of human resource
management.
About Dataset
• The "Salary Prediction Dataset" is a synthetic dataset generated for the purpose of exploring salary prediction tasks. It
contains simulated data reflecting various factors influencing salary levels such as education, experience, location, job title,
age, and gender. This dataset can be utilized for predictive modeling tasks to estimate salaries based on these factors
• # Data Collection
• Salary Prediction Data ,Predict the salary according to the features
• (From kaggle.com)
• Explore, clean and prepare dataset:
• Check shape of original dataset:
• We have total 1000 rows and 7 columns
• Imported Dataset from the file.csv
Details step of Data Exploration:
The data process involves exploration , handling info of data , unique values, duplication values in data , finding null values and describe data
which shows the total values , min , average and max values of the data.
1. Info of data 2. unique values 3.Null values
4. Data describe 5. duplicate values of data
In the data we have 0 null values and 0 duplicate values.
Exploratory Data Analysis (EDA):
First we plotted a pie chart to find-out gender relationship,
Here the Gender value counts.
Here the relationship of gender , male has 51.60%
and women has 48.40%
#Hence proved the male has more salary then women.
• A bar plot is generated to display the “ Job Title “ Distribution to know the relationship between job title and salary.
• The bar plot offering insight into the frequency of different job title.
As we can see the following bar plot the more frequency has for the
manger job title and less frequency for the engineer job title.
The average frequency for the job title of director and Analyst.
• A donut chart is plotted to visualized education distribution and location distribution to finding out relationship between .
As per the education we can see the almost equal % salary but
qualified from high school people have highly paid package
Location distribution chart is offering insight that and rural and
suburban and rural area people has high package.
A heatmap is generated to visualize the correlation matrix of the entire dataset providing the comprehensive overview of relationship
between numerical variables.
Each cell in the heatmap corresponds to the correlation coefficient between the variables represented by the row and column.
1. The color intensity indicates the strength and direction of the correlation:
2. Dark blue indicates a strong negative correlation; Dark red indicates a strong positive correlation.
3. White indicates no correlation (correlation coefficient close to zero).
A stacked column chart is plotted for the “Age Distribution”. Offering insight into the frequency of different age type of people.
1. In the plot we can found that age group of between 25 to 45 have high frequency of the good salary package.
2. And other age group of people have average salary package
3. Here we can found that the middle age group of people have high salary package, and above 30 to 40 and 50 to 60 age group of
people have less salary package.
Generated a box plot to visualize the distribution of salary
• Found min, median and max value of the salary package as the dataset.
• We can see as per shown in the box plot min salary is 40k
• Median salary package is 1.5 lakhs.
• And max salary package is 1.9 lakhs.
1. using scatter plot to display relationship of age and salary. 2. using scatter plot to display relationship of experience and salary.
1. in 1st scatter plot we can see mostly age group of people taking salary between 1 lakh to 1.2 lakh of package.
2. Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh
3. Older age group of people get lower salary package who has more experience.
#Mean salary for each category
Generated common multiple plots to indicates mean
of each category
1.For education category masters degree people have
mean salary
2.In location category mean is suburban located people has mean salary.
3. Aa per the job title category manger job role mean salary .
4. Gender category has distributed the equal salary package.
#Data distribution using pair-plots
"Age", "Experience", "Salary“, "Education“ "Age", "Experience", "Salary“ "Job Title"
pair plot visualizes the relationships between "Age", "Experience", and "Salary" for different levels of "Education". Each
scatterplot in the pair plot represents the relationship between two variables, and the diagonal contains histograms showing
the distribution of each variable.
#Encoding data
1. Import the Label-Encoder class from scikit-learn
Iterate over each categorical variable in the list 'categorical’
2. found the data types of all columns in the Data Frame.
There is 2 type of data : int, object.
3. df. head() to get information of the data of rows and columns
# Splitting the train-test split & # Scaling the data
linear Regression
Using linear regression method is not suitable for my dataset, it’s showing less accuracy (0.57%)
Finding R-squared, is a statistical measure that represents the proportion of the variance in the dependent variable that is explained
by the independent variables in a regression model. It is a key metric used to evaluate the goodness of fit of the model to the
observed data.
Random forest
used Random forest model for tuning hyperparameter it showing 97% accuracy.
Ada boost regressor
After using Ada boost regression model got 81% accuracy.
Will try new model for getting proper accuracy.
Support vector regressor
Support vector regressor provides flexible options for customizing the SVR model, including the choice of kernel function,
regularization parameter, and other hyperparameters.
After using support vector regressor got 2.14% accuracy. This model is not suitable for my dataset.
XG Boost Regressor
After applying 5 different models , getting 99% accuracy after using XGBoost Regression model so this is
best model for my data set.
Conclusion and Insights:
Gender
1 . The pie chart reveals the percentage distribution % of Each slice of the pie represents a gender category, and the size of each slice
corresponds to the proportion of that gender category in the dataset. The percentage labels on each slice provide additional information about
the relative frequency of each gender category. We found the male employees has good salary package as compare female employees.
Job Title
2 . In the second slide the bar plot indicates Each bar represents a unique job title, and the height of each bar corresponds to the frequency
(count) of that job title in the dataset. The text labels on top of each bar provide the exact count for each job title category. People who are
working as manager position has good salary package
Education & Location
3 . The donut chart shows the distribution of education and location distribution The percentage labels on each slice provide additional
information about the relative frequency of each category. In summary, these visualizations provide insights into the distribution of categorical
variables in the data set. As we can see the outcome is almost same , people who studied in high school and with degree in masters have
good salary package, and as we can see the location distribution rural and suburban people earning equally.
Used Heatmap for find out correlation.
Insights from the Heatmap:
1. Strong positive correlations (values close to 1) between variables appear as bright red cells in the heatmap.
2. Strong negative correlations (values close to -1) between variables appear as bright blue cells in the heatmap.
3. Weak correlations (values close to 0) appear as cells with colors closer to white or gray.
4. By examining the heatmap, you can identify patterns and relationships between different numerical variables in the dataset. For
example, variables with high positive correlations may indicate dependencies or interactions between them, while variables with high
negative correlations may indicate inverse relationships.
1. Application and Further Analysis:
1. The heatmap provides valuable insights into the relationships between variables, which can inform feature selection, model building, and
data preprocessing steps in data analysis and machine learning tasks.
2. Further analysis can involve investigating the identified correlations in more detail, exploring causality, and validating the relationships
through additional statistical tests or domain knowledge.
Histogram :
1. Application and Further Analysis:
1. The histogram provides insights into the age distribution of the dataset, which can inform demographic analysis, segmentation, and
targeted marketing strategies.
2. Further analysis can involve comparing the age distribution across different groups or segments in the dataset, identifying outliers or
anomalies, and assessing the impact of age on other variables or outcomes of interest.
• In summary, the histogram visualization of the age distribution helps in understanding the demographic composition of the dataset and
provides valuable insights for data-driven decision-making and analysis.
• Insights from the Box Plot:
The box plot provides insights into the central tendency (median) and spread of salary values in the data set. The length of the box (IQR)
indicates the spread of salary values, with longer boxes representing greater variability . The position of the median line within the box indicates
the central tendency of salary values. Outliers, if present, are identified as individual data points beyond the whiskers, suggesting potential
extreme or unusual salary values.
In box plot min salary is 40k and the median is 1.10 lakh and the max is 1.90 lakh.
Scatter plot
the scatter plot visualization of the age-salary relationship provides insights into the patterns and variability in the dataset, facilitating data
exploration and analysis for salary prediction or related tasks.
We found the relationship of age and salary as the age increases the salary package is getting less, and the 20 to 40 age group of people has
average salary package but the frequency is high . Highly experienced people less frequency and few of them only taking high salary
package. which is 2 lakh.
Mean salary for each category
• Insights from the Bar Plots:
The height of each bar indicates the average salary for the corresponding category.
By comparing the heights of bars within each plot, you can identify variations in mean salary across different categories within the same
categorical variable.
Differences in mean salary between categories may suggest potential factors influencing salary variations within the dataset.
Model used:
Used all this model ( linear Regression , Random forest ,Ada boost regressor , Support vector regressor , XG Boost Regressor) to get
accuracy of dataset and After applying 5 different models , getting 99% accuracy after using XG Boost Regression model so this is best
model for my data set.
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPT

More Related Content

Similar to Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT

CollectionOptimization
CollectionOptimizationCollectionOptimization
CollectionOptimization
Mike Nguyen
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poin
BetseyCalderon89
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
bmcfad01
 
Statistics
StatisticsStatistics
Statistics
pikuoec
 

Similar to Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT (20)

CollectionOptimization
CollectionOptimizationCollectionOptimization
CollectionOptimization
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poin
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Work structure and pay structure - HRM
Work structure   and pay structure - HRMWork structure   and pay structure - HRM
Work structure and pay structure - HRM
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
Designing Pay Structure.pptx
Designing Pay Structure.pptxDesigning Pay Structure.pptx
Designing Pay Structure.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
6 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 20196 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 2019
 
direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data mining
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Statistics
StatisticsStatistics
Statistics
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 

More from Boston Institute of Analytics

More from Boston Institute of Analytics (20)

Solar production with K means clustering
Solar production with K means clusteringSolar production with K means clustering
Solar production with K means clustering
 
Demystifying Salaries: A Data Science Approach to Predicting Salary Ranges
Demystifying Salaries: A Data Science Approach to Predicting Salary RangesDemystifying Salaries: A Data Science Approach to Predicting Salary Ranges
Demystifying Salaries: A Data Science Approach to Predicting Salary Ranges
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 

Recently uploaded

Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
adet6151
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
hwhqz6r1y
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
hwhqz6r1y
 

Recently uploaded (20)

Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT

  • 2. Introduction: • In today's dynamic job market, predicting salaries accurately plays a pivotal role in various aspects of workforce management, recruitment, and financial planning. The ability to estimate salaries based on a range of factors empowers organizations to make informed decisions regarding budget allocation, employee compensation, and talent acquisition strategies. Therefore, the development of robust salary prediction models has become increasingly valuable in modern business operations. • The goal of our project is to construct a reliable salary prediction system that leverages machine learning techniques to forecast salaries for individuals based on relevant attributes such as education, experience, skills, and geographic location. By analyzing historical salary data and identifying patterns within the job market, our aim is to create a model capable of providing accurate salary estimates for new job listings or assessing the competitiveness of compensation packages offered by employers. • Through this project, we seek to address several key challenges in salary prediction, including the inherent variability in compensation across industries, regions, and job roles, as well as the complex interplay of factors influencing salary determination. By applying advanced machine learning algorithms and feature engineering techniques to large-scale datasets, we aim to develop a predictive model that not only achieves high accuracy but also provides insights into the factors driving salary disparities and trends within the job market. • Ultimately, our salary prediction project aims to empower businesses, recruiters, and job seekers alike with actionable insights into salary expectations, thereby facilitating more transparent and equitable negotiations, optimizing resource allocation, and supporting informed decision-making in the realm of human resource management.
  • 3. Problem Statement: • In today's competitive job market, accurately predicting salaries for job positions is essential for organizations to make informed decisions regarding budget allocation, compensation strategies, and talent acquisition. However, the task of salary prediction presents several challenges due to the multifaceted nature of salary determinants and the inherent variability within the job market. • The primary challenge we aim to address with our salary prediction project is the accurate estimation of salaries for individuals based on a diverse set of attributes, including but not limited to education level, years of experience, specialized skills, industry sector, and geographic location. Additionally, we seek to account for the complex interactions between these factors and their impact on salary levels across different job roles and industries. • Furthermore, the availability and quality of data for salary prediction can vary significantly, posing challenges in terms of data preprocessing, feature selection, and model generalization. Additionally, factors such as inflation, market demand, and economic conditions introduce temporal variability that must be accounted for in the prediction process. • By developing a robust salary prediction model, our objective is to address these challenges and provide stakeholders with a reliable tool for estimating salaries with a high degree of accuracy and precision. This model will not only aid organizations in optimizing their recruitment and compensation strategies but also assist job seekers in negotiating fair and competitive salaries based on their qualifications and market demand. • In summary, our salary prediction project seeks to bridge the gap between employer expectations and candidate aspirations by leveraging machine learning techniques to provide transparent and data-driven salary estimations, thereby facilitating more equitable and informed decision-making in the realm of human resource management.
  • 4. About Dataset • The "Salary Prediction Dataset" is a synthetic dataset generated for the purpose of exploring salary prediction tasks. It contains simulated data reflecting various factors influencing salary levels such as education, experience, location, job title, age, and gender. This dataset can be utilized for predictive modeling tasks to estimate salaries based on these factors • # Data Collection • Salary Prediction Data ,Predict the salary according to the features • (From kaggle.com) • Explore, clean and prepare dataset: • Check shape of original dataset: • We have total 1000 rows and 7 columns • Imported Dataset from the file.csv
  • 5. Details step of Data Exploration: The data process involves exploration , handling info of data , unique values, duplication values in data , finding null values and describe data which shows the total values , min , average and max values of the data. 1. Info of data 2. unique values 3.Null values 4. Data describe 5. duplicate values of data In the data we have 0 null values and 0 duplicate values.
  • 6. Exploratory Data Analysis (EDA): First we plotted a pie chart to find-out gender relationship, Here the Gender value counts. Here the relationship of gender , male has 51.60% and women has 48.40% #Hence proved the male has more salary then women.
  • 7. • A bar plot is generated to display the “ Job Title “ Distribution to know the relationship between job title and salary. • The bar plot offering insight into the frequency of different job title. As we can see the following bar plot the more frequency has for the manger job title and less frequency for the engineer job title. The average frequency for the job title of director and Analyst.
  • 8. • A donut chart is plotted to visualized education distribution and location distribution to finding out relationship between . As per the education we can see the almost equal % salary but qualified from high school people have highly paid package Location distribution chart is offering insight that and rural and suburban and rural area people has high package.
  • 9. A heatmap is generated to visualize the correlation matrix of the entire dataset providing the comprehensive overview of relationship between numerical variables. Each cell in the heatmap corresponds to the correlation coefficient between the variables represented by the row and column. 1. The color intensity indicates the strength and direction of the correlation: 2. Dark blue indicates a strong negative correlation; Dark red indicates a strong positive correlation. 3. White indicates no correlation (correlation coefficient close to zero).
  • 10. A stacked column chart is plotted for the “Age Distribution”. Offering insight into the frequency of different age type of people. 1. In the plot we can found that age group of between 25 to 45 have high frequency of the good salary package. 2. And other age group of people have average salary package 3. Here we can found that the middle age group of people have high salary package, and above 30 to 40 and 50 to 60 age group of people have less salary package.
  • 11. Generated a box plot to visualize the distribution of salary • Found min, median and max value of the salary package as the dataset. • We can see as per shown in the box plot min salary is 40k • Median salary package is 1.5 lakhs. • And max salary package is 1.9 lakhs.
  • 12. 1. using scatter plot to display relationship of age and salary. 2. using scatter plot to display relationship of experience and salary. 1. in 1st scatter plot we can see mostly age group of people taking salary between 1 lakh to 1.2 lakh of package. 2. Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh 3. Older age group of people get lower salary package who has more experience.
  • 13. #Mean salary for each category Generated common multiple plots to indicates mean of each category 1.For education category masters degree people have mean salary 2.In location category mean is suburban located people has mean salary. 3. Aa per the job title category manger job role mean salary . 4. Gender category has distributed the equal salary package.
  • 14. #Data distribution using pair-plots "Age", "Experience", "Salary“, "Education“ "Age", "Experience", "Salary“ "Job Title" pair plot visualizes the relationships between "Age", "Experience", and "Salary" for different levels of "Education". Each scatterplot in the pair plot represents the relationship between two variables, and the diagonal contains histograms showing the distribution of each variable.
  • 15. #Encoding data 1. Import the Label-Encoder class from scikit-learn Iterate over each categorical variable in the list 'categorical’ 2. found the data types of all columns in the Data Frame. There is 2 type of data : int, object. 3. df. head() to get information of the data of rows and columns # Splitting the train-test split & # Scaling the data
  • 16. linear Regression Using linear regression method is not suitable for my dataset, it’s showing less accuracy (0.57%) Finding R-squared, is a statistical measure that represents the proportion of the variance in the dependent variable that is explained by the independent variables in a regression model. It is a key metric used to evaluate the goodness of fit of the model to the observed data.
  • 17. Random forest used Random forest model for tuning hyperparameter it showing 97% accuracy.
  • 18. Ada boost regressor After using Ada boost regression model got 81% accuracy. Will try new model for getting proper accuracy.
  • 19. Support vector regressor Support vector regressor provides flexible options for customizing the SVR model, including the choice of kernel function, regularization parameter, and other hyperparameters. After using support vector regressor got 2.14% accuracy. This model is not suitable for my dataset.
  • 20. XG Boost Regressor After applying 5 different models , getting 99% accuracy after using XGBoost Regression model so this is best model for my data set.
  • 21. Conclusion and Insights: Gender 1 . The pie chart reveals the percentage distribution % of Each slice of the pie represents a gender category, and the size of each slice corresponds to the proportion of that gender category in the dataset. The percentage labels on each slice provide additional information about the relative frequency of each gender category. We found the male employees has good salary package as compare female employees. Job Title 2 . In the second slide the bar plot indicates Each bar represents a unique job title, and the height of each bar corresponds to the frequency (count) of that job title in the dataset. The text labels on top of each bar provide the exact count for each job title category. People who are working as manager position has good salary package Education & Location 3 . The donut chart shows the distribution of education and location distribution The percentage labels on each slice provide additional information about the relative frequency of each category. In summary, these visualizations provide insights into the distribution of categorical variables in the data set. As we can see the outcome is almost same , people who studied in high school and with degree in masters have good salary package, and as we can see the location distribution rural and suburban people earning equally. Used Heatmap for find out correlation. Insights from the Heatmap: 1. Strong positive correlations (values close to 1) between variables appear as bright red cells in the heatmap. 2. Strong negative correlations (values close to -1) between variables appear as bright blue cells in the heatmap. 3. Weak correlations (values close to 0) appear as cells with colors closer to white or gray. 4. By examining the heatmap, you can identify patterns and relationships between different numerical variables in the dataset. For example, variables with high positive correlations may indicate dependencies or interactions between them, while variables with high negative correlations may indicate inverse relationships.
  • 22. 1. Application and Further Analysis: 1. The heatmap provides valuable insights into the relationships between variables, which can inform feature selection, model building, and data preprocessing steps in data analysis and machine learning tasks. 2. Further analysis can involve investigating the identified correlations in more detail, exploring causality, and validating the relationships through additional statistical tests or domain knowledge. Histogram : 1. Application and Further Analysis: 1. The histogram provides insights into the age distribution of the dataset, which can inform demographic analysis, segmentation, and targeted marketing strategies. 2. Further analysis can involve comparing the age distribution across different groups or segments in the dataset, identifying outliers or anomalies, and assessing the impact of age on other variables or outcomes of interest. • In summary, the histogram visualization of the age distribution helps in understanding the demographic composition of the dataset and provides valuable insights for data-driven decision-making and analysis. • Insights from the Box Plot: The box plot provides insights into the central tendency (median) and spread of salary values in the data set. The length of the box (IQR) indicates the spread of salary values, with longer boxes representing greater variability . The position of the median line within the box indicates the central tendency of salary values. Outliers, if present, are identified as individual data points beyond the whiskers, suggesting potential extreme or unusual salary values. In box plot min salary is 40k and the median is 1.10 lakh and the max is 1.90 lakh.
  • 23. Scatter plot the scatter plot visualization of the age-salary relationship provides insights into the patterns and variability in the dataset, facilitating data exploration and analysis for salary prediction or related tasks. We found the relationship of age and salary as the age increases the salary package is getting less, and the 20 to 40 age group of people has average salary package but the frequency is high . Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh. Mean salary for each category • Insights from the Bar Plots: The height of each bar indicates the average salary for the corresponding category. By comparing the heights of bars within each plot, you can identify variations in mean salary across different categories within the same categorical variable. Differences in mean salary between categories may suggest potential factors influencing salary variations within the dataset. Model used: Used all this model ( linear Regression , Random forest ,Ada boost regressor , Support vector regressor , XG Boost Regressor) to get accuracy of dataset and After applying 5 different models , getting 99% accuracy after using XG Boost Regression model so this is best model for my data set.