Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS
Skills demand analysis based on the data from
online HR websites: Using web...
Methodology:
Overview
Datamotus LLC 2
Introduction
In recent years online job ads became a popular job-search model, that’s
why the research community is increa...
Introduction
Job seekers, employers, students, researchers, policymakers, higher education
institutions, career advisors, ...
Job ads provide an incomplete picture of labor
demand
Online job ads data strongly correlate with job
openings data
Web Scraping
Text Mining
Datamotus LLC 7
Synopsys of the study
• Develop an algorithm for web scrapping job announcement
data (careercenter.am)
• Text mining and p...
What was done
• Around 20,000 posts are scrapped from the web,
• Posts come in rough, unstructured way. Algorithm is
devel...
A variable for each “section”
Total vacancy rate (Careercenter) and Official Labor
Demand (2004-2016 I Quarter)
Datamotus LLC 11
500
1000
1500
2000
2500...
Job Market Overview
IT sector
Datamotus LLC 12
ICT sector and overall economy
Datamotus LLC 13
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
1.60
1.70
1.80
1.90
2.00
2.10
2.20...
Total vacancy and IT sector vacancy rates (Careercenter,
2004-2016)
Datamotus LLC 14
0
20
40
60
80
100
120
140
160
180
200...
Hard Skills in IT
Sector
Datamotus LLC 15
Time series: Annual demand for top 5 programming languages
Datamotus LLC 16
0
50
100
150
200
250
2004 2005 2006 2007 2008 ...
Time series: Annual demand for top 5 programming languages
(parabolic trend)
Datamotus LLC 17
-30
20
70
120
170
220
2004 2...
Analyzing demand for
programming languages using
association rules
Datamotus LLC 18
Arules
• Association rules mining is used to analyse the co-
occurrence of programming languages in a job post
• R package...
Association rules: Measures of rules
interestingness
Datamotus LLC 20
Measure 1
Support = 𝑃 𝐴 ∩ 𝐵
Measure 2
Confidence = 𝑃...
Visualizing the rules
Datamotus LLC 21
Association Mining for
Programming languages: C++
Datamotus LLC 22
• Set of association rules is generated for top20 progr...
Association Mining for
Programming languages: Java
Datamotus LLC 23
Rules visualization:
Java (all rules)
Datamotus LLC 24
Rules Visualization:
Javascript
Datamotus LLC 25
Job Title Analysis
Datamotus LLC 26
IT Job Titles Frequency
Datamotus LLC 27
Most popular Job Titles (2004Q1-2016Q1) Percentage
software developer/engineer 18...
Software developer/engineer
Datamotus LLC 28
0
20
40
60
80
100
120
140
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2...
Quality assurance engineer
Datamotus LLC 29
0
5
10
15
20
25
30
35
40
45
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 ...
Java software developer
Datamotus LLC 30
0
5
10
15
20
25
30
35
40
45
50
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 ...
System administrator
Datamotus LLC 31
0
5
10
15
20
25
30
35
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
sy...
Web developer
Datamotus LLC 32
0
5
10
15
20
25
30
35
40
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
web.de...
IT Job Titles vs Programming
languages
Job Titile => Programming language confidence Job Titile => Programming language co...
Next Steps:
• Develop machine learning algorithm to classify job ads by sectors,
• Develop state of art text mining and to...
Thank You For Your Attention!
Datamotus LLC 35
IT Skills Analysis
Upcoming SlideShare
Loading in …5
×

IT Skills Analysis

1,830 views

Published on

Skills demand analysis based on the data from online HR websites: Using web scraping and text mining applications: IT Sector

Published in: Data & Analytics
  • Be the first to comment

IT Skills Analysis

  1. 1. DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS Skills demand analysis based on the data from online HR websites: Using web scraping and text mining applications: IT Sector Habet Madoyan Vahe Movsisyan Sunday, July 03, 2016 The analysis is funded by the research grant from American University of Armenia. Presented at: IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
  2. 2. Methodology: Overview Datamotus LLC 2
  3. 3. Introduction In recent years online job ads became a popular job-search model, that’s why the research community is increasingly experimenting with the detailed breakdown of online job ads to study labor market dynamics. It is estimated that in USA 60-70 percent of job openings are now posted on the Internet. However these job ads are biased toward industries and occupations that seek high-skilled, “white-collar” workers.
  4. 4. Introduction Job seekers, employers, students, researchers, policymakers, higher education institutions, career advisors, and curriculum developers now view online job ads data as a practical source to explore the nature of today’s dynamic of labor market. Online job ads can show the relative demand for different types of skills and levels of education. The real-time nature of job ads data also allows for the early detection of labor demand trends, which gives job seekers, employers, and policymakers a forward-looking analytical tool. Real-time labor market indicators can be particularly useful in aligning education and training curricula with workforce needs in emerging or rapidly changing industries, such as healthcare and information technology, etc.
  5. 5. Job ads provide an incomplete picture of labor demand Online job ads data strongly correlate with job openings data
  6. 6. Web Scraping Text Mining Datamotus LLC 7
  7. 7. Synopsys of the study • Develop an algorithm for web scrapping job announcement data (careercenter.am) • Text mining and parsing algorithms to structure job announcements • Algorithms to assess and track vacancy rates by: • Industry • Job role • Specific skills
  8. 8. What was done • Around 20,000 posts are scrapped from the web, • Posts come in rough, unstructured way. Algorithm is developed to structure them.
  9. 9. A variable for each “section”
  10. 10. Total vacancy rate (Careercenter) and Official Labor Demand (2004-2016 I Quarter) Datamotus LLC 11 500 1000 1500 2000 2500 3000 100 150 200 250 300 350 400 450 500 550 600 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Total jobs (Careercenter) Job Demand (NSS, right scale) Correlation=0.76
  11. 11. Job Market Overview IT sector Datamotus LLC 12
  12. 12. ICT sector and overall economy Datamotus LLC 13 3.00 3.20 3.40 3.60 3.80 4.00 4.20 4.40 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Average yearly wage in Transport and Communication sector/Average yearly wage in RA Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
  13. 13. Total vacancy and IT sector vacancy rates (Careercenter, 2004-2016) Datamotus LLC 14 0 20 40 60 80 100 120 140 160 180 200 100 150 200 250 300 350 400 450 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Non IT Jobs (Careercenter) IT Jobs (Careercenter, right scale) Correlation=0.81
  14. 14. Hard Skills in IT Sector Datamotus LLC 15
  15. 15. Time series: Annual demand for top 5 programming languages Datamotus LLC 16 0 50 100 150 200 250 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 C++ Javascript Java C# PHP
  16. 16. Time series: Annual demand for top 5 programming languages (parabolic trend) Datamotus LLC 17 -30 20 70 120 170 220 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Poly. (C++) Poly. (Javascript) Poly. (Java) Poly. (C#) Poly. (PHP)
  17. 17. Analyzing demand for programming languages using association rules Datamotus LLC 18
  18. 18. Arules • Association rules mining is used to analyse the co- occurrence of programming languages in a job post • R package “”arules” and “arulesViz” are used for the analysis • Analysis is done for IT jobs only
  19. 19. Association rules: Measures of rules interestingness Datamotus LLC 20 Measure 1 Support = 𝑃 𝐴 ∩ 𝐵 Measure 2 Confidence = 𝑃 𝐵|𝐴 = 𝑃(𝐵 ∩ 𝐴)/𝑃(𝐴) Measure 3 Lift = 𝑃 𝐵|𝐴 𝑃 𝐵 = 𝑃(𝐴∩𝐵) 𝑃(𝐴) ∗ 1 𝑃(𝐵) Suppose we have the rule : IF {A} = > {B}
  20. 20. Visualizing the rules Datamotus LLC 21
  21. 21. Association Mining for Programming languages: C++ Datamotus LLC 22 • Set of association rules is generated for top20 programming languages. • Rules are subsetted with min support of 0.01 and min confidence of 0.1 Two items on the left One item on the left
  22. 22. Association Mining for Programming languages: Java Datamotus LLC 23
  23. 23. Rules visualization: Java (all rules) Datamotus LLC 24
  24. 24. Rules Visualization: Javascript Datamotus LLC 25
  25. 25. Job Title Analysis Datamotus LLC 26
  26. 26. IT Job Titles Frequency Datamotus LLC 27 Most popular Job Titles (2004Q1-2016Q1) Percentage software developer/engineer 18.29% quality assurance engineer 5.42% java software developer 4.98% system administrator 4.00% web developer 3.66% .net developer 2.94% php developer 2.33% graphic designer 1.89% ios developer 1.31% android developer 1.26% deep submicron 0.98% database developer 0.96% support specialist 0.96% database administrator 0.92% technical support 0.89% technical writer 0.83% support engineer 0.80% application developer 0.72% design engineer 0.72% r&d engineer 0.68% team leader 0.67% frontend developer 0.55% monitoring evaluation 0.52% information security 0.50% senior r&d 0.50% 57.29%
  27. 27. Software developer/engineer Datamotus LLC 28 0 20 40 60 80 100 120 140 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
  28. 28. Quality assurance engineer Datamotus LLC 29 0 5 10 15 20 25 30 35 40 45 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 quality.assurance.engineer
  29. 29. Java software developer Datamotus LLC 30 0 5 10 15 20 25 30 35 40 45 50 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 java.software.developer
  30. 30. System administrator Datamotus LLC 31 0 5 10 15 20 25 30 35 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 system.administrator
  31. 31. Web developer Datamotus LLC 32 0 5 10 15 20 25 30 35 40 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 web.developer
  32. 32. IT Job Titles vs Programming languages Job Titile => Programming language confidence Job Titile => Programming language confidence {software developer/engineer} => {csharp} 0.33 {java software developer} => {java} 0.98 {software developer/engineer} => {java} 0.30 {java software developer} => {javascript} 0.47 {software developer/engineer} => {javascript} 0.20 {java software developer} => {j} 0.39 {software developer/engineer} => {asp} 0.20 {java software developer} => {shell} 0.11 {software developer/engineer} => {php} 0.12 {java software developer} => {ruby} 0.05 {software developer/engineer} => {j} 0.12 {system administrator} => {perl} 0.09 {software developer/engineer} => {tcl} 0.09 {system administrator} => {shell} 0.09 {software developer/engineer} => {python} 0.07 {system administrator} => {bash} 0.03 {software developer/engineer} => {cplusplus} 0.06 {system administrator} => {pl.sql} 0.02 {software developer/engineer} => {ruby} 0.03 {web developer} => {javascript} 0.76 {software developer/engineer} => {visual.basic} 0.02 {web developer} => {php} 0.57 {software developer/engineer} => {verilog} 0.02 {web developer} => {asp} 0.36 {quality assurance engineer} => {java} 0.27 {web developer} => {csharp} 0.27 {quality assurance engineer} => {shell} 0.25 {web developer} => {ruby} 0.02 {quality assurance engineer} => {perl} 0.22 {.net developer} => {asp} 0.82 {quality assurance engineer} => {python} 0.14 {.net developer} => {csharp} 0.80 {quality assurance engineer} => {tcl} 0.12 {.net developer} => {javascript} 0.42 {quality assurance engineer} => {bash} 0.04 {.net developer} => {visual.basic} 0.03 {quality assurance engineer} => {verilog} 0.04 {php developer} => {php} 1.00 {php developer} => {javascript} 0.71 {php developer} => {ruby} 0.08 {php developer} => {python} 0.07 Datamotus LLC 33
  33. 33. Next Steps: • Develop machine learning algorithm to classify job ads by sectors, • Develop state of art text mining and topic modeling algorithms to predict demand for skills, professions and job roles, • Create interactive web dashboard (using R shiny) to help: • Potential job seekers • Potential employees • Policy makers • Universities Datamotus LLC 34
  34. 34. Thank You For Your Attention! Datamotus LLC 35

×