3. Introduction and Expectations
Two words about Sponsor – Curtis Foundation
Assigned Tasks
1)To create a Simulated Data Model Based on Linkedin
2) Create a MYSQL DATABASE
3)Analysis-
A) Is success related to number of endorsements?
B)Classification of Endorsements- Toxonomy creation
C) Creation of Credibility Score
D)Creation of Success Score
5. The Beginnings…….
Understanding the
proposal
• Define Success
• Normalization of
Titles
• What Variables to
Consider
Research /Literature
Review
• How do we define
variables ?
• How to create
based data model
Coding
• How to write loops
?
• How to I assign
multiple titles to
one person
• Figuring out
logistics
• How to avoid
biases
6. Assumptions and Definition
Age group – 22- 65 ( 1950- 1995)
Apply to IT industry only
Success is defined objectively , not subjectively in this model
Job Titles are normalized based on the responsibilities of jobs.
Titles are divided into 6 groups – 1 – Lowest ; 6 Highest
Assignment of job titles is based on education. As per job
requirement analysis most of the IT jobs need Undergraduate and
above.
Only US population is considered in this model.
Created 3 levels of Endorsers.
7. Creation of DataDemographics
• Generator to
create names ,
DOB
• Usmap- location
• Derived Age,
YOE, No of Jobs
changed
• No of titles – 6
• Randomly
assigned – No of
connections,
• Education,
Gender, Race
• Salary is based
on Current Titles
, and Location
Endorsers
• Randomly
Assigned
YOE,
endorsers
Titles, No of
Endorsers ID
Titles
• We have
derived up to
5 previous
titles .
Previous
titles are
based on
education
and No of
titles.
• Titles are
based in
Education,
YOE.
Skill
• Scrap data on
Skills and
Endorsements
from linked in
• Normalized the
Skills based on
Titles .
• Assigned Skilled
Randomly among
each group
based on current
titles and Years
of experience
• No. of
endorsements
randomly
assigned
11. Success Score
What Variables are highly
correlated?
Success Score – (sum(Title
Level+(Current_Salary/1000)+Fortu
ne500))/Age
12. Credibility Score(CS) VS High
Credibility Score(HCS)
CS =Title level + year of experience+(median of total
endorsements)
HCS= CS x No. of similar Skills for which endorser has been
endorsed x No. of endorsers who fullfill similar criteria
13. Do No. of endorsements Matter?
No. of titles, No. of job
Changed, and year of
experience are strongly
negatively correlated to
Success Score.
Success Score is strongly
negatively correlated.
14. Regression- Model 1 Success Score Vs Total
Endorsements
The results are nonsignificant. That proves current endorsement
system is broken
16. Future Pathways ….
Testing Simulated Model against real life data model
Creation of Neo4j graphical database
Creation and Validation of High Credibility Score per person
per skill
Measuring the impact of score system on personal success
Use of system in future Skill-Gap Analysis
When I started the project I felt exactly when to try to eat more than your stomach can handle. It was very hard as there was no Data; and that’s why I decided to name it Data story. This project is based on linked in data which I is hard to get due to encryption codes etc. So I tried to screp it found some codes which were not working.
This proposal was written by Curtis Foundation a NGO, works in education and human resources enhancement; especially with law enforcements.
Many of us are on linkedin and feel that endorsements don’t matter. Current System of endorsements is broken. Anyone can endorse anyone and there is no credibility assigned to endorsements. That’s why this project is there.
Major questions- How to define Success? How to Normalize the titles ? What Variables to consider? What to look for for the next steps? etc.
List of publications I used to create this data -
Finally data was ready …Major hurdles were assigning titles, - Explain How we did it
2) Assigning Skills- finding and assigning Skills, issues with Scrped data in R and then transformating it to database
3) Normalization of Titles
Not an Ideal way to create this database.
Neo4j graphical databases is way to go and would like to recommend.
Data class – Character, Integers, Numeric
Converted character vectors into Factors for data analysis purpose.
This will reduce the bias due to age. My observation endorsments are more for young people, who works in IT industry as well as those who have more access to computers and internet.
# for HCS I propose - CS x number of similar Skills for which endorcer has been endorsed x No of endorsers who fullfills similarity criteria# THIS MODEL DOESN'T HAVE MULTIPLE ENDORSERS :- THEREFORE I HAVE ADJUSTED THE SCORE ACCORDINGLY.# I ALSO PROPOSE THAT IN FUTURE HIGH CREDIBILITY SCORE SHOULD BE ASSIGNED TO INDIVIDULA SKILL TO BRING MORE CREDIBILITY TO ENDORSEMENT SYSTEM.