Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

2,612 views

Published on

Short presentation about my final project at Zipfian Academy about quantifying Data Scientist profiles using Linkedin data.
The prototype web app is available at: bit.ly/cybads

Published in: Technology, Education
  • Be the first to comment

Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

  1. 1. Could You Be a Data Scientist? Carlo Torniai, Ph.D. @carlotorniai
  2. 2. Goal • Quantify data scientist profiles features • Analyze aspirant data scientist profiles • Provide useful feedback ?
  3. 3. Why this is relevant? • A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
  4. 4. Data Collection • Linkedin API: – General Information – Past work history – Education • Web Scraping: – Skills • 1500 profiles – Data Scientists – Software Engineer – Business Analysts – Mathematicians – Statisticians
  5. 5. Data Analysis Feature Extraction Software Engineers Business Analysts Data scientists Statisticians Mathematicians
  6. 6. Data Analysis Feature Extraction Astronomy Bioinformatics Biology Computer Science Economics Electronics Engineering Math Neuroscience Other Physics Psychology Stats Number of PhDs by topic and profiles
  7. 7. Model Testing For the purpose of this project I trained with skills and education features the following models: Random Forest • Classify the profile Naïve Bayes • Multi class probabilities to asses profiles background components K-means • Capability of suggesting similar and relevant profiles
  8. 8. Model Testing For the purpose of this project I trained with skills and education features the following models: Model Training set Purpose Random Forest All 5 categories Classify the profile Naïve Bayes 4 classic categories: SE, BA, MT, ST Asses profile backgrounds components with multi class probabilities K-means All 5 categories Identify similar profiles
  9. 9. Data Product bit.ly/cybads
  10. 10. Data Product Naïve Bayes Multi class probabilities Random Forest
  11. 11. Data Product K-means clustering
  12. 12. Next Steps Data Collection Data Analysis Feature Extraction Model Testing Data Product Get more data: - Other websites - Indeed - User input on Web app - Fine grained parsing of education - Experiment with additional features (industry, years of experience) • Extend feature set and test more models • Fuzzy C-means • Add interactive data collection • Personalized links for skills • Explanation about similarity results Close the loop by analyzing job offers and suggest matching profiles
  13. 13. Thank you! Technologies Web App: Flask, jQuery, Vega, MongoDB NMF, HC, RF ,DT, NB, K-means models:: scikit-learn Visualizations: Vincent, Vega, NetworkX, Gephi Acknowledgement yatish27 : Ruby Linkedin public profile Web Scraper ozgut : Linkedin API Python wrapper

×