1. Squeezing biggish job market data
onto a laptop
Alan Mark Berg BSc.MSc.PGCE.
a.m.berg@uva.nl
2. Agenda
•Overview
• Who I am and what am I doing?
• Area of research
• Technique
•Example Results:
• Stereotypes
• Female
• IT
• Discrimination
•Question and Answers
• Refinements
• References
3. Who am I?
A rather mature, external PhD Candidate in Learning Analytics.
2. Hard Science Background: Physics, microelectronics with computational
engineering, experimental Science
3. Pragmatic: Last 17 years involved in Design and development of large scale
IT systems @UvA
○ Wishes to use the simplest technique possible for a given task.
4. Author of 4 books
5. Busy with open source communities.
○ Considers the best place to curate software
6. Stephan and Gabor are my co-supervisors. Prof Robin Boast my supervisor.
7. Status: In the process of writing up the research and then finishing the PhD.
8. Initial Infrastructure, standards papers published (see references)
4.
5. Technique
❑ 3 million UK job adverts – 1150 million words - Thank you
Monsterboard.
❑ Simplest possible scenario
❑ Bag of words
❑ Unigrams
❑ Perl to process the text
❑ R language: Inferential Statistic and visualization
❑ CATA: Frequency of words
❑ Mapped job dataset to SOC 2010 occupation categories
❑ From SOC 2010 categories merged UK Labour force survey
9. Dispersion IT
Skills
- Monitor skill dispersion
- Has implications for policy
and training
- Has implications for risks
within occupations such as
the deployment of IT
projects.
12. Discrimination
Diffusion process into the central region where
men and women are more equally represented.
Color:
Red = Highest percentage of female wording
Notice the large amount of green (less wording)
in 2013
14. Refinements
❏Inferential Statistics
❏From Unigram to Bigram
❏Cleaner data sources
❏Multiple languages
❏Compare to specific surveys
❏Generation of many dictionaries
❏From dictionary to taxonomies
❏From research to practice
15. References
Motivation: We can develop large scale systems without using sensitive data.
Berg, A. M., Mol, S. T., Kismihók, G., & Sclater, N. (2016). The role of a reference synthetic data generator within the
field of learning analytics. Journal of Learning Analytics, 3, 107–128. http://doi.org/10.18608/jla.2016.31.7
Motivation: We need to add new xAPI profiles and be consistent to avoid issues with connecting systems
Berg, A., Scheffel, M., Drachsler, H., Ternier, S., & Specht, M. (2016). The dutch xAPI experience. In Proceedings of
the Sixth International Conference on Learning Analytics & Knowledge - LAK ’16 (pp. 544–545). New York, New York,
USA: ACM Press. http://doi.org/10.1145/2883851.2883968
Motivation: We need to add new xAPI profiles and be consistent to avoid issues with connecting systems
Berg, A., Scheffel, M., Drachsler, H., Ternier, S., & Specht, M. (2016). Dutch Cooking with xAPI Recipes: The Good,
the Bad, and the Consistent. In 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT)
(pp. 234–236). IEEE. http://doi.org/10.1109/ICALT.2016.48
Motivation: To contribute to the discussion around LA infrastructural elements, hence providing a means to consistency
Sclater, N., Berg, A., & Webb, M. (2015). Developing an open architecture for learning analytics. Proceedings of the
EUNIS 2015 Congress. http://doi.org/ISSN: 2409-1340