Invited talk, on Discovering Insight from Scholarly Research Output in Higher Educational Institutions, given to faculty staff from the Faculty of Computing and Informatics [1] at Namibia University of Science and Technology [2]. The video screencast of the talk is available online [3].
[1] https://fci.nust.na
[2] https://nust.na
[3] https://youtu.be/pP7a15T67oo
Hierarchy of management that covers different levels of management
Data-Driven Insights in Higher Ed
1. Data-Driven Problem Solving in
Higher Education Institutions
Lighton Phiri <lighton.phiri@unza.zm>
Department of Library & Information Science
University of Zambia
http://lis.unza.zm/~lightonphiri
Discovering Insight from
Scholarly Research Output in
Higher Educational Institutions
2. 2
October 27, 2021
About The DataLab Research Group at The
University of Zambia
● The DataLab research group
at The University of Zambia is
composed of faculty staff and
students—undergraduate
and postgraduate—working
in three main areas
○ Data Mining
○ Digital Libraries
○ Technology-Enhanced
Learning
http://datalab.unza.zm
3. 3
October 27, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Insights from Scholarly Research Data
4. 4
October 27, 2021
Outline
● Part I. Data-Driven Problem Solving
○ Introduction
○ Data Mining Pipelines
○ Data Mining Models
○ Past and Current Projects
● Part II. Insights from Scholarly Research Data
5. 5
October 27, 2021
Machine Learning Instrumental in Deriving
Insights
https://commons.wikimedia.org/
● Artificial Intelligence encompases
a broad spectrum of sub-fields
○ Traditional machine learning
techniques and approaches
○ Deep Learning approaches
6. 6
October 27, 2021
Machine Learning Instrumental in Deriving
Insights
https://commons.wikimedia.org/
● Artificial Intelligence encompases
a broad spectrum of sub-fields
○ Traditional machine learning
techniques and approaches
○ Deep Learning approaches
7. 7
October 27, 2021
Past and Current Projects Leverage Machine
Learning With Data as a Key Ingredient
8. 8
October 27, 2021
Project #1: Predicting Student Learning
Outcomes—Problem
● ICT 1110 performance is as issue. The poor performance
transcends all assessments: quizzes, tests and practical
programming questions.
9. 9
October 27, 2021
Project #1: Predicting Student Learning
Outcomes—Current Work
● Potential solution: implement a prediction model aimed at
identifying at-risk students .
○ Initiate interventions on at-risk students.
Chaibela, M., Chisha, I., Pungwa, D., Siabbaba D. and Simukoko B. (2021)
“Performance Predictor: Machine Learning Tool for Student Performance Outcomes”.
Work-in-Progress
10. 10
October 27, 2021
Project #2: Medical Imaging Workflows in
Zambia—Problem
https://mjz.co.zm/index.php/mjz/article/view/560
13. 13
October 27, 2021
Data-Drive Problem Solving Pipelines are
Generic
● Fundamentally,
machine learning
aims to extract
knowledge from
data
○ Historical data is
used to
infer/predict
outcomes
associated with
new observations
14. 14
October 27, 2021
Data-Drive Problem Solving Pipelines are
Generic
● Input features
identified during
feature engineering
are used to train
models
○ Features correlated
with outcome to be
identified
15. 15
October 27, 2021
Data-Drive Problem Solving Pipelines are
Generic
● The ML inference
model is used to
predict future
patterns
○ Models can then be
deployed as Web
services and/or
standalone
applications
16. 16
October 27, 2021
Established Data Mining Models Crucial to
Data-Driven Problem Solving (1/4)
https://doi.org/10.1017/S0269888910000032
● Numerous data
mining models and
frameworks have
been proposed
○ Most trace their
roots from the
KDD Process
proposed by
Fayyad et al.
17. 17
October 27, 2021
Established Data Mining Models Crucial to
Data-Driven Problem Solving (2/4)
https://doi.org/10.1017/S0269888906000737
18. 18
October 27, 2021
Established Data Mining Models Crucial to
Data-Driven Problem Solving (2/4)
https://doi.org/10.1017/S0269888906000737
19. 19
October 27, 2021
Established Data Mining Models Crucial to
Data-Driven Problem Solving (3/4)
https://www.kdnuggets.com
● CRISP-DM model is one
of the most widely
used data mining
models
● Data understanding
and preparation are
the most time
consuming
20. 20
October 27, 2021
Established Data Mining Models Crucial to
Data-Driven Problem Solving (4/4)
https://arxiv.org/abs/2003.05155
21. 21
October 27, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Insights from Scholarly Research Data
○ Problem and Motivation
○ Data Sources, Preprocessing and Preparation
○ Scholarly Research Output Insights
22. 22
October 27, 2021
Online Visibility of Scholarly Research in
Zambia—Problem (1/3)
https://worldmapper.org
23. 23
October 27, 2021
Online Visibility of Scholarly Research in
Zambia—Problem (1/3)
https://worldmapper.org
26. 26
October 27, 2021
Online Visibility of Scholarly Research in
Zambia—Problem (3/3)
http://www.webometrics.info
27. 27
October 27, 2021
Online Visibility of Scholarly Research in
Zambia—Problem (3/3)
http://www.webometrics.info
28. 28
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (1/5)
https://ir.nust.na
29. 29
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (2/5)
http://dspace.unza.zm
30. 30
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (2/5)
http://dspace.unza.zm
http://journals.unza.zm
31. 31
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (3/5)
● Textual content mined from PDF
manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined from
metadata for training
● PDF document metadata
● Curated datasets from external
repositories
32. 32
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (3/5)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
33. 33
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (3/5)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
34. 34
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (3/5)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
35. 35
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (4/5)
● OAI-PMH used to
harvest all ETD
descriptive metadata
elements
● OAI-ORE used to
harvest all ETD PDF
documents
36. 36
October 27, 2021
Data Sources, Collection, Preprocessing and
Preparation (5/5)
● Text features extracted from a set of core
bitstream portions—ETD Title, ETD
Abstract, ETD Title Page and ETD pages—to
classify ETD manuscripts
ETD Type
ETD Subjects
IR Collection
37. 37
October 27, 2021
Quantifying The Online Visibility of Research
in Zambia
Phiri, L. (2018). ICICT 2018
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
38. 38
October 27, 2021
Quantifying The Online Visibility of Research
in Zambia
Phiri, L. (2018). ICICT 2018
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
39. 39
October 27, 2021
Quantifying The Online Visibility of Research
in Zambia
Phiri, L. (2018). ICICT 2018
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
40. 40
October 27, 2021
Improved Visibility through Automatic
Classification of ETDs
● Implementation of classification models to
automatically classify IR digital objects
using the minimum possible input from
graduate students: “The ETD Manuscript”
○ The ETD manuscript bitstream is considered
the “single source of truth”
○ Metadata prepared by staff that work with IR
potentially have inconsistencies
Phiri, L. (2021). IJMSO Vol. 14, No. 3
“Automatic Classification of Digital Objects for Improved Metadata Quality of ETDs”
URL: https://doi.org/10.1504/IJMSO.2020.112804
41. 41
October 27, 2021
Improved Discoverability of Digital Objects
in Institutional Repositories
Chipangila, B. et al. (2021). JCDL 2021
“Improved Discoverability of Digital Objects in IRs Using Controlled Vocabularies”
URL: https://doi.org/10.1109/JCDL52503.2021.00022
42. 42
October 27, 2021
Beyond Insights into Scholarly Research
Landscape in Zambia
http://lis.unza.zm/portal
44. [1] Phiri, L. (2018). Research Visibility in the Global South: Towards
Increased Online Visibility of Scholarly Research Output in
Zambia. IEEE International Conference in Information and
Communication Technologies.
[2] Chipangila, B. et al. (2021). Improved Discoverability of Digital
Objects in Institutional Repositories Using Controlled
Vocabularies
[3] Phiri, L. (2020). Automatic classification of digital objects for
improved metadata quality of electronic theses and dissertations
in institutional repositories. International Journal of Metadata,
Semantics and Ontologies, 14(3), 234-248.
Bibliography