Invited talk, on Using Machine Learning Techniques for Solving Locally Relevant Problems, given as part of the Deep Learning Indaba X - Zambia event [1]. The video screencast of the talk is available online [2].
[1] https://deeplearningindaba.com/2021/indabax/indabax-zambia
[2] https://youtu.be/rYi-Jg10Rno
Using Machine Learning Techniques for Solving Locally Relevant Problems
1. Deep Learning
Indaba X - Zambia 2021
Lighton Phiri <lighton.phiri@unza.zm>
Department of Library & Information Science
University of Zambia
http://lis.unza.zm/~lightonphiri
Using Machine Learning Techniques
for Solving
Locally Relevant Problems
2. 2
May 25, 2021
About The DataLab Research Group at The
University of Zambia
● The DataLab research group
at The University of Zambia is
composed of faculty staff and
students—undergraduate
and postgraduate—working
in three main areas
○ Data Mining
○ Digital Libraries
○ Technology-Enhanced
Learning
http://datalab.unza.zm
3. 3
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
● Part III. Potential Problems
4. 4
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
○ Introduction
○ Data Mining Pipelines
○ Data Mining Models
● Part II. Past and Current Projects
● Part III. Potential Problems
5. 5
May 25, 2021
Machine Learning 101 [...]
https://commons.wikimedia.org/
● Artificial Intelligence encompases
a broad spectrum of sub-fields
○ Traditional machine learning
techniques and approaches
○ Deep Learning approaches
6. 6
May 25, 2021
Machine Learning 101 [...]
https://commons.wikimedia.org/
● Artificial Intelligence encompases
a broad spectrum of sub-fields
○ Traditional machine learning
techniques and approaches
○ Deep Learning approaches
8. 8
May 25, 2021
Data Mining Pipelines
● Fundamentally,
machine learning
aims to extract
knowledge from
data
○ Historical data is
used to
infer/predict
outcomes
associated with
new observations
9. 9
May 25, 2021
Data Mining Pipelines
● Input features
identified during
feature engineering
are used to train
models
○ Features correlated
with outcome to be
identified
10. 10
May 25, 2021
Data Mining Pipelines
● The ML inference
model is used to
predict future
patterns
○ Models can then be
deployed as Web
services and/or
standalone
applications
11. 11
May 25, 2021
Data Mining Models (1/5)
https://doi.org/10.1017/S0269888910000032
● Numerous data
mining models and
frameworks have
been proposed
○ Most trace their
roots from the
KDD Process
proposed by
Fayyad et al.
12. 12
May 25, 2021
Data Mining Models (2/5)
https://doi.org/10.1017/S0269888910000032
13. 13
May 25, 2021
Data Mining Models (2/5)
https://doi.org/10.1017/S0269888910000032
14. 14
May 25, 2021
Data Mining Models (2/5)
https://doi.org/10.1017/S0269888910000032
15. 15
May 25, 2021
Data Mining Models (3/5)
https://doi.org/10.1017/S0269888906000737
16. 16
May 25, 2021
Data Mining Models (3/5)
https://doi.org/10.1017/S0269888906000737
17. 17
May 25, 2021
Data Mining Models (4/5)
https://www.kdnuggets.com
● CRISP-DM model is one
of the most widely
used data mining
models
● Data understanding
and preparation are
the most time
consuming
19. 19
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
○ Scholarly Research Output in Zambia
○ Predicting Learning Outcome at UNZA
○ Medical Imaging Workflows in Zambia
○ Automatic Weather Prediction in Zambia
● Part III. Potential Problems
20. 20
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
○ Scholarly Research Output in Zambia
○ Predicting Learning Outcome at UNZA
○ Medical Imaging Workflows in Zambia
○ Automatic Weather Prediction in Zambia
● Part III. Potential Problems
21. 21
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (1/4)
https://worldmapper.org
22. 22
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (1/4)
https://worldmapper.org
25. 25
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (3/4)
http://www.webometrics.info
26. 26
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (3/4)
http://www.webometrics.info
27. 27
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (4/4)
Phiri, L. (2018)
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
28. 28
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (4/4)
Phiri, L. (2018)
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
29. 29
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Problem (4/4)
Phiri, L. (2018)
“Towards Increased Online Visibility of Scholarly Research Output in Zambia”.
URL: http://lis.unza.zm/archive/handle/123456789/227
34. 34
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (1/7)
● Implementation of classification models to
automatically classify IR digital objects
using the minimum possible input from
graduate students: “The ETD Manuscript”
○ The ETD manuscript bitstream is considered
the “single source of truth”
○ Metadata prepared by staff that work with IR
potentially have inconsistencies
Phiri, L. (2021)
“Automatic Classification of Digital Objects for Improved Metadata Quality of ETDs”
URL: https://doi.org/10.1504/IJMSO.2020.112804
35. 35
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (2/7)
● Text features extracted from a set of core
bitstream portions—ETD Title, ETD
Abstract, ETD Title Page and ETD pages—to
classify ETD manuscripts
ETD Type
ETD Subjects
IR Collection
36. 36
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (3/7)
● Textual content mined from PDF
manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined from
metadata for training
● PDF document metadata
● Curated datasets from external
repositories
37. 37
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (3/7)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
38. 38
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (3/7)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
39. 39
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (3/7)
● Textual content mined
from PDF manuscripts
○ Cover/title pages
○ Preliminary pages
● Textual content mined
from metadata for
training
● PDF document metadata
● Curated datasets from
external repositories
40. 40
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (4/7)
● OAI-PMH used to
harvest all ETD
descriptive metadata
elements
● OAI-ORE used to
harvest all ETD PDF
documents
41. 41
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (5/7)
● ETD Type—98.1%
● ETD Collection— 81.1%
● ETD Subjects—81.7%
● The models would still
need to be
incorporated into an
application that
requires “some”
human intervention
42. 42
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (6/7)
https://github.com/lightonphiri/etd_autoclassifier
43. 43
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—ETDs Automatic Classification (7/7)
https://datalab-apis.herokuapp.com/api/collection
44. 44
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Current Work (1/3)
M’sendo R. (2019—Present)
MSc Computer Science, University of Zambia
“Multi-Faceted Automatic Classification of Institutional Repository Objects”
45. 45
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Current Work (2/3)
Chisale A. (2021—Present)
MLIS, University of Zambia
“Automatic Generation of Electronic Theses and Dissertations Metadata”
46. 46
May 25, 2021
Project #1: Online Visibility of Research in
Zambia—Current Work (3/3)
http://lis.unza.zm/portal
47. 47
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
○ Scholarly Research Output in Zambia
○ Predicting Learning Outcome at UNZA
○ Medical Imaging Workflows in Zambia
○ Automatic Weather Prediction in Zambia
● Part III. Potential Problems
48. 48
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Problem (1/2)
● ICT 1110 performance is as issue. The poor performance
transcends all assessments: quizzes, tests and practical
programming questions.
49. 49
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Problem (1/2)
● ICT 1110 performance is as issue. The poor performance
transcends all assessments: quizzes, tests and practical
programming questions.
50. 50
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Problem (2/2)
● Potential solution: implement a prediction model aimed at
identifying at-risk students .
○ Initiate interventions on at-risk students.
52. 52
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Data Sources (2/5)
● Assessment results
broken down by question
○ Concepts associated with
question
○ Topics associated with
question
53. 53
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Data Sources (3/5)
● Assessment results broken
down by question
○ Concepts associated with
question
○ Topics associated with
question
54. 54
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Data Sources (4/5)
● LMS interaction logs
○ How often do
students access
Moodle (login
attempts)
○ Which Moodle
features are being
access (GradeBook,
Messaging)
○ Time spent on Moodle
55. 55
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Data Sources (5/5)
● ICT 1110 information survey to
capture information not available
in SIS
○ Experience with computers
○ Motivation for taking the course
○ Specific location where student lives
(although this can be inferred from
next of kin address perhaps?)
56. 56
May 25, 2021
Project #2: Predicting Student Learning
Outcomes—Current Work
Chaibela, M., Chisha, I., Pungwa, D., Siabbaba D. and Simukoko B. (2021)
“Performance Predictor: Machine Learning Tool for Student Performance Outcomes”.
Work-in-Progress
57. 57
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
○ Scholarly Research Output in Zambia
○ Predicting Learning Outcome at UNZA
○ Medical Imaging Workflows in Zambia
○ Automatic Weather Prediction in Zambia
● Part III. Potential Problems
58. 58
May 25, 2021
Project #3: Medical Imaging Workflows in
Zambia—Problem
https://mjz.co.zm/index.php/mjz/article/view/560
64. 64
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
○ Scholarly Research Output in Zambia
○ Predicting Learning Outcome at UNZA
○ Medical Imaging Workflows in Zambia
○ Automatic Weather Prediction in Zambia
● Part III. Potential Problems
66. 66
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
● Part III. Potential Problems
○ Exemplar Projects in Zambia
○ Potential Locally Relevant Problems
67. 67
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
● Part III. Potential Problems
○ Exemplar Projects in Zambia
○ Potential Locally Relevant Problems
68. 68
May 25, 2021
Agriculture: Automatic identification and
Early Warning of Fall Armyworms
http://dspace.unza.zm/handle/123456789/7141
71. 71
May 25, 2021
Outline
● Part I. Data-Driven Problem Solving
● Part II. Past and Current Projects
● Part III. Potential Problems
○ Exemplar Projects in Zambia
○ Potential Locally Relevant Problems
72. 72
May 25, 2021
Potential Locally Relevant Problems in
Zambia (1/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
73. 73
May 25, 2021
Potential Locally Relevant Problems in
Zambia (2/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
Zambia Daily Mail | August 18, 2019 | Volume 22 No. 033
74. 74
May 25, 2021
Potential Locally Relevant Problems in
Zambia (3/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
75. 75
May 25, 2021
Potential Locally Relevant Problems in
Zambia (3/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
76. 76
May 25, 2021
Potential Locally Relevant Problems in
Zambia (4/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
77. 77
May 25, 2021
Potential Locally Relevant Problems in
Zambia (4/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
78. 78
May 25, 2021
Potential Locally Relevant Problems in
Zambia (4/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
79. 79
May 25, 2021
Potential Locally Relevant Problems in
Zambia (4/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development perhaps?
80. 80
May 25, 2021
Potential Locally Relevant Problems in
Zambia (5/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development
perhaps?
81. 81
May 25, 2021
Potential Locally Relevant Problems in
Zambia (5/6)
● Impact-driven
research/studies
○ Education
○ Health
○ So-called ICT for
development
perhaps?
82. 82
May 25, 2021
Potential Locally Relevant Problems in
Zambia (6/6)
● Education
● Health
● So-called ICT for
development
perhaps?
83. 83
May 25, 2021
Potential Locally Relevant Problems in
Zambia (6/6)
● Education
● Health
● So-called ICT for
development
perhaps?
85. [1] Phiri, L. (2018). Research Visibility in the Global South: Towards
Increased Online Visibility of Scholarly Research Output in
Zambia. IEEE International Conference in Information and
Communication Technologies.
[2] Phiri, L. (2020). A Multi-Faceted Multi-Stakeholder Approach for
Increased Visibility of ETDs in Zambia. Cadernos BAD, (1).
https://doi.org/10.1017/S0269888910000032
[3] Phiri, L. (2020). Automatic classification of digital objects for
improved metadata quality of electronic theses and dissertations
in institutional repositories. International Journal of Metadata,
Semantics and Ontologies, 14(3), 234-248.
Bibliography