Keynote at the IEEE ICSC Workshop on Semantic Machine Learning (#SML21: https://ist.gmu.edu/~hpurohit/events/sml21/#keynote):
Video of SML21: https://www.youtube.com/watch?v=cx-l0XDk9Tw
The recent series of deep learning innovations have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
Semantics of the Black-Box: Using knowledge-infused learning approach to make AI systems more interpretable and explainable
1. Semantics of the Black-Box:
Using knowledge-infused learning approach
to make AI systems more interpretable and
explainable
Keynote @ KGSWC 2020: http://www.kgswc.org/
2. 2
Amit Sheth
Founding Director,
Artificial Intelligence Institute http://aiisc.ai
The University of South Carolina
amit@sc.edu https://www.linkedin.com/in/amitsheth/
Special Thanks
Kaushik Roy
AIISC, kaushikr@email.sc.edu
Manas Gaur
AIISC, mgaur@email.sc.edu
Some of the K-iL collaborators:
Ruwan Wickramarachchi (AI Institute)
Shweta Yadav
Ugur Kurşuncu (AI Institute)
Keyur Faldu (Embibe Inc.)
Qi Zhang (AI Institute)
Vishal Pallagani (AI Institute)
...
4. Outline of the talk
❏ Knowledge Graph
❏ Knowledge Graph meets Deep Learning:
Knowledge-infused Learning
❏ K-IL in Explainability and Interpretability in
Healthcare
❏ K-IL for Explainability and Interpretability in
Adaptive Contagion Control
❏ K-IL : Explainable Improving of Learning
Outcomes 4
6. Definition
6
Knowledge Graphs (KG) is a
structured knowledge in a graph
representation (in many cases, labeled
property graph, or RDF or its variants). We
cannot escape the class expressivity-
computability
Tread-off.
Community is still debating exact
definition.
Key differentiator: Relationships
(“relationships at the heart of semantics”).
Different/Related forms:
● Ontology : Knowledge graph after human
curation of entities and relations;
“ontological commitment”, richer KR
● Knowledge Base: flattened graph
● Lexicons: Small application-specific
flattened graph
● Knowledge Networks (KN) integrate
and combine knowledge (usually
captured as KGs) to serve a network
(community).
Knowledge Graphs and Knowledge Networks: The Story in Brief
7. 7
First commercial semantic search/browsing/… on the Web and
for the content on the Web using KG. Term used for KR:
WorldModel, Ontology http://bit.ly/15yrSemS
Creation and Use of Knowledge ~ 2000
8. Proliferation Broad-based & Domain-Specific KGs
8
Examples of General Purpose Knowledge Graphs
1. DBpedia [Auer 2007, Lehmann 2015]
2. Yago [Rebele 2016]
3. Freebase [Bollacker 2008]
4. ConceptNet [Speer 2017]
5. Knowledge Vault [Dong 2014]
6. NELL [Mitchell 2018]
7. Wikidata [Vrandečić 2014]
Example of Healthcare-specific Knowledge Graphs
1. SNOMED-CT [ACL Chang 2020]
2. Unified Medical Language System (UMLS) [Yip 2019]
3. DataMed [JAMIA Chen 2018]
4. International Classification of Diseases (ICD-10)
[JAMIA Choi 2016]
5. DrugBank, Rx-NORM and MedDRA [ BMC Celebi 2019]
6. Drug Abuse Ontology [BMI Cameron 2013]
Many are also community-developed.
9. Enterprise Knowledge Graphs are also very popular
9
KG enabled Web and
Enterprise Applications:
Google, Amazon, Microsoft,
Siemens, LinkedIn, Airbnb,
eBay, and Apple, as well as
smaller companies (e.g. ezDI,
Franz, Metaphactory/
Metaphacts, Semantic Web
Company, Mondeca, Stardog,
Diffbot, Siren).
Enterprise KG development
service is also available.
(Maana). Industry-Scale Knowledge Graphs: Lessons and Challenges (Communications of the ACM, August 2019)
10. 10
Health Knowledge GraphEmpathi Ontology
IRI: https://w3id.org/empathi/1.0
Download:
https://raw.githubusercontent.com/shekarpour/emp
athi.io/master/empathi.owl [Shah and Sheth US patent 2015]
11. “
11Purohit, Hemant, Valerie L. Shalin, and Amit P. Sheth. "Knowledge Graphs to Empower Humanity-Inspired AI Systems." IEEE Internet Computing 24.4 (2020): 48-54.
13. Why Knowledge Graphs? Challenges in NLP/NLU
● Natural Language Processing Challenges:
○ How do you learn quickly from small amount of data?
○ How do you mine (varied) relationships from existing text?
○ How do you reliably classify entities into known ontology?
○ Better contextualization of words
● Natural Language Understanding Challenges:
○ Query Interpretation or Understanding the user question
○ Answering the question with Trust and Transparency
○ How to measure “reasonability” and “meaningfulness” of the response to a
question?
○ How much context is needed to provide a precise response?
[Stanford Knowledge Graph Seminar 2020, Amit Prakash , Leilani Gilpin] 13
14. Better Contextualization of Words : Retrofitting
14
Why Knowledge Graphs : NLP/NLU Challenges
damage
Infrastructure
affected
population
damage
Infrastructure
affected
population
Vector representation of words in Tweets
(embedding) before retrofitting
Vector representation of words in
Tweets after retrofitting
MOAC Ontology
Empathi ontology
Disaster Ontology
DBpedia
16. 16
Knowledge
Extraction
Knowledge
Alignment
Knowledge
Cleaning
Knowledge Mining &
Knowledge-based QA
Data Extraction
(NLP, Web)
Wrapper Induction
(DB, DM-Data
Mining)
Web Tables (DB)
Text Mining (DM)
Entity and
Relationship Linking
[Perera 2016]
Schema Mapping
and Ontology
Mapping
[Jain 2010]
Universal Schema
[Sheth 1990]
Data Cleaning
[Jadhav 2016]
Anomaly Detection
[Anantharam 2012,
2016]
Knowledge Fusion
[Sheth 2020,
Kapanipathi 2020,
Gaur 2018,
Kursuncu 2020]
Graph Mining [Lalithsena
2016, 2017, 2018]
Knowledge Embedding
[Wickramarachchi 2020,
Gaur 2018]
Search [Sheth 2003,
Cheekula 2015, Kho
2019]
QA [Alambo 2019,
Shekarpour 2017]
[Stanford Knowledge Graph Seminar 2020, Luna Dong]
Knowledge Graphs in DL pipeline for NLP
17. Knowledge graphs in Conversational AI
19
Personalization: taking into account
the contextual factors such as user’s
health history, physical
characteristics, environmental
factors, activity, and lifestyle.
Chatbot with contextualized (e.g asthma) knowledge is
potentially more personalized and engaging.
Without
Contextualized Personalization
With
Contextualized Personalization
18. Knowledge for Multimodal Data: Example of City Traffic Event
20Anantharam, Pramod, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. "Extracting city traffic events from social
streams." ACM Transactions on Intelligent Systems and Technology (TIST) 6, no. 4 (2015): 1-27.
19. Why Knowledge Graphs: Shortcomings of Deep Learning
21
● Graph Convolutional Neural Networks (GCN) are blind to relation types. For example: <shelter-
in-place causes anxiety> and <shelter-in-place prevents anxiety> have similar representations
in GCN.
● Deep Clustering over unlabeled data exploits the inherent latent semantics to generate diverse
and cohesive clusters. But, interpretability of the clusters requires Knowledge Graphs.
ODKG: Opioid
Drug Knowledge
Graph
[Kamdar 2019]
20. Symbolic glued with Statistical: Knowledge-infused Learning
22
STATISTICAL AI
CONNECTIONIST
“Unreasonable effectiveness of big data”
in machine processing &
powering bottom up processing
“Unreasonable effectiveness of small
data” in human decision making - can this
be emulated to power top down
processing?
SYMBOLIC AI
FORMAL
KG will play an increasing role in developing hybrid neuro-symbolic systems (that is bottom-up
deep learning with top-down symbolic computing) as well as in building explainable AI systems
for which KGs will provide scaffolding for punctuating neural computing.
Cognitive Science Analogy: Combining Top Brain - Bottom Brain Processes.
22. How do ensure consistency of
labeling, esp when label is not
binary?
Do labels represent adequate
semantics (e.g., number of
alternatives)?
Do they have adequate domain
knowledge?
How do you ensure consistency of
labeling (interpretation)? 24
A good KG has addressed these
issues:
● a schema is rich in representation
(and captures much more than
labeling)
● KG design incorporate
substantiate domain knowledge
● Instance level knowledge is
created through (usually)
collective intelligence and
Challenges in Deep Learning : Why K-IL
23. Why Knowledge Infused Learning (K-IL)?
By changing the inputs, it can enrich the
representation (E.g. Radicalization on Social
Media)
By changing parameters, we can control
the learned patterns/correlations learned to
adhere to the knowledge.
Deep Infusion would allow us finger
grained control over learned patterns to
ensure adherence to knowledge at every
step of the hierarchy
Explanations easy to derive from the KG
used
25Jiang, Shan, William Groves, Sam Anzaroot, and Alejandro Jaimes. "Crisis Sub-Events on Social Media: A
Case Study of Wildfires."
Contextual Modeling to
analyze Radicalization on
Social Media
24. 26
Knowledge-infused Learning (K-IL)
of knowledge graphs
to improve the
semantic and
conceptual
processing of data.
Semi-Deep Infusion
Deeper and congruent
incorporation or
integration of the
knowledge graphs in the
learning techniques. Deep Infusion
(Part of Future KG Strategy)
combines statistical AI
(bottom-up) and symbolic AI
learning techniques (top-
down) for hybrid and
integrated intelligent systems.
Shallow Infusion
Sheth, Gaur, Kursuncu, Wickramarachchi: Shades of Knowledge-Infused Learning for Enhancing Deep Learning
25. 27
Shallow Infusion of Knowledge for Machine/ Deep Learning in
Brief
Chronological
arrangement of shallow
Infusion techniques
From NLP domain
26. 28
K-IL: Shallow Infusion (shallow KR, shallow merging technique)
Knowledge infused is shallow, method of infusion is week.
Shallow external knowledge is described as those form of information which are
extracted from text based on some heuristics, often designed for task-specific problems:
○ Bag of Words/Phrases from Corpus [Hagoort 2004, Zhang 2019, Sun 2019]
○ Bag of Words/Phrases from Semantic Lexicons [Faruqui 2014, Mrkšić 2016]
○ Count of Nouns, Pronouns, Verbs [Gkotsis 2017, 2016]
○ Sentiment and Emotions of the sentence [Gaur 2019, Vedula 2017, Kursuncu 2019]
○ Latent topics describing the documents [Jiang 2016, Li 2016, Meng 2020]
○ Label assignment to words or phrases in sentence (Semantic Role Labeling):
Mary sold the book to John
Agent ThemePredicate Recipient
27. 29
K-IL: Shallow Infusion: Explaining Clustering
Identifiable Suicide Risk Factors from Electronic
Healthcare Records
Identifiable Suicide Risk Factors from Social
Media
Question: What people say to Clinician?
Question: What people hide from Clinician?
Question: What people say to Social Media?
Question: What people hide from Social MediaMissing
Information
28. 30
K-IL: Shallow Infusion:
Knowledge Graph Embeddings for Autonomous Driving
Scene KG KG Embeddings of objects/events Computed Scene Similarity
Wickramarachchi, Ruwan., Henson, Cory., and Sheth, Amit. An evaluation of knowledge graph embeddings for autonomous driving data: Experience and practice.
In AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020).
30. 33
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
Scenario
Really struggling with my bisexuality which is causing chaos in my relationship with a girl. Being
a fan of LGBTQ community, I am equal to worthless for her. I’m now starting to get drunk
because I can’t cope with the obsessive, intrusive thoughts, and need to get out of my head.
BPD
DICD PND SAD SBI OCD
Don’t want to live anymore. Sexually assault, ignorant family members and my never
ending loneliness brights up my path to death.
SCW
PND SBI SAD DPR DICD
DPR
I do have a potential to live a decent life but not with people who abandon me.
Hopelessness and feelings of betrayal have turned my nights to days. I am developing
insomnia because of my restlessness.
SBI DPR DICD
BPD I just can’t take it anymore. Been abandoned yet again by someone I cared about. I've been
diagnosed with borderline for a while, and I’m just going to isolate myself and sleep forever.
SBI PND
Reddit DSM-5 [Gaur 2018]
31. 34
TwADR
AskaPatient
Drug Abuse
Ontology
DSM-5 Lexicon
Suicide Risk
Severity Lexicon
Treatment Information
Observation and
Drug-related
Information
Mental Health Condition
Suicide Risk Levels
Ideation
Behavior
Attempt
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
Mapping Subreddit to DSM-5 categories using Mental health Knowledge Bases
32. 35
Medical KnowledgeBases
N-grams
(n=1, 2, 3)
LDA
LDA over
Bi-grams
Normalized
Hit
Score
DSM-5
Lexicon
<Reddit Post>
<Subreddit Label>
Input
<Reddit Post>
<DSM-5 Label>
Output
DAO
Drug Abuse
Ontology
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
Matching process from Reddit to DSM-5
34. 37
12808
Words
300 dimension embedding 300 dimension embedding
20 DSM-5
Categories
R
D
Reddit Word
Embedding Model
DSM-5 -DAO
Lexicon
W
Solvable Sylvester Equation
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
35. 38
I know you want me to say no and that it is a
part of me blah blah blah. But I can't.
Honestly, not having bipolar disorder would be
a huge blessing. I would be so much happier
and could control my life better. I wouldn't
have frantic, scattered thoughts and
depression. I would be normal, happy, and
less dramatic.
Bipolar Subreddit
DSM-5: Depressive Disorder
I know you want me to say no and that it is a
part of me blah blah blah. But I can't. Honestly,
not having bipolar disorder would be a huge
blessing. I would be so much happier and
could control my life better. I wouldn't have
frantic, scattered thoughts and depression. I
would be normal, happy, and less dramatic.
BiPolar
Depression
Disorder
Subreddits DSM-5
Chapter
BiPolarReddit
BiPolarSOS
Depression
Addiction
Substance use &
Addictive Disorder
Crippling Alcoholism
Opiates Recovery
Opiates
Self-Harm
Stop Self-Harm
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
Example posts after Mapping Subreddit to DSM-5
categories
Mappings provides explainability
36. 39
K-IL: Semi-Deep Infusion : Matching Reddit Conversation to DSM-5
Domain-specific
Knowledge lowers
False Alarm Rates.
2005-2016
550K Users
8 Million
Conversations
15 Mental Health
Subreddits
[Gkotsis 2017][Saravia 2016]
[Park 2018]
Performance Gains in the outcomes
37. Semi-deep infusion in Reinforcement Learning
40
Consider a gathering event at a
rally [Tablighi Jamaat
Movement]
Many fatalities and economic
cost incurred before an SIR
model recognises this event
(delay)
Any policy by the policy maker
at this point might be too late
to instate.
A Knowledge infused policy
where the knowledge is -
[lockdown the location of rally
and test everyone,] can greatly
mitigate this effect.
Image taken from: https://towardsdatascience.com/reinforcement-learning-for-covid-19-simulation-and-optimal-policy-b90719820a7f
How?
39. An Explainable system would comprise of collectively
exhaustive interpretable subsystems and orchestration
among them.
Explanations would be in natural language explaining
the decision making process.
Interpretable system provides an ability to discern the
internal mechanisms of any module.
Neural Attention Models are endowed with certain
degree of interpretability in visualizing parts of the
input without providing human understandable
explanations.
Explainable System is
Interpretable but not
vice versa
40. 44
Really struggling with my bisexuality which is causing chaos in my relationship with a girl.
Being a fan of LGBTQ community, I am equal to worthless for her. I’m now starting to get
drunk because I can’t cope with the obsessive, intrusive thoughts, and need to get it out of
my head.
Is mental health related ? Yes: 0.71 , No: 0.29
Which Mental Health condition?
Predicted: Depression (False)
True: Obsessive Compulsive Disorder
Reasoning over Model:
Why model predicted
Depression?
Unknown
41. 45
Really struggling with my bisexuality which is causing chaos in my relationship with a girl.
Being a fan of LGBTQ community, I am equal to worthless for her. I’m now starting to get
drunk because I can’t cope with the obsessive intrusive thoughts, and need to get it out of
my head.
Is mental health related ? Yes: 0.82 , No: 0.18
Which Mental Health condition?
Predicted: Obsessive Compulsive Disorder(True)
True: Obsessive Compulsive Disorder
DSM-5 Knowledge
Graph
DSM-5 and Post
Correlation Matrix
Reasoning over Model:
Why model predicted
Obsessive Compulsive
Disorder ? known
Interpretable learningD
εRN
P εRN
W f(W)
42. 46
Really struggling with my bisexuality which
is causing chaos in my relationship with a
girl. Being a fan of LGBTQ community, I am
equal to worthless for her. I’m now starting
to get drunk because I can’t cope with the
obsessive, intrusive thoughts, and need to
get out of my head.
288291000119102: High risk bisexual behavior
365949003: Health-related behavior finding 365949003: Health-related behavior finding
307077003: Feeling hopeless
365107007: level of mood
225445003: Intrusive thoughts
55956009: Disturbance in content of thought
26628009: Disturbance in thinking
1376001: Obsessive compulsive personality disorder
Multi-hop traversal on
Medical knowledge
graphs
<is symptom>
Achieving Explainability through Medical Entity Normalization :
Replacing Entities in the post with Concepts in the Medical Knowledge Graph through Semantic Annotation
43. 47
Really struggling with my [health-related behavior] which is causing [health-related
behavior] with a girl. Being a fan of [LGBTQ] community, I am equal to [level of mood] for
her. I’m now starting to [drinking] because I can’t cope with the [obsessive compulsive
personality disorder] [disturbance in thinking], and [disturbance in thinking].
Is mental health related ? Yes: 0.96 , No: 0.04
Which Mental Health condition?
Predicted: Obsessive Compulsive Disorder(True)
True: Obsessive Compulsive Disorder
DSM-5 Knowledge
Graph
DSM-5 and Post
Correlation Matrix
Reasoning over Model:
Why model predicted
Obsessive Compulsive
Disorder ? known
Interpretable and
Explainable Learning
D
εRN
P εRN
W f(W)
45. Semi-deep infusion in RL
49
Consider a gathering event at a
rally [Tablighi Jamaat
Movement],
Many fatalities and economic
cost incurred before an SIR model
recognises this event (delay)
Any policy by the policy maker at
this point might be too late to
instate.
A Knowledge infused policy
where the knowledge is -
[lockdown the location of rally
and test everyone,] can greatly
mitigate this effect.
Image taken from: https://towardsdatascience.com/reinforcement-learning-for-covid-19-simulation-and-optimal-policy-b90719820a7f
How?->
46. Explainable COVID-19 Policy
◎ Knowledge in dynamics: People go to work everyday
and do groceries at either shops in the neighborhood
or shops en-route to work.
◎ Knowledge traceable in policy choice: “There exists a
‘shop1’ en-route to a workplace, there are many
people in a neighborhood that work here and take this
route” -> encoded as a relational feature
◎ Learning algorithm assigns high weight to this feature
when the policy output is lockdown(shop1)
50
48. Bayesian Knowledge Tracing for Improving Learning Outcomes
in Education
53
Question: What is the name of the compound formed after the addition of phosphate to
glucose?
Answer: Glucose Monophosphate
Response from Student: Glucose Phosphate
Question: What is the name of the compound formed after the addition of phosphate to
adenosine diphosphate?
Answer: Adenosine Triphosphate
Response from Student: Adenosine 3-Phosphate
Can we conclude from the correct responses (if any) provided by the student, that student
knows Phosphorylation?
Piech, Chris, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J. Guibas, and Jascha Sohl-Dickstein. "Deep knowledge tracing." In Advances in neural
information processing systems, pp. 505-513. 2015.
Using Knowledge infusion, we can see
the answer is close to correct
49. KG + BKT/DKT → Explainability
54
CQ: Concepts in the questions asked
CQ, CQ: Relationships between the concepts asked in the
questions
CQ, CKG: Relationships between the concepts asked in the
questions and the concepts in the Knowledge graphs (e.g.
epubs from Amazon, NCERT textbooks, Books specific to
entrance exams, etc.)
50. 55
Donda, Chintan, Sayan
Dasgupta, Soma S.
Dhavala, Keyur Faldu,
and Aditi Avasthi. "A
framework for
predicting, interpreting,
and improving Learning
Outcomes." arXiv
preprint
arXiv:2010.02629
(2020).
K-IL for Improving Learning Outcomes
Tutorial @
ACM CoDS COMAD
https://aiisc.ai/xaikg/
53. ROBOTICS
Cross-domain Knowledge
1) Observational (sensory
data) and common-sense
knowledge to perceive the
surrounding environment
2) Knowledge
representation to model
the knowledge concerning
the surrounding
environment
3) Appropriate cross-
domain knowledge
reasoning mechanisms
COGNITIVE SCIENCE
Human Intelligence
“Inject” human
intelligence into AI
assistants such as
Amazon Alexa,
utilization of cross-
domain knowledge
of social
interactions,
emotions and
linguistic variations
of natural language.
SELF-DRIVING CARS PERSONAL ASSISTANT
Empathy and
Morality
AI agents to mimic
human emotions
and decisions, we
need to model
human emotional
knowledge of
empathy, moral,
and ethics.
Personalization
Smart health agents are
adapting to answer real-
world personalized complex
health queries in simple
interactive language.
Requires patients’
environmental knowledge,
health data, and
coordination with their
healthcare physicians.
Promising K-IL Impacts
55. 60
5 faculty, >12 PhDs, few Masters, >5
undergrads, 2 Post-Docs, >10 Research Interns
Alumni in/as
Industry: IBM T.J. Watson, Almaden, Amazon, Samsung
America, LinkedIn, Facebook, Bosch
Start-ups: AppZen, AnalyticsFox, Cognovi Labs
Faculty: George Mason, University of Kentucky, Case Western
Reserve, North Carolina State University, University of Dayton
Core AI
Neuro-symbolic computing/Hybrid AI, Knowledge
Graph Development, Deep Learning,
Reinforcement Learning, Natural Language
Processing, Knowledge-infused Learning (for deep
learning and NLP), Multimodal AI (including
IoT/sensor data streams, images), Collaborative
Assistants, Multiagent Systems (incl. Coordinating
systems of decision making agents including
humans, robots, sensors), Semantic-Cognitive-
Perceptual Computing, Brain-inspired computing,
Interpretation/Explainability/Trust/Ethics in AI
systems, Search, Gaming
Interdisciplinary AI and application
domains: Medicine/Clinical, Biomedicine, Social
Good/Harm, Public Health (mental health,
addiction), Education, Manufacturing, Disaster
Management
56. Thanks!
Open to Questions?
You can find me at:
amit@sc.edu
https://aiisc.ai/
https://www.linkedin.com/company/1054055/
http://bit.ly/AIISC
61
Editor's Notes
Slide 3: Inner circle : talks about our research areas and strength
A nice knowledge graph, which is a knowledge graphs ----- picture over here
One side ---> Empathi
ezDI image → other side
When an agent communicate with humans,
Empathy, policies, trustworthy → inform the behavior of the agent
---- Slide before PAC Learning
----- Explaining one of the them -- why knowlege graph would help
There are many NLP challenges, why knowledge Graph would work
GPT-3 --- issues
Can KG solve it
How to get a better context for effective output
You can have relationship between concepts
Video : When the slide will be uploaded.
a(i) Domain knowledge of traffic in the form of concepts and relationships (mostly causal) from the ConceptNet
a(ii) Probabilistic Graphical Model (PGM) that explains the conditional dependencies between variables in traffic domain is enriched by adding the missing random variables, links, and link directions extracted from ConceptNet
b : Shows how this enriched PGM is used to correlate contextually related data of different modalities.
3 sources of knowledge (Geo-Spatially and temporally) [ We need to put this in the slide]
→ Open Street Map
→ Smart City Knowledge Graph
→ [Find the third one]
---- We should have another insight:
--- it is still a coarse-grained use of kG for making sense of clusters
--- We need to provide an example in a more detailed: There is an explicit relationship between two concept
---- A person owns a company or works for a company
Both Example
Kaushik: points
Manas: points (enriching the embedding)
What is knowledge infusion in deep learning? Using knowledge to change input (shallow), to change parameters (semi-deep), to change parameters by mapping to a stratified hierarchy (Deep) (Ex: 1st layer knowledge x, 2nd layer knowledge y, etc). Can use diagram from pydata Berlin talk.
→ Integration of Knowledge Representation with Statistical Representation of Text is also straightforward → Devoid of Semantic Representation
→ Shallow merging needs to be demonstrated
Possible proposal material
In semi-deep infusion paradigm, the learning system of the model is altered either through a probabilistic threshold (e.g. attention or constraints) or data redundancy for gains in performance. There are three broad categories of SEMI-DEEP Infusion:
Forcing methods: the prediction of the model from the learnt representation is improved by mixing (sigmoidal, concatenation, multiplication) the representation of input as ground truth to enrich latent representation.
Attention methods: These methods improves upon the forcing methods by making the model capable of selecting parts of the learnt representations that needs to be modified.
Knowledge-base methods: Since both forcing and attention methods rely on the input data which is a poor manifestation of the real world, thus models suffer from problems such as exposure bias. The knowledge-base methods replace the dependency of the model from input text to knowledge-base for attention and forcing.
In knowledge-based LSTMs, rather than putting attention on input text, the method used attention as a switch, which when open contextualize the latent representation through representations of relevant concepts in knowledge base. When the switch is off, the latent representation is used as it is.
In knowledge-based GANs, the model learn by maximizing the reward, which is generating the representation of the input which matches the output. One way of formulating this reward is minimization of KL divergence. In this architecture, the attention module is influenced by reward function which is a learnable constraint.
Correlation matrix is the parameters for the deep learning algorithm for DSM-5
Method: Semantic encoding - decoding optimization
(Pearson Correlation)
DD: Correlation between DSM-5 Categories
RR: Correlation between concepts in Reddit posts irrespective of the user
DR: Correlation between the concepts in Reddit
Qualitatively, this is the outcome of the semantic encoding and decoding method.
You are able to label a post in a subreddit with an appropriate DSM-5 category.
On the Left, is all such mapping that the model learnt.
Why? Shallow - Can help enrich neural representations. Semi-deep: Can help with tweaking parameters to follow correlations present in knowledge (in addition to data) in constructing representations. Deep - Can identify what correlation in the knowledge in addition to data matters in which layer to finally construct a representation that benefits from knowledge infusion at all layers. Ex: Shallow: Wikipedia based GNN training to answer questions - hopefully captures relationships. Semi-deep: Force understanding that Obama is correlated to Michele Obama through relationships like spouse, by explicitly modifying the attention (correlation matrix) - definitely captures relationships. Deep: Identify number relationships, how they relate to metrics, how those metrics relate to what is being measured (blood pressure), how blood pressure relates to what is being predicted - definitely captures nested/hierarchical relationship semantics
<Example of Deep Knowledge Infusion>
Definition of Interpretability and Explainability
Multi-hop
Two-hop
Changing the post
Example of explanation: For each time of knowledge infusion
Shallow: TSNE clusters can show that KG relationships were captured, sports words come together
Semi-Deep: Attention matrix can show if KG relationships were captured, sports words attend to each other with high correlation
Deep: Representations at each layer can be visualized through concept maps in the stratified KG. Members of a hierarchical concept lower in the hierarchy correlate highly with those higher in the hierarchy on visualization of concepts from a class hierarchy. (Ex: 30, cistolic pressure, heart attack all would be close as they map to the same hierarchical concept)
Explainability example in Education
Current approach assessing the mastery of a student in a course and provide multiple pathways for improving the learning outcomes relies on a predictive algorithm: Bayesian Knowledge Tracing (BKT).
The approach assess following tendencies of the student:
He knows the answers correctly
He guessed the answers correctly
What is the improvement after multiple attempts
However, it does not tell:
How far from the correct answer, is the student’s answer?
What relevant concepts the student needs to learn?
Also, the algorithm does not provide the capability to assess whether the student has mastered a topic in a course or course itself.
On this slide, a student was asked two questions from the topic of “Phosphorylation”.
BKT would consider these questions independently, Whereas, Knowledge infusion would find the relation between the two question, through the entity: Phosphorylation
Since, the answer don’t match the true answer, BKT would not accept them as correct.
The question in the red, could not be answered by BKT, because it does not know the relation between the questions
However, if we use knowledge Infusion:
It knows the relation between the concepts through Phosphorylation, so, it can answer the question in red.
It knows that “adenosine 3-phosphate” is an alias of “Adenosine Triphosphate”, so it would accept the response
It would measure the distance between “Glucose Phosphate” and “Glucose Monophosphate” to see:
How far from the correct answer is the student’s answer?
What new concepts the student needs to learn to achieve mastery on this topic
The concepts asked in the question are addition of phosphate to glucose and addition of phosphate to adenosine diphosphate
From the KG, the relation between the two is obtained as relating to phosphorylation
The answer the student provides which is adenosine 3-phosphate might be predicted as wrong by the DN (because it is not adenosine tri-phosphate). The wrong answer triggers search through the KG to figure out how far from the right answer.
The explanation adenosine tri-phosphate is an alias of adenosine 3-phosphate and therefore the explanation shows that the student was actually correct and hence has attained explainable mastery.
Education knowledge graph can be constructed using the content from MOOC, Coursera, Khan Academy, Udemy, Udacity, Books, epubs from Amazon
https://khanacademy.fandom.com/wiki/Knowledge_Map
Bayesian knowledge tracing not adequate as explanations required to know what other concepts the student might need to attain mastery
These concepts can be found in the KG
Furthermore, the KG can provide explanation for how far the current level is from mastery.
Do we need to provide a list of workshop and tutorials conducted