Andre Freitas
Department of Computer Science
AI Beyond Deep Learning
AI Systems Lab
Talk@Peak Ensemble, October 2019
This Talk
• Articulate a more comprehensive representation-
based perspective of AI (NLP-centered).
• Complements a Deep Learning based
perspective.
• Targeting the AI Engineer.
• How:
– Pointers to existing works from the AI Systems lab
linking to major trends.
Who we are?
• Established in September 2018.
• Applied, systems-oriented research.
– Fuzziness between science and engineering.
• Service to areas outside computer science.
– Bio, engineering, law.
• NLP as epistemological engineering.
Current Limits of Deep Learning
1. Data hungry
2. Shallow and has limited capacity for transfer
3. Has no natural way to deal with hierarchical structure
4. Is not sufficiently transparent
5. Has not been well integrated with prior knowledge
6. Cannot inherently distinguish causation from correlation
7. Presumes a largely stable world, in ways that may be
problematic
8. Works well as an approximation, but its answers often
cannot be fully trusted
9. Is difficult to engineer with
Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018).
Deep learning thus far is:
AI Users
Common Requirements
(End-users & AI Developers)
• Data is available but it is unstructured/messy. How can I use it to
solve my problem (answering, predicting, discovering)?
• How can I develop systems which generalise from fewer
examples?
• How can I be sure that the system got it right? How do I verify it?
• I built a model for data/population X. How does it transport to
data/population Y?
• Which architecture, representation and tools should I use to build
my AI system?
Common Requirements
(End-users & AI Developers)
• Data is available but it is unstructured/messy. How can I use it to
solve my problem (answering, predicting, discovering)?
• How can I develop systems which generalise from fewer
examples?
• How can I be sure that the system got it right? How do I verify it?
• I built a model for data/population X. How does it transport to
data/population Y?
• Which architecture, representation and tools should I use to build
my AI system?
Extracting and Representing Meaning
Few-shot / Small Data Learning
Explainability
Transportability
AI Engineering, Multi-component AI Systems
Outline
• Extracting and Representing Meaning
– (from Text, at Scale)
– Few-shot Learning
– Explainability
– Transportability
• Explainable Reasoning (From Deep Learning to Deep
Semantics)
– Neuro-symbolic models
• NLP Engineering
– Building Multi-Component NLP Systems
Extracting and Representing
Meaning (from Text, at Scale)
Message #1:
A little linguistics goes a
long way.
The Challenge of Text Interpretaion
Where to start?
“A fluoroscopic study known as an upper gastrointestinal series is typically
the next step in management, although if volvulus is suspected, caution
with non water soluble contrast is mandatory as the usage of barium can
impede surgical revision and lead to increased post operative
complications.”
?
Sentence Spliting
A fluoroscopic study is
typically the next step in
management.
This fluoroscopic study
is known as an upper
gastrointestinal series.
Caution with non water soluble
contrast is mandatory.
The usage of barium can
impede surgical revision.
Vovulus is suspected
The usage of barium can
lead to increased post
operative complications.
ELABORATIO
N
core
LIST
BACKGROUND
CONTRAST
CONDITION
context
context
context
core
context
core core
core
core
In a Nutshell
• Transformation of complex sentences into a set of
hierarchically ordered and semantically interconnected
sentences that present a simplified syntax:
○ minimal semantic units
○ semantic hierarchy
• Manually curated rules over syntactic and lexical patterns
Transforming Complex Sentences into a Semantic Hierarchy, ACL 2019.
https://github.com/Lambda-3/DiscourseSimplification
Knowledge Graph Extraction
https://github.com/Lambda-3/Graphene
A Sentence Simplification System for Improving Relation Extraction, COLING
(2017)
Extracting Knowledge Graphs
from Text
Argumentation KGs
Assumption
ProofStart
Q and D are positive
integers.
Q and D are relatively
prime.
Fact
Fact
ProofStep
Q and D are
relatively prime
entails
exponential
Has fact
Fact
Q² can have 2 in
its prime
decomposition
Q² exponent
must be one
Has fact
Odd integer
x
Has fact
Q² is square
the exponents
for all of its
prime factors
must be even
entails
KB-
Lookup
entails
searches
Contradicts
entails
Fact Fact
ProofStep
Fact
Fact
Fact
ProofEnd
Mathematical
Knowledge
Graphs
Crossing Modalities
Visual Genome
Message #2:
Build your knowledge graph
one semantic phenomenon
at a time.
Vector Representations
of Meaning (Embeddings)
Commonsense is here
θ
car
dog
cat
bark
run
leash
Semantic Approximation is
here
Semantic Model with
low acquisition effort
https://colah.github.io/posts/2015-01-Visualizing-Representations/
BERT (Devlin et al. 2018)
• Transformer: attention mechanism that learns contextual
relations between tokens/affixes in text.
• Word and Sentence-level representation.
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-
for-nlp-f8b21a9b6270
Neuro-KGs
Barack
Obama
Sonia
Sotomayor
nominated
:is_a
First Supreme Court Justice of
Hispanic descent
…
W2V, GLOVE, BERT, …
Distributional Relational Networks, AAAI Symposium (2013).
A Compositional-Distributional Semantic Model for Searching Complex Entity
Categories, ACL *SEM (2016)
Natural Language Queries over Heterogeneous Linked Data Graphs:
A Distributional-Compositional Semantics Approach, IUI )2014)
Summary
• KG Models (Fine-grained):
– Facilitates querying, integration and explainability.
• Embedding Models (Coarse-grained):
– Supports semantic approximation, coping with vocabulary
variation.
• Have complementary semantic values.
Summary
• Building Knowledge Graphs:
– Text Simplification, Contextual Open IE.
– Discourse relations (argumentation, stories, pragmatics).
– Entity, relation and event extraction.
– Co-reference resolution, entity linking.
Message #3:
KGs + Embeddings are a
representation (data model)
convenient for AI
engineering.
Data Management for AI
Query Processing, Indexing,
Caching …
s0 p0 o0
s0
q
Inverted index
sharding
disk access
optimization
…
Multiple Randomized
K-d Tree Algorithm
Priority Search
K-Means Tree algorithm
Database + IR
Query planning
Cardinality
Indexing
Skyline
Bitmap indexes
…
Structured Queries Approximation Queries
DBee
DBee: A Database for Creating and Managing Knowledge Graphs and
Embeddings, TextGraphs@EMNLP (2019).
Software
https://github.com/Lambda-3/Stargraph
Natural Language Queries over Heterogeneous Linked Data Graphs: A
Distributional-Compositional Semantics Approach, IUI (2014).
Semantic Relatedness for All (Languages): A Comparative Analysis of
Multilingual Semantic Relatedness using Machine Translation, EKAW,
(2016).
https://github.com/Lambda-3/indra
Message #4:
Processing KGs +
Embeddings at scale require
specific data management
strategies.
Explainable Reasoning
(From Deep Learning to Deep Semantics)
Neural vs Symbolic AI Systems
Graph Networks (GNs)
• Deep learning methods often:
– Follow an end-to-end design philosophy.
– Emphasizes minimal a priori representational and
computational assumptions.
– Seeks to avoid explicit structures.
• GNs as a solution: projects embeddings and graphs
into the same representation space.
• Abstract away from fine-grained differences and
capture more general commonalities.
Multi-hop QA & Document
Graph Networks
Document Graph Networks (DGNs)
DGN = Gated Graph Neural Networks (GGNN) over Entity/Passage KGs.
Identifying Supporting Facts for Multi-hop Question Answering with
Document Graph Networks, TextGraphs@EMNLP
T: IBM cleared $18.2 billion in the first quarter.
H: IBM’s revenue in the first quarter was $18.2 billion.
Entailment?
YES
Why?
- To clear is to yield as a net profit
- A net profit is an excess of revenues
over outlays in a given period of time
Exploring Knowledge Graphs in an Interpretable Composite Approach for Text Entailment, AAAI
(2019).
Recognizing and Justifying Text Entailment through Distributional Navigation on Definition Graphs,
AAAI (2018).
Explainable Reasoning using
Distributional Navigation (DNA)
Explainable Science QA
Q: A student climbs up a rocky mountain trail in Maine. She sees many small pieces of rock
on the path. Which action most likely made the small pieces of rock?
[0]: sand blowing into cracks
[1]: leaves pressing down tightly
[2]: ice breaking large rocks apart
[3]: shells and bones sticking together
A: ice breaking large rocks apart
Explanation:
• weathering means breaking down (rocks ; surface materials) from larger whole into smaller
pieces by weather
• ice wedging is a kind of mechanical weathering
• ice wedging is when ice causes rocks to crack by expanding in openings
• cycles of freezing and thawing water cause ice wedging
• cracking something may cause that something to break apart
• to cause means to make
Jansen, Peter A., et al. "Worldtree: A corpus of explanation graphs for elementary science questions supporting multi-hop inference."
arXiv preprint arXiv:1802.03052 (2018).
Message #5:
Learning + Inference on
Neuro-KG models facilitate
explainability and
generalisation.
Engineering Multi-Component
NLP Systems
Text Classification
Inference &
Integration
Embeddings
Querying Components
NL Query
Answers
Explanations
Open IE
NERArgument
Definition Extr.
OpinionStory
KG
Completion
NL Inference
Entity Linking
Co-reference
Resolution
Semantic
Parsing
Query By
Example
Systems-level
Representation
&
Evaluation
Neuro-symbolic
Representation
Measuring diagram quality through semiotic morphisms, Semiotica (2019).
The Diagrammatic AI Language (DIAL): Version 0.1 (2019).
Switching contexts: Transportability measures for NLP (under review).
Evaluating Explanations &
Interpretability
On the Semantic Interpretability of Artificial Intelligence Models (under review).
An Explainable Few-Shot Learning Semantic Parser for Natural Language Programming
(under review).
Summary
• AI Engineering has still a long way to go. Open questions:
• How to represent AI systems at an architectural level (so
that we facilitate systems design, reasoning, learning and
communication)?
• How do we measure and estimate performance and cost
in AI systems for new domains and languages?
• How to measure explainability and transportability?
Take-away Message
Final Considerations
• AI Engineers as representation and algorithm
designers.
• Working with NLP?
– Understanding language/semantics precedes ML.
• Critical roles:
– AI Architect
– Data/Representation/Resources/Experiment/Workflow Curator
– DBA4AI
– AI Evaluator and Tester
– Data Engineer
Take-away Message
• Many meaningful AI tasks require large-scale background
knowledge.
• Knowledge graphs + Embeddings provide a realistic and
scalable representation model.
• Neuro-Symbolic (GNs, DNA) models operating on the top of
these representations, enabling:
– Few-shot learning
– Explainability
– Transportability/Transferability
References
http://andrefreitas.org
andre.freitas@manchester.ac.uk
These slides:
https://www.slideshare.net/andrenfreitas/ai-
beyond-deep-learning

AI Beyond Deep Learning

  • 1.
    Andre Freitas Department ofComputer Science AI Beyond Deep Learning AI Systems Lab Talk@Peak Ensemble, October 2019
  • 2.
    This Talk • Articulatea more comprehensive representation- based perspective of AI (NLP-centered). • Complements a Deep Learning based perspective. • Targeting the AI Engineer. • How: – Pointers to existing works from the AI Systems lab linking to major trends.
  • 3.
    Who we are? •Established in September 2018. • Applied, systems-oriented research. – Fuzziness between science and engineering. • Service to areas outside computer science. – Bio, engineering, law. • NLP as epistemological engineering.
  • 5.
    Current Limits ofDeep Learning 1. Data hungry 2. Shallow and has limited capacity for transfer 3. Has no natural way to deal with hierarchical structure 4. Is not sufficiently transparent 5. Has not been well integrated with prior knowledge 6. Cannot inherently distinguish causation from correlation 7. Presumes a largely stable world, in ways that may be problematic 8. Works well as an approximation, but its answers often cannot be fully trusted 9. Is difficult to engineer with Marcus, Gary. "Deep learning: A critical appraisal." arXiv preprint arXiv:1801.00631 (2018). Deep learning thus far is:
  • 6.
  • 7.
    Common Requirements (End-users &AI Developers) • Data is available but it is unstructured/messy. How can I use it to solve my problem (answering, predicting, discovering)? • How can I develop systems which generalise from fewer examples? • How can I be sure that the system got it right? How do I verify it? • I built a model for data/population X. How does it transport to data/population Y? • Which architecture, representation and tools should I use to build my AI system?
  • 8.
    Common Requirements (End-users &AI Developers) • Data is available but it is unstructured/messy. How can I use it to solve my problem (answering, predicting, discovering)? • How can I develop systems which generalise from fewer examples? • How can I be sure that the system got it right? How do I verify it? • I built a model for data/population X. How does it transport to data/population Y? • Which architecture, representation and tools should I use to build my AI system? Extracting and Representing Meaning Few-shot / Small Data Learning Explainability Transportability AI Engineering, Multi-component AI Systems
  • 9.
    Outline • Extracting andRepresenting Meaning – (from Text, at Scale) – Few-shot Learning – Explainability – Transportability • Explainable Reasoning (From Deep Learning to Deep Semantics) – Neuro-symbolic models • NLP Engineering – Building Multi-Component NLP Systems
  • 10.
  • 12.
    Message #1: A littlelinguistics goes a long way.
  • 13.
    The Challenge ofText Interpretaion Where to start? “A fluoroscopic study known as an upper gastrointestinal series is typically the next step in management, although if volvulus is suspected, caution with non water soluble contrast is mandatory as the usage of barium can impede surgical revision and lead to increased post operative complications.” ?
  • 14.
    Sentence Spliting A fluoroscopicstudy is typically the next step in management. This fluoroscopic study is known as an upper gastrointestinal series. Caution with non water soluble contrast is mandatory. The usage of barium can impede surgical revision. Vovulus is suspected The usage of barium can lead to increased post operative complications. ELABORATIO N core LIST BACKGROUND CONTRAST CONDITION context context context core context core core core core
  • 15.
    In a Nutshell •Transformation of complex sentences into a set of hierarchically ordered and semantically interconnected sentences that present a simplified syntax: ○ minimal semantic units ○ semantic hierarchy • Manually curated rules over syntactic and lexical patterns Transforming Complex Sentences into a Semantic Hierarchy, ACL 2019. https://github.com/Lambda-3/DiscourseSimplification
  • 16.
  • 17.
    https://github.com/Lambda-3/Graphene A Sentence SimplificationSystem for Improving Relation Extraction, COLING (2017) Extracting Knowledge Graphs from Text
  • 18.
  • 19.
    Assumption ProofStart Q and Dare positive integers. Q and D are relatively prime. Fact Fact ProofStep Q and D are relatively prime entails exponential Has fact Fact Q² can have 2 in its prime decomposition Q² exponent must be one Has fact Odd integer x Has fact Q² is square the exponents for all of its prime factors must be even entails KB- Lookup entails searches Contradicts entails Fact Fact ProofStep Fact Fact Fact ProofEnd Mathematical Knowledge Graphs
  • 20.
  • 21.
    Message #2: Build yourknowledge graph one semantic phenomenon at a time.
  • 22.
    Vector Representations of Meaning(Embeddings) Commonsense is here θ car dog cat bark run leash Semantic Approximation is here Semantic Model with low acquisition effort
  • 23.
  • 24.
    BERT (Devlin etal. 2018) • Transformer: attention mechanism that learns contextual relations between tokens/affixes in text. • Word and Sentence-level representation. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model- for-nlp-f8b21a9b6270
  • 25.
    Neuro-KGs Barack Obama Sonia Sotomayor nominated :is_a First Supreme CourtJustice of Hispanic descent … W2V, GLOVE, BERT, … Distributional Relational Networks, AAAI Symposium (2013). A Compositional-Distributional Semantic Model for Searching Complex Entity Categories, ACL *SEM (2016) Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI )2014)
  • 26.
    Summary • KG Models(Fine-grained): – Facilitates querying, integration and explainability. • Embedding Models (Coarse-grained): – Supports semantic approximation, coping with vocabulary variation. • Have complementary semantic values.
  • 27.
    Summary • Building KnowledgeGraphs: – Text Simplification, Contextual Open IE. – Discourse relations (argumentation, stories, pragmatics). – Entity, relation and event extraction. – Co-reference resolution, entity linking.
  • 28.
    Message #3: KGs +Embeddings are a representation (data model) convenient for AI engineering.
  • 29.
  • 30.
    Query Processing, Indexing, Caching… s0 p0 o0 s0 q Inverted index sharding disk access optimization … Multiple Randomized K-d Tree Algorithm Priority Search K-Means Tree algorithm Database + IR Query planning Cardinality Indexing Skyline Bitmap indexes … Structured Queries Approximation Queries
  • 31.
    DBee DBee: A Databasefor Creating and Managing Knowledge Graphs and Embeddings, TextGraphs@EMNLP (2019).
  • 32.
    Software https://github.com/Lambda-3/Stargraph Natural Language Queriesover Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI (2014). Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness using Machine Translation, EKAW, (2016). https://github.com/Lambda-3/indra
  • 33.
    Message #4: Processing KGs+ Embeddings at scale require specific data management strategies.
  • 34.
    Explainable Reasoning (From DeepLearning to Deep Semantics)
  • 35.
  • 36.
    Graph Networks (GNs) •Deep learning methods often: – Follow an end-to-end design philosophy. – Emphasizes minimal a priori representational and computational assumptions. – Seeks to avoid explicit structures. • GNs as a solution: projects embeddings and graphs into the same representation space. • Abstract away from fine-grained differences and capture more general commonalities.
  • 37.
    Multi-hop QA &Document Graph Networks
  • 38.
    Document Graph Networks(DGNs) DGN = Gated Graph Neural Networks (GGNN) over Entity/Passage KGs. Identifying Supporting Facts for Multi-hop Question Answering with Document Graph Networks, TextGraphs@EMNLP
  • 39.
    T: IBM cleared$18.2 billion in the first quarter. H: IBM’s revenue in the first quarter was $18.2 billion. Entailment? YES Why? - To clear is to yield as a net profit - A net profit is an excess of revenues over outlays in a given period of time Exploring Knowledge Graphs in an Interpretable Composite Approach for Text Entailment, AAAI (2019). Recognizing and Justifying Text Entailment through Distributional Navigation on Definition Graphs, AAAI (2018). Explainable Reasoning using Distributional Navigation (DNA)
  • 40.
    Explainable Science QA Q:A student climbs up a rocky mountain trail in Maine. She sees many small pieces of rock on the path. Which action most likely made the small pieces of rock? [0]: sand blowing into cracks [1]: leaves pressing down tightly [2]: ice breaking large rocks apart [3]: shells and bones sticking together A: ice breaking large rocks apart Explanation: • weathering means breaking down (rocks ; surface materials) from larger whole into smaller pieces by weather • ice wedging is a kind of mechanical weathering • ice wedging is when ice causes rocks to crack by expanding in openings • cycles of freezing and thawing water cause ice wedging • cracking something may cause that something to break apart • to cause means to make Jansen, Peter A., et al. "Worldtree: A corpus of explanation graphs for elementary science questions supporting multi-hop inference." arXiv preprint arXiv:1802.03052 (2018).
  • 41.
    Message #5: Learning +Inference on Neuro-KG models facilitate explainability and generalisation.
  • 42.
  • 43.
    Text Classification Inference & Integration Embeddings QueryingComponents NL Query Answers Explanations Open IE NERArgument Definition Extr. OpinionStory KG Completion NL Inference Entity Linking Co-reference Resolution Semantic Parsing Query By Example Systems-level Representation & Evaluation Neuro-symbolic Representation
  • 45.
    Measuring diagram qualitythrough semiotic morphisms, Semiotica (2019). The Diagrammatic AI Language (DIAL): Version 0.1 (2019). Switching contexts: Transportability measures for NLP (under review).
  • 46.
    Evaluating Explanations & Interpretability Onthe Semantic Interpretability of Artificial Intelligence Models (under review). An Explainable Few-Shot Learning Semantic Parser for Natural Language Programming (under review).
  • 47.
    Summary • AI Engineeringhas still a long way to go. Open questions: • How to represent AI systems at an architectural level (so that we facilitate systems design, reasoning, learning and communication)? • How do we measure and estimate performance and cost in AI systems for new domains and languages? • How to measure explainability and transportability?
  • 48.
  • 49.
    Final Considerations • AIEngineers as representation and algorithm designers. • Working with NLP? – Understanding language/semantics precedes ML. • Critical roles: – AI Architect – Data/Representation/Resources/Experiment/Workflow Curator – DBA4AI – AI Evaluator and Tester – Data Engineer
  • 50.
    Take-away Message • Manymeaningful AI tasks require large-scale background knowledge. • Knowledge graphs + Embeddings provide a realistic and scalable representation model. • Neuro-Symbolic (GNs, DNA) models operating on the top of these representations, enabling: – Few-shot learning – Explainability – Transportability/Transferability
  • 51.