SlideShare a Scribd company logo
1 of 37
AI2 day
San Kim
2021.06.30
1. “Going on a vacation” tasks longer than “Going for a walk”: A Study of Temporal
Commonsense Understanding (EMNLP 19) – MCTACO
2. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions (EMNLP 20)
3. Temporal Reasoning on Implicit Events from Distant Supervision (NAACL 21)-TRACIE
MCTACO (Multiple choice temporal common-sense)
Temporal commonsense
Given two events “going on a vacation” and “going for a walk,” most humans would know that a
vacation is typically longer and occurs less often than a walk, but it is still challenging for computers
to understand and reason about temporal commonsense.
5 temporal properties
• Duration (how long an event takes)
• Temporal ordering (typical order of events)
• Typical time (when an event happens)
• Frequency (how often an event occurs)
• Stationarity (whether a state holds for a very long time or
indefinitely)
MCTACO
• MCTACO is comprised of 13k tuples, in the form of (sentence, question, candidate answer).
• The sentences in those tuples are randomly selected from MultiRC
• Collect questions and candidate answers(both correct question and wrong ones) using AMT.
• To ensure the quality of the results, they limit the annotations to native speakers and use
qualification tryouts.
• Step1. Question Generation
• Should ask about one of the five temporal phenomena
the defined earlier
• Should not be solved simply by a word or phrase from
the original sentence
• They also require crowd-sourcers to provide a correct
answer for each of their questions(correct and
incorrect answers)
• Step2. Question verification
• The ask another two crowdsourcers to
check the questions generated in Step 1,
(a) whether the two requirements are
satisfied and (b) whether the question is
grammatically and logically correct.
• For valid questions, they continue to ask
crowdsourcers to give one correct answer
and one incorrect answer
MCTACO
• For those candidates that represent
events, the previously-mentioned token-
level perturbations rarely lead to
interesting and diverse set of candidate
answers. It may lead to invalid phrases
(e.g., “he left the house”  “he walked
the house”.) Therefore, to perturb such
candidates, they create a pool of 60k
event phrases using PropBank and
perturb the candidate answers to be the
most similar ones extracted by an
information retrieval(IR) system.
• Expand the candidate answer set to 20
candidates per question.
• Step3. Candidate answer expansion
• Until this stage, they have collected a small
set of candidate answers (3 positive and 2
negative) for each question.
• Automatically expand this set in three ways
• Use a set of rules to extract numbers
and quantities (“2”, “once”) and temporal
terms (e.g. “a.m.”, “1990”, “afternoon”,
“day”), and then randomly perturb them
based on a list of temporal
units(“second”), adjectives (“early”),
points (“a.m.”) and adverbs (“always”).
( “2 a.m.”  “3 p.m.”, “1 day”  “10
days”, “once a week”  “twice a month”)
• Mask each individual token in a
candidate answer (one at a time) and
use BERT to predict replacements for
each missing term; they rank those
predictions by the confidence level of
BERT and keep the top three.
• Step4. Answer labeling
• Each (sentence, question, answer) tuple
produced earlier is labeled by 4
crowdsourcers, with three options: “likely”,
“unlikely”, or “invalid”.
• A tuple is kept only if all 4 annotators
agree on “likely” or “unlikely”.
MCTACO
TORQUE
• Time is important for understanding events and stories
described in natural language text.
• “he won the championship yesterday” is different from “he
will win the championship tomorrow” (explicit)
• If we read that a woman is “expecting the birth of her first
child”, we know that the birth is in the future, while if she
is “mourning the death of her mother”, the death is in the
past. (implicit)
• These relationships between an event and a time point
(e.g. “won the championship yesterday”) or between two
events (e.g., “expecting” is before “birth” and “mourning”
is after “death”) are called temporal relations.
TORQUE
• Challenges of RC for temporal relationships
1. Reading comprehension works rarely require event understanding. Most datasets largely only
require an understanding of predicates and arguments, and would ask questions like “what was a
woman trapped in?”. But a temporal relation question would be “what started before a woman was
trapped?” To answer it, the system needs to identify events (e.g., LANDSLIDE is an event and “body”
is not), the time of these events (e.g., LANDSLIDE is correct answer, while SAID is not because of
the time when the two events happen), and look at the entire passage rather than the local
predicate-argument structures within a sentence (e.g., SNOW and RAINFALL are correct answers to
the question above).
2. There are many events in a typical passage of text, so tempral relation questions typically query
more than one relationship at the same time. This means that a question can have multiple
answers (e.g., “what happened after the landslide?”), or no answers, because the question may be
beyond the time scope (e.g., “what happened before the snow started?”)
TORQUE
3. Temporal relations queried by natural language questions are often sensitive to a few key words
such as before, after, and start. Those questions can easily be changed to make contrasting
questions with dramatically different answers. Those questions can easily be changed to make
contrasting questions with dramatically different answers. Models that are not sensitive to these small
changes in question words will perform poorly on this task.
landslide
searching, said, found
searching
No answers
Causing, disruption, brining,
flooding, searching, trapped,
landslide, said, found
Landslide, trapped, found, said,
disruption, flooding
Landslide, trapped
said
No answers
TORQUE
• Annotate 3.2k text snippets randomly selected from the TempEval3 dataset.
• TORQUE has 25k events and 21k user-generated and fully answered temporal relation questions.
• RoBERTa-large achieves 51% in exact match on TORQUE after fine-tuning, about 30% behind
human performance.
• Generally speaking, an event involves a predicate and its arguments.
• When studying time, events were defined as actions/states triggered by verbs, adjectives, and
nominals.
• This work follows this line of event definition and uses event and event trigger interchangeably.
• Define an event to be either a verb or a noun.
• In copular constructions, they choose to label the verb as the event, instead of an adjective or
preposition. (for consistent treatment of “she was on the east coast yesterday” and “she was happy”
– easily teach to crowd workers) (Note that from the perspective of data collection, labeling the
copula does not lose information as one can always do post-processing using dependency parsing
or semantic role labeling to recover the connection between “was” and “happy”.)
Events
TORQUE
• Events expressed in text are not always factual. They
can be negated, uncertain, hypothetical or have
associated modalities.
• Prior work dealing with events often tried to
categorize and label these various aspects because
they were crucial for determining temporal relation.
• Simply have people label all events, irrespective of
their modality, and use natural language to describe
relations between them.
Events
TORQUE
Temporal Relations
• The relationship between two events with respect to time, or
between one event and a fixed time point.
• (A, r, B) – A and B are events or time points, and r is a
temporal relation. (e.g. (HAD, happened before, SLEPT) – first
sentence in Fig. 3)
• In previous works, every event is assumed to be associated
with a time interval. When comparing two events, there are
13 possible relation labels.
• There are still many relations that cannot be expressed
because the assumption that every event has a time interval
is inaccurate: The time scope of an event may be fuzzy, an
event can have a non-factual modality, or events can be
repetitive and invoke multiple intervals.
• To better handle these phenomena, they use natural
language to annotate the relationships between events.
TORQUE
Natural Language Annotation of Temporal Relations
• (A, r, B): a temporal relation between two events
• (?, r, B) : a temporal relation question
• (?, happened before, SLEPT): natural language
expression  “what happened before a lion slept?”
• (A, r, B) holds, assuming for any deictic expression A
or B the time point when the passage was written,
and assuming that the passage is true.
TORQUE
Advantages of Natural Language Annotation
• DISRUPTION and FLOODING happened at about
the same time, but we do not know for sure which
one is earlier, so we have to choose vague.
• SNOW and DISRUPTION, we do not know which
one ends earlier and have to choose vague.
• The question-answer (QA) pairs can naturally
capture these fuzzy relations.
TORQUE
Advantages of Natural Language Annotation
• Natural language questions can conveniently
incorporate different modes of events.
• ▲ the relation between “having a meal”, and
“sleeping”
• If we could only choose one label, we must
choose before for all these relations, although
these relations are actually different.
•  a repetitive event may be a series of intervals
rather than a single one, and often before is
very different from before.
TORQUE
Advantages of Natural Language Annotation
• The format of natural language questions
bypasses the need for explicit annotation of
properties of events or other theories.
• The annotator naturally avoids event pairs that
do not have relations.
• “what happened after the service
industries are hardest hit?”
• “what happened after a passerby reported
the body?”
• “what was expected to happen when the
crisis hit America?”
• “what was supposed to happen after a
passerby called the police?”
• It still remains difficult to have a theory
explaining
•  why hit can compare to expected and crisis,
but not to gains.
TORQUE
Penalize Shortcuts by Contrast Sets
• An important problem in building datasets is to
avoid trivial solutions.
• Contrast questions: which slightly modify the
original questions, but dramatically change the
answers
• For an existing question (?, r, B) (e.g., “what
happened after he ate his breakfast?”)
• Keep using B and change r (e.g., “what
happened before/shortly after/… he ate his
breakfast?”)
• Modify it to ask about the start/end time
(e.g., “what happened after he started
eating his breakfast?” or “what would finish
after he ate his breakfast?”)
• Check that the answers to the new question
are different from the original one to avoid
trivial modifications (e.g., changing “what
happened” to “what occurred”)
TORQUE
Data Collection
• Passages that consist of two contiguous
sentences, as this is sufficient to capture the
vast majority of non-trivial temporal relations.
• Create a pool of 26k two-sentence passages
from the TempEval3 workshop (2.8k articles)
• 1. Label all the events
• 2. Repeatedly do the following
• (a) Ask a temporal relation question and
point out all the answers from the list of
events
• Modify the temporal relation to create
one or more new questions and answer
them.
Quality Control
• Qualification: crowd workers were trained and
tested on 3 capabilities: labeling events,
asking temporal relation questions, and
question answering. Crowd workers were
considered level-1 qualified if they could pass
the test within 3 attempts. (1/3 workers
passed the qualification.)
• Pilot: asked level-1 crowd workers to do a
small amount of the real task. They manually
checked the annotations and gave feedback
to them. Roughly 1 out of 3 pilot submissions
received a level-2 qualification. In the end,
there were 63 level-2 annotators, and 60 of
them actually worked on large-scale task.
• Validation: 20% of articles. 5 different level-2
annotators(include original annotator) validate
the event and answers. They intentionally
added noise to the original data for quality
control. They did not do additional validation
for the question because there is no bad
questions in a random sample of 100.
Quality Control
TORQUE
Cost
• 3 passages were presented.
• The crowd worker could decide to use some
or all of the.
• For each passage a worker decided to use,
they needed to label the vents, answer 3 hard-
coded warm-up questions, and them ask and
answer at least 12 questions (including
contrast questions). The final reward is a base
pay of $6 plus $0.5 for each extra question (up
to $4).
• Incentive
• (1) use fewer passages so that they can
do event labeling and warm-up questions
fewer times.
• (2) modify questions instead of asking
from scratch
• (3) ask extra questions in each job.
• In practice, crowd workers on average used 2
passages in each job.
• Validating the events in each passage and the
answers to a specific question both cost $0.1.
• In total, TORQUE cost $15k for an average of
$0.7/question.
statistics
• 3.2k passage annotations (~50 tokens/passage)
• 24.9k events (7.9 events/passage)
• 21.2k user-provided questions (half of them
were labeled by crowd workers as
modifications of existing ones)
• 94 / 200 questions querying about relations
that cannot be directly represented by the
previous single-interval-based labels.
TORQUE
statistics
TORQUE
Experiments
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
MATRES, ACL 18
TempEval1-3, TimeBank-Dense(TB-Dense), EventTimeCorpus
Before, after, equal, vague
Based on tart-points
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• When reading a story, a human can construct
a latent timeline about events’ start and end
times.
• The timeline not only contains the placements
of explicitly mentioned events (e.g., ride a
bicycle), but also accounts for implicit events
(e.g. Farrah was distracted so she looked away).
• The ability to construct such a timeline is
essential for understanding the causal
dynamics of a situation.
• Contributions
• A temporal relation dataset TRACIE
focusing on implicit events
• A distant supervision process for temporal
understanding of implicit events
• A reasoning model that makes end-time
comparisons using predictions of start-
time distances and durations
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Such tests in TRACIE take the form of multi-premise textual entailment (TE)
• Each TRACIE instance contains
• A context story (or premise) consisting of a sequence of explicit narrative events
• An implicit event in the form of a natural language phrase that is unmentioned but has
some role in the story
• An explicit event also in the form of a phrase
• A comparator of either {starts, ends}
• A temporal relation of either {before, after} that marks the relationship in the dimension
defined by the comparator between the implicit-event and the explicit-event
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Such tests in TRACIE take the form of multi-premise textual entailment (TE)
• Premise: context story
• Hypotheses: temporal queries about pair-wise relations between implicit and explicit events
• E.g. “avoids” is implicit-event, “starts” is the comparator, “removed” is explicit-event and “before”
is the temporal-relation.
• Flip the temporal-relation (i.e., “before” to “after” and vice versa) to create
negative(contradiction) instances.
• Use start times of explicit-events as reference points and compare the implicit-event’s start or
end time with them, according to the label definitions (Fig. 3)
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Randomly sample short stories from the
ROCStories dataset
• For each story, one annotator writes 5 implicit
event phrases that are not explicitly mentioned
by the given story, but are inferable and
relevant.
• Additionally rewrites two explicit events closest
to the implicit event’s start and end time,
respectively.
• Build two TRACIE instances (minus the
temporal-relation) per implicit event
Implicit Event Generation
Automatic Instance Generation
• Extract all verbs and relevant arguments with
its semantic role labeling model in AllenNLP
• Construct a pool of explicit events in the form
of short phrases (using verbs and their
arguments)
• Extract all verbs and relevant arguments with
its semantic role labeling model in AllenNLP
• Construct a pool of explicit events in the form
of short phrases (using verbs and their
arguments)
• For each implicit event, randomly select two
{explicit-event, comparator} pairs from the pool.
Label Collection
• For each of the 20 instances per story,
annotate the temporal-relation with four
different annotators.
• Majority agreement as the final label and filter
out unagreeable instances.
• Two authors additionally verify the instances
with ambiguous verbs(e.g., “have”) and
corrected 5% of the end-time instances.
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Distant Supervision
• Within-sentence Extraction
• Collect start time comparisons between pairs of events heuristically from free-text using
“before/after” keywords
• Use AllenNLP’s SRL model to process each input sentence and find verbs with a temporal
argument that starts with either “before” or “after”, and contains at least another verb.
• If there are multiple verbs in the temporal argument, take the one with the largest number
of tokens as arguments.
• 2.8M instances from Wikipedia dump(May 2020)
Pattern-Based Pre-Training
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Distant Supervision
• Cross-sentence Extraction
• The data collected from the within-sentence patterns
does not reveal the relative distance between two
start times.
• Finds direct temporal expressions of hours and dates.
• Because these temporal expressions(e.g., 2021-01-01)
are globally comparable, the compared events can be
anywhere in a document.
• This process collects more supervision signals about
time-point comparisons and their relative distance
on event pairs with trivial causal relation.
• Find exact temporal values by filling unmentioned
elements of a temporal expression with the nearest
previous mention (e.g., add “January to the expression
of “the 10th” in Fig. 4)
Pattern-Based Pre-Training (PTNTIME)
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Cross-sentence Extraction
• Construct supervision instances under the assumption
that the extracted temporal expressions describe the
start times of the associated verbs (e.g., went started
on January 1st )
• Represent the differences between the two start times
as one of seven coarse temporal units: {<=minutes,
hours, days, weeks, months, years, >= decades}
• Go to park is weeks before write review as shown in
Fig. 4
• Couple the specialized temporal pre-training data
described above with additional paragraphs that are used
to perform conventional language model pretraining using
the original denoising task (T5).
• Input sequences of event : [EventA] starts [Relation]
[EventB] . Story: [Paragraph] and output sequences of
answer: [Label] [Distance] . [paragraph]: non empty only
for cross-sentence extractions. [label] is either positive or
negative. [distance] is one of the 7 coarse temporal units
represented with a set of blank tokens [extra_id_N]
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• This model makes end-time comparisons by symbolically combining start time distance and
duration from separate predictions based on some of the components.
• Does not rely on explicit annotations on timepoints, but only relative comparisons between
them.
Symbolic Temporal Reasoning Model (SYSTIME)
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
Symbolic Temporal Reasoning Model (SYSTIME)
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
Duration estimation – pretrain sequence-to-sequence model
𝑟𝑒𝑛𝑑𝑠 𝑒1, 𝑒2 = 𝑏𝑒𝑓𝑜𝑟𝑒 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 + 𝑑𝑢𝑟 𝑒1 < 0
𝑟𝑒𝑛𝑑𝑠 𝑒1, 𝑒2 = 𝑎𝑓𝑡𝑒𝑟 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 + 𝑑𝑢𝑟 𝑒1 > 0
𝑟𝑠𝑡𝑎𝑟𝑡𝑠 𝑒1, 𝑒2 = 𝑏𝑒𝑓𝑜𝑟𝑒 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 < 0
𝑟𝑠𝑡𝑎𝑟𝑡𝑠 𝑒1, 𝑒2 = 𝑎𝑓𝑡𝑒𝑟 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 > 0
• Use duration data from TimeM (1M events and
duration values)
• Input sequence event: [Event] story: [Story]
• Output sequence answer: [Value]
• [Event] represents the tokens of an event with
the trigger verb marked by a special token to its
left
• [Story] represents down tokens from the context
• [Value] is one of the 7unit labels (i.e., {<= minuts,
hours, …})
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
Approximate dist() function using output from PTNTIME
• Input sequences of event : [EventA] starts
[Relation] [EventB] . Story: [Paragraph] and
output sequences of answer: [Label] [Distance] .
• [EventA]: the texture description of e1
• [EventB]: the texture description of e2
• [Paragraph] the context (premise)
• Fix [Relation] to be before.
• By taking the values of the vocabulary indices
corresponding to “positive” and “negative” from
the logits of [Label] and applying a softmax
operation, get P_before, P_after. P = [P_before,
P_after]
• Apply softmax to the logits of [Distance] over the
7words representing the temporal units to obtain
7 values that approximate the prob. of distance.
Place the 7 values in temporal units’ increasing
order in vector d. c = [0, 1, 2, 3, 4, 5, 6]
• To get the direction, apply the tanh function to
the difference between the prob. in p.
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• T5-Large for PTNTIME and the duration model.
• PTNTIME – 45k steps(1.4M instances), duration
model – 80k steps(2.6M instances)
• These pretrained weights in SYSTIME: SYSTIME
ZEROSHOT wich uses no TRACIE supervision.
• Story-wide exact match metric, which is the
percentage of stories with all its related
hypotheses answered correctly.
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Uniform-dist: in the i.i.d. training set, 70% of the examples with the comparator ends and relation after are
positive. – randomly remove instances from the majority classes
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
• Train and evaluate only the instances with a label
of either “before” or “after”, which accounts for
about 80% of all instances.
• OT-NS(original test, no story): train and test with
only the sentences containing the trigger verbs
• OT: train and test with the entire document as an
auxiliary input
• OT-MS(original test, minimal supervision): train
with 1.2k (6%) training instances
• PT(perturbed test): train with the complete
training set and test on a perturbed test set from
Evaluating Models’ Local Decision Boundaries via
Contrast Sets.
TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision

More Related Content

Similar to AI2 day.pptx

NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSFinding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSESCOM
 
Event templates for Question answering
Event templates for Question answeringEvent templates for Question answering
Event templates for Question answeringBarbara Starr
 
Debs 2010 context based computing tutorial
Debs 2010 context based computing tutorialDebs 2010 context based computing tutorial
Debs 2010 context based computing tutorialOpher Etzion
 
Foundations of Knowledge Representation in Artificial Intelligence.pptx
Foundations of Knowledge Representation in Artificial Intelligence.pptxFoundations of Knowledge Representation in Artificial Intelligence.pptx
Foundations of Knowledge Representation in Artificial Intelligence.pptxkitsenthilkumarcse
 
Learning from Time-to-Event Data from Online Learning Contexts
Learning from Time-to-Event Data from Online Learning Contexts Learning from Time-to-Event Data from Online Learning Contexts
Learning from Time-to-Event Data from Online Learning Contexts Shalin Hai-Jew
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Event templatesfor qa2
Event templatesfor qa2Event templatesfor qa2
Event templatesfor qa2Barbara Starr
 
Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Barbara Starr
 
Brain Clock IM Keynote brief 2007
Brain Clock IM Keynote brief 2007Brain Clock IM Keynote brief 2007
Brain Clock IM Keynote brief 2007Kevin McGrew
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
2015-02-25 research seminal, Paul Seitlinger
2015-02-25 research seminal, Paul Seitlinger2015-02-25 research seminal, Paul Seitlinger
2015-02-25 research seminal, Paul Seitlingerifi8106tlu
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologiesbenosteen
 
using fuzzy logic in educational measurement
using fuzzy logic in educational measurementusing fuzzy logic in educational measurement
using fuzzy logic in educational measurementSunShine9793
 
A Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfA Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfAnna Landers
 
Understanding Concepts
Understanding ConceptsUnderstanding Concepts
Understanding ConceptsDr. N. Asokan
 

Similar to AI2 day.pptx (20)

NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Monte Carlo
Monte CarloMonte Carlo
Monte Carlo
 
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKSFinding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKS
 
On the nature of medial temporal lobe contributions to the constructive simul...
On the nature of medial temporal lobe contributions to the constructive simul...On the nature of medial temporal lobe contributions to the constructive simul...
On the nature of medial temporal lobe contributions to the constructive simul...
 
Event templates for Question answering
Event templates for Question answeringEvent templates for Question answering
Event templates for Question answering
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Debs 2010 context based computing tutorial
Debs 2010 context based computing tutorialDebs 2010 context based computing tutorial
Debs 2010 context based computing tutorial
 
Foundations of Knowledge Representation in Artificial Intelligence.pptx
Foundations of Knowledge Representation in Artificial Intelligence.pptxFoundations of Knowledge Representation in Artificial Intelligence.pptx
Foundations of Knowledge Representation in Artificial Intelligence.pptx
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Learning from Time-to-Event Data from Online Learning Contexts
Learning from Time-to-Event Data from Online Learning Contexts Learning from Time-to-Event Data from Online Learning Contexts
Learning from Time-to-Event Data from Online Learning Contexts
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Event templatesfor qa2
Event templatesfor qa2Event templatesfor qa2
Event templatesfor qa2
 
Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...
 
Brain Clock IM Keynote brief 2007
Brain Clock IM Keynote brief 2007Brain Clock IM Keynote brief 2007
Brain Clock IM Keynote brief 2007
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
2015-02-25 research seminal, Paul Seitlinger
2015-02-25 research seminal, Paul Seitlinger2015-02-25 research seminal, Paul Seitlinger
2015-02-25 research seminal, Paul Seitlinger
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologies
 
using fuzzy logic in educational measurement
using fuzzy logic in educational measurementusing fuzzy logic in educational measurement
using fuzzy logic in educational measurement
 
A Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfA Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdf
 
Understanding Concepts
Understanding ConceptsUnderstanding Concepts
Understanding Concepts
 

More from San Kim

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptxSan Kim
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxSan Kim
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...San Kim
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning taskSan Kim
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalSan Kim
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understandingSan Kim
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoningSan Kim
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 
Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3San Kim
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1San Kim
 
Back propagation
Back propagationBack propagation
Back propagationSan Kim
 

More from San Kim (19)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptx
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning task
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrieval
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoning
 
Electra
ElectraElectra
Electra
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
 
Back propagation
Back propagationBack propagation
Back propagation
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

AI2 day.pptx

  • 1. AI2 day San Kim 2021.06.30 1. “Going on a vacation” tasks longer than “Going for a walk”: A Study of Temporal Commonsense Understanding (EMNLP 19) – MCTACO 2. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions (EMNLP 20) 3. Temporal Reasoning on Implicit Events from Distant Supervision (NAACL 21)-TRACIE
  • 2. MCTACO (Multiple choice temporal common-sense) Temporal commonsense Given two events “going on a vacation” and “going for a walk,” most humans would know that a vacation is typically longer and occurs less often than a walk, but it is still challenging for computers to understand and reason about temporal commonsense. 5 temporal properties • Duration (how long an event takes) • Temporal ordering (typical order of events) • Typical time (when an event happens) • Frequency (how often an event occurs) • Stationarity (whether a state holds for a very long time or indefinitely)
  • 3. MCTACO • MCTACO is comprised of 13k tuples, in the form of (sentence, question, candidate answer). • The sentences in those tuples are randomly selected from MultiRC • Collect questions and candidate answers(both correct question and wrong ones) using AMT. • To ensure the quality of the results, they limit the annotations to native speakers and use qualification tryouts. • Step1. Question Generation • Should ask about one of the five temporal phenomena the defined earlier • Should not be solved simply by a word or phrase from the original sentence • They also require crowd-sourcers to provide a correct answer for each of their questions(correct and incorrect answers) • Step2. Question verification • The ask another two crowdsourcers to check the questions generated in Step 1, (a) whether the two requirements are satisfied and (b) whether the question is grammatically and logically correct. • For valid questions, they continue to ask crowdsourcers to give one correct answer and one incorrect answer
  • 4. MCTACO • For those candidates that represent events, the previously-mentioned token- level perturbations rarely lead to interesting and diverse set of candidate answers. It may lead to invalid phrases (e.g., “he left the house”  “he walked the house”.) Therefore, to perturb such candidates, they create a pool of 60k event phrases using PropBank and perturb the candidate answers to be the most similar ones extracted by an information retrieval(IR) system. • Expand the candidate answer set to 20 candidates per question. • Step3. Candidate answer expansion • Until this stage, they have collected a small set of candidate answers (3 positive and 2 negative) for each question. • Automatically expand this set in three ways • Use a set of rules to extract numbers and quantities (“2”, “once”) and temporal terms (e.g. “a.m.”, “1990”, “afternoon”, “day”), and then randomly perturb them based on a list of temporal units(“second”), adjectives (“early”), points (“a.m.”) and adverbs (“always”). ( “2 a.m.”  “3 p.m.”, “1 day”  “10 days”, “once a week”  “twice a month”) • Mask each individual token in a candidate answer (one at a time) and use BERT to predict replacements for each missing term; they rank those predictions by the confidence level of BERT and keep the top three. • Step4. Answer labeling • Each (sentence, question, answer) tuple produced earlier is labeled by 4 crowdsourcers, with three options: “likely”, “unlikely”, or “invalid”. • A tuple is kept only if all 4 annotators agree on “likely” or “unlikely”.
  • 6. TORQUE • Time is important for understanding events and stories described in natural language text. • “he won the championship yesterday” is different from “he will win the championship tomorrow” (explicit) • If we read that a woman is “expecting the birth of her first child”, we know that the birth is in the future, while if she is “mourning the death of her mother”, the death is in the past. (implicit) • These relationships between an event and a time point (e.g. “won the championship yesterday”) or between two events (e.g., “expecting” is before “birth” and “mourning” is after “death”) are called temporal relations.
  • 7. TORQUE • Challenges of RC for temporal relationships 1. Reading comprehension works rarely require event understanding. Most datasets largely only require an understanding of predicates and arguments, and would ask questions like “what was a woman trapped in?”. But a temporal relation question would be “what started before a woman was trapped?” To answer it, the system needs to identify events (e.g., LANDSLIDE is an event and “body” is not), the time of these events (e.g., LANDSLIDE is correct answer, while SAID is not because of the time when the two events happen), and look at the entire passage rather than the local predicate-argument structures within a sentence (e.g., SNOW and RAINFALL are correct answers to the question above). 2. There are many events in a typical passage of text, so tempral relation questions typically query more than one relationship at the same time. This means that a question can have multiple answers (e.g., “what happened after the landslide?”), or no answers, because the question may be beyond the time scope (e.g., “what happened before the snow started?”)
  • 8. TORQUE 3. Temporal relations queried by natural language questions are often sensitive to a few key words such as before, after, and start. Those questions can easily be changed to make contrasting questions with dramatically different answers. Those questions can easily be changed to make contrasting questions with dramatically different answers. Models that are not sensitive to these small changes in question words will perform poorly on this task. landslide searching, said, found searching No answers Causing, disruption, brining, flooding, searching, trapped, landslide, said, found Landslide, trapped, found, said, disruption, flooding Landslide, trapped said No answers
  • 9. TORQUE • Annotate 3.2k text snippets randomly selected from the TempEval3 dataset. • TORQUE has 25k events and 21k user-generated and fully answered temporal relation questions. • RoBERTa-large achieves 51% in exact match on TORQUE after fine-tuning, about 30% behind human performance. • Generally speaking, an event involves a predicate and its arguments. • When studying time, events were defined as actions/states triggered by verbs, adjectives, and nominals. • This work follows this line of event definition and uses event and event trigger interchangeably. • Define an event to be either a verb or a noun. • In copular constructions, they choose to label the verb as the event, instead of an adjective or preposition. (for consistent treatment of “she was on the east coast yesterday” and “she was happy” – easily teach to crowd workers) (Note that from the perspective of data collection, labeling the copula does not lose information as one can always do post-processing using dependency parsing or semantic role labeling to recover the connection between “was” and “happy”.) Events
  • 10. TORQUE • Events expressed in text are not always factual. They can be negated, uncertain, hypothetical or have associated modalities. • Prior work dealing with events often tried to categorize and label these various aspects because they were crucial for determining temporal relation. • Simply have people label all events, irrespective of their modality, and use natural language to describe relations between them. Events
  • 11. TORQUE Temporal Relations • The relationship between two events with respect to time, or between one event and a fixed time point. • (A, r, B) – A and B are events or time points, and r is a temporal relation. (e.g. (HAD, happened before, SLEPT) – first sentence in Fig. 3) • In previous works, every event is assumed to be associated with a time interval. When comparing two events, there are 13 possible relation labels. • There are still many relations that cannot be expressed because the assumption that every event has a time interval is inaccurate: The time scope of an event may be fuzzy, an event can have a non-factual modality, or events can be repetitive and invoke multiple intervals. • To better handle these phenomena, they use natural language to annotate the relationships between events.
  • 12. TORQUE Natural Language Annotation of Temporal Relations • (A, r, B): a temporal relation between two events • (?, r, B) : a temporal relation question • (?, happened before, SLEPT): natural language expression  “what happened before a lion slept?” • (A, r, B) holds, assuming for any deictic expression A or B the time point when the passage was written, and assuming that the passage is true.
  • 13. TORQUE Advantages of Natural Language Annotation • DISRUPTION and FLOODING happened at about the same time, but we do not know for sure which one is earlier, so we have to choose vague. • SNOW and DISRUPTION, we do not know which one ends earlier and have to choose vague. • The question-answer (QA) pairs can naturally capture these fuzzy relations.
  • 14. TORQUE Advantages of Natural Language Annotation • Natural language questions can conveniently incorporate different modes of events. • ▲ the relation between “having a meal”, and “sleeping” • If we could only choose one label, we must choose before for all these relations, although these relations are actually different. •  a repetitive event may be a series of intervals rather than a single one, and often before is very different from before.
  • 15. TORQUE Advantages of Natural Language Annotation • The format of natural language questions bypasses the need for explicit annotation of properties of events or other theories. • The annotator naturally avoids event pairs that do not have relations. • “what happened after the service industries are hardest hit?” • “what happened after a passerby reported the body?” • “what was expected to happen when the crisis hit America?” • “what was supposed to happen after a passerby called the police?” • It still remains difficult to have a theory explaining •  why hit can compare to expected and crisis, but not to gains.
  • 16. TORQUE Penalize Shortcuts by Contrast Sets • An important problem in building datasets is to avoid trivial solutions. • Contrast questions: which slightly modify the original questions, but dramatically change the answers • For an existing question (?, r, B) (e.g., “what happened after he ate his breakfast?”) • Keep using B and change r (e.g., “what happened before/shortly after/… he ate his breakfast?”) • Modify it to ask about the start/end time (e.g., “what happened after he started eating his breakfast?” or “what would finish after he ate his breakfast?”) • Check that the answers to the new question are different from the original one to avoid trivial modifications (e.g., changing “what happened” to “what occurred”)
  • 17. TORQUE Data Collection • Passages that consist of two contiguous sentences, as this is sufficient to capture the vast majority of non-trivial temporal relations. • Create a pool of 26k two-sentence passages from the TempEval3 workshop (2.8k articles) • 1. Label all the events • 2. Repeatedly do the following • (a) Ask a temporal relation question and point out all the answers from the list of events • Modify the temporal relation to create one or more new questions and answer them. Quality Control • Qualification: crowd workers were trained and tested on 3 capabilities: labeling events, asking temporal relation questions, and question answering. Crowd workers were considered level-1 qualified if they could pass the test within 3 attempts. (1/3 workers passed the qualification.) • Pilot: asked level-1 crowd workers to do a small amount of the real task. They manually checked the annotations and gave feedback to them. Roughly 1 out of 3 pilot submissions received a level-2 qualification. In the end, there were 63 level-2 annotators, and 60 of them actually worked on large-scale task. • Validation: 20% of articles. 5 different level-2 annotators(include original annotator) validate the event and answers. They intentionally added noise to the original data for quality control. They did not do additional validation for the question because there is no bad questions in a random sample of 100. Quality Control
  • 18. TORQUE Cost • 3 passages were presented. • The crowd worker could decide to use some or all of the. • For each passage a worker decided to use, they needed to label the vents, answer 3 hard- coded warm-up questions, and them ask and answer at least 12 questions (including contrast questions). The final reward is a base pay of $6 plus $0.5 for each extra question (up to $4). • Incentive • (1) use fewer passages so that they can do event labeling and warm-up questions fewer times. • (2) modify questions instead of asking from scratch • (3) ask extra questions in each job. • In practice, crowd workers on average used 2 passages in each job. • Validating the events in each passage and the answers to a specific question both cost $0.1. • In total, TORQUE cost $15k for an average of $0.7/question. statistics • 3.2k passage annotations (~50 tokens/passage) • 24.9k events (7.9 events/passage) • 21.2k user-provided questions (half of them were labeled by crowd workers as modifications of existing ones) • 94 / 200 questions querying about relations that cannot be directly represented by the previous single-interval-based labels.
  • 21. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision MATRES, ACL 18 TempEval1-3, TimeBank-Dense(TB-Dense), EventTimeCorpus Before, after, equal, vague Based on tart-points
  • 22. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision
  • 23. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • When reading a story, a human can construct a latent timeline about events’ start and end times. • The timeline not only contains the placements of explicitly mentioned events (e.g., ride a bicycle), but also accounts for implicit events (e.g. Farrah was distracted so she looked away). • The ability to construct such a timeline is essential for understanding the causal dynamics of a situation. • Contributions • A temporal relation dataset TRACIE focusing on implicit events • A distant supervision process for temporal understanding of implicit events • A reasoning model that makes end-time comparisons using predictions of start- time distances and durations
  • 24. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Such tests in TRACIE take the form of multi-premise textual entailment (TE) • Each TRACIE instance contains • A context story (or premise) consisting of a sequence of explicit narrative events • An implicit event in the form of a natural language phrase that is unmentioned but has some role in the story • An explicit event also in the form of a phrase • A comparator of either {starts, ends} • A temporal relation of either {before, after} that marks the relationship in the dimension defined by the comparator between the implicit-event and the explicit-event
  • 25. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Such tests in TRACIE take the form of multi-premise textual entailment (TE) • Premise: context story • Hypotheses: temporal queries about pair-wise relations between implicit and explicit events • E.g. “avoids” is implicit-event, “starts” is the comparator, “removed” is explicit-event and “before” is the temporal-relation. • Flip the temporal-relation (i.e., “before” to “after” and vice versa) to create negative(contradiction) instances. • Use start times of explicit-events as reference points and compare the implicit-event’s start or end time with them, according to the label definitions (Fig. 3)
  • 26. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Randomly sample short stories from the ROCStories dataset • For each story, one annotator writes 5 implicit event phrases that are not explicitly mentioned by the given story, but are inferable and relevant. • Additionally rewrites two explicit events closest to the implicit event’s start and end time, respectively. • Build two TRACIE instances (minus the temporal-relation) per implicit event Implicit Event Generation Automatic Instance Generation • Extract all verbs and relevant arguments with its semantic role labeling model in AllenNLP • Construct a pool of explicit events in the form of short phrases (using verbs and their arguments) • Extract all verbs and relevant arguments with its semantic role labeling model in AllenNLP • Construct a pool of explicit events in the form of short phrases (using verbs and their arguments) • For each implicit event, randomly select two {explicit-event, comparator} pairs from the pool. Label Collection • For each of the 20 instances per story, annotate the temporal-relation with four different annotators. • Majority agreement as the final label and filter out unagreeable instances. • Two authors additionally verify the instances with ambiguous verbs(e.g., “have”) and corrected 5% of the end-time instances.
  • 27. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Distant Supervision • Within-sentence Extraction • Collect start time comparisons between pairs of events heuristically from free-text using “before/after” keywords • Use AllenNLP’s SRL model to process each input sentence and find verbs with a temporal argument that starts with either “before” or “after”, and contains at least another verb. • If there are multiple verbs in the temporal argument, take the one with the largest number of tokens as arguments. • 2.8M instances from Wikipedia dump(May 2020) Pattern-Based Pre-Training
  • 28. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Distant Supervision • Cross-sentence Extraction • The data collected from the within-sentence patterns does not reveal the relative distance between two start times. • Finds direct temporal expressions of hours and dates. • Because these temporal expressions(e.g., 2021-01-01) are globally comparable, the compared events can be anywhere in a document. • This process collects more supervision signals about time-point comparisons and their relative distance on event pairs with trivial causal relation. • Find exact temporal values by filling unmentioned elements of a temporal expression with the nearest previous mention (e.g., add “January to the expression of “the 10th” in Fig. 4) Pattern-Based Pre-Training (PTNTIME)
  • 29. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Cross-sentence Extraction • Construct supervision instances under the assumption that the extracted temporal expressions describe the start times of the associated verbs (e.g., went started on January 1st ) • Represent the differences between the two start times as one of seven coarse temporal units: {<=minutes, hours, days, weeks, months, years, >= decades} • Go to park is weeks before write review as shown in Fig. 4 • Couple the specialized temporal pre-training data described above with additional paragraphs that are used to perform conventional language model pretraining using the original denoising task (T5). • Input sequences of event : [EventA] starts [Relation] [EventB] . Story: [Paragraph] and output sequences of answer: [Label] [Distance] . [paragraph]: non empty only for cross-sentence extractions. [label] is either positive or negative. [distance] is one of the 7 coarse temporal units represented with a set of blank tokens [extra_id_N]
  • 30. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • This model makes end-time comparisons by symbolically combining start time distance and duration from separate predictions based on some of the components. • Does not rely on explicit annotations on timepoints, but only relative comparisons between them. Symbolic Temporal Reasoning Model (SYSTIME)
  • 31. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision Symbolic Temporal Reasoning Model (SYSTIME)
  • 32. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision Duration estimation – pretrain sequence-to-sequence model 𝑟𝑒𝑛𝑑𝑠 𝑒1, 𝑒2 = 𝑏𝑒𝑓𝑜𝑟𝑒 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 + 𝑑𝑢𝑟 𝑒1 < 0 𝑟𝑒𝑛𝑑𝑠 𝑒1, 𝑒2 = 𝑎𝑓𝑡𝑒𝑟 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 + 𝑑𝑢𝑟 𝑒1 > 0 𝑟𝑠𝑡𝑎𝑟𝑡𝑠 𝑒1, 𝑒2 = 𝑏𝑒𝑓𝑜𝑟𝑒 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 < 0 𝑟𝑠𝑡𝑎𝑟𝑡𝑠 𝑒1, 𝑒2 = 𝑎𝑓𝑡𝑒𝑟 ⇔ 𝑑𝑖𝑠𝑡 𝑒1, 𝑒2 > 0 • Use duration data from TimeM (1M events and duration values) • Input sequence event: [Event] story: [Story] • Output sequence answer: [Value] • [Event] represents the tokens of an event with the trigger verb marked by a special token to its left • [Story] represents down tokens from the context • [Value] is one of the 7unit labels (i.e., {<= minuts, hours, …})
  • 33. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision Approximate dist() function using output from PTNTIME • Input sequences of event : [EventA] starts [Relation] [EventB] . Story: [Paragraph] and output sequences of answer: [Label] [Distance] . • [EventA]: the texture description of e1 • [EventB]: the texture description of e2 • [Paragraph] the context (premise) • Fix [Relation] to be before. • By taking the values of the vocabulary indices corresponding to “positive” and “negative” from the logits of [Label] and applying a softmax operation, get P_before, P_after. P = [P_before, P_after] • Apply softmax to the logits of [Distance] over the 7words representing the temporal units to obtain 7 values that approximate the prob. of distance. Place the 7 values in temporal units’ increasing order in vector d. c = [0, 1, 2, 3, 4, 5, 6] • To get the direction, apply the tanh function to the difference between the prob. in p.
  • 34. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • T5-Large for PTNTIME and the duration model. • PTNTIME – 45k steps(1.4M instances), duration model – 80k steps(2.6M instances) • These pretrained weights in SYSTIME: SYSTIME ZEROSHOT wich uses no TRACIE supervision. • Story-wide exact match metric, which is the percentage of stories with all its related hypotheses answered correctly.
  • 35. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Uniform-dist: in the i.i.d. training set, 70% of the examples with the comparator ends and relation after are positive. – randomly remove instances from the majority classes
  • 36. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision • Train and evaluate only the instances with a label of either “before” or “after”, which accounts for about 80% of all instances. • OT-NS(original test, no story): train and test with only the sentences containing the trigger verbs • OT: train and test with the entire document as an auxiliary input • OT-MS(original test, minimal supervision): train with 1.2k (6%) training instances • PT(perturbed test): train with the complete training set and test on a perturbed test set from Evaluating Models’ Local Decision Boundaries via Contrast Sets.
  • 37. TRACIE: Temporal Reasoning on Implicit Events from Distant Supervision