3. Data
2224 records (now 5134)
65 interaction types (now 68)
809 proteins (now 1434 + 9 and 2295 pairs)
984 articles (now 3099)
Average 1.9 interactions per PP (max = 23)
Average 5.9 interactions per article (max = 90)
4. Goal
For every “triple”
PP
−
A (Article with unique pmid)
−
Find the interaction type
(ignore 7.7% of the triples with > 1 interaction)
−
5. NER
LocusLink
“Conservative” approach
No coreference analysis
Not good recall
High precision
6. Method – assuming one interaction
For a subset of all the PPs (45%)
Get all full text articles
−
Get the sentences that have both PP
−
Group as “papers”
−
Also for a tripe PPA
Get the papers that cite A
−
Get the sentences that have PP and mention A
−
Group as “citances”
−
7. Training Data Construction
“papers”
0.5 sentence per triple (max 79)
−
50.6 sentences per interaction type (max 119)
−
“citances”
0.4 sentence per triple (max 105)
−
49.2 sentences per interaction type (max 162)
−
Include an interaction type if >40 in both
12. DM – Assumptions
There is an interaction
Single interaction per sentence
As many role states as words
Words = features
One feature node per role
−
Roles are hidden
−
Protein names may be masked
−
13. Evaluation
Documentlevel
(Not all the sentences describe an interaction)
−
For every triple an interaction is assigned to the
−
whole document
Using two methods:
−
Mj
Cf
14. Mj
for each triple
for each sentence of the triple
−
find the interaction that maximises the posterior
probability of the interaction given features
assign to all sentences of the triple the most
frequent interaction
15. Cf
get all conditional probabilities (do not assign
per sentence)
for each triple
choose the interaction that maximises the sum over
−
all the triple's sentences
17. Comparison
Trigger word
70 triggers for 10 interactions
−
Cooccurrence
−
Choose the “most specific” type
−
If both specific or no trigger, choose nothing
−
Backoff: if in doubt, choose the most frequent
−
interaction
18. Comparison
Key(B): trigger word (backoff)
−
Base: the most frequent interaction
−
19. SentenceLevel Experiments
Manual annotation of 2114 sentences
68.3% disagreed with HIV database
Contacted some of the authors
DB error
−
Contradiction
−
“require” but under certain conditions “inhibit”