Using an English noun phrase grammar defined by Hulth (2004a) as a starting point, we created an English noun phrase chunker to extract complex noun phrases identified within web-based articles. These phrases served as candidates for anchor texts that would link articles within the About.com network of content sites. Prior to the full-scale deployment, a group of annotators—who had some domain authority in their respective fields—evaluated articles that received these machine-generated anchor texts. Due to the limited amount of time before deployment, there was not sufficient time to create a reference set of documents for anchor text comparisons amongst the annotators; and as a result, we could not compute inter-labeler agreement. However, if we assume that the anchor text generator is another annotator, we could compute the average Cohen’s Kappa Coefficient (Landis and Koch, 1977) across all pairings of the anchor text generator and an annotator. Our approach showed a fair agreement level on average (as described in Pustejovsky and Stubbs (2013, p. 131–132)).
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Evaluation of Anchor Texts for Automated Link Discovery in Semi-structured Web Documents
1. Evaluation of Anchor Texts
for Automated Link Discovery
in Semi-structured Web Documents
Na’im Tyson, Jon Roberts, Jeff Allen and Matt Lipson
Sciences, About.com
Novel Incentives for Collecting Data & Annotation from People
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 1 / 19
2. Purpose
Research Questions
Q: What do you do when you have little time and funding to
annotate web pages?
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 2 / 19
3. Purpose
Research Questions
Q: What do you do when you have little time and funding to
annotate web pages?
A: Create an algorithm to annotate web pages with anchor texts
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 2 / 19
4. Purpose
Research Questions
Q: What do you do when you have little time and funding to
annotate web pages?
A: Create an algorithm to annotate web pages with anchor texts
Q: How do you measure quality and consistency between your
algorithm and human annotators?
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 2 / 19
5. Purpose
Research Questions
Q: What do you do when you have little time and funding to
annotate web pages?
A: Create an algorithm to annotate web pages with anchor texts
Q: How do you measure quality and consistency between your
algorithm and human annotators?
A: Create an evaluation framework to determine consistency
between annotations of algorithm and humans!
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 2 / 19
6. Introduction What is About.com?
• Composition
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
7. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
8. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
9. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
• Content Structure
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
10. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
• Content Structure
• Content written by a large number of writers, experts, using a
content-management system (CMS)—with each expert having their
own topic area
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
11. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
• Content Structure
• Content written by a large number of writers, experts, using a
content-management system (CMS)—with each expert having their
own topic area
• Nascent content can be linked to other articles with hypertext links -
inline links
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
12. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
• Content Structure
• Content written by a large number of writers, experts, using a
content-management system (CMS)—with each expert having their
own topic area
• Nascent content can be linked to other articles with hypertext links -
inline links
• Inline links - necessary for user recirculation
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
13. Introduction What is About.com?
• Composition
• Intent-driven website of two million articles divided into seven major
verticals: food, health1
, home, money, style, tech and travel
• Over 200 million monthly visits from U.S., Western Europe and parts
of India
• Content Structure
• Content written by a large number of writers, experts, using a
content-management system (CMS)—with each expert having their
own topic area
• Nascent content can be linked to other articles with hypertext links -
inline links
• Inline links - necessary for user recirculation
• Have higher clicks per session cf. article listings (@ bottom of page),
trending articles and navigation units
1
Top health content migrated to verywell.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 3 / 19
14. Introduction What makes Inline Links problematic?
• Experts hardly add links!
Figure 1: Histogram of link density of articles prior to the launch of automated link discovery.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 4 / 19
15. Introduction What makes Inline Links problematic? (Continued)
• Experts do not receive extra incentives for annotating anchor text for
inline links
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 5 / 19
16. Introduction What makes Inline Links problematic? (Continued)
• Experts do not receive extra incentives for annotating anchor text for
inline links
• Producing quality inline links takes time
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 5 / 19
17. Introduction What makes Inline Links problematic? (Continued)
• Experts do not receive extra incentives for annotating anchor text for
inline links
• Producing quality inline links takes time
• Experts must know their content and other neighboring content to link
to it
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 5 / 19
18. Introduction What makes Inline Links problematic? (Continued)
• Experts do not receive extra incentives for annotating anchor text for
inline links
• Producing quality inline links takes time
• Experts must know their content and other neighboring content to link
to it
• Experts not compensated for direction of traffic outside their site
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 5 / 19
19. Generating Anchor Texts Making Anchor Texts from Keyword Extraction Algorithms
• Empirically-driven inline linking produces long sequences
Figure 2: Part-of-speech (POS) histogram of expert-generated anchor texts consisting of
six words in full-text articles.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 6 / 19
20. Generating Anchor Texts Making Anchor Texts from Keyword Extraction Algorithms
• Empirically-driven inline linking produces long sequences
Figure 2: Part-of-speech (POS) histogram of expert-generated anchor texts consisting of
six words in full-text articles.
• TextRank [Mihalcea, 2004], KEA [Witten et al., 1999] and Hulth
[2004] produce sequences that are too short
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 6 / 19
21. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
22. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
• Grammars used to identify candidates for anchor text suggestions
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
23. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
• Grammars used to identify candidates for anchor text suggestions
• Augmented with entity and datetime tags
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
24. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
• Grammars used to identify candidates for anchor text suggestions
• Augmented with entity and datetime tags
• POS sequences expressed as Chunk Rules implemented in Python’s
NLTK
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
25. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
• Grammars used to identify candidates for anchor text suggestions
• Augmented with entity and datetime tags
• POS sequences expressed as Chunk Rules implemented in Python’s
NLTK
• Candidates selected based on weighted sum of document-level
features between source and target documents
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
26. Generating Anchor Texts Making Anchor Texts using Chunk Parsing
• Starting point: Generic grammar of POS sequences originally
derived from Hulth (2004)
• Grammars used to identify candidates for anchor text suggestions
• Augmented with entity and datetime tags
• POS sequences expressed as Chunk Rules implemented in Python’s
NLTK
• Candidates selected based on weighted sum of document-level
features between source and target documents
• Weights based on existing expert links
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 7 / 19
27. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
28. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
29. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
• Annotators modified documents within an annotation environment
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
30. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
• Annotators modified documents within an annotation environment
1 Keep anchor text
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
31. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
• Annotators modified documents within an annotation environment
1 Keep anchor text
2 Modify anchor text (by expanding/contracting)
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
32. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
• Annotators modified documents within an annotation environment
1 Keep anchor text
2 Modify anchor text (by expanding/contracting)
3 Delete anchor text
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
33. Evaluating Anchor Texts Quality Assurance Setup & Workflow
• Annotation of top 86k documents
• Done across 13 annotators paid hourly
• Annotators modified documents within an annotation environment
1 Keep anchor text
2 Modify anchor text (by expanding/contracting)
3 Delete anchor text
4 Modify link target
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 8 / 19
34. Evaluating Anchor Texts Quality Assurance Setup & Workflow (continued)
Figure 3: Example annotation environment used for investopedia.com.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 9 / 19
35. Evaluating Anchor Texts Computing Inter-labeler Agreement
• For each annotator...
B B
positive negative
A positive a b
A negative c d
Table 1: Contingency table for the anchor text generator (A), and a single annotator (B).
Algorithm: ˆ quick brown fox jumps over the lazy dog $
Annotator: ˆ the quick brown fox $
d c a a a b b b b b d
Example 1: Example of phrase alignment between the anchor text generator and an annotator.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 10 / 19
36. Evaluating Anchor Texts Computing Inter-labeler Agreement - Continued
Algorithm: ˆ quick brown fox jumps over the lazy dog $
Annotator: ˆ the quick brown fox $
d c a a a b b b b b d
B B
positive negative
A positive 3/11 = 0.27 5/11 = 0.45
A negative 1/11 = 0.09 2/11 = 0.18
Table 2: Contingency table computed from relative word agreements from Table 1 for the
generator (A) and annotator (B).
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 11 / 19
37. Evaluating Anchor Texts Computing Inter-labeler Agreement - Final
Cohen’s Kappa
K =
Pr(a) − Pr(e)
1 − Pr(e)
(1)
2
See Pustejovksy [2013, p. 133–134] for a detailed example of how to compute both
Pr(a) and Pr(e).
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 12 / 19
38. Evaluating Anchor Texts Computing Inter-labeler Agreement - Final
Cohen’s Kappa
K =
Pr(a) − Pr(e)
1 − Pr(e)
(1)
Average Cohen’s Kappa
¯K =
1
|A|
K (2)
2
See Pustejovksy [2013, p. 133–134] for a detailed example of how to compute both
Pr(a) and Pr(e).
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 12 / 19
39. Evaluating Anchor Texts Computing Inter-labeler Agreement - Final
Cohen’s Kappa
K =
Pr(a) − Pr(e)
1 − Pr(e)
(1)
Average Cohen’s Kappa
¯K =
1
|A|
K (2)
Annotation Precision of Document
Precision(dk) =
a
a + b
(3)
2
See Pustejovksy [2013, p. 133–134] for a detailed example of how to compute both
Pr(a) and Pr(e).
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 12 / 19
40. Evaluating Anchor Texts Computing Inter-labeler Agreement - Final
Cohen’s Kappa
K =
Pr(a) − Pr(e)
1 − Pr(e)
(1)
Average Cohen’s Kappa
¯K =
1
|A|
K (2)
Annotation Precision of Document
Precision(dk) =
a
a + b
(3)
Mean Average Precision
MAP =
1
|A|
|A|
i=1
1
m
m
k=1
Precision(dk) (4)
2
See Pustejovksy [2013, p. 133–134] for a detailed example of how to compute both
Pr(a) and Pr(e).
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 12 / 19
41. Results
• Average K: 0.33
• Fair level of agreement
• slight < fair < moderate < substantial < perfect
• MAP: 0.40
• Roughly a 40% agreement, on average, between the linker and an
annotator for this dataset
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 13 / 19
42. Discussion Two-tier Approach to Web Page Annotation
Testing Labeling Consistency
• Create same set of documents for annotation
• Documents already linked using the automated linking process
• Measure the mean average relative agreement, MAR
• Using agreements a and d from Table 1 for anchor text, t, compute
average relative agreement—between an annotator and another
annotator—for a document consisting of each anchor text, t
1
n
n
k=1
atk
+ dtk
(5)
• Compute mean across the set of all documents, D, to get the mean
average relative agreement between any two annotators:
MAR<i,j> =
1
|D|
|D|
k=1
1
n
n
l=1
atl
+ dtl
(6)
• Average all MAR<i,·> scores for each annotator i
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 14 / 19
43. Discussion Two-tier Approach to Web Page Annotation
Establish Best Practices
• Use threshold on MAR to remove bad actors within the group
[Neuendorf, 2002]
• Hold general meeting of annotators exposing good and bad practices
in annotation
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 15 / 19
44. Discussion Improvements to Anchor Text Selection
• Offer more parses of sentences given the noun phrase grammar
• NLTK returns first matching rule to grammar
• Probabilistic noun phrase grammar in NLTK
• Compute probabilities based on anchor texts used in annotations of this
study
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 16 / 19
45. Conclusions Lessons Learned
• Use a reference corpus
• Introduce annotators to one another to offer best practices
• Establish social media group to foster communication
• Static rules require updating to take into account experts’ choices
• Probabilistic grammar accommodates parsing flexibility
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 17 / 19
46. References
References I
R. Mihalcea and P. Tarau.
TextRank: Bringing Order into Texts
Conference on Empirical Methods in Natural Language Processing,
2004.
I.H. Witten et al.
KEA: Practical automatic keyphrase extraction
International Workshop on Description Logics, p. 254–256, 1999
A. Hulth.
Combining Machine Learning and Natural Language Processing for
Automatic Keyword Extraction.
Department of Computer and Systems Sciences, Stockholm University,
2004.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 18 / 19
47. References
References II
J. Pustejovsky and A. Stubbs.
Natural Language Annotation for Machine Learning
O’Reilly Media, Inc., 2013.
K. A. Neuendorf.
The Content Analysis Guidebook
Thousand Oaks, California: Sage Publications.
Tyson, Roberts, Allen, Lipson (About.com) Evaluation of Annotations of Anchor Texts LREC 2016 19 / 19