SlideShare a Scribd company logo
1 of 23
Download to read offline
Interactive Second Language
Learning from News Websites	
Tao Chen, Naijia Zheng, Yue Zhao,
Muthu Chandrasekaran and Min-Yen Kan
kanmy@comp.nus.edu.sg
Slides available at: dwz.cn/kan-nlptea2
Image courtesy: skyscanner.net	
Formal language learning is
time-consuming, and learning
materials are often limited.
Slides available at: 
dwz.cn/kan-nlptea2
NLP-TEA-2 - WordNews	
 3
Image courtesy: anid.com.br	
  
…but we also spend over
35B hours every month
keeping up with the news
31 Jul 2015 NLP-TEA-2 - WordNews	
 4
✚	
  
A browser extension for vocabulary
learning when reading online news	
WordNews
531 Jul 2015 NLP-TEA-2 - WordNews	
The WordNews 
Chrome Extension
News Context	
•  Identify the news category by URL pattern
http://edition.cnn.com/2015/07/08/entertainment/feat-
tom-selleck-droughtshaming-water/index.html

7 categories: Entertainment, World, Finance, Sports, Fashion,
Technology, Travel
•  Classify words based on category document frequency
E.g.,“superstar” belongs to “Entertainment”
•  For both English and Chinese news and words
31 Jul 2015 NLP-TEA-2 - WordNews	
 6
Outline	
•  Introduction
•  Conclusion
31 Jul 2015 NLP-TEA-2 - WordNews	
 7
•  Translating	
•  Testing	
Word Sense Disambiguation (WSD)	
Distractor Generation
Word Sense Disambiguation	
•  Expanded College EnglishTest 4 Dictionary
–  English, Chinese (relative frequency), part-of-speech
–  33,664 English-Chinese pairs and
~4k unique English words
•  Baseline: always choose the most frequent relative
Chinese translation
–  100% of coverage as it always has a translation
–  Low accuracy as it lacks context modeling
31 Jul 2015 NLP-TEA-2 - WordNews	
 8
Word Sense Disambiguation	
•  Approach 1: News Category
–  Pick the Chinese translation with the same category as the
news article
–  E.g.,“利息” =“interest” in Finance news
•  Approach 2: Part-of-Speech (POS)
–  Pick up the Chinese translation with the same POS as the
target English word
–  E.g.,“book” = “书” (noun) and “预定” (verb)
31 Jul 2015 NLP-TEA-2 - WordNews	
 9
WSD: BingTranslator Based Methods	
•  Approach 3: Substring Match
31 Jul 2015 NLP-TEA-2 - WordNews	
 10
… into the world’s top 40
clubs
noun: 俱乐部, 棒, …
verb: 凑份子, …
…进入世界顶级40名
俱乐部
1.BingTranslator	
2. Look up dictionary	
3. Matched: 俱乐部
(Bing) is a substring of
俱乐部 (dict)	
Limited by dictionary coverage!
WSD: BingTranslator Based Methods	
31 Jul 2015 NLP-TEA-2 - WordNews	
 11
… into the world’s top 40
clubs
noun: 顶部,顶端, 顶,
adj:上面的, 最大的,…
verb: 盖
…进入世界顶级40名
俱乐部
1.BingTranslator
2. Look up dictionary	
3. No output using
substring match
WSD: BingTranslator Based Methods	
•  Approach 4: Relaxed Match
31 Jul 2015 NLP-TEA-2 - WordNews	
 12
… into the world’s top 40
clubs
noun: 顶部,顶端, 顶,
adj:上面的, 最大的,…
verb: 盖
… 进入 世界 顶级 40名
俱乐部
1.BingTranslator +
Chinese Segmentation	
2. Look up dictionary	
3. Relaxed Match: 顶级
(Bing) is superset of 顶
(dict)
WSD: BingTranslator Based Methods	
31 Jul 2015 NLP-TEA-2 - WordNews	
 13
... state department
spokeswomen said …
noun: 态, 国, 州, …
verb: 声明, 陈述, … 发言, …
adj:国家的
…	
  国家 部门 的 发言人
说 …
1.BingTranslator +
Chinese Segmentation
2. Look up dictionary	
3.Two relaxed matches, both
wrong
国家 (Bing) =国(dict)
发言人(Bing) =发言(dict)
WSD: BingTranslator Based Methods	
•  Approach 5: Bing Alignment	
31 Jul 2015 NLP-TEA-2 - WordNews	
 14
... state department
spokeswomen said …
noun: 态, 国, 州, …
verb: 声明, 陈述, … 发言, …
adj:国家的
…	
  国家 部门 的 发言人
说 …
1.BingTranslator +
Bing Alignment
2. Look up dictionary	
Better - No output if the
alignment is phrase to phrase
WSD: Evaluation	
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Baseline 1. News
Category
2. POS 3. Bing -
Substring
4. Bing - Relaxed 5. Bing - Align
Coverage
Accuracy
31 Jul 2015 NLP-TEA-2 - WordNews	
 15
Outline	
•  Introduction
•  Conclusion
31 Jul 2015 NLP-TEA-2 - WordNews	
 16
•  Translating	
•  Testing	
Word Sense Disambiguation (WSD)	
Distractor Generation
What is a set of suitable distractors?
•  Have the same form as the target word
– Same POS tag
•  Fit the sentence context
– News category
•  Have proper difficulty level according to user’s
level of mastery
– Difficult distractors are more semantically similar to
the target words
31 Jul 2015 NLP-TEA-2 - WordNews	
 17
Generating proper distractors	
The difficulty level is measured by Lin distance
between the target word and candidate distractor
in WordNet
A distractor is deemed hard when its similarity to
target word is above threshold (e.g., 0.1)	
31 Jul 2015 NLP-TEA-2 - WordNews	
 18
Lowest common
subsumer synset
Distractor Generation	
1.WordNews Hard: Same word form + 
Same news category + 
Semantically Similar
2. Random News: Same word form +
Same news category
Vary the number of hard distractors based on
user’s knowledge level
– Beginner: two random + one hard
– Intermediate: three hard
31 Jul 2015 NLP-TEA-2 - WordNews	
 19
Human Evaluation	
•  Baseline
– WordGap System (Knoop and Wilske, 2013)
•  Distractor: target’s synonyms of synonyms in WordNet
•  Evaluation 1: WordGap vs. Random News
•  Evaluation 2: WordGap vs.WordNews Hard
	
31 Jul 2015 NLP-TEA-2 - WordNews	
 20
Human Evaluation	
31 Jul 2015 NLP-TEA-2 - WordNews	
 21
One is the target word, three are from
WordGap, and the other three are from
WordNews Hard or Random News
Human Evaluation	
•  WordGap vs. Random News.
•  WordGap vs.WordNews Hard.
31 Jul 2015 NLP-TEA-2 - WordNews	
 22
# of wins Avg. Score
WordGap 27 3.84
Random News 23 4.10
# of wins Avg. Score
WordGap 21 4.16
WordNews Hard 29 3.49
Lower scores
are better
Conclusion	
•  WordNews: a Chrome extension enabling interactive
vocabulary learning when reading online news
•  Future work
– Mobile client and 
longitudinal user study
31 Jul 2015 NLP-TEA-2 - WordNews	
 23
Word Sense Disambiguation 
based on MachineTranslation	
Distractor Generation based on
news context and semantic similarity
Image Courtesy: www.heley-int.com	
Thanks
for listening!
Slides available at: 
dwz.cn/kan-nlptea2

More Related Content

Similar to Tao Chen - 2015 - Interactive Second Language Learning from News Websites

thesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitthesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitMiklas Njor
 
Social media tools for audience research and measurement and relevant influen...
Social media tools for audience research and measurement and relevant influen...Social media tools for audience research and measurement and relevant influen...
Social media tools for audience research and measurement and relevant influen...Brilliant Noise
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
Monitoring real time public vaccine confidence through social media (Francesc...
Monitoring real time public vaccine confidence through social media (Francesc...Monitoring real time public vaccine confidence through social media (Francesc...
Monitoring real time public vaccine confidence through social media (Francesc...Data Driven Innovation
 
The Research Software Encyclopedia
The Research Software EncyclopediaThe Research Software Encyclopedia
The Research Software EncyclopediaVanessa S
 
Projects for Database and Data Mining
Projects for Database and Data MiningProjects for Database and Data Mining
Projects for Database and Data MiningJames Ndukwane
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
thesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitthesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitMiklas Njor
 
Talk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaTalk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaDaniele Quercia
 
Building Your Online PLN
Building Your Online PLNBuilding Your Online PLN
Building Your Online PLNLauren Zucker
 
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...Kristen T
 
Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...Platforma Otwartej Nauki
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...keelangreen
 
Relevance Mining and Detection System
Relevance Mining and Detection SystemRelevance Mining and Detection System
Relevance Mining and Detection SystemAlexandre Pinto
 
Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11Swift Kick
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentationAndrew Marietta
 
Social Media Case Studies Compilation #1 - 110210
Social Media Case Studies Compilation #1 - 110210Social Media Case Studies Compilation #1 - 110210
Social Media Case Studies Compilation #1 - 110210FullsourceWP
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisDiana Maynard
 

Similar to Tao Chen - 2015 - Interactive Second Language Learning from News Websites (20)

thesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitthesis_presentation_v1 shorter split
thesis_presentation_v1 shorter split
 
Social media tools for audience research and measurement and relevant influen...
Social media tools for audience research and measurement and relevant influen...Social media tools for audience research and measurement and relevant influen...
Social media tools for audience research and measurement and relevant influen...
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
Monitoring real time public vaccine confidence through social media (Francesc...
Monitoring real time public vaccine confidence through social media (Francesc...Monitoring real time public vaccine confidence through social media (Francesc...
Monitoring real time public vaccine confidence through social media (Francesc...
 
The Research Software Encyclopedia
The Research Software EncyclopediaThe Research Software Encyclopedia
The Research Software Encyclopedia
 
Projects for Database and Data Mining
Projects for Database and Data MiningProjects for Database and Data Mining
Projects for Database and Data Mining
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
thesis_presentation_v1 shorter split
thesis_presentation_v1 shorter splitthesis_presentation_v1 shorter split
thesis_presentation_v1 shorter split
 
Talk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaTalk of the City: Londoners and Social Media
Talk of the City: Londoners and Social Media
 
Building Your Online PLN
Building Your Online PLNBuilding Your Online PLN
Building Your Online PLN
 
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...
Rea2015 Workshop 2 Everything You Didn't Know You Needed to Know About the So...
 
Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
 
Relevance Mining and Detection System
Relevance Mining and Detection SystemRelevance Mining and Detection System
Relevance Mining and Detection System
 
Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentation
 
Social Media Case Studies Compilation #1 - 110210
Social Media Case Studies Compilation #1 - 110210Social Media Case Studies Compilation #1 - 110210
Social Media Case Studies Compilation #1 - 110210
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Connected Research Workshop
Connected Research WorkshopConnected Research Workshop
Connected Research Workshop
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 

Tao Chen - 2015 - Interactive Second Language Learning from News Websites

  • 1. Interactive Second Language Learning from News Websites Tao Chen, Naijia Zheng, Yue Zhao, Muthu Chandrasekaran and Min-Yen Kan kanmy@comp.nus.edu.sg Slides available at: dwz.cn/kan-nlptea2
  • 2. Image courtesy: skyscanner.net Formal language learning is time-consuming, and learning materials are often limited. Slides available at: dwz.cn/kan-nlptea2
  • 3. NLP-TEA-2 - WordNews 3 Image courtesy: anid.com.br   …but we also spend over 35B hours every month keeping up with the news
  • 4. 31 Jul 2015 NLP-TEA-2 - WordNews 4 ✚   A browser extension for vocabulary learning when reading online news WordNews
  • 5. 531 Jul 2015 NLP-TEA-2 - WordNews The WordNews Chrome Extension
  • 6. News Context •  Identify the news category by URL pattern http://edition.cnn.com/2015/07/08/entertainment/feat- tom-selleck-droughtshaming-water/index.html 7 categories: Entertainment, World, Finance, Sports, Fashion, Technology, Travel •  Classify words based on category document frequency E.g.,“superstar” belongs to “Entertainment” •  For both English and Chinese news and words 31 Jul 2015 NLP-TEA-2 - WordNews 6
  • 7. Outline •  Introduction •  Conclusion 31 Jul 2015 NLP-TEA-2 - WordNews 7 •  Translating •  Testing Word Sense Disambiguation (WSD) Distractor Generation
  • 8. Word Sense Disambiguation •  Expanded College EnglishTest 4 Dictionary –  English, Chinese (relative frequency), part-of-speech –  33,664 English-Chinese pairs and ~4k unique English words •  Baseline: always choose the most frequent relative Chinese translation –  100% of coverage as it always has a translation –  Low accuracy as it lacks context modeling 31 Jul 2015 NLP-TEA-2 - WordNews 8
  • 9. Word Sense Disambiguation •  Approach 1: News Category –  Pick the Chinese translation with the same category as the news article –  E.g.,“利息” =“interest” in Finance news •  Approach 2: Part-of-Speech (POS) –  Pick up the Chinese translation with the same POS as the target English word –  E.g.,“book” = “书” (noun) and “预定” (verb) 31 Jul 2015 NLP-TEA-2 - WordNews 9
  • 10. WSD: BingTranslator Based Methods •  Approach 3: Substring Match 31 Jul 2015 NLP-TEA-2 - WordNews 10 … into the world’s top 40 clubs noun: 俱乐部, 棒, … verb: 凑份子, … …进入世界顶级40名 俱乐部 1.BingTranslator 2. Look up dictionary 3. Matched: 俱乐部 (Bing) is a substring of 俱乐部 (dict) Limited by dictionary coverage!
  • 11. WSD: BingTranslator Based Methods 31 Jul 2015 NLP-TEA-2 - WordNews 11 … into the world’s top 40 clubs noun: 顶部,顶端, 顶, adj:上面的, 最大的,… verb: 盖 …进入世界顶级40名 俱乐部 1.BingTranslator 2. Look up dictionary 3. No output using substring match
  • 12. WSD: BingTranslator Based Methods •  Approach 4: Relaxed Match 31 Jul 2015 NLP-TEA-2 - WordNews 12 … into the world’s top 40 clubs noun: 顶部,顶端, 顶, adj:上面的, 最大的,… verb: 盖 … 进入 世界 顶级 40名 俱乐部 1.BingTranslator + Chinese Segmentation 2. Look up dictionary 3. Relaxed Match: 顶级 (Bing) is superset of 顶 (dict)
  • 13. WSD: BingTranslator Based Methods 31 Jul 2015 NLP-TEA-2 - WordNews 13 ... state department spokeswomen said … noun: 态, 国, 州, … verb: 声明, 陈述, … 发言, … adj:国家的 …  国家 部门 的 发言人 说 … 1.BingTranslator + Chinese Segmentation 2. Look up dictionary 3.Two relaxed matches, both wrong 国家 (Bing) =国(dict) 发言人(Bing) =发言(dict)
  • 14. WSD: BingTranslator Based Methods •  Approach 5: Bing Alignment 31 Jul 2015 NLP-TEA-2 - WordNews 14 ... state department spokeswomen said … noun: 态, 国, 州, … verb: 声明, 陈述, … 发言, … adj:国家的 …  国家 部门 的 发言人 说 … 1.BingTranslator + Bing Alignment 2. Look up dictionary Better - No output if the alignment is phrase to phrase
  • 15. WSD: Evaluation 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Baseline 1. News Category 2. POS 3. Bing - Substring 4. Bing - Relaxed 5. Bing - Align Coverage Accuracy 31 Jul 2015 NLP-TEA-2 - WordNews 15
  • 16. Outline •  Introduction •  Conclusion 31 Jul 2015 NLP-TEA-2 - WordNews 16 •  Translating •  Testing Word Sense Disambiguation (WSD) Distractor Generation
  • 17. What is a set of suitable distractors? •  Have the same form as the target word – Same POS tag •  Fit the sentence context – News category •  Have proper difficulty level according to user’s level of mastery – Difficult distractors are more semantically similar to the target words 31 Jul 2015 NLP-TEA-2 - WordNews 17
  • 18. Generating proper distractors The difficulty level is measured by Lin distance between the target word and candidate distractor in WordNet A distractor is deemed hard when its similarity to target word is above threshold (e.g., 0.1) 31 Jul 2015 NLP-TEA-2 - WordNews 18 Lowest common subsumer synset
  • 19. Distractor Generation 1.WordNews Hard: Same word form + Same news category + Semantically Similar 2. Random News: Same word form + Same news category Vary the number of hard distractors based on user’s knowledge level – Beginner: two random + one hard – Intermediate: three hard 31 Jul 2015 NLP-TEA-2 - WordNews 19
  • 20. Human Evaluation •  Baseline – WordGap System (Knoop and Wilske, 2013) •  Distractor: target’s synonyms of synonyms in WordNet •  Evaluation 1: WordGap vs. Random News •  Evaluation 2: WordGap vs.WordNews Hard 31 Jul 2015 NLP-TEA-2 - WordNews 20
  • 21. Human Evaluation 31 Jul 2015 NLP-TEA-2 - WordNews 21 One is the target word, three are from WordGap, and the other three are from WordNews Hard or Random News
  • 22. Human Evaluation •  WordGap vs. Random News. •  WordGap vs.WordNews Hard. 31 Jul 2015 NLP-TEA-2 - WordNews 22 # of wins Avg. Score WordGap 27 3.84 Random News 23 4.10 # of wins Avg. Score WordGap 21 4.16 WordNews Hard 29 3.49 Lower scores are better
  • 23. Conclusion •  WordNews: a Chrome extension enabling interactive vocabulary learning when reading online news •  Future work – Mobile client and longitudinal user study 31 Jul 2015 NLP-TEA-2 - WordNews 23 Word Sense Disambiguation based on MachineTranslation Distractor Generation based on news context and semantic similarity Image Courtesy: www.heley-int.com Thanks for listening! Slides available at: dwz.cn/kan-nlptea2