SlideShare a Scribd company logo
Highlighting Weasel Sentences
for Promoting Critical
Information Seeking on the Web
Fumiaki Saito1 , Yoshiyuki Shoji2 ,Yusuke Yamamoto1
1:Shizuoka university, Japan
2: Aoyama Gakuin ,Japan January 19,2020
1
WISE 2019: Session S7
Background: Web information is not always correct
2
The number of medical Web sites
authorized by medical experts:
< 50%*
* E. Sillence et al., “Trust and Mistrust of Online Health Sites”, ACM CHI, pp.663-670, 2004
Possible approach in information science
3
Obtaining
correct information
(Semi-) automatic analysis
on information credibility
Examples of credibility analysis systems
*2 Y. Yamamoto and K. Tanaka. Enhancing Credibility Judgment of Web Search Results. In Proceedings of the 29th ACM SIGCHI Conference
on Human Factors in Computing Systems (CHI 2011), pages 1235–1244, 2011.
*1 Yin, X., Han, J., & Philip, S. Y. (2008). Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on
Knowledge and Data Engineering, 20(6), 796-808.
TruthFinder*1
Scores the consistency of fact
describing objects
CowSearch*2
Provides supporting information for
credibility judgment
The analysis does not guarantee
the correctness of information
Limitation
4
Possible approach in information science
5
Obtaining
correct information
(Semi-) automatic analysis
on information credibility
Careful examination on
information by users
57%
Many people are not aware of Web information credibility!
82%
# of young Japanese people
who trust Web information *1
*1 Adobe Inc., “The State of Content : Rules of Engagement”, 2015
*2 S.Nakamura et al., “Trustworthiness analysis of Web search results (ECDL 2007)
Never mind,
never mind.
# of people who trust
information on SERP *2
I trust Google
6
Why don’t people often take care of Web information credibility
lTrust in search engine’s ranking1
lWrong metrics for quality judgment
(e.g. appearance of websites2)
lCognitive bias3
7
1:Kahneman, D.: Thinking, fast and slow, Macmillan (2011)
2:Pan, B., Hembrooke, H., Joachims, et al
In Google We Trust: Users’ Decisions on Rank, Position, and Relevance
3:Fogg, B. J, Soohoo, Cathy ,Danielson, David R, et al
How Do Users Evaluate the Credibility of Web Sites? A Study with over 2,500 Participants
♥
Research question
How can we promote
critical information seeking
when users do Web browsing?
8
Proposed system
9
Automatically detects and highlights
ambiguous sentences on the Web
Proposed system
10
Automatically detects and highlights
ambiguous sentences on the Web
They lack in evidences but seem to
have no problems in their claims
Ambiguous sentences
→
In this study
Proposed system to highlight ambiguous sentences
11
Makes users aware of information credibility
and promotes careful information seeking
!!
12
Ambiguous sentence classifier1.
How to detect ambiguous
sentences?Q.
13
We focus on weasel expressionsA.
What are weasel expressions ?
14
l“Certain person concerned is ... ”
l“Research has shown...”
14
l“ It is often said ... ”
l“ It is widely thought ... ”
Who said that?
What is the truth?
Weasel Expressions create an impression that
something specific and meaningful has been said,
although their claim is ambiguous and lacks in evidence.
Classification of weasel sentences
We built a simple weasel sentence
classifier with Wikipedia data
15
Wikipedia
Ambiguous sentences
Non-ambiguous ones
Training/test set
ML
(SVM)
Training/test dataset to classify weasel sentences (1/2)
We focused on Wikipedia editting rules.
Some weasel expressions are annotated with the
special tags in Wikipedia.
Ex: Who said?
16
It is said that Company A
has said this hoax. [By whom?]
Weasel expressions in Wikipedia
l“The hoax is said to have been company
A, a media and advertising company that
wanted to show off their influence.[by whom?] ”
l“ ~ is an established theory, but
there are some objections. [who?]”
17
Image reference https://en.wikipedia.org/wiki/Wikipedia_logo
Training/test dataset to classify weasel sentences (2/2)
Positive examples (2236 senteneces)
Sentences annotated with [who?][by whom?] on
Wikipedia
18
Negative examples(2236 senteneces)
Sentences without [who?][by whom?] tags in
Wikipedia articles where weasel expressions
appear.
Features for classification
lBag-of-Words :nouns, verbs, and adjectives
lTypical example of ambiguous expression
listed in Wikipedia (27 phrases)
– "It is said that ..."
– "An expert argues that ..."
and so on
19
Classifier performance
Performance evaluation with
5-fold cross-validation
20
Accuracy Precision Recall F1
SVM(RBF) 0.764 0.772 0.748 0.760
Improvements for better classification (1/2)
21
We need to get more data from editing histories
on Wikipedia
Frequency of weasel annotation varies
depending on topics on Wikipedia
Topic categories of positive examples may
be biased
Improvements in dataset
Improvements for better classification (2/2)
22
Improvements in classification features
Because bag-of-words is simple as a feature,
features that consider surrounding sentences and
word order are required.
23
Effect of our prototype
on user behaviors2.
How does our prototype affect users behaviors?
24
?
24
Can our prototype encourage users to
browse webpages more carefully?
If so, how do user behaviors change?
Q.
Hypothesis
lH1 The proposed system extends the time
expended in web information seeking.
lH2 The proposed system increases the
number of visited webpages.
lH3 The proposed system improves the users’
confidence in their decisions made through
web information seeking.
lH4 The above-mentioned effects vary with
users’ familiarity with the search topics.
25
User study
26
Participants were asked to search for
webpages to answer medical questions.
• Fixed search results:
Participants sought
100 fixed search results
• 4 search tasks:
We prepared 4 search tasks
about medical topics.
“Is cinnamon effective for diabetes?
Report your answer by Web search ”
Participant
The participants were recruited by crowdsourcing
and randomly assigned to two groups by UI
condition
lProposed group(105 participants )
– The system highlighted the weasel sentences while
viewing the webpages
– Authors manually decided which sentence should be
highlighted as weasel sentences
– Participants learnt what highlighted sentences meant
before the task starts
lControlled group(83 participants )
–Highlighting function was disabled
27
Focused behaviors
Measuring behavior on the web
–Session time
–SERP dwelling time
–Average dwelling time on one webpage
–Pageview (the number of visited webpages)
28
Questionnaire
lPre-questionnaire(6 levels)
– Degree of prior knowledge about search topic
– Expected answer
– Confidence about the answer
lPost questionnaire(6 levels)
– Answers for search task
– Confidence about the answers
29
Analysis method
lWe use Bayesian GLMM (generalized linear
mixture model) to model user behaviors
lFixed effects
– UI condition
– Topic familiarity
– Interaction between UI condition & topic familiarity
lRandom effects
– User
– Search topic
30
Session time
31
Participants with our prototype spent a
longer time in search tasks
Session time
If participants were more familiar with
search topic, there was greater difference
in session time due to UI conditions
32
Pageview
Participants with our prototype viewed
more webpages in search tasks
33
Pageview
The higher topic familiarity participants
had, the more webpages they viewed
34
Change of confidence
35
The interaction between UI and topic
familiarity affected confidence change.
If participants were not familiar with topics, our prototype
changed prior confidence more significantly.
If participants were familiar with topics, our prototype
changed prior confidence less significantly.
Discussion
lThe user study suggests that the
proposed system can enhance user
engagement in careful information
seeking on the web
– The system increased pageview and session time.
lWe need to investigate how user feel and
use highlighted sentences
– we didn’t understand why participants with our prototype
viewed more webpages and spent longer time in search
session
36
Conclusion
lDesigning a system to highlight weasel
sentences
– The classification performance of weasel sentences were
generally good performance but needs more improvement
lUser study to examine the system effect
– The proposed system can enhance user engagement in
critical information seeking on the web.
lFuture works
– Improvement of weasel classifier
– Investigating weasel sentences on the Web
37

More Related Content

What's hot

What's hot (20)

F017433947
F017433947F017433947
F017433947
 
An adaptive anomaly request detection framework based on dynamic web applica...
An adaptive anomaly request detection framework based  on dynamic web applica...An adaptive anomaly request detection framework based  on dynamic web applica...
An adaptive anomaly request detection framework based on dynamic web applica...
 
Personalized Web Search
Personalized Web SearchPersonalized Web Search
Personalized Web Search
 
Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web searchSupporting privacy protection in personalized web search
Supporting privacy protection in personalized web search
 
Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web search Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web search
 
Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web searchSupporting privacy protection in personalized web search
Supporting privacy protection in personalized web search
 
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASESDOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
 
Domain ontology development for communicable diseases
Domain ontology development for communicable diseasesDomain ontology development for communicable diseases
Domain ontology development for communicable diseases
 
E017433538
E017433538E017433538
E017433538
 
Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web searchSupporting privacy protection in personalized web search
Supporting privacy protection in personalized web search
 
supporting privacy protection in personalized web search
supporting privacy protection in personalized web searchsupporting privacy protection in personalized web search
supporting privacy protection in personalized web search
 
1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...
 
A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques A Survey: Data Leakage Detection Techniques
A Survey: Data Leakage Detection Techniques
 
Supporting privacy protection in personalized web search
Supporting privacy protection in personalized web searchSupporting privacy protection in personalized web search
Supporting privacy protection in personalized web search
 
Anonymizing and Confidential Databases for Privacy Protection Using Suppressi...
Anonymizing and Confidential Databases for Privacy Protection Using Suppressi...Anonymizing and Confidential Databases for Privacy Protection Using Suppressi...
Anonymizing and Confidential Databases for Privacy Protection Using Suppressi...
 
A Case for Expectation Informed Design
A Case for Expectation Informed DesignA Case for Expectation Informed Design
A Case for Expectation Informed Design
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
IRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer SystemIRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer System
 
A42020106
A42020106A42020106
A42020106
 

Similar to WISE2019 presentation

The Use of Query Reformulation to Predict Future User Actions
The Use of Query Reformulation to Predict Future User ActionsThe Use of Query Reformulation to Predict Future User Actions
The Use of Query Reformulation to Predict Future User Actions
Jim Jansen
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster
Luciana Jaalouk
 
SharePoint Saturday Belgium 2013 Intranet search fail
SharePoint Saturday Belgium 2013 Intranet search failSharePoint Saturday Belgium 2013 Intranet search fail
SharePoint Saturday Belgium 2013 Intranet search fail
BIWUG
 
The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian Researchers
IRDL
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
ORCID, Inc
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
GUANGYUAN PIAO
 

Similar to WISE2019 presentation (20)

Interfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent RecommendationsInterfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent Recommendations
 
Curation roles in theory and practice
Curation roles in theory and practiceCuration roles in theory and practice
Curation roles in theory and practice
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
The Use of Query Reformulation to Predict Future User Actions
The Use of Query Reformulation to Predict Future User ActionsThe Use of Query Reformulation to Predict Future User Actions
The Use of Query Reformulation to Predict Future User Actions
 
Ibrahim ramadan paper
Ibrahim ramadan paperIbrahim ramadan paper
Ibrahim ramadan paper
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster
 
Why people stop using sina weibo?
Why people stop using sina weibo?Why people stop using sina weibo?
Why people stop using sina weibo?
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
SharePoint Saturday Belgium 2013 Intranet search fail
SharePoint Saturday Belgium 2013 Intranet search failSharePoint Saturday Belgium 2013 Intranet search fail
SharePoint Saturday Belgium 2013 Intranet search fail
 
SPSBE14 Intranet Search #fail
SPSBE14 Intranet Search #failSPSBE14 Intranet Search #fail
SPSBE14 Intranet Search #fail
 
The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian Researchers
 
Eldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel PhilippsonEldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel Philippson
 
Information Access on Social Web
Information Access on Social WebInformation Access on Social Web
Information Access on Social Web
 
Personalization in the Context of Relevance-Based Visualization
Personalization in the Context of Relevance-Based VisualizationPersonalization in the Context of Relevance-Based Visualization
Personalization in the Context of Relevance-Based Visualization
 
How to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsHow to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjforms
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-contentPenguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 

More from Yusuke Yamamoto

More from Yusuke Yamamoto (20)

Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
データ解析技術2019
データ解析技術2019データ解析技術2019
データ解析技術2019
 
研究室紹介資料2019
研究室紹介資料2019研究室紹介資料2019
研究室紹介資料2019
 
不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料
 
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
 
批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発
 
東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922
 
WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708
 
情報学応用論20170622
情報学応用論20170622情報学応用論20170622
情報学応用論20170622
 
情報学総論20170623
情報学総論20170623情報学総論20170623
情報学総論20170623
 
情報学総論20170616
情報学総論20170616情報学総論20170616
情報学総論20170616
 
ビッグデータとITイノベーション
ビッグデータとITイノベーションビッグデータとITイノベーション
ビッグデータとITイノベーション
 
ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302
 
大学の研究力を考える
大学の研究力を考える大学の研究力を考える
大学の研究力を考える
 
研究力DOWNシナリオ
研究力DOWNシナリオ研究力DOWNシナリオ
研究力DOWNシナリオ
 
URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材
 
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
 

Recently uploaded

LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Atif Razi
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 

Recently uploaded (20)

WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
shape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxshape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptx
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 

WISE2019 presentation

  • 1. Highlighting Weasel Sentences for Promoting Critical Information Seeking on the Web Fumiaki Saito1 , Yoshiyuki Shoji2 ,Yusuke Yamamoto1 1:Shizuoka university, Japan 2: Aoyama Gakuin ,Japan January 19,2020 1 WISE 2019: Session S7
  • 2. Background: Web information is not always correct 2 The number of medical Web sites authorized by medical experts: < 50%* * E. Sillence et al., “Trust and Mistrust of Online Health Sites”, ACM CHI, pp.663-670, 2004
  • 3. Possible approach in information science 3 Obtaining correct information (Semi-) automatic analysis on information credibility
  • 4. Examples of credibility analysis systems *2 Y. Yamamoto and K. Tanaka. Enhancing Credibility Judgment of Web Search Results. In Proceedings of the 29th ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2011), pages 1235–1244, 2011. *1 Yin, X., Han, J., & Philip, S. Y. (2008). Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering, 20(6), 796-808. TruthFinder*1 Scores the consistency of fact describing objects CowSearch*2 Provides supporting information for credibility judgment The analysis does not guarantee the correctness of information Limitation 4
  • 5. Possible approach in information science 5 Obtaining correct information (Semi-) automatic analysis on information credibility Careful examination on information by users
  • 6. 57% Many people are not aware of Web information credibility! 82% # of young Japanese people who trust Web information *1 *1 Adobe Inc., “The State of Content : Rules of Engagement”, 2015 *2 S.Nakamura et al., “Trustworthiness analysis of Web search results (ECDL 2007) Never mind, never mind. # of people who trust information on SERP *2 I trust Google 6
  • 7. Why don’t people often take care of Web information credibility lTrust in search engine’s ranking1 lWrong metrics for quality judgment (e.g. appearance of websites2) lCognitive bias3 7 1:Kahneman, D.: Thinking, fast and slow, Macmillan (2011) 2:Pan, B., Hembrooke, H., Joachims, et al In Google We Trust: Users’ Decisions on Rank, Position, and Relevance 3:Fogg, B. J, Soohoo, Cathy ,Danielson, David R, et al How Do Users Evaluate the Credibility of Web Sites? A Study with over 2,500 Participants ♥
  • 8. Research question How can we promote critical information seeking when users do Web browsing? 8
  • 9. Proposed system 9 Automatically detects and highlights ambiguous sentences on the Web
  • 10. Proposed system 10 Automatically detects and highlights ambiguous sentences on the Web They lack in evidences but seem to have no problems in their claims Ambiguous sentences → In this study
  • 11. Proposed system to highlight ambiguous sentences 11 Makes users aware of information credibility and promotes careful information seeking !!
  • 13. How to detect ambiguous sentences?Q. 13 We focus on weasel expressionsA.
  • 14. What are weasel expressions ? 14 l“Certain person concerned is ... ” l“Research has shown...” 14 l“ It is often said ... ” l“ It is widely thought ... ” Who said that? What is the truth? Weasel Expressions create an impression that something specific and meaningful has been said, although their claim is ambiguous and lacks in evidence.
  • 15. Classification of weasel sentences We built a simple weasel sentence classifier with Wikipedia data 15 Wikipedia Ambiguous sentences Non-ambiguous ones Training/test set ML (SVM)
  • 16. Training/test dataset to classify weasel sentences (1/2) We focused on Wikipedia editting rules. Some weasel expressions are annotated with the special tags in Wikipedia. Ex: Who said? 16 It is said that Company A has said this hoax. [By whom?]
  • 17. Weasel expressions in Wikipedia l“The hoax is said to have been company A, a media and advertising company that wanted to show off their influence.[by whom?] ” l“ ~ is an established theory, but there are some objections. [who?]” 17 Image reference https://en.wikipedia.org/wiki/Wikipedia_logo
  • 18. Training/test dataset to classify weasel sentences (2/2) Positive examples (2236 senteneces) Sentences annotated with [who?][by whom?] on Wikipedia 18 Negative examples(2236 senteneces) Sentences without [who?][by whom?] tags in Wikipedia articles where weasel expressions appear.
  • 19. Features for classification lBag-of-Words :nouns, verbs, and adjectives lTypical example of ambiguous expression listed in Wikipedia (27 phrases) – "It is said that ..." – "An expert argues that ..." and so on 19
  • 20. Classifier performance Performance evaluation with 5-fold cross-validation 20 Accuracy Precision Recall F1 SVM(RBF) 0.764 0.772 0.748 0.760
  • 21. Improvements for better classification (1/2) 21 We need to get more data from editing histories on Wikipedia Frequency of weasel annotation varies depending on topics on Wikipedia Topic categories of positive examples may be biased Improvements in dataset
  • 22. Improvements for better classification (2/2) 22 Improvements in classification features Because bag-of-words is simple as a feature, features that consider surrounding sentences and word order are required.
  • 23. 23 Effect of our prototype on user behaviors2.
  • 24. How does our prototype affect users behaviors? 24 ? 24 Can our prototype encourage users to browse webpages more carefully? If so, how do user behaviors change? Q.
  • 25. Hypothesis lH1 The proposed system extends the time expended in web information seeking. lH2 The proposed system increases the number of visited webpages. lH3 The proposed system improves the users’ confidence in their decisions made through web information seeking. lH4 The above-mentioned effects vary with users’ familiarity with the search topics. 25
  • 26. User study 26 Participants were asked to search for webpages to answer medical questions. • Fixed search results: Participants sought 100 fixed search results • 4 search tasks: We prepared 4 search tasks about medical topics. “Is cinnamon effective for diabetes? Report your answer by Web search ”
  • 27. Participant The participants were recruited by crowdsourcing and randomly assigned to two groups by UI condition lProposed group(105 participants ) – The system highlighted the weasel sentences while viewing the webpages – Authors manually decided which sentence should be highlighted as weasel sentences – Participants learnt what highlighted sentences meant before the task starts lControlled group(83 participants ) –Highlighting function was disabled 27
  • 28. Focused behaviors Measuring behavior on the web –Session time –SERP dwelling time –Average dwelling time on one webpage –Pageview (the number of visited webpages) 28
  • 29. Questionnaire lPre-questionnaire(6 levels) – Degree of prior knowledge about search topic – Expected answer – Confidence about the answer lPost questionnaire(6 levels) – Answers for search task – Confidence about the answers 29
  • 30. Analysis method lWe use Bayesian GLMM (generalized linear mixture model) to model user behaviors lFixed effects – UI condition – Topic familiarity – Interaction between UI condition & topic familiarity lRandom effects – User – Search topic 30
  • 31. Session time 31 Participants with our prototype spent a longer time in search tasks
  • 32. Session time If participants were more familiar with search topic, there was greater difference in session time due to UI conditions 32
  • 33. Pageview Participants with our prototype viewed more webpages in search tasks 33
  • 34. Pageview The higher topic familiarity participants had, the more webpages they viewed 34
  • 35. Change of confidence 35 The interaction between UI and topic familiarity affected confidence change. If participants were not familiar with topics, our prototype changed prior confidence more significantly. If participants were familiar with topics, our prototype changed prior confidence less significantly.
  • 36. Discussion lThe user study suggests that the proposed system can enhance user engagement in careful information seeking on the web – The system increased pageview and session time. lWe need to investigate how user feel and use highlighted sentences – we didn’t understand why participants with our prototype viewed more webpages and spent longer time in search session 36
  • 37. Conclusion lDesigning a system to highlight weasel sentences – The classification performance of weasel sentences were generally good performance but needs more improvement lUser study to examine the system effect – The proposed system can enhance user engagement in critical information seeking on the web. lFuture works – Improvement of weasel classifier – Investigating weasel sentences on the Web 37