Wikipedia の情報信頼性検証技術

•Download as KEY, PDF•

1 like•793 views

The document appears to be a presentation in Japanese about Wikipedia and its usage statistics. Some key points discussed include: 1. Age demographics of Wikipedia users, with the highest percentage being ages 25-34. 2. A graph showing usage trends of Wikipedia and blogs over time, with Wikipedia usage steadily increasing. 3. Metrics on the number of Wikipedia editors, articles, data storage usage, and pages viewed per second. 4. Methods for measuring credibility of Wikipedia editors, such as reliability degree based on edits over time. The presentation provides an overview and analysis of Wikipedia usage and editor statistics.

Technology

1. (5 )

? ?

2. (20 )

?

?

3. (5 )

4. (5 )
2

Wikipedia(blog) ?
100%

Wikipedia blog
80%

60%

40%

20%

0%
-18 18-24 25-34 35-44 45-54 55-64 65-74
Oxford university - SPIRE Project
Results and analysis of Web2.0 services survey
http://spire.conted.ox.ac.uk/

4

Wikipedia(blog) ?
18
100%

Wikipedia blog
80%

Wiki
60%
65
40%
pe dia
20%

0%
-18 18-24 25-34 35-44 45-54 55-64 65-74
Oxford university - SPIRE Project
Results and analysis of Web2.0 services survey
http://spire.conted.ox.ac.uk/

4

?

8%
56%
8%
20%

Wikipedia

28%

36%
?

5

Wikipedia
7000.00
?
6000.00

5000.00
80%
4000.00

3000.00

2000.00

1000.00

0
1

6

55%
4.
1.
2.
3.
4.
A = 70%
B = 40%
:A
:B 3.
1. 2.
12

•
30
• Wikipedia 60
720
• 400 GByte
43,200
• Wikipedian : 70 2,592,0
00
• :1 120,000 /

14

10 (x 1000 editors)

• 80% 20% 9

(Ziph ) 8

7

Number of Editors
• 20 %
6

5
Uncredible editors Credible editors
4

• 3

2

• 1

0

-10 -8 -6 -4 -2 0 2 4 6 8 10

•
(degrees)
Reliability Degree

15

( )
• 1:

•

• 2: ( )
•

• 3: 1+2 (TF-IDF)
•

16

• Wikiped
ia
• Wikipedia

• 85,028 ( 13.6%) , 705,713 (Bot
)

•

•“ ” “ ” 98

18

( 1)
(
+ 2)
1,100,000
( 3)
(ms)

990,000
880,000
770,000
660,000
550,000
440,000
330,000
220,000
110,000
0
10 20 30 40 50 60 70 80 90 100

(%)

20

The document summarizes a research paper presented at WikiSym 2012 on calculating the quality of Wikipedia articles. The researchers propose a method to: 1) Identify the editors of each article. 2) Analyze the edit history of each editor to calculate their quality value (QV). 3) Use the editors' QVs to calculate the QV of text within the articles. 4) Iteratively calculate editors' and texts' QVs until they converge to obtain the final article QV. The method improves precision in identifying high quality articles compared to not considering editors' QVs. It addresses the "chicken-and-egg" problem of text QV depending on editor QV

2010年7月合同研究会

Yu Suzuki

This document discusses quality assessment of information and research. It includes sections on self introduction, research topics including credibility of editors and reliability degree, graphs showing the number of credible and incredible editors over reliability degree and analysis of credibility over time. It also includes pie charts showing percentages of credibility. The overall document appears to be presenting research on methods for assessing the credibility and reliability of information and its sources.

Wikimedia Conference Japan 2013 情報の信頼度測定

Yu Suzuki

Wikipedia ws

Yu Suzuki

1. The document contains graphs and tables about Wikipedia data such as the number of editors over time, article quality metrics, and calculation times for important editor identification methods. 2. It analyzes the impact of reducing the number of editors on metrics like description amount, number of articles, and calculation time. 3. Methods that consider both description amount and number of articles showed higher correlation and lower increased rank than methods using a single metric.

The document discusses using Twitter to build an audience. It notes that social networking is important for businesses and professionals. Twitter allows you to follow people and build a network while sharing updates limited to 140 characters. While getting started can be intimidating, Twitter provides opportunities for business development, networking and staying connected. The summary highlights the key purpose and features of Twitter discussed in the document.

Hashcaster business overview

Hashcaster

The document provides an overview of Hashcaster, a company that helps event organizers manage and measure their events on Twitter. Hashcaster curates the Twitter conversation around an event's hashtag, applies analytics, and filters content. This brings audiences together online and at live events. It allows organizers to capture user-generated content, expand event reach through social media, and quantitatively measure the event. Hashcaster also provides tools for moderation, notifications, live updates, real-time advertising, and community building after the event.

Wikimedia Conference 2009 presentation

Yu Suzuki

1. The document discusses Wikipedia and analyzes data related to editor contributions and page views over time. 2. It finds that most edits on Wikipedia are made by a small group of elite editors, with 20% of editors contributing 80% of edits, following a Zipfian distribution. 3. The analysis also examines how reducing the number of editors impacts the amount of content on Wikipedia pages and determines that credible editors who make high quality contributions can help offset reductions in editor numbers.

Egyptian elections presidential debate may 2012

SocialEyez

The document contains charts and graphs showing data on television viewership percentages in Egypt for different news channels, as well as Twitter hashtag popularity for Egyptian political figures. It also includes word clouds and charts about opinions on Egyptian presidential candidates. The final pages are pie charts displaying survey responses on attitudes towards freedom of expression and political participation in Egypt.

Mobile summit 2 16-13 (3b)

popeyesm

The document contains analytics on visits to a news and entertainment website from desktop and mobile devices. It shows that: 1) Mobile visits now exceed those from desktop, accounting for 57% of total visits in 2022. 2) Smartphones drive the most mobile traffic, accounting for over half of daily visits. 3) Users on average spend more time on mobile apps compared to desktop or mobile browser sessions.

Gusa20101023

Katsu Kuwano

The document summarizes statistics from Gilt Groupe, a members-only flash sale site, including: 1) In 2010, Gilt Groupe had over 20,000 members and sold 19,415 products, with 92% of available products sold. 2) A chart shows Gilt Groupe's membership growth from 2010 to present, reaching over 1 million members currently. 3) The document asks what challenges Gilt Groupe may face going forward as the company continues to grow rapidly.

banthai

monsompuwach

1. The document contains technical specifications and pricing for various building materials including shutters, panels, and coatings. It includes dimensions, weights, prices and website URLs. 2. The specifications are organized in tables with columns for description, size, weight, price and other technical details. Website URLs are listed at the bottom of most pages. 3. The document appears to be a catalog or price list from Thai building material suppliers including information needed for material selection and ordering.

Distributing Video To The Masses

Richard Harrington

The cost of creating video for the Web has plummeted, but it is still one of the most expensive elements of many Web site or Web 2.0 initiatives. Publishers want results—and it’s up to you to get them. In today’s world, your video needs to be in several places simultaneously, with great hooks bringing users back to your Web site. In this session you’ll learn how to become a hyper-syndicator, publishing your video to devices including cell phones, laptops, and televisions. Video publishing may start with an embed code, but so much more is possible—and this session will show you how to take advantage of the best opportunities available.

"Make problems visible and users happy" by Catherine Chabiron

Operae Partners

The document discusses implementing work standards to improve accuracy and availability for users. It proposes defining work standards as a sequence of tasks to efficiently manage recurrent IT tasks. This would help make problems visible by identifying deviations from the standards and allow for continuous improvement. The document suggests starting with creating work standards for critical recurrent IT systems using a Failure Mode and Effects Analysis to establish best practices.

European initiatives

Edward Baker

This document discusses citizen science projects focused on biodiversity in Europe. It provides background on citizen science, describing it as scientific research conducted in whole or in part by amateur or nonprofessional scientists. It then highlights several examples of biodiversity monitoring projects in Europe, including waarnemingen.be, waarneming.nl, observado.org, artportalen.se, and naturgucker.de. These projects engage citizens to record and report observations of taxa in nature. The document compares the scale of these projects in terms of numbers of observations, users, and other collected data.

Top Pages Q4 business.co.uk

Xma Nottingham

Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)

Can Yuerekli

The document discusses how social media was originally fun but has become more focused on measurement as brands and marketers have increasingly used social platforms. It notes that while this focus on measurement has been important for optimizing brand strategies and experiences, it may have reduced some of the fun, casual nature of early social media. However, the document argues that social data and metrics do not need to just be used by marketers, but can help all users by improving friendships, building their online presence, understanding their persona, discovering new trends, and gamifying their social experiences. It encourages attendees to think about how to put social data to use for themselves, not just for brands.

Social Media Measurement for Consumers (aka #FunData): SXSWi 2013

Adam Schoenfeld

SXSWi 2013 Talk on Using Social Media Data as a Consumer. Social Media Was Fun. Has Measurement Killed It? Our answer: NO! Description: Our culture is becoming more and more obsessed with data each day. Why? It's fun. Many people argue that measurement has killed social media – it’s let the marketers take over! But, the same data marketers use can be used by individuals. And it doesn't require a data science degree to enjoy. The data is out there. It's accessible, and you can consume it, visualize it, and act on it. Panel page: http://schedule.sxsw.com/2013/events/event_IAP5410 Rate it here: http://sxsw.tv/d51

Moral Panics over the Internet

Oxford Martin Centre, OII, and Computer Science at the University of Oxford

The History of App Store

Seungyul Kim

The document discusses the history and growth of app stores from 2009 to 2015. It notes that in the early years, carriers had significant control over mobile apps and stores. The Apple App Store launched in 2008 and grew rapidly, reaching over 5,000 downloads by 2010. Revenue from mobile app stores increased dramatically in this period as well, reaching over $35 billion globally by 2015. The document also examines trends in free versus paid apps, monetization strategies like in-app purchases, and how the business model evolved over this period.

ソーシャルメディアのビジネス活用最前線Ver1.1.0

Toru Saito

The document contains data from surveys on social media usage: - People ages 18-34 spend the most time on social media, with over 83% using sites like Facebook for 30+ minutes daily. Usage declines with age. - Twitter grew significantly between 2009-2010, from 315 million users to over 700 million, showing the rise of mobile social networks. - When shopping, over 50% of people are influenced most by online product reviews, friends, and family recommendations rather than advertising or blogs. Social media is becoming a major influence on purchasing decisions.

Lean principles and practices

Jelle Bens

The document discusses the origins and principles of lean development. It originated from the Toyota Production System and includes concepts like jidoka, just-in-time, and the Deming cycle. The seven principles of lean are to eliminate waste, build quality in, create knowledge, defer commitment, deliver fast, respect people, and optimize the whole. Various types of waste like overproduction, transportation, and defects are also described. Additional Japanese concepts discussed are mura (stress) and muri (overburdening people or systems).

China Online Retail Market - iResearch - Will Tao

iResearch

Web技術の現状と将来 (Open Source Conference 2011 Nagoya)

Rikkyo University

web_2.0_the_end_again

gzioni

Web 2.0 represents the transition from static HTML websites to dynamic websites that emphasize user-generated content, usability, and interoperability. Some key characteristics of Web 2.0 include user-generated content through wikis and blogs, ease of use through intuitive interfaces, and interconnectivity through web APIs and services. Examples of popular Web 2.0 applications include social networking sites like Facebook and media sharing sites like YouTube that allow users to connect, collaborate, and share content online. While Web 2.0 ushered in new opportunities, it also introduced challenges around ensuring quality, managing spam and security issues that come with more open participation online.

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Similar to Wikipedia の情報信頼性検証技術

Tweet!tweet!

Richard Harrington

Hashcaster business overview

Hashcaster

Wikimedia Conference 2009 presentation

Yu Suzuki

Egyptian elections presidential debate may 2012

SocialEyez

Mobile summit 2 16-13 (3b)

popeyesm

Gusa20101023

Katsu Kuwano

banthai

monsompuwach

Distributing Video To The Masses

Richard Harrington

"Make problems visible and users happy" by Catherine Chabiron

Operae Partners

European initiatives

Edward Baker

Top Pages Q4 business.co.uk

Xma Nottingham

Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)

Can Yuerekli

Social Media Measurement for Consumers (aka #FunData): SXSWi 2013

Adam Schoenfeld

Moral Panics over the Internet

Oxford Martin Centre, OII, and Computer Science at the University of Oxford

The History of App Store

Seungyul Kim

ソーシャルメディアのビジネス活用最前線Ver1.1.0

Toru Saito

Lean principles and practices

Jelle Bens

China Online Retail Market - iResearch - Will Tao

iResearch

Web技術の現状と将来 (Open Source Conference 2011 Nagoya)

Rikkyo University

web_2.0_the_end_again

gzioni

Similar to Wikipedia の情報信頼性検証技術 (20)

Tweet!tweet!

Hashcaster business overview

Wikimedia Conference 2009 presentation

Egyptian elections presidential debate may 2012

Mobile summit 2 16-13 (3b)

Gusa20101023

banthai

Distributing Video To The Masses

"Make problems visible and users happy" by Catherine Chabiron

European initiatives

Top Pages Q4 business.co.uk

Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)

Social Media Measurement for Consumers (aka #FunData): SXSWi 2013

Moral Panics over the Internet

The History of App Store

ソーシャルメディアのビジネス活用最前線Ver1.1.0

Lean principles and practices

China Online Retail Market - iResearch - Will Tao

Web技術の現状と将来 (Open Source Conference 2011 Nagoya)

web_2.0_the_end_again

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

HCL Notes and Domino License Cost Reduction in the World of DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/ The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this! We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model. Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward. These topics will be covered - Reducing license cost by finding and fixing misconfigurations and superfluous accounts - How do CCB and CCX licenses really work? - Understanding the DLAU tool and how to best utilize it - Tips for common problem areas, like team mailboxes, functional/test users, etc - Practical examples and best practices to implement right away

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

GenAI Pilot Implementation in the organizations

kumardaparthi1024

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Serial Arm Control in Real Time Presentation

tolgahangng

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/ Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit. In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing. van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.

Mariano G Tinti - Decoding SpaceX

Mariano Tinti

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany

Video Streaming: Then, Now, and in the Future

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

UiPath Test Automation using UiPath Test Suite series, part 5

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

HCL Notes and Domino License Cost Reduction in the World of DLAU

Communications Mining Series - Zero to Hero - Session 1

GenAI Pilot Implementation in the organizations

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Essentials of Automations: The Art of Triggers and Actions in FME

Serial Arm Control in Real Time Presentation

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

Climate Impact of Software Testing at Nordic Testing Days

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Removing Uninteresting Bytes in Software Fuzzing

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Mariano G Tinti - Decoding SpaceX

Wikipedia の情報信頼性検証技術

1. (?) Wikipedia Sep. 8 Shirahama, 1

2. 1. (5 ) ? ? 2. (20 ) ? ? 3. (5 ) 4. (5 ) 2

3. ? 3

4. Wikipedia(blog) ? 100% Wikipedia blog 80% 60% 40% 20% 0% -18 18-24 25-34 35-44 45-54 55-64 65-74 Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 4

5. Wikipedia(blog) ? 18 100% Wikipedia blog 80% Wiki 60% 65 40% pe dia 20% 0% -18 18-24 25-34 35-44 45-54 55-64 65-74 Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 4

6. ? 56% ? 5

7. ? 8% 56% 8% 20% Wikipedia 28% 36% ? 5

8. Wikipedia 7000.00 ? 6000.00 5000.00 80% 4000.00 3000.00 2000.00 1000.00 0 1 6

9. • • : → • : • : • • 7

10. Wikipedia ? 8

11. Credibility Degree 0.4 9

12. ? ? ? ? ? ?

13. ? ? ? ? ?

14. ? ? ? ?

15. ? ? ?

16. • ? • • YouTube • ? • • ? • 11

17. 55% 4. 1. 2. 3. 4. A = 70% B = 40% :A :B 3. 1. 2. 12

18. • A • B • C 13

19. • 30 • Wikipedia 60 720 • 400 GByte 43,200 • Wikipedian : 70 2,592,0 00 • :1 120,000 / 14

20. 10 (x 1000 editors) • 80% 20% 9 (Ziph ) 8 7 Number of Editors • 20 % 6 5 Uncredible editors Credible editors 4 • 3 2 • 1 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 • (degrees) Reliability Degree 15

21. ( ) • 1: • • 2: ( ) • • 3: 1+2 (TF-IDF) • 16

22. ? ? 17

23. • Wikiped ia • Wikipedia • 85,028 ( 13.6%) , 705,713 (Bot ) • •“ ” “ ” 98 18

24. • • • : • • ? • ? 19

25. ( 1) ( + 2) 1,100,000 ( 3) (ms) 990,000 880,000 770,000 660,000 550,000 440,000 330,000 220,000 110,000 0 10 20 30 40 50 60 70 80 90 100 (%) 20

26. 5.8 6 5 3 2 0 + 1 2 3 21 40%

27. • • 40% • • 0.02 • • • 22

28. ? 23

29. ? • ? ? ? 24

30. • • ... A B D C 25

31. • • ... A B D C 25

32. • • ... A B D C 25

33. • • ... A B D C 25

34. • • ... A B D C Group A 25

35. • • ... A B D C Group A Group B 25

36. • • • • • Web SNS • 26

37. Thank you! ! ! ขอบคุณ ครับ ! ! 27

Editor's Notes

I appreciate the opportunity to give this presentation. I am Yu Suzuki, in the information technology center at Nagoya University. Title of today&#x2019;s presentation is credibility assessment of wikipedia articles using edit history. The purpose of this presentation is how to calculate credibility degrees to Wikipedia articles.
Today, I would like to to talk about how to calculate credibility, one side of quality, for Wikipedia articles. First, I should mention what is credibility values, and why I should calculate? Next, I talk our proposed system. When I talk, first, I mention how to calculate credibility degrees. This method is time consuming, therefore I talk how to speed up credibility calculation. Finally, I talk about experimental evaluation, conclusion, and future work.
First, I talk about the motivation of my study. In this part, I talk what is the quality of articles, why the quality of article is important, and how the quality is useful for users.
I show the data about age of users and percentage usage of services. This questionnaire is done by SPIRE Project by Oxford University. Red bar shows Wikipedia, and Blue bar shows blogs. From this graph, less than 18 years old and more than 65 years old users use Wikipedia frequently than the other Web services. These users may not have enough knowledge, then if there is a wrong story in Wikipedia, these users will believe. This is a problem.
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
This graph showing the relationship between credibility degrees and number of articles. This credibility is calculated by our proposed system which I will talk later. From this graph, if our system calculates accurate credibility values, about 80% of all articles are not credible. This means that almost all users trust Wikipedia, whereas almost all articles are not credible. So I think credibility values is important for many users to prevent believing wrong articles.
The objectives of this study is to calculate credibility degrees automatically, speedy, and accurately. This credibility degree is useful for readers, editors, and administrators. Readers may believe which articles are credible or not. Editors can decide which articles need to be edited. And Administrators can decide which articles are not appropriate for Wikipedia for keeping the quality of articles. This study is a state-of-the-art. The goal of this study is to calculate the quality of articles, but in this presentation, I calculate credibility, a side of quality.
Next, I talk our proposed system. In this part, I talk how to calculate credibility values of Wikipedia articles.
This is the output of our proposed system. In our system, original Wikipedia article is overlaid with three kinds of color lines. Blue line shows credible part, red line shows not credible part, and yellow line shows unknown part. Left-upper part shows overall credibility degrees, and blue, red, and yellow bar show ratio of credible, not credible, and unknown parts.
To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader&#x2019;s decision is used such as voting, personalization. In our system, I select editor&#x2019;s reputation, because I think this method is fair. Next, I measure editor&#x2019;s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader&#x2019;s decision is used such as voting, personalization. In our system, I select editor&#x2019;s reputation, because I think this method is fair. Next, I measure editor&#x2019;s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader&#x2019;s decision is used such as voting, personalization. In our system, I select editor&#x2019;s reputation, because I think this method is fair. Next, I measure editor&#x2019;s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
I talk again about a plan to measure credibility. I used reputation-based approach because user&#x2019; voting is not always true. In You Tube, almost all votes are highest scores. I used editor&#x2019;s credibility because we assume that same editor writes same quality of articles. I used edit history because this method seems simple, and our proposed system should language independent.
This is a overview of our proposed system. First, when I analyze an article, I identify editors. In this example, I identify editor A and B from edit history. Next, I get edit history of the editors for the other articles. Then I analyze this edit history, and calculate editor&#x2019;s credibility values. I calculate credibility value of A is 70% and B is 40%. Finally, for combining these two editor&#x2019;s credibility values, I calculate article credibility values. In this case, this article&#x2019;s credibility degree is 55%.
The key idea is the remain ratio. This means if a part of articles are credible, the part is not deleted by the other editors. If a part of articles are not credible, the part is soon deleted or replaced. I give the situation that Editor A writes this part, and editor B adds this part, and editor C delete editor A&#x2019;s part and replace this part. In this case, Editor B remain Editor A&#x2019;s part, Editor B decide Editor A&#x2019;s part is credible. Editor C remain Editor B&#x2019;s part, Editor C decide Editor B&#x2019;s part is credible. However, Editor C delete Editor A&#x2019;s part, Editor C decide Editor A&#x2019;s part is not credible.
However, this method is time consuming, because I should analyze all editors&#x2019; remain ratio. However, number of articles in Wikipedia is more than six hundred thousand pages, more than four hundred giga bytes. Number of active Wikipedian is more than seven hundred thousand. Number of edits per person is at most one hundred and twenty thousand per month. These number shows calculation cost of this method is too large. So I think reduction of calculation time is important.
To reduce calculation cost, I use a method to specify key person. This graph shows an assumption of credibility degree and the number of editors. From this graph, a number of credible and not credible editors are small. From Zipf&#x2019;s law, 20% of all editors contribute 80% of articles. Therefore, if &#x201C; can identify 20% key persons, I will reduce calculation time, and I also improve the accuracy of credibility, because not key persons are seemed to be noise.
I propose three methods to identify key person. Method 1 is called number of words. In this method, if editors write many words, the editors are key persons. Method 2 is called number of articles. In this method, if editors write many articles, the editors are key persons. Method 3 is a combination of Method 1 and 2. In this method, if editors write many words to small number of articles, the editors are key persons. These methods come from the idea of information retrieval research field. Method 1 comes from term frequency, method 2 comes from document frequency, and method 3 comes from TF-IDF.
Next, I show the experimental evaluation.
I used Japanese Wikipedia edit history data from Wikipedia site. I used of 85 thousands and 28 articles, about 13.6% of all all articles. These articles are written by 705 thousands and 713 editors except bot. I used credible articles as featured articles and good articles selected by Wikipedians. In this experiment, I used Japanese Wikipedia, but I can use any language of Wikipedia. However, English version of Wikipedia edit history is not available now. So I cannot use English version of Wikipedia.
In this evaluation experiment, I use two two metrics, such as calculation time and precision. I do not use precision and recall ratio which are generally used for Information Retrieval research field, because these ratio is too small, I can&#x2019;t compare these values. Next, I discuss which articles are decided as credible articles, and is ignoring editors with small contributes effective for better accuracy?
This graph shows decreasing ratio of editors and calculation time. This graph shows direct proportion between decreasing ratio and calculation time, then if I set decreasing ratio to 40%, calculation ratio is about 40%. Therefore, I show if I reduce editors, I can reduce calculation time.
This graph shows averasing increased rank of featured and good articles which are ordered by credibility values using method 1, 2, 3, and original. From this graph, method 3 improve about 5.8 ranks. Therefore, I can improve accuracy. This is because, many small contribute persons are ignored in this method 3.
We can reduce calculation cost to 40%. However, averaging precision ratio is about 0.02, which is too small, inaccurate. This is because, I cannot cove all types of articles. Credibility degrees of several types of articles are high which are unexpected. For example, long articles, only adding articles such title list of TV shows, anime programs, and so on.
In this study, I want to measure information quality, which is intuitive for human sense. I think information quality should be calculated using many factors. I already mention, who evaluate, what quality we measure, how to evaluate, is three key to consider factors. I will try several method based on these three factors. In next slide, I talk one example of future work.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
I consider several problems, such as content analysis techniques. In this method, I estimate terms which appear frequently in credible articles, but do not appear in not credible articles. Next I use multiple language articles. I think english Wikipedia is the richest, therefore if an article in japanese is similar to that in English, the article is credible or rich. I want to adopt my system to Web documents and SNS, but there is no edit history for Web documents. So I should discover how to calculate quality without edit history.
Thank you!

Wikipedia の情報信頼性検証技術

Recommended

Recommended

More Related Content

Similar to Wikipedia の情報信頼性検証技術

Similar to Wikipedia の情報信頼性検証技術 (20)

Recently uploaded

Recently uploaded (20)

Wikipedia の情報信頼性検証技術

Editor's Notes