The document summarizes a research paper presented at WikiSym 2012 on calculating the quality of Wikipedia articles. The researchers propose a method to:
1) Identify the editors of each article.
2) Analyze the edit history of each editor to calculate their quality value (QV).
3) Use the editors' QVs to calculate the QV of text within the articles.
4) Iteratively calculate editors' and texts' QVs until they converge to obtain the final article QV.
The method improves precision in identifying high quality articles compared to not considering editors' QVs. It addresses the "chicken-and-egg" problem of text QV depending on editor QV
A bird’s-eye view of academic library ebooks, outlining how different considerations can affect the decisions that libraries make regarding this format. Presented at GaCOMO12 by Sofia Slutskaya and Tessa Minchew.
Presentation given at the 24th annual COMO 2012 Conference in Macon, GA by Sofia Slutskaya and Tessa Minchew. A bird’s-eye view of academic library ebooks, outlining how different considerations can affect the decisions that libraries make regarding this format.
Measuring Impact: Towards a data citation metricEdward Baker
How the ViBRANT and eMonocot projects are building tools, including a modified implementation of Bourne and Fink's 'Scholar Factor', the Biodiversity Data Journal, and Scratchpad's user metrics and statistics modules.
As informações digitais atuais estão sendo interligadas, cruzadas e indexadas de múltiplas maneiras por meio da Web global para que os artigos tenham maior visibilidade e alcance. A recente proliferação das API’s (interfaces de aplicativos) e redes sociais têm facilitado e aumentado ainda mais as possibilidades para os artigos. O foco desta apresentação são os principais produtos e serviços que ampliam o alcance e impacto dos artigos na Web.
Today’s digital information is being linked, cross-linked and indexed in a multitude of ways across the global Web to give articles greater visibility and reach. The recent proliferation of API’s (application interfaces) and social networks have further facilitated and expanded the possibilities for articles. This presentation will focus on the main products and services that extend the reach and impact of articles on the Web.
La información digital hoy día está siendo vinculada (linked), reticulada e indexada en una multitud de formas a través de la Web mundial para dar una mayor visibilidad y acceso a los artículos. La reciente proliferación de APIs (interfaces de aplicaciones) y las redes sociales han facilitado y ampliado aún más las posibilidades de los artículos. Esta presentación se centrará en los principales productos y servicios que amplían el alcance y el impacto de los artículos en la Web.
Presentation by panelist Matthew Cockerill, BioMed Central, for OASPA hosted webinar: A Q &A with five publishers working with Open Access on 20 October 2009. www.oaspa
Sense About Science held a workshop on peer review in collaboration with the Research Information Network, Vitae, Elsevier and the Voice of Young Science.
This afternoon event was held at the University of Sussex, Brighton on 5 March 2010 and was free and for early career researchers in all sciences, engineering and medicine (PhD students, post-docs or equivalent in first job).
The workshop discussed the process of peer review in journal publishing and explored the criticisms of the peer review process. What does peer review do for science? Does it detect fraud and misconduct? Will it illuminate good ideas or shut them down?
The RIN’s Liason and Partnerships Officer, Branwen Hide, spoke at the event on ‘The changing scholarly communications landscape: What does this mean for peer review?’
For more information on the programme, visit http://www.rin.ac.uk/news/events/research-publishing-it-reviewing-it-and-talking-about-it-publicly
A look at how the GeoWeb Community is embracing the world of Web 2.0 and social media. Ideas on how to grow and develop the community in the future. Presented at the 2009 ESRI Developer Summit in Palm Springs, CA
EDN Team
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
Presentation on publishing scientific research and how to write high-impact scientific papers by Dr. John Uhlrich as the Editor-in-Chief of Energy Technology published by Wiley-VCH, given on 25 October 2018 at the National Energy Technology Laboratory, USA.
Event:
Digital Curation Institute Symposium
November 22, 2011
4:30-6:30pm
iSchool, University Of Toronto
Abstract:
This presentation reports select findings from two descriptive studies of blogs and bloggers in the areas of history, economics, law, biology, chemistry and physics. The first study focused on scholar bloggersʼ preferences for digital preservation, as well as their publishing behaviors and blog characteristics that influence preservation action. Findings are drawn from 153 questionnaires, 24 interviews, and content analysis of 93 blogs. Briefly, questionnaire respondents are generally interested in blog preservation with a strong sense of personal responsibility. Most feel their blogs should be preserved for both personal and public access and use into the indefinite, rather than short-term, future. Over half of questionnaire respondents report saving their blog content, in whole or in part, and many interviewees expressed a sophisticated understanding of issues of digital preservation. However, the findings also indicate that bloggers exhibit behaviors and preferences complicating preservation action, including issues related to rights and use, co-producer dependencies, and content integrity.
The second study, currently on-going, looks toward the public availability of scholar blogs over-time, with findings drawn from a sample of 644 blogs. Content analysis is currently underway on inactive blogs, characterized as available, but with no new posts published within three months of coding. Initial analysis of the most recent post published to these inactive blogs shows that some bloggers did provide indicators of their respective blogʼs declining activity or, in some cases, blog stoppage. However, such indicators are only present in a clear minority of publicly available, yet inactive blogs. These preliminary findings offer implications for both personal and programmatic preservation approaches, including, notably, issues related to selection and appraisal.
This ppt will provide the support to finding the indexing of publication and also will help to manage your research profile among world research forums.
A lecture on evaluating AR interfaces, from the graduate course on Augmented Reality, taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury.
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
Presentation on publishing scientific research and how to write high-impact scientific papers by Dr. John Uhlrich as the Editor-in-Chief of Energy Technology published by Wiley-VCH, given on 25 October 2018 at the University of Pittsburgh, USA.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A bird’s-eye view of academic library ebooks, outlining how different considerations can affect the decisions that libraries make regarding this format. Presented at GaCOMO12 by Sofia Slutskaya and Tessa Minchew.
Presentation given at the 24th annual COMO 2012 Conference in Macon, GA by Sofia Slutskaya and Tessa Minchew. A bird’s-eye view of academic library ebooks, outlining how different considerations can affect the decisions that libraries make regarding this format.
Measuring Impact: Towards a data citation metricEdward Baker
How the ViBRANT and eMonocot projects are building tools, including a modified implementation of Bourne and Fink's 'Scholar Factor', the Biodiversity Data Journal, and Scratchpad's user metrics and statistics modules.
As informações digitais atuais estão sendo interligadas, cruzadas e indexadas de múltiplas maneiras por meio da Web global para que os artigos tenham maior visibilidade e alcance. A recente proliferação das API’s (interfaces de aplicativos) e redes sociais têm facilitado e aumentado ainda mais as possibilidades para os artigos. O foco desta apresentação são os principais produtos e serviços que ampliam o alcance e impacto dos artigos na Web.
Today’s digital information is being linked, cross-linked and indexed in a multitude of ways across the global Web to give articles greater visibility and reach. The recent proliferation of API’s (application interfaces) and social networks have further facilitated and expanded the possibilities for articles. This presentation will focus on the main products and services that extend the reach and impact of articles on the Web.
La información digital hoy día está siendo vinculada (linked), reticulada e indexada en una multitud de formas a través de la Web mundial para dar una mayor visibilidad y acceso a los artículos. La reciente proliferación de APIs (interfaces de aplicaciones) y las redes sociales han facilitado y ampliado aún más las posibilidades de los artículos. Esta presentación se centrará en los principales productos y servicios que amplían el alcance y el impacto de los artículos en la Web.
Presentation by panelist Matthew Cockerill, BioMed Central, for OASPA hosted webinar: A Q &A with five publishers working with Open Access on 20 October 2009. www.oaspa
Sense About Science held a workshop on peer review in collaboration with the Research Information Network, Vitae, Elsevier and the Voice of Young Science.
This afternoon event was held at the University of Sussex, Brighton on 5 March 2010 and was free and for early career researchers in all sciences, engineering and medicine (PhD students, post-docs or equivalent in first job).
The workshop discussed the process of peer review in journal publishing and explored the criticisms of the peer review process. What does peer review do for science? Does it detect fraud and misconduct? Will it illuminate good ideas or shut them down?
The RIN’s Liason and Partnerships Officer, Branwen Hide, spoke at the event on ‘The changing scholarly communications landscape: What does this mean for peer review?’
For more information on the programme, visit http://www.rin.ac.uk/news/events/research-publishing-it-reviewing-it-and-talking-about-it-publicly
A look at how the GeoWeb Community is embracing the world of Web 2.0 and social media. Ideas on how to grow and develop the community in the future. Presented at the 2009 ESRI Developer Summit in Palm Springs, CA
EDN Team
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
Presentation on publishing scientific research and how to write high-impact scientific papers by Dr. John Uhlrich as the Editor-in-Chief of Energy Technology published by Wiley-VCH, given on 25 October 2018 at the National Energy Technology Laboratory, USA.
Event:
Digital Curation Institute Symposium
November 22, 2011
4:30-6:30pm
iSchool, University Of Toronto
Abstract:
This presentation reports select findings from two descriptive studies of blogs and bloggers in the areas of history, economics, law, biology, chemistry and physics. The first study focused on scholar bloggersʼ preferences for digital preservation, as well as their publishing behaviors and blog characteristics that influence preservation action. Findings are drawn from 153 questionnaires, 24 interviews, and content analysis of 93 blogs. Briefly, questionnaire respondents are generally interested in blog preservation with a strong sense of personal responsibility. Most feel their blogs should be preserved for both personal and public access and use into the indefinite, rather than short-term, future. Over half of questionnaire respondents report saving their blog content, in whole or in part, and many interviewees expressed a sophisticated understanding of issues of digital preservation. However, the findings also indicate that bloggers exhibit behaviors and preferences complicating preservation action, including issues related to rights and use, co-producer dependencies, and content integrity.
The second study, currently on-going, looks toward the public availability of scholar blogs over-time, with findings drawn from a sample of 644 blogs. Content analysis is currently underway on inactive blogs, characterized as available, but with no new posts published within three months of coding. Initial analysis of the most recent post published to these inactive blogs shows that some bloggers did provide indicators of their respective blogʼs declining activity or, in some cases, blog stoppage. However, such indicators are only present in a clear minority of publicly available, yet inactive blogs. These preliminary findings offer implications for both personal and programmatic preservation approaches, including, notably, issues related to selection and appraisal.
This ppt will provide the support to finding the indexing of publication and also will help to manage your research profile among world research forums.
A lecture on evaluating AR interfaces, from the graduate course on Augmented Reality, taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury.
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
Presentation on publishing scientific research and how to write high-impact scientific papers by Dr. John Uhlrich as the Editor-in-Chief of Energy Technology published by Wiley-VCH, given on 25 October 2018 at the University of Pittsburgh, USA.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
1. WikiSym 2012
Mutual Evaluation of Editors and Texts
for Assessing Quality of Wikipedia Articles
Yu Suzuki Nagoya University, Japan
Masatoshi Yoshikawa Kyoto University, Japan
1
2. Have you ever use Wikipedia?
1.0
Wikipedia blog
0.8
Percentage Usage of
0.6
0.4
0.2
Oxford university - SPIRE Project
Results and analysis of Web2.0 services survey
http://spire.conted.ox.ac.uk/
0
-18 18-24 25-34 35-44 45-54 55-64 65-74
Age (years old)
2
3. Have you ever use Wikipedia?
1.0
Wikipedia blog
0.8
Percentage Usage of
0.6
0.4
Less than 18 years and more than 65 years old users
0.2 = novice users
use Wikipedia frequently. Oxford university - SPIRE Project
Results and analysis of Web2.0 services survey
http://spire.conted.ox.ac.uk/
0
-18 18-24 25-34 35-44 45-54 55-64 65-74
Age (years old)
2
4. What is the main purpose?
56% of users use
for work and study.
But really?
3
5. What is the main purpose?
Never heard
8%
For Work
Never used
8% 20% 56% of users use
for work and study.
Wikipedia is trusted by
For Fun many users.
28%
For Study
36%
But really?
3
6. Are Wikipedia articles high quality?
7000.00
80% of
all artic
5250.00
les are
low qua
# of Articles
l i t y.
値タイトル
3500.00
1750.00
0
1
Quality degree
カテゴリタイトル
low high
4
(calculated using our proposed method)
7. Objectives
• Calculate quality values for articles automatically, accurately.
• For readers: Readers may believe which articles are high quality or not.
→ Readers can assume which articles are high quality.
• For editors: Editors can decide which articles need to be edited.
• For administrators: Administrators can decide which articles are not
appropriate for Wikipedia, for keeping the quality of articles.
5
8. Output of Our proposed system
Quality Value: 40%
High quality part
Low quality part
6
9. What is quality?
From Dictionary
【Quality】the degree of excellence of something
【Credibility】the quality of being treated and believed in
From Psychology (Fogg 2003)
Trustworthiness: How many users believe something
Expertise: expert’s opinion
We use “trustworthiness” as the definition of quality
Quality is not True or False but How many users
believe.
7
10. Related Work
Link analysisquality articles using[Bellomi 2005, Chin 2011]
Identify high based method HITS, PageRank.
This method can easily identify major articles, but cannot identify minor but high
quality articles.
Using editor reputation [Adler2007, Wiklinson 2007]
We use this method.
Identify which articles are high quality using reputation of editors by editors
themselves
Good Point: These methods can calculate accurate quality.
Because, editors or viewers do not directly decide text quality.
Bad Point: Vandals (bad editor) can easily change text quality.
8
11. Plan for Calculating Quality
Who evaluate?
・reader (voting)
Who evaluate?
・reader themselves (personalization)
・editor (reputation-based)
What quality we measure? How to evaluate?
・whole article
What quality we ・reader’s voting
・a part measure?
of article How to evaluate?
・article analysis
・editor ・article edit history
9
12. Plan for Calculating Quality
Who evaluate?
・reader (voting)
・reader themselves (personalization)
・editor (reputation-based)
What quality we measure? How to evaluate?
・whole article
What quality we ・reader’s voting
・a part measure?
of article How to evaluate?
・article analysis
・editor ・article edit history
9
13. Plan for Calculating Quality
Who evaluate?
・reader (voting)
・reader themselves (personalization)
・editor (reputation-based)
What quality we measure? How to evaluate?
・whole article ・reader’s voting
・a part of article How to evaluate?
・article analysis
・editor ・article edit history
9
14. Plan for Calculating Quality
Who evaluate?
・reader (voting)
・reader themselves (personalization)
・editor (reputation-based)
What quality we measure? How to evaluate?
・whole article ・reader’s voting
・a part of article ・article analysis
・editor ・article edit history
9
15. Plan to Measure Quality
• Why we use reputation-based approach?
• Users voting are not always true.
• In YouTube, almost all votes are 5 stars (highest scores).
• Why we calculate editor’s quality?
• We assume that same editor writes same quality of articles.
• Why we use edit history?
• Our proposed system should language independent.
10
16. Overview
Quality degree 55%
5. 1. Identify editors of articles.
2. Get edit history of each editor.
3. Calculate text’s Quality Value; QV.
4. Calculate editor’s QV.
5. Calculate article’s QV.
QV of = 70%
QV of = 40%
QV= 60%
Editor:A Edit history 4.
Editor:B 3.
1. 2.
11
17. Key Idea
High quality texts survive beyond
Editor A multiple edits
add
・if a text remain - QV of the text ↑
Editor B ・if a text is deleted - QV of the text ↓
delete
Editor C
12
18. Calculate Text’s quality values
•A writes 100 letters
write A
• Texts of A do not gain QV
100
100
80
deleted by B •A cannot evaluate A herself
75 20↓
•B deletes 20 letters
deleted by C
# of letters
•B remain A’s 80 letters
50
•B evaluate A’s 80 letters is good
25
•C deletes 60 letters
60↓
20 20
•C remain A’s 20 letters
0 •C evaluate A’s 20 letters is good
1 2 3 4
version number
• A’s text QV = log80 + log20
13
19. Problem
• Editor’s quality is not considered.
•C deletes A’s text.
A
• A’s QV decreases.
add
B • If C has low quality, C may delete high quality texts.
delete • A’s QV should NOT be decreased.
C • If C has high quality, C should delete low quality texts.
• A’s QV should be decreased.
14
20. Use editor’s QV for text’s QV
write A
without editor’s QV
100
100
with editor’s QV • If B’s QV is 100%
100
80
80
deleted by B •B should deletes low quality texts.
75 20↓
• A’s text is deleted 25 letters by B.
deleted by C
# of letters
• If C’s QV is 50%
50 50
50
•C may delete 50% of high quality
25
60↓ texts.
20 20
0 • A’stext is deleted 30 letters
1 2 3 4 (60 letters × 50%) by C.
version number
15
21. Chicken-or-the-egg problem
QV of = 70%
QV of = 40%
= 60%
Editor:A Edit history 4.
Editor:B 3. use ’s CV
1. 2.
• Text’s QV is calculated by both edit history and editor’s QV.
• Editor’s QV is calculated by text’s QV.
• Editor’s QV ⇆ text’s QV are a chicken-or-the-egg problem.
Mutually calculate editor’s and text’s QV until converge.
16
22. Our proposed method
1. Identify editors of articles.
2. Get edit history of each editor.
3. Calculate Text’s QV using editor’s QV.
• When first time, all editor’s QV is considered as 1 (highest value).
4. Calculate editor’s QV.
5. If text’s QVs and editor’s QVs are not converged, return 3.
6. Calculate article’s QV.
QV of = 70%
QV of = 40%
QV = 60%
Editor:A Edit history 4.
Editor:B 3.
1. 2. 5.
17
23. Experimental Setup
• Data set
• Japanese Wikipedia edit history data (at Nov. 2, 2010)
• 1,889,129 articles, 2,178,003 editors (w/ bots, anonymous IP user)
• High quality articles (Correct Dataset)
• “Featured articles” and “Good articles” selected by Wikipedians.
• Evaluation measure
• 11-pt interpolated Recall-Precision graph
18
24. Experimental result
0.10
with editor’s QV • Precision improves about 10%.
without editor’s QV
0.09
• At recall 0 to 0.5, precision improves about 20%, whereas
0.08
precision does not improve At recall 0.6 to 1.
0.07
• When an article is about current events and is high quality,
Precision
0.06 our system can decide as high quality, but not in featured
article.
0.04
0.03
• When one editor writes excellent texts, and the other editors
do not edit, the article is “featured” but do not decided as
0.02 high quality.
0.01 • Text’s and editor’s QV converges when we calculate QVs
20 times each.
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall 19
25. Conclusion
• Calculate texts’ quality values using editor’s QV.
• Relation of texts’ quality values and editors’ quality values is chicken-or-the-egg.
• Mutually calculate text’s quality values and editor’s quality values until converge.
• Improved averaging precision ratio is about 10%.
• At low recall ratio, precision ratio improves about 20%.
• Future Work
• Confidence of quality values
• When A edits 100 articles many times, B edits only ONE article once, and A and B has same QV,
qualities of A and B are decided as the same by the system. But, this should be different because
confidence is different.
• Other effective assumption
• When high quality editor confirms a text, the text should be high quality even if the text is written by
low quality editor. 20
26. Open problem
• Using contents analysis
• Estimate terms which appear frequently in high quality articles, but do not
appear in low quality articles.
• Using multiple language articles
• If
an article in Japanese is similar to that in English, the article is high
quality?
• For Web documents, SNS, ...
• How to calculate quality degrees without edit history?
21
I am Yu Suzuki, in the information technology center at Nagoya University. Title of today’s presentation is quality assessment of wikipedia articles using edit history. The purpose of this presentation is how to calculate quality values to Wikipedia articles.\n
I show the data about age of users and percentage usage of services. This question is done by SPIRE Project by Oxford University. Red bar shows Wikipedia, and Blue bar shows blogs. From this graph, less than 18 years old and more than 65 years old users use Wikipedia frequently than the other Web services. These users may not have enough knowledge, then if there is a wrong story in Wikipedia, these users will believe. This is a problem.\n
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?\n
I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?\n
This graph showing the relationship between quality degrees and number of articles. This quality is calculated by our proposed system which I will talk later. From this graph, if our system calculates quality values, about 80% of all articles are not credible. This means that almost all users trust Wikipedia, whereas almost all articles are not credible. So I think quality values is important for many users to prevent believing wrong articles.\n
The objectives of this study is to calculate quality degrees automatically, speedy, and accurately. This quality degree is useful for readers, editors, and administrators. Readers may believe which articles are credible or not. Editors can decide which articles need to be edited. And Administrators can decide which articles are not appropriate for Wikipedia for keeping the quality of articles. \n
This is the output of our proposed system. In our system, original Wikipedia article is overlaid with two kinds of color lines. Blue line shows credible part, and red line shows not credible part. Left-upper part shows overall quality degrees, and blue and red bar show ratio of credible, and not credible parts.\n
First, we should define what is quality. This is a very difficult question, but from dictionary, quality is defined as the degrees of excellence of something. Credibility is defined as the quality of being treated and believed in. But this definition is ambiguous, so I carried the definition from psychology. Fogg said that quality is defined as two meanings such as trustworthiness and expertise. Trustworthiness is how many users believe something, and expertise is expert’s opinion. In our study, we use trustworthiness as the definition of quality. Therefore, quality is not true or false but how many users believe.\n
Next, we introduce several related works. There are two approach, link analysis based method and editor reputation based method. Link analysis method is to identify high quality articles using link analysis method such as HITS or PageRank. This method can easily identify major articles, but cannot identify minor but high quality articles. Another method is using editor reputation. In this method, reputation of editors by the other editors. Our method is based on this method. Good point of these methods is these methods can calculate accurate quality, because editors or viewers of articles do not directly decide text quality, but using implicit decision. But bad point is that vandals, bad editor, can easily change text quality. \n
To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
I talk again about a plan to measure quality. I used reputation-based approach because user’s voting is not always true. In You Tube, almost all votes are highest scores. I used editor’s quality because we assume that same editor writes same quality of articles. I used edit history because this method is simple, and our proposed system should be language independent. If I use linguistic analysis, the system should be language dependent.\n
This is a overview of our proposed system. First, when I analyze an article, and identify editors. In this example, I identified editor A and B from edit history. Next, I get edit history of the editors for the other articles. Then I analyze this edit history, and calculate text’s and editor’s quality values. I calculate quality value of A is 70% and B is 40%. Finally, for combining these two editor’s quality values, I calculate article quality values. In this case, this article’s quality degree is 55%.\n
The key idea is the remain ratio of texts. This means if a part of articles are high quality, the part is not deleted by the other editors. If a part of articles are low quality, the part is soon deleted or replaced. I give the situation that Editor A writes this part, and editor B adds this part, and editor C delete editor A’s part and replace this part. In this case, Editor B remain Editor A’s part, Editor B decide Editor A’s part is high quality. Editor C remain Editor B’s part, Editor C decide Editor B’s part is high quality. However, Editor C delete Editor A’s part, Editor C decide Editor A’s part is low quality.\n
I explain how to calculate quality value of texts. First, A writes 100 letters to an article, then B deletes 20 letters from A’s text, then editor C deletes 60 letters from A’s text. In this case, at version 1, A cannot gain quality value because A cannot evaluate A herself. Next, at version 2, B remain A’s 80 letters, then B evaluates that A’s 80 letters are good, then A gain 80 positive evaluation from A. Next, at version 3, C remain A’s 20 letters, then C evaluates that A’s 20 letters are good, then A gain 20 positive evaluation from A. As a result, from this edit history, editor A gains log 80 plus log 20 quality values from editor B and C.\n
However, the problem is that this system does not consider editor’s quality. In this case C deletes A’s text. Then, Our system decreases A’s quality value. However, If C has low quality, C may delete high quality texts. In this case, A’s quality value should not be decreased. But if C has high quality, C should delete only low quality texts. Then, A’s quality value should be decreased. Therefore, editor’s quality is important to calculate text quality values.\n
I explain how to calculate quality value of texts using editor quality value. If B’s quality value is 100%, this means that if B is a high quality editor, then B should deletes low quality texts. Therefore, when B deletes delete 25 letters, A’s text should be deleted 25 letters. However, if C deletes 60 letters and C’s quality value is 50%, C may delete 50% of high quality texts. Therefore, A’s text should be deleted 30 letters, a half of actual deleted letters, by C.\n
However, there is another problem. Text quality value is calculated by both edit history and editor’s quality values. Editor quality value is calculated by text quality value. Therefore, calculating editor’s quality values and text quality values are a kind of chicken-or-the-egg problem. Therefore, to solve this problem, we mutually calculate editor’s and text’s quality values until converge these values.\n
Using these discussions, we improve our proposed method. First, we identify editors of articles. Then, we get edit history of each editor. Then, we calculate text’s quality values using editor’s quality value. When it is a first time to calculate text quality values, we assume that all editor’s quality value is considered as 1, the highest value. Then we calculate editor’s quality value. Next, if text quality values and editor’s quality values are not converged, return to step 3. Finally, we calculate article quality values.\n
I used Japanese Wikipedia edit history data from Wikipedia site. I used of 85 thousands and 28 articles, about 13.6% of all all articles. These articles are written by 705 thousands and 713 editors except bot. I used credible articles as featured articles and good articles selected by Wikipedians. In this experiment, I used Japanese Wikipedia, but I can use any language of Wikipedia. However, English version of Wikipedia edit history is not available now. So I cannot use English version of Wikipedia.\n
This is an experimental result. From this recall precision graph, we can confirm that precision ratio improves about 10%. From recall 0 to 0.5, precision ratio improves about 20%, whereas precision ratio does not improve at recall 0.6 to 1. When an article is about current events and is high quality, our system can decide as high quality. But these articles are not in featured articles. When one editor writes excellent texts, and the other editors do not edit, the articles is featured articles, but do not decided as high quality in our method. Moreover, the quality value of texts and editors converges when we calculate quality values 20 times each.\n
Finally, I conclude our study. In this study, we calculate history’s quality values using editor’s quality values. Relation of text and editor quality values is a kind of chicken-or-the-egg problem. To solve this problem, we mutually calculate text and editor quality values until converge. As a result, we improve averaging precision about 10%. At low recall ratio, precision ratio improves about 20%. Next, I introudce our future work. First topic is about confidence of quality values. When A edits 100 articles many times, B edits only one article once, and A and B has same quality values, the qualities of A and B are decided as the same by the system. But, this should be different because confidence of A and B is different. Another topic is about the other effective assumption. When high quality editor confirms a text, the text should be high quality even if the text is written by low quality editor.\n
I consider several problems, such as content analysis techniques. In this method, I estimate terms which appear frequently in credible articles, but do not appear in not credible articles. Next I use multiple language articles. I think english Wikipedia is the richest, therefore if an article in japanese is similar to that in English, the article is credible or rich. I want to adopt my system to Web documents and SNS, but there is no edit history for Web documents. So I should discover how to calculate quality without edit history.\n