SlideShare a Scribd company logo
1 of 14
Download to read offline
Chunyang Chen, Zhenchang Xing, Yang Liu
By the Community & For the Community:
A Deep Learning Approach to Assist
Collaborative Editing in Q&A Sites
Chen, Chunyang, Zhenchang Xing, and Yang Liu. "By the community & for the community:
a deep learning approach to assist collaborative editing in q&a sites." Proceedings of the
ACM on Human-Computer Interaction 1, no. CSCW (2017): 32.
Collaborative edits
Background
• Collaborative editing is the editing of groups producing works together
through individual contributions. Effective choices in group awareness,
participation, and coordination are critical to successful collaborative writing
outcomes
• Widely used for crowd-sourcing websites or collaborative platforms:
• Wikipedia
• Google Doc, GitHub
Collaborative edits in Stack Overflow
Background
• Stack Overflow
• 17M questions, 26M answers, 9.6M users
• 7K new questions/day, many new users
• Edits:
• https://stackoverflow.com/help/privileges/edit
Community collaborative edits
Observation
• 21,759,565 edits before Dec 2017
• 1,857,568 (9%) question-title edits
• 2,622,955 (12%) question-tag edits
• 17,279,042 (79%) post-body edits
What has been edited
Observation
• Edit type
• Title, tag and post body
• Edit content
• https://stackoverflow.com/help/privileges/edit
• Formatting, spelling, grammar, readability
0
1
2
3
4
5
6
2008 2009 2010 2011 2012 2013 2014 2015 2016
CountsMillions
Year
Post
Body Edit
Title Edit
Tags Edit
What is the scale of changes that post edits involve
Observation
• Calculate the change scale
• character-level Longest Common Subsequence (LCS)
• Similarity
• 71.29% post body edits are with similarity score between 0.8 and 1
• 64.47% post body edits are with similarity score between 0.8 and 1
5.48% 6.73% 9.60% 18.72%
45.75%4.21% 6.16%
9.24%
16.02%
55.28%
0
2
4
6
8
10
12
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
CountsMillions
Similarity
title body
Assist or even automate collaborative edits
Goal
• Develop a tool for reminding users about the potential edits
• Minor revision: many posts edits are about spelling, formatting, grammar …
• Sentence-level: most minor revision happen in single sentences.
Collecting the dataset of <original-post, post-body-edit-type>
Data Collection
• Regular expression and text differencing
• 13,806,188 sentence pairs
Method
• Data collection
• 7,545,979 original-edited sentence pairs
• Edit model
• Char-level seq2seq
• Copy domain-specific word
Data collection Edit model
Character-Level RNN Encoder-Decoder Model
Method
• Basic RNN model
• Char-based seq2seq
Character-Level RNN Encoder-Decoder Model
Method
• Copy domain-specific words
• URL, API calls, variable names
Original: pls check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() …
Preprocess: pls check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1…
Edited: Please check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1…
Post-process: Please check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() …
Performance comparison between our model and baselines
Results
• Evaluation metrics
• GLEU:
• Baseline
• LanguageTools, Statistical Machine Translation (SMT)
Further evaluation
Results
• Influence by length
• With the length increase, the baselines are comparable
to our model
• Assist real edit
• Within randomly selected 50 latest posts, our model
recommend edits to 39 of them.
• 36 edits (92.3%) are accepted, while 3 (7.7) rejected
Feedback from Real Users
• Survey
• Q1: How much do yo care about spelling, grammar, formatting edits?
• Q2: What percentage of your edits are spelling, grammar, formatting edits?
• Q3: How much could our tool help with such edits?
• Feedback
• Send email to 410 users who ranked top 2000 in Stack Overflow
• 61 valid replies
• Other suggestions:
• “SO needs this tool and I hope to see it in action soon. I believe the resulting tool might be useful outside the context of SO
websites.”
• “How will it get integrated with the SO site?”
• “Not interested. Same reason I abhor spell and grammar checkers. Generally way too many false positives”
0
10
20
30
40
Q1 Q2 Q3
Usernumber
Question ID
1 2 3 4 5

More Related Content

Similar to By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communitySelena Deckelmann
 
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...University of Hawai‘i at Mānoa
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)Robert Sanderson
 
33rd degree talk: open and automatic coding conventions with walkmod
33rd degree talk: open and automatic coding conventions with walkmod33rd degree talk: open and automatic coding conventions with walkmod
33rd degree talk: open and automatic coding conventions with walkmodwalkmod
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrievalAYESHA JAVED
 
Docs Like Code: Strategies and Stories
Docs Like Code: Strategies and StoriesDocs Like Code: Strategies and Stories
Docs Like Code: Strategies and StoriesAnne Gentle
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 

Similar to By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres community
 
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics i...
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)W3C Web Annotation WG Update (I Annotate 2016)
W3C Web Annotation WG Update (I Annotate 2016)
 
33rd degree talk: open and automatic coding conventions with walkmod
33rd degree talk: open and automatic coding conventions with walkmod33rd degree talk: open and automatic coding conventions with walkmod
33rd degree talk: open and automatic coding conventions with walkmod
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
Docs Like Code
Docs Like CodeDocs Like Code
Docs Like Code
 
Software citation
Software citationSoftware citation
Software citation
 
Docs Like Code: Strategies and Stories
Docs Like Code: Strategies and StoriesDocs Like Code: Strategies and Stories
Docs Like Code: Strategies and Stories
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

  • 1. Chunyang Chen, Zhenchang Xing, Yang Liu By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites Chen, Chunyang, Zhenchang Xing, and Yang Liu. "By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites." Proceedings of the ACM on Human-Computer Interaction 1, no. CSCW (2017): 32.
  • 2. Collaborative edits Background • Collaborative editing is the editing of groups producing works together through individual contributions. Effective choices in group awareness, participation, and coordination are critical to successful collaborative writing outcomes • Widely used for crowd-sourcing websites or collaborative platforms: • Wikipedia • Google Doc, GitHub
  • 3. Collaborative edits in Stack Overflow Background • Stack Overflow • 17M questions, 26M answers, 9.6M users • 7K new questions/day, many new users • Edits: • https://stackoverflow.com/help/privileges/edit
  • 4. Community collaborative edits Observation • 21,759,565 edits before Dec 2017 • 1,857,568 (9%) question-title edits • 2,622,955 (12%) question-tag edits • 17,279,042 (79%) post-body edits
  • 5. What has been edited Observation • Edit type • Title, tag and post body • Edit content • https://stackoverflow.com/help/privileges/edit • Formatting, spelling, grammar, readability 0 1 2 3 4 5 6 2008 2009 2010 2011 2012 2013 2014 2015 2016 CountsMillions Year Post Body Edit Title Edit Tags Edit
  • 6. What is the scale of changes that post edits involve Observation • Calculate the change scale • character-level Longest Common Subsequence (LCS) • Similarity • 71.29% post body edits are with similarity score between 0.8 and 1 • 64.47% post body edits are with similarity score between 0.8 and 1 5.48% 6.73% 9.60% 18.72% 45.75%4.21% 6.16% 9.24% 16.02% 55.28% 0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 CountsMillions Similarity title body
  • 7. Assist or even automate collaborative edits Goal • Develop a tool for reminding users about the potential edits • Minor revision: many posts edits are about spelling, formatting, grammar … • Sentence-level: most minor revision happen in single sentences.
  • 8. Collecting the dataset of <original-post, post-body-edit-type> Data Collection • Regular expression and text differencing • 13,806,188 sentence pairs
  • 9. Method • Data collection • 7,545,979 original-edited sentence pairs • Edit model • Char-level seq2seq • Copy domain-specific word Data collection Edit model
  • 10. Character-Level RNN Encoder-Decoder Model Method • Basic RNN model • Char-based seq2seq
  • 11. Character-Level RNN Encoder-Decoder Model Method • Copy domain-specific words • URL, API calls, variable names Original: pls check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() … Preprocess: pls check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1… Edited: Please check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1… Post-process: Please check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() …
  • 12. Performance comparison between our model and baselines Results • Evaluation metrics • GLEU: • Baseline • LanguageTools, Statistical Machine Translation (SMT)
  • 13. Further evaluation Results • Influence by length • With the length increase, the baselines are comparable to our model • Assist real edit • Within randomly selected 50 latest posts, our model recommend edits to 39 of them. • 36 edits (92.3%) are accepted, while 3 (7.7) rejected
  • 14. Feedback from Real Users • Survey • Q1: How much do yo care about spelling, grammar, formatting edits? • Q2: What percentage of your edits are spelling, grammar, formatting edits? • Q3: How much could our tool help with such edits? • Feedback • Send email to 410 users who ranked top 2000 in Stack Overflow • 61 valid replies • Other suggestions: • “SO needs this tool and I hope to see it in action soon. I believe the resulting tool might be useful outside the context of SO websites.” • “How will it get integrated with the SO site?” • “Not interested. Same reason I abhor spell and grammar checkers. Generally way too many false positives” 0 10 20 30 40 Q1 Q2 Q3 Usernumber Question ID 1 2 3 4 5