Paper accepted on CSCW'17 by Chunyang Chen. Community edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. These post edits usually involve small-scale sentence edits and our survey of trusted contributors suggests that most of them care much or very much about such small sentence edits. To assist users in making small sentence edits, we develop an edit-assistance tool for identifying minor textual issues in posts and recommending sentence edits for correction. We formulate the sentence editing task as a machine translation problem, in which an original sentence is “translated” into an edited sentence. Our tool implements a character-level Recurrent Neural Network (RNN) encoder-decoder model, trained with about 6.8 millions original-edited sentence pairs from Stack Overflow post edits. We evaluate our edit assistance tool using a large-scale archival post edits, a field study of assisting a novice post editor, and a survey of trusted contributors. Our evaluation demonstrates the feasibility of training a deep learning model with post edits by the community and then using the trained model to assist post editing for the community.
08448380779 Call Girls In Friends Colony Women Seeking Men
By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites
1. Chunyang Chen, Zhenchang Xing, Yang Liu
By the Community & For the Community:
A Deep Learning Approach to Assist
Collaborative Editing in Q&A Sites
Chen, Chunyang, Zhenchang Xing, and Yang Liu. "By the community & for the community:
a deep learning approach to assist collaborative editing in q&a sites." Proceedings of the
ACM on Human-Computer Interaction 1, no. CSCW (2017): 32.
2. Collaborative edits
Background
• Collaborative editing is the editing of groups producing works together
through individual contributions. Effective choices in group awareness,
participation, and coordination are critical to successful collaborative writing
outcomes
• Widely used for crowd-sourcing websites or collaborative platforms:
• Wikipedia
• Google Doc, GitHub
3. Collaborative edits in Stack Overflow
Background
• Stack Overflow
• 17M questions, 26M answers, 9.6M users
• 7K new questions/day, many new users
• Edits:
• https://stackoverflow.com/help/privileges/edit
4. Community collaborative edits
Observation
• 21,759,565 edits before Dec 2017
• 1,857,568 (9%) question-title edits
• 2,622,955 (12%) question-tag edits
• 17,279,042 (79%) post-body edits
5. What has been edited
Observation
• Edit type
• Title, tag and post body
• Edit content
• https://stackoverflow.com/help/privileges/edit
• Formatting, spelling, grammar, readability
0
1
2
3
4
5
6
2008 2009 2010 2011 2012 2013 2014 2015 2016
CountsMillions
Year
Post
Body Edit
Title Edit
Tags Edit
6. What is the scale of changes that post edits involve
Observation
• Calculate the change scale
• character-level Longest Common Subsequence (LCS)
• Similarity
• 71.29% post body edits are with similarity score between 0.8 and 1
• 64.47% post body edits are with similarity score between 0.8 and 1
5.48% 6.73% 9.60% 18.72%
45.75%4.21% 6.16%
9.24%
16.02%
55.28%
0
2
4
6
8
10
12
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
CountsMillions
Similarity
title body
7. Assist or even automate collaborative edits
Goal
• Develop a tool for reminding users about the potential edits
• Minor revision: many posts edits are about spelling, formatting, grammar …
• Sentence-level: most minor revision happen in single sentences.
8. Collecting the dataset of <original-post, post-body-edit-type>
Data Collection
• Regular expression and text differencing
• 13,806,188 sentence pairs
9. Method
• Data collection
• 7,545,979 original-edited sentence pairs
• Edit model
• Char-level seq2seq
• Copy domain-specific word
Data collection Edit model
11. Character-Level RNN Encoder-Decoder Model
Method
• Copy domain-specific words
• URL, API calls, variable names
Original: pls check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() …
Preprocess: pls check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1…
Edited: Please check 𝑈𝑁𝐾_𝑈𝑅𝐿1 for 𝑈𝑁𝐾_𝐴𝑃𝐼1…
Post-process: Please check https://docs.python.org/2/library/itertools.html#itertools.groupby for itertools.groupby() …
12. Performance comparison between our model and baselines
Results
• Evaluation metrics
• GLEU:
• Baseline
• LanguageTools, Statistical Machine Translation (SMT)
13. Further evaluation
Results
• Influence by length
• With the length increase, the baselines are comparable
to our model
• Assist real edit
• Within randomly selected 50 latest posts, our model
recommend edits to 39 of them.
• 36 edits (92.3%) are accepted, while 3 (7.7) rejected
14. Feedback from Real Users
• Survey
• Q1: How much do yo care about spelling, grammar, formatting edits?
• Q2: What percentage of your edits are spelling, grammar, formatting edits?
• Q3: How much could our tool help with such edits?
• Feedback
• Send email to 410 users who ranked top 2000 in Stack Overflow
• 61 valid replies
• Other suggestions:
• “SO needs this tool and I hope to see it in action soon. I believe the resulting tool might be useful outside the context of SO
websites.”
• “How will it get integrated with the SO site?”
• “Not interested. Same reason I abhor spell and grammar checkers. Generally way too many false positives”
0
10
20
30
40
Q1 Q2 Q3
Usernumber
Question ID
1 2 3 4 5