Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis

Chenlei Zhang
Chenlei ZhangTA/RA at University of Alberta
DEFICIENT DOCUMENTATION DETECTION
A Methodology to Locate Deficient Project
Documentation using Topic Analysis
Joshua Charles Campbell Department of Computing Science
Chenlei Zhang Department of Computing Science
Zhen Xu Department of Electrical and Computer Engineering
Abram Hindle Department of Computing Science
James Miller Department of Electrical and Computer Engineering
The 10th Working Conference on Mining Software Repositories
MOTIVATION
Developers
Official
Crowd-sourced
MSR 2013 2
Project
Documentation
Q&A Website
RESEARCH QUESTION
• Answer the question “Can we identify deficient
areas of project documentation by relating it to
Stack Overflow questions?”
• Provide a method to relate crowd-sourced
questions and project documentation.
3MSR 2013
METHODOLOGY
4
Stack
Overflow
Questions
Data
Extraction
Project
Documentation
MSR 2013
Two-phase Processing
• Data extraction
• Topic analysis
LDA
Ranked Deficient
Topics
Topic
Analysis
Stack Overflow
Question/Topic
Matrix
Project
Documentation/Topic
Matrix
Max
Subtract
DEFICIENT TOPICS FOUND
5MSR 2013
PHP EXAMPLE
6
• Deficient documentation exists
• Stack Overflow question #7321289:
• “How want to apply a vignette effect to an image using PHP with
ImageMagik. I found this function but I’m not sure how to use it.”
• PHP documentation:
• Imagick::vignetteImage
• http://www.php.net/manual/en/imagick.vignetteimage.php
MSR 2013
PYTHON EXAMPLE
7
• Deficient documentation exists
• Stack Overflow question #5893163:
• “What is the meaning of _ after for in this code?”
MSR 2013
OUT-OF-SCOPE DOCUMENTATION
• Questions that related to multiple projects
• For example, questions about:
• Clear indications and links should be included
when a user should reference external project
documentation
8
HTML
MSR 2013
MySQL
CONCLUSION
• Developed a method for locating deficient
documented aspects in project documentation;
• Successfully located deficient project
documentation using Stack Overflow questions.
9MSR 2013
1 of 9

Recommended

Full-stack Data Scientist by
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data ScientistAlexey Grigorev
11.9K views74 slides
Data Science Demystified by
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
195 views49 slides
MSR 2013 Preview by
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 PreviewThomas Zimmermann
21.8K views14 slides
Open Source Tools for Materials Informatics by
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
366 views37 slides
LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec... by
LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec...LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec...
LPU Summer Training Project Viva PPT - Modern Big Data Analysis with SQL Spec...Qazi Maaz Arshad
1K views16 slides
Alex mang patterns for scalability in microsoft azure application by
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
765 views30 slides

More Related Content

Similar to Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis

Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs by
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j
254 views48 slides
From SQL to Python - A Beginner's Guide to Making the Switch by
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
169 views29 slides
Data_Engineering_Learning_Roadmap.pdf by
Data_Engineering_Learning_Roadmap.pdfData_Engineering_Learning_Roadmap.pdf
Data_Engineering_Learning_Roadmap.pdfSayakSarkar22
13 views9 slides
Multithreading in C# - pitfalls, mistakes and solutions. by
Multithreading in C# - pitfalls, mistakes and solutions.Multithreading in C# - pitfalls, mistakes and solutions.
Multithreading in C# - pitfalls, mistakes and solutions.Marcin Dembowski
566 views12 slides
Data Science with Python - WeCloudData by
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudDataWeCloudData
12.9K views89 slides
Neo4j GraphDay Munich - How to make your GraphDB project successful by
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j GraphDay Munich - How to make your GraphDB project successful
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j
291 views29 slides

Similar to Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis (20)

Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs by Neo4j
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j254 views
From SQL to Python - A Beginner's Guide to Making the Switch by Rachel Berryman
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman169 views
Data_Engineering_Learning_Roadmap.pdf by SayakSarkar22
Data_Engineering_Learning_Roadmap.pdfData_Engineering_Learning_Roadmap.pdf
Data_Engineering_Learning_Roadmap.pdf
SayakSarkar2213 views
Multithreading in C# - pitfalls, mistakes and solutions. by Marcin Dembowski
Multithreading in C# - pitfalls, mistakes and solutions.Multithreading in C# - pitfalls, mistakes and solutions.
Multithreading in C# - pitfalls, mistakes and solutions.
Marcin Dembowski566 views
Data Science with Python - WeCloudData by WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
WeCloudData12.9K views
Neo4j GraphDay Munich - How to make your GraphDB project successful by Neo4j
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j GraphDay Munich - How to make your GraphDB project successful
Neo4j GraphDay Munich - How to make your GraphDB project successful
Neo4j291 views
The Semantic Knowledge Graph by Trey Grainger
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger12.3K views
SUNG PARK PREDICT 422 Group Project Presentation by Sung Park
SUNG PARK PREDICT 422 Group Project PresentationSUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project Presentation
Sung Park487 views
Benchmarking search relevance in industry vs academia by Nick Craswell
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academia
Nick Craswell345 views
Untangling fall2017 week1 by Derek Jacoby
Untangling fall2017 week1Untangling fall2017 week1
Untangling fall2017 week1
Derek Jacoby521 views
3DIR Presentation at BIM2015 Conference by pdemian
3DIR Presentation at BIM2015 Conference3DIR Presentation at BIM2015 Conference
3DIR Presentation at BIM2015 Conference
pdemian674 views
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus by Jakub Ruzicka
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusSocial Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Jakub Ruzicka5.7K views
NDC Oslo : A Practical Introduction to Data Science by Mark West
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
Mark West461 views

Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis