An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

1,237 views
1,090 views

Published on

Presented at Software Engineering Conference 2013 in Aachen

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,237
On SlideShare
0
From Embeds
0
Number of Embeds
234
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

  1. 1. Layers An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias JarkeLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 1 This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  2. 2. Open Source Software Projects Layers   Community-driven Development   Voluntary participation   Communication, project management and development via Web tools   Some successful and famous examples   Smaller niche projects   A long-tail of unsuccessful projectsLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 2
  3. 3. Open Source Software Analysis for Software Engineering Layers   Understand, model, simulate and organize community-driven development   Agile development practices   Distributed and intercultural practices   New success factors   Long-term freely available datasets   Low cost empirical studiesLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 3
  4. 4. Open Source Software Analysis Research Results LayersLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 4 Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
  5. 5. Techniques for Knowledge Mining in Development Repositories Layers   Results are only as good as data is!   Remember DNA Phantom? “A hypothesized unknown female serial killer as a result of contaminated cotton swabs used for collecting DNA”   MineData not Noise! Cleaning of Artifacts from Communication andLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke Development Repositories Needed 5
  6. 6. Data Cleaning for Knowledge Mining in Development Repositories Layers   Data-structure independence: variable artifacts types   Additive filtering: filter only new data   Filter nesting: sequence of arbitrary order   Consistent data format: cross-medium analysis   Consistent and easy-to-use interface   Extensibility: continuous evolution   Adaptive database insertionLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 6
  7. 7. Adaptive-Filtering Approach Cross-Media Mapping Layers Artifact types   Mail   Comment   Post   ... Cross-media mapping   Assignment of semantic meaning to artifact elements   Extensibility to new data sourcesLehrstuhl Informatik 5(Information Systems)   Same filters for different data Prof. Dr. M. Jarke 7
  8. 8. Adaptive-Filtering Approach Filter Nesting Layers   Sequence of filters F1, F2, …, FN   Results in same predefined format   One filter – one cleaning (analysis) task   Each filter triggers its predecessor   Complex filter as a combination of several filters   Filtering triggered on demand   Filtering of a subset possible   Simple filters first and than analysis of reduced dataLehrstuhl Informatik 5(Information Systems) set with more filters of higher complexity Prof. Dr. M. Jarke 8
  9. 9. Adaptive-Filtering Approach Multi-Threading Layers   Only new data is filtered   Asynchronous processing: filtered data subset is provided directly to the next analysis taskLehrstuhl Informatik 5   Synchronous processing: wait till the complete data set is filtered(Information Systems) Prof. Dr. M. Jarke 9
  10. 10. Dataset Reduction and Content Cleaning Filters Layers   Dataset Reduction Filter (DRF) –  Reduces amount of artifacts –  Select artifacts, which fulfill certain criteria –  Example –  Spam detection –  Artifact classification based on Bayes Decision Rule   Content Cleaning Filter (CRF) –  Modifies content of artifacts –  ExampleLehrstuhl Informatik 5 –  Quotation Filter(Information Systems) Prof. Dr. M. Jarke 10 –  Detection of predefined patterns in content
  11. 11. Artifact Transformation Filters Layers   Filter as analysis task   Modifies artifact attributes   Example: –  Core-Periphery Filter: Separates core of community from periphery –  Hierarchical clustering based on power law distributionLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 11
  12. 12. Validation in BioJava, Biopython and BioPerl OSS: Spam Detection Layers BioJava Spam and spammer level in mailing lists of OSS   Significant amount (up to 60%)   Non-monoton   Distortion of dynamicsLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 12
  13. 13. Validation in BioJava, Biopython and BioPerl OSS: Results Distortion Layers Year 2004, BioJava Mood within project community   Summarized sentiment of project Mails per month   Positive sentiment of spam advertisement   Incorrect sentiment assignment due to quotationLehrstuhl Informatik 5(Information Systems) Prof. Dr. M. Jarke 13
  14. 14. Adaptive Filter-Framework and OSS Analysis   OSS Analysis for SE Layers –  Methods/metrics for knowledge mining in company communication and development repositories –  Understanding of community-oriented development: principles, obstacles and advantages !  Data Cleaning: Results are only as good as data is!   Adaptive Filter-Framework –  Significant noise level in data –  Adaptable for any Web artifact formatLehrstuhl Informatik 5 –  Filter nesting(Information Systems) Prof. Dr. M. Jarke 14 –  Filter as analysis method

×