SlideShare a Scribd company logo
Layers


                         An Adaptive Filter-Framework for the
                         Quality Improvement of Open-Source
                                   Software Analysis

                            Advanced Community Information Systems (ACIS)
                                  RWTH Aachen University, Germany
                              Anna Hannemann, Michael Hackstein, Ralf
                                       Klamma, Matthias Jarke
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          1                  This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Open Source Software Projects

     Layers                Community-driven    Development
                           Voluntary participation
                           Communication, project management and
                            development via Web tools
                           Some successful and famous examples
                           Smaller niche projects
                           A long-tail of unsuccessful projects


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          2
Open Source Software Analysis for
                                Software Engineering

     Layers                Understand,  model, simulate and organize
                            community-driven development
                           Agile development practices
                           Distributed and intercultural practices
                           New success factors
                           Long-term freely available datasets
                           Low cost empirical studies


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          3
Open Source Software Analysis
                               Research Results

     Layers




Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          4                       Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
Techniques for Knowledge Mining in
                             Development Repositories

     Layers




                           Results
                                 are only as good as data is!
                           Remember DNA Phantom?
                            “A hypothesized unknown female serial killer as a result of
                            contaminated cotton swabs used for collecting DNA”
                           MineData not Noise!
                            Cleaning of Artifacts from Communication and
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
                                 Development Repositories Needed
          5
Data Cleaning for Knowledge Mining
                              in Development Repositories

     Layers

                           Data-structure   independence: variable artifacts types
                           Additive filtering: filter only new data
                           Filter nesting: sequence of arbitrary order
                           Consistent data format: cross-medium analysis
                           Consistent and easy-to-use interface
                           Extensibility: continuous evolution
                           Adaptive database insertion
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          6
Adaptive-Filtering Approach
                                   Cross-Media Mapping

     Layers
                         Artifact types
                           Mail
                           Comment
                           Post
                           ...
                         Cross-media mapping
                           Assignment of semantic meaning to artifact elements
                           Extensibility to new data sources
Lehrstuhl Informatik 5
(Information Systems)      Same filters for different data
   Prof. Dr. M. Jarke
          7
Adaptive-Filtering Approach
                                       Filter Nesting

     Layers                Sequence     of filters F1, F2, …, FN
                           Results in same predefined format
                           One filter – one cleaning (analysis) task
                           Each filter triggers its predecessor
                           Complex filter as a combination of several filters
                           Filtering triggered on demand
                           Filtering of a subset possible
                           Simple filters first and than analysis of reduced data
Lehrstuhl Informatik 5
(Information Systems)
                            set with more filters of higher complexity
   Prof. Dr. M. Jarke
          8
Adaptive-Filtering Approach
                                    Multi-Threading

     Layers




                           Only  new data is filtered
                           Asynchronous processing: filtered data subset is
                            provided directly to the next analysis task
Lehrstuhl Informatik 5
                           Synchronous processing: wait till the complete data
                            set is filtered
(Information Systems)
   Prof. Dr. M. Jarke
          9
Dataset Reduction and Content
                                     Cleaning Filters

     Layers                Dataset   Reduction Filter (DRF)
                            –  Reduces amount of artifacts
                            –  Select artifacts, which fulfill certain criteria
                            –  Example
                                –  Spam detection
                                –  Artifact classification based on Bayes Decision Rule
                           Content   Cleaning Filter (CRF)
                            –  Modifies content of artifacts
                            –  Example
Lehrstuhl Informatik 5
                                –  Quotation Filter
(Information Systems)
   Prof. Dr. M. Jarke
         10
                                –  Detection of predefined patterns in content
Artifact Transformation Filters

     Layers                Filter
                                 as analysis task
                           Modifies artifact attributes
                           Example:
                            –  Core-Periphery Filter: Separates
                               core of community from periphery
                            –  Hierarchical clustering based on
                               power law distribution


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         11
Validation in BioJava, Biopython and
                             BioPerl OSS: Spam Detection

     Layers
                             BioJava




                         Spam and spammer level in mailing lists of OSS
                           Significant amount (up to 60%)
                           Non-monoton
                           Distortion of dynamics
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         12
Validation in BioJava, Biopython and
                            BioPerl OSS: Results Distortion

     Layers




                                                 Year 2004, BioJava


                         Mood within project community
                           Summarized sentiment of project Mails per month
                           Positive sentiment of spam advertisement
                           Incorrect sentiment assignment due to quotation
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         13
Adaptive Filter-Framework and OSS
                                               Analysis
                           OSS Analysis for SE
     Layers
                            –  Methods/metrics for knowledge mining in company
                               communication and development repositories
                            –  Understanding of community-oriented development:
                               principles, obstacles and advantages
                         !  Data Cleaning: Results are only as good as data is!
                           Adaptive   Filter-Framework
                            –  Significant noise level in data
                            –  Adaptable for any Web artifact format
Lehrstuhl Informatik 5
                            –  Filter nesting
(Information Systems)
   Prof. Dr. M. Jarke
         14                 –  Filter as analysis method

More Related Content

What's hot

download
downloaddownload
downloadbutest
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
IJCSEA Journal
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Amit Sheth
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
Chris Evelo
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptNicolas Bertrand
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
Chris Evelo
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Carla Lima
 
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
David Yonge-Mallo
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
idescitation
 
Artista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys temsArtista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys temsUltraUploader
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
AIMS (Agricultural Information Management Standards)
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
Justin Johnson
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Use of data
Use of dataUse of data
Use of data
Chris Evelo
 
NatashaBME1450.doc
NatashaBME1450.docNatashaBME1450.doc
NatashaBME1450.docbutest
 

What's hot (20)

download
downloaddownload
download
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of concept
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...
 
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
 
Artista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys temsArtista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys tems
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Use of data
Use of dataUse of data
Use of data
 
NatashaBME1450.doc
NatashaBME1450.docNatashaBME1450.doc
NatashaBME1450.doc
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 

Similar to An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
Michael Derntl
 
Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...
Dejan Kovachev
 
Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
Laurens De Vocht
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
BITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
BITS
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
Neuroscience Information Framework
 
Identification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based CommunitiesIdentification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based Communities
Milos Kravcik
 
Interactions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime EnvironmentInteractions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime Environment
Michael Derntl
 
Containerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessContainerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data access
Ben Busby
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
NetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningNetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for Learning
Mohamed Amine Chatti
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Alex Henderson
 
Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1
BigData_Europe
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
NeuroMat
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
Carole Goble
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
Ben Busby
 
Laskaris mining information_neuroinformatics
Laskaris mining information_neuroinformaticsLaskaris mining information_neuroinformatics
Laskaris mining information_neuroinformatics
Laskaris
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Somdutt Sharma
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
Neuroscience Information Framework
 
Learning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerLearning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail Learner
Ralf Klamma
 

Similar to An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis (20)

An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
 
Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...
 
Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Identification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based CommunitiesIdentification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based Communities
 
Interactions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime EnvironmentInteractions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime Environment
 
Containerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessContainerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data access
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
NetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningNetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for Learning
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
 
Laskaris mining information_neuroinformatics
Laskaris mining information_neuroinformaticsLaskaris mining information_neuroinformatics
Laskaris mining information_neuroinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Learning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerLearning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail Learner
 

Recently uploaded

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

  • 1. Layers An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias Jarke Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 1 This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  • 2. Open Source Software Projects Layers   Community-driven Development   Voluntary participation   Communication, project management and development via Web tools   Some successful and famous examples   Smaller niche projects   A long-tail of unsuccessful projects Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 2
  • 3. Open Source Software Analysis for Software Engineering Layers   Understand, model, simulate and organize community-driven development   Agile development practices   Distributed and intercultural practices   New success factors   Long-term freely available datasets   Low cost empirical studies Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 3
  • 4. Open Source Software Analysis Research Results Layers Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 4 Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
  • 5. Techniques for Knowledge Mining in Development Repositories Layers   Results are only as good as data is!   Remember DNA Phantom? “A hypothesized unknown female serial killer as a result of contaminated cotton swabs used for collecting DNA”   MineData not Noise! Cleaning of Artifacts from Communication and Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke Development Repositories Needed 5
  • 6. Data Cleaning for Knowledge Mining in Development Repositories Layers   Data-structure independence: variable artifacts types   Additive filtering: filter only new data   Filter nesting: sequence of arbitrary order   Consistent data format: cross-medium analysis   Consistent and easy-to-use interface   Extensibility: continuous evolution   Adaptive database insertion Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 6
  • 7. Adaptive-Filtering Approach Cross-Media Mapping Layers Artifact types   Mail   Comment   Post   ... Cross-media mapping   Assignment of semantic meaning to artifact elements   Extensibility to new data sources Lehrstuhl Informatik 5 (Information Systems)   Same filters for different data Prof. Dr. M. Jarke 7
  • 8. Adaptive-Filtering Approach Filter Nesting Layers   Sequence of filters F1, F2, …, FN   Results in same predefined format   One filter – one cleaning (analysis) task   Each filter triggers its predecessor   Complex filter as a combination of several filters   Filtering triggered on demand   Filtering of a subset possible   Simple filters first and than analysis of reduced data Lehrstuhl Informatik 5 (Information Systems) set with more filters of higher complexity Prof. Dr. M. Jarke 8
  • 9. Adaptive-Filtering Approach Multi-Threading Layers   Only new data is filtered   Asynchronous processing: filtered data subset is provided directly to the next analysis task Lehrstuhl Informatik 5   Synchronous processing: wait till the complete data set is filtered (Information Systems) Prof. Dr. M. Jarke 9
  • 10. Dataset Reduction and Content Cleaning Filters Layers   Dataset Reduction Filter (DRF) –  Reduces amount of artifacts –  Select artifacts, which fulfill certain criteria –  Example –  Spam detection –  Artifact classification based on Bayes Decision Rule   Content Cleaning Filter (CRF) –  Modifies content of artifacts –  Example Lehrstuhl Informatik 5 –  Quotation Filter (Information Systems) Prof. Dr. M. Jarke 10 –  Detection of predefined patterns in content
  • 11. Artifact Transformation Filters Layers   Filter as analysis task   Modifies artifact attributes   Example: –  Core-Periphery Filter: Separates core of community from periphery –  Hierarchical clustering based on power law distribution Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 11
  • 12. Validation in BioJava, Biopython and BioPerl OSS: Spam Detection Layers BioJava Spam and spammer level in mailing lists of OSS   Significant amount (up to 60%)   Non-monoton   Distortion of dynamics Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 12
  • 13. Validation in BioJava, Biopython and BioPerl OSS: Results Distortion Layers Year 2004, BioJava Mood within project community   Summarized sentiment of project Mails per month   Positive sentiment of spam advertisement   Incorrect sentiment assignment due to quotation Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 13
  • 14. Adaptive Filter-Framework and OSS Analysis   OSS Analysis for SE Layers –  Methods/metrics for knowledge mining in company communication and development repositories –  Understanding of community-oriented development: principles, obstacles and advantages !  Data Cleaning: Results are only as good as data is!   Adaptive Filter-Framework –  Significant noise level in data –  Adaptable for any Web artifact format Lehrstuhl Informatik 5 –  Filter nesting (Information Systems) Prof. Dr. M. Jarke 14 –  Filter as analysis method