Report on the CLEF-IP 2012 Experiments: Search of Topically Organized PatentsMike Salampasis
Experiments submitted to the CLEF-IP campaign. Patent documents produced worldwide have manually-assigned classification codes which in our work are used to cluster, distribute and index patents through hundreds or thousands of sub-collections. We tested different combinations of source selection (CORI, BordaFuse, Reciprocal Rank) and results merging algorithms (SSL, CORI). We also tested different combinations of the number of collections requested and documents retrieved from each collection. . According to PRES @100 our best DIR approach ranked 7th across 31 submitted results, however our best DIR (not submitted) run outperforms all submitted runs.
This document presents Trustrace, an approach that uses software repository links to improve the trust in automatically recovered traceability links. Trustrace calculates trust values for traceability links based on their similarity scores from information retrieval techniques as well as evidence from other sources like version control commit logs. An empirical study on two systems found that Trustrace improved precision and recall over vector space models and reduced an expert's effort to validate links by up to 50%. The results also tended to improve when using larger version control commit logs.
This document discusses cross-lingual information retrieval. It presents approaches for translating queries from other languages to the document language, including using online machine translation systems and developing a statistical machine translation system. It describes experiments on reranking translations to select the one most effective for retrieval and on adapting the reranking model to new languages. Results show the reranking approach improves over baselines and online translation systems. The document also explores document translation and query expansion techniques.
The document discusses different information retrieval models including boolean, vector, and probabilistic models. The boolean model uses set theory and boolean algebra to represent documents and queries. The vector model assigns weights to terms and ranks documents based on similarity to the query. The probabilistic model calculates the probability of a document being relevant given a query. It also covers structured models for different tasks like filtering and browsing.
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
Towards a integrated network of data and services for the life sciences Modern biological knowledge discovery requires access to machine-understandable data that can be searched, retrieved, and subsequently analyzed using a wide array of analytical software and services. The Semantic Automated Discovery and Integration (SADI) framework is a set of conventions to formalize web service inputs and outputs using OWL ontologies that enable the automatic discovery and invocation of Semantic Web services. In this talk, I will walk through a worked example in the design and deployment of chemical semantic web services using the Chemical Development Toolkit, chemical descriptors from the Chemical Information Ontology (CHEMINF), and the Semanticscience Integrated Ontology (SIO) as a unifying, upper level ontology of basic types and relations. I will discuss how one can make use of the SADI-enabled SHARE client to reason about data obtained from Bio2RDF, the largest linked open data project, and automatically invoke chemical semantic web services to determine a chemical's drug-likeness. If you want to see the potential of the Semantic Web being realized, this talk is for you.
The document provides an overview of a proposed text categorization system using a modified Naive Bayes algorithm. It includes sections on the problem definition, objectives, literature review, methodology, proposed system architecture consisting of modules for dataset preprocessing, text categorization using the modified algorithm, a comparative study. The system would use a 20 newsgroup dataset, perform text reduction during preprocessing, classify unknown text, and compare the performance of the existing and proposed algorithms. It lists the software and hardware requirements and provides references for related work.
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
The document discusses various text mining techniques including sentiment analysis, topic modeling, classification, named entity recognition, and event extraction. It provides examples of applications and considerations for each technique. Performance factors like scalability, language support, and integration rules are also covered. Overall the document serves as an introduction to common text analytics methods.
This document discusses cross-language information retrieval (CLIR). It defines CLIR as retrieving information written in a language different from the user's query language. It describes approaches to CLIR such as dictionary-based query translation and pseudo-relevance feedback. Dictionary-based query translation uses bilingual dictionaries but requires disambiguation due to ambiguity. Pseudo-relevance feedback assumes top documents are relevant and selects terms from them to expand the query. The document also discusses using parallel corpora to estimate cross-lingual relevance models and evaluate CLIR using conferences like TREC and CLEF.
Report on the CLEF-IP 2012 Experiments: Search of Topically Organized PatentsMike Salampasis
Experiments submitted to the CLEF-IP campaign. Patent documents produced worldwide have manually-assigned classification codes which in our work are used to cluster, distribute and index patents through hundreds or thousands of sub-collections. We tested different combinations of source selection (CORI, BordaFuse, Reciprocal Rank) and results merging algorithms (SSL, CORI). We also tested different combinations of the number of collections requested and documents retrieved from each collection. . According to PRES @100 our best DIR approach ranked 7th across 31 submitted results, however our best DIR (not submitted) run outperforms all submitted runs.
This document presents Trustrace, an approach that uses software repository links to improve the trust in automatically recovered traceability links. Trustrace calculates trust values for traceability links based on their similarity scores from information retrieval techniques as well as evidence from other sources like version control commit logs. An empirical study on two systems found that Trustrace improved precision and recall over vector space models and reduced an expert's effort to validate links by up to 50%. The results also tended to improve when using larger version control commit logs.
This document discusses cross-lingual information retrieval. It presents approaches for translating queries from other languages to the document language, including using online machine translation systems and developing a statistical machine translation system. It describes experiments on reranking translations to select the one most effective for retrieval and on adapting the reranking model to new languages. Results show the reranking approach improves over baselines and online translation systems. The document also explores document translation and query expansion techniques.
The document discusses different information retrieval models including boolean, vector, and probabilistic models. The boolean model uses set theory and boolean algebra to represent documents and queries. The vector model assigns weights to terms and ranks documents based on similarity to the query. The probabilistic model calculates the probability of a document being relevant given a query. It also covers structured models for different tasks like filtering and browsing.
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
Towards a integrated network of data and services for the life sciences Modern biological knowledge discovery requires access to machine-understandable data that can be searched, retrieved, and subsequently analyzed using a wide array of analytical software and services. The Semantic Automated Discovery and Integration (SADI) framework is a set of conventions to formalize web service inputs and outputs using OWL ontologies that enable the automatic discovery and invocation of Semantic Web services. In this talk, I will walk through a worked example in the design and deployment of chemical semantic web services using the Chemical Development Toolkit, chemical descriptors from the Chemical Information Ontology (CHEMINF), and the Semanticscience Integrated Ontology (SIO) as a unifying, upper level ontology of basic types and relations. I will discuss how one can make use of the SADI-enabled SHARE client to reason about data obtained from Bio2RDF, the largest linked open data project, and automatically invoke chemical semantic web services to determine a chemical's drug-likeness. If you want to see the potential of the Semantic Web being realized, this talk is for you.
The document provides an overview of a proposed text categorization system using a modified Naive Bayes algorithm. It includes sections on the problem definition, objectives, literature review, methodology, proposed system architecture consisting of modules for dataset preprocessing, text categorization using the modified algorithm, a comparative study. The system would use a 20 newsgroup dataset, perform text reduction during preprocessing, classify unknown text, and compare the performance of the existing and proposed algorithms. It lists the software and hardware requirements and provides references for related work.
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
The document discusses various text mining techniques including sentiment analysis, topic modeling, classification, named entity recognition, and event extraction. It provides examples of applications and considerations for each technique. Performance factors like scalability, language support, and integration rules are also covered. Overall the document serves as an introduction to common text analytics methods.
This document discusses cross-language information retrieval (CLIR). It defines CLIR as retrieving information written in a language different from the user's query language. It describes approaches to CLIR such as dictionary-based query translation and pseudo-relevance feedback. Dictionary-based query translation uses bilingual dictionaries but requires disambiguation due to ambiguity. Pseudo-relevance feedback assumes top documents are relevant and selects terms from them to expand the query. The document also discusses using parallel corpora to estimate cross-lingual relevance models and evaluate CLIR using conferences like TREC and CLEF.
This document summarizes an algorithm to detect algorithm names in computer science research papers. It involves converting PDFs to text, performing named entity recognition to extract noun phrases, filtering entities to remove author names and locations, and using a word2vec model trained on computer science papers to classify extracted tokens as true algorithm names or noisy data by comparing their similarity to known positives and negatives. The top similar words are used to label each token as a true or false positive for an algorithm name.
INDUS: A System for Information Integration and Knowledge Acquisition from Au...Jie Bao
INDUS is a system for integrating and acquiring knowledge from autonomous, distributed, and semantically heterogeneous biological data sources. It provides tools for specifying ontologies, schemas, mappings between data sources and user views. Users such as domain experts can register data sources, define views over sources of interest, and formulate queries across multiple sources. Ongoing work focuses on supporting more expressive ontologies and mappings, scalability, and query optimization to ensure semantic integrity when querying distributed heterogeneous data sources.
The document describes cross-language information retrieval (CLIR) and summarizes an English-Chinese information retrieval system called ECIRS. ECIRS allows users to input queries in English and retrieves relevant Chinese documents through translation. It includes dictionaries, document indexes, and a Chinese search engine. Screenshots show the user interface where a user can enter an English keyword, view its Chinese translation, and see search results in Chinese.
The document discusses cross-language information retrieval (CLIR). It notes that while there are over 6,000 languages, 80% of websites are in English, creating a need for CLIR. CLIR aims to retrieve relevant documents in languages different from the query language. It is an important area as it allows for global information exchange and knowledge sharing, with applications in national security, access to foreign patents and medical information. CLIR draws on multiple disciplines including information retrieval, natural language processing and machine translation.
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Riccardo Albertoni
The development of a Spatial Data Infrastructure (SDI) at
European level is strategic to answer the needs of environmental management requested by the European, national and local policies. Several European projects and initiatives aim to share, integrate and make accessible large amount of environmental data in order to overcome cross-
border/language/cultural barriers. To this purpose, environmental thesauri are used as shared nomenclatures in metadata compilation and information discovery, and they are increasingly made available on the web.
This paper provides a methodological approach for creating a catalogue of the environmental thesauri available on the web and assessing their reusability with respect to domain independent criteria. It highlights critical issues providing some recommendations for improving thesauri reusability.
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
The document discusses scientific workflows, provenance, and linked data. It covers:
1) Scientific workflows can automate data analysis at scale, abstract complex processes, and capture provenance for transparency.
2) Provenance represents the origin and history of data and can be represented using standards like PROV. It allows reasoning about how results were produced.
3) Capturing and publishing provenance as linked open data can help make scientific results more reusable and queryable, but challenges remain around multi-site studies and producing human-readable reports.
The document discusses the history and challenges of software component search and retrieval. It covers early information retrieval systems from the 1940s-1950s and issues with unstructured, semi-structured and structured documents. The document also discusses the goals of software component retrieval, such as exact and approximate reuse. It analyzes different component search tools and approaches developed from the 1990s-2000s, including facet classification, spreading activation techniques, and context-aware recommendation systems.
Searching Heterogenous E Learning Resourcesimranlatif
The document discusses frameworks and projects for improving the discovery and interoperability of e-learning resources across different systems. It describes the e-Learning Framework (ELF) which provides common services and data models. The d+ project aims to develop search services and a toolkit allowing searches across heterogeneous repositories. Metadata, repositories, and service interfaces need to be mapped to ensure interoperability. Examples of using the services for searches from within learning management systems or on mobile devices are also discussed.
The CEDAR Workbench offers web services for creating and submitting biomedical metadata. Members of the AIRR community developed templates for the BioProject, BioSample, and SRA repositories using CEDAR. CEDAR templates precisely define fields to control how users see and fill them out. Metadata entries can be updated over time in CEDAR until ready for submission to repositories like SRA, with CEDAR returning status messages.
Searching Repositories of Web Application ModelsMarco Brambilla
Project repositories are a central asset in software development, as they preserve the technical knowledge gathered in past development activities. However, locating relevant information in a vast project repository is problematic, because it requires manually tagging projects with accurate metadata, an activity which is time consuming and prone to errors and omissions. This paper investigates the use of classical Information Retrieval techniques for easing the discovery of useful information from past projects. Differently from approaches based on textual search over the source code of applications or on querying structured metadata, we propose to index and search the models of applications, which are available in companies applying Model-Driven Engineering practices. We contrast alternative index structures and result presentations, and evaluate a prototype implementation on real-world experimental data.
bridging formal semantics and social semantics on the webFabien Gandon
The document summarizes research on bridging formal semantics and social semantics on the web. It discusses:
1) The Wimmics research team which studies web-instrumented machine interactions, communities, and semantics using a multidisciplinary approach and typed graphs.
2) The challenge of analyzing, modeling, and formalizing social semantic web applications for communities by combining formal semantics and social semantics.
3) Examples of past work that have structured folksonomies, combined metric spaces for tags, and analyzed sociograms and social networks.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
B sc it syit sem 3 sem 4 syllabus as per mumbai universitytanujaparihar
This document contains the syllabus for the S.Y.B.Sc. Information Technology program at the University of Mumbai for the academic year 2017-2018. It outlines the courses, course codes, credits, and topics to be covered for semesters 3 and 4. Semester 3 includes courses in Python programming, data structures, computer networks, databases, applied mathematics, and their corresponding practical courses. Semester 4 includes courses in Core Java, embedded systems, computer statistics, software engineering, computer graphics, and their practical courses.
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...Evanta Technologies
C# is a multi-paradigm programming language including robust typing, imperative, declarative, functional, generic, object-oriented , and component-oriented programming castigations. It was developed by Microsoft . C# is one of the programming languages designed for the Common Language Infrastructure.C# is an Object Oriented Programming language, It is introduced specifically for .NET and thus has no backward compatibility issues. C# is a simple, modern.
This document summarizes a paper on keyword-driven SPARQL query generation using background knowledge. It presents an approach that maps keywords to entities, ranks them, identifies relevant graph patterns, and generates SPARQL queries. The approach was evaluated on accuracy metrics such as recall, fuzzy precision and F-score. Results showed higher accuracy for queries finding instances' characteristics or associations compared to those finding similar instances. Future work to improve the approach is also discussed.
The Structure of Computer Science Knowledge NetworkPham Cuong
This document discusses using social network analysis methods to map and analyze the structure of the computer science knowledge network using data from the DBLP and CiteSeerX digital libraries. It describes creating knowledge networks based on bibliographic coupling between publications and venues, visualizing and clustering the networks, and analyzing metrics like betweenness centrality and PageRank to identify interdisciplinary venues and high prestige series. The analysis provides insight into the core areas and interconnectivity of fields within computer science.
ICMT 2016: Search-Based Model Transformations with MOMoTMartin Fleck
A tool demonstration of the MOMoT framework for the ICMT conference 2016 in Vienna. The demonstration shows the application of search-based model transformations on the Class Responsibility Assignment problem.
The document summarizes various techniques for retrieving reusable software components from a repository, and proposes a combined technique. It discusses keyword search, full-text retrieval, hypertext search, enumerated classification, attribute-value classification, faceted classification, signature matching, and behavioral matching. It notes disadvantages to signature and behavioral matching alone. The proposed technique combines signature and behavioral matching to minimize their individual disadvantages by considering both signatures and behaviors during matching. An example compares component retrieval results using only signature matching, only behavioral matching, and the combined approach.
This document summarizes an algorithm to detect algorithm names in computer science research papers. It involves converting PDFs to text, performing named entity recognition to extract noun phrases, filtering entities to remove author names and locations, and using a word2vec model trained on computer science papers to classify extracted tokens as true algorithm names or noisy data by comparing their similarity to known positives and negatives. The top similar words are used to label each token as a true or false positive for an algorithm name.
INDUS: A System for Information Integration and Knowledge Acquisition from Au...Jie Bao
INDUS is a system for integrating and acquiring knowledge from autonomous, distributed, and semantically heterogeneous biological data sources. It provides tools for specifying ontologies, schemas, mappings between data sources and user views. Users such as domain experts can register data sources, define views over sources of interest, and formulate queries across multiple sources. Ongoing work focuses on supporting more expressive ontologies and mappings, scalability, and query optimization to ensure semantic integrity when querying distributed heterogeneous data sources.
The document describes cross-language information retrieval (CLIR) and summarizes an English-Chinese information retrieval system called ECIRS. ECIRS allows users to input queries in English and retrieves relevant Chinese documents through translation. It includes dictionaries, document indexes, and a Chinese search engine. Screenshots show the user interface where a user can enter an English keyword, view its Chinese translation, and see search results in Chinese.
The document discusses cross-language information retrieval (CLIR). It notes that while there are over 6,000 languages, 80% of websites are in English, creating a need for CLIR. CLIR aims to retrieve relevant documents in languages different from the query language. It is an important area as it allows for global information exchange and knowledge sharing, with applications in national security, access to foreign patents and medical information. CLIR draws on multiple disciplines including information retrieval, natural language processing and machine translation.
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Riccardo Albertoni
The development of a Spatial Data Infrastructure (SDI) at
European level is strategic to answer the needs of environmental management requested by the European, national and local policies. Several European projects and initiatives aim to share, integrate and make accessible large amount of environmental data in order to overcome cross-
border/language/cultural barriers. To this purpose, environmental thesauri are used as shared nomenclatures in metadata compilation and information discovery, and they are increasingly made available on the web.
This paper provides a methodological approach for creating a catalogue of the environmental thesauri available on the web and assessing their reusability with respect to domain independent criteria. It highlights critical issues providing some recommendations for improving thesauri reusability.
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
The document discusses scientific workflows, provenance, and linked data. It covers:
1) Scientific workflows can automate data analysis at scale, abstract complex processes, and capture provenance for transparency.
2) Provenance represents the origin and history of data and can be represented using standards like PROV. It allows reasoning about how results were produced.
3) Capturing and publishing provenance as linked open data can help make scientific results more reusable and queryable, but challenges remain around multi-site studies and producing human-readable reports.
The document discusses the history and challenges of software component search and retrieval. It covers early information retrieval systems from the 1940s-1950s and issues with unstructured, semi-structured and structured documents. The document also discusses the goals of software component retrieval, such as exact and approximate reuse. It analyzes different component search tools and approaches developed from the 1990s-2000s, including facet classification, spreading activation techniques, and context-aware recommendation systems.
Searching Heterogenous E Learning Resourcesimranlatif
The document discusses frameworks and projects for improving the discovery and interoperability of e-learning resources across different systems. It describes the e-Learning Framework (ELF) which provides common services and data models. The d+ project aims to develop search services and a toolkit allowing searches across heterogeneous repositories. Metadata, repositories, and service interfaces need to be mapped to ensure interoperability. Examples of using the services for searches from within learning management systems or on mobile devices are also discussed.
The CEDAR Workbench offers web services for creating and submitting biomedical metadata. Members of the AIRR community developed templates for the BioProject, BioSample, and SRA repositories using CEDAR. CEDAR templates precisely define fields to control how users see and fill them out. Metadata entries can be updated over time in CEDAR until ready for submission to repositories like SRA, with CEDAR returning status messages.
Searching Repositories of Web Application ModelsMarco Brambilla
Project repositories are a central asset in software development, as they preserve the technical knowledge gathered in past development activities. However, locating relevant information in a vast project repository is problematic, because it requires manually tagging projects with accurate metadata, an activity which is time consuming and prone to errors and omissions. This paper investigates the use of classical Information Retrieval techniques for easing the discovery of useful information from past projects. Differently from approaches based on textual search over the source code of applications or on querying structured metadata, we propose to index and search the models of applications, which are available in companies applying Model-Driven Engineering practices. We contrast alternative index structures and result presentations, and evaluate a prototype implementation on real-world experimental data.
bridging formal semantics and social semantics on the webFabien Gandon
The document summarizes research on bridging formal semantics and social semantics on the web. It discusses:
1) The Wimmics research team which studies web-instrumented machine interactions, communities, and semantics using a multidisciplinary approach and typed graphs.
2) The challenge of analyzing, modeling, and formalizing social semantic web applications for communities by combining formal semantics and social semantics.
3) Examples of past work that have structured folksonomies, combined metric spaces for tags, and analyzed sociograms and social networks.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
B sc it syit sem 3 sem 4 syllabus as per mumbai universitytanujaparihar
This document contains the syllabus for the S.Y.B.Sc. Information Technology program at the University of Mumbai for the academic year 2017-2018. It outlines the courses, course codes, credits, and topics to be covered for semesters 3 and 4. Semester 3 includes courses in Python programming, data structures, computer networks, databases, applied mathematics, and their corresponding practical courses. Semester 4 includes courses in Core Java, embedded systems, computer statistics, software engineering, computer graphics, and their practical courses.
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...Evanta Technologies
C# is a multi-paradigm programming language including robust typing, imperative, declarative, functional, generic, object-oriented , and component-oriented programming castigations. It was developed by Microsoft . C# is one of the programming languages designed for the Common Language Infrastructure.C# is an Object Oriented Programming language, It is introduced specifically for .NET and thus has no backward compatibility issues. C# is a simple, modern.
This document summarizes a paper on keyword-driven SPARQL query generation using background knowledge. It presents an approach that maps keywords to entities, ranks them, identifies relevant graph patterns, and generates SPARQL queries. The approach was evaluated on accuracy metrics such as recall, fuzzy precision and F-score. Results showed higher accuracy for queries finding instances' characteristics or associations compared to those finding similar instances. Future work to improve the approach is also discussed.
The Structure of Computer Science Knowledge NetworkPham Cuong
This document discusses using social network analysis methods to map and analyze the structure of the computer science knowledge network using data from the DBLP and CiteSeerX digital libraries. It describes creating knowledge networks based on bibliographic coupling between publications and venues, visualizing and clustering the networks, and analyzing metrics like betweenness centrality and PageRank to identify interdisciplinary venues and high prestige series. The analysis provides insight into the core areas and interconnectivity of fields within computer science.
ICMT 2016: Search-Based Model Transformations with MOMoTMartin Fleck
A tool demonstration of the MOMoT framework for the ICMT conference 2016 in Vienna. The demonstration shows the application of search-based model transformations on the Class Responsibility Assignment problem.
The document summarizes various techniques for retrieving reusable software components from a repository, and proposes a combined technique. It discusses keyword search, full-text retrieval, hypertext search, enumerated classification, attribute-value classification, faceted classification, signature matching, and behavioral matching. It notes disadvantages to signature and behavioral matching alone. The proposed technique combines signature and behavioral matching to minimize their individual disadvantages by considering both signatures and behaviors during matching. An example compares component retrieval results using only signature matching, only behavioral matching, and the combined approach.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
This document discusses integrating faculty and scientist profiling data from external sources like VIVO and ScientistsDB into the Eureka Research Workbench system. It presents the motivations for enhancing collaboration features in Eureka. Details are provided on accessing data through the APIs of VIVO, ScientistsDB, and Elasticsearch to profile and search for researchers. The implementation involves developing CakePHP plugins to ingest data from these sources. Future plans include expanding the number of integrated sources to better enable finding collaborators.
This document summarizes Pasquale Lops' presentation on semantics-aware content-based recommender systems. It discusses how content-based recommender systems traditionally rely on keyword profiles but have limitations due to issues like multi-word concepts, synonymy, and polysemy. The presentation proposes leveraging semantic text analytics techniques like word sense disambiguation, explicit semantic analysis using Wikipedia, and linked open data to move beyond keyword profiles and address issues like overspecialization and lack of serendipity in recommendations.
Open Archives Initiative Object Reuse and Exchangelagoze
This document discusses infrastructure to support new models of scholarly publication by enabling interoperability across repositories through common data modeling and services. It proposes building blocks like repositories, digital objects, a common data model, serialization formats, and core services. This would allow components like publications and data to move across repositories and workflows, facilitating reuse and new value-added services that expose the scholarly communication process.
The document presents a taxonomy called ProMeTA for classifying program metamodels used in program reverse engineering. ProMeTA defines dimensions for characterizing metamodels such as target language, abstraction level, meta-language, and quality attributes. The taxonomy is used to analyze and classify five popular metamodels (ASTM, KDM, FAMIX, SPOOL, UNIQ). Key findings are that metamodels should support widely used languages and standards for long-term use and provide robust functionality and quality.
Similar to Uncovering Library Features from API Usage on Stack Overflow (20)
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfBen Linders
Psychological safety in teams is important; team members must feel safe and able to communicate and collaborate effectively to deliver value. It’s also necessary to build long-lasting teams since things will happen and relationships will be strained.
But, how safe is a team? How can we determine if there are any factors that make the team unsafe or have an impact on the team’s culture?
In this mini-workshop, we’ll play games for psychological safety and team culture utilizing a deck of coaching cards, The Psychological Safety Cards. We will learn how to use gamification to gain a better understanding of what’s going on in teams. Individuals share what they have learned from working in teams, what has impacted the team’s safety and culture, and what has led to positive change.
Different game formats will be played in groups in parallel. Examples are an ice-breaker to get people talking about psychological safety, a constellation where people take positions about aspects of psychological safety in their team or organization, and collaborative card games where people work together to create an environment that fosters psychological safety.
This presentation by Katharine Kemp, Associate Professor at the Faculty of Law & Justice at UNSW Sydney, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative randomization techniques in neural networks, the potential of reservoir computing, and the cutting-edge realm of neuromorphic computing. This webinar aims to connect theoretical knowledge with practical applications and provide insights into how these innovative approaches can lead to more robust, efficient, and environmentally conscious AI systems.
Webinar Speaker: Prof. Claudio Gallicchio, Assistant Professor, University of Pisa
Claudio Gallicchio is an Assistant Professor at the Department of Computer Science of the University of Pisa, Italy. His research involves merging concepts from Deep Learning, Dynamical Systems, and Randomized Neural Systems, and he has co-authored over 100 scientific publications on the subject. He is the founder of the IEEE CIS Task Force on Reservoir Computing, and the co-founder and chair of the IEEE Task Force on Randomization-based Neural Networks and Learning Systems. He is an associate editor of IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
1.) Introduction
Our Movement is not new; it is the same as it was for Freedom, Justice, and Equality since we were labeled as slaves. However, this movement at its core must entail economics.
2.) Historical Context
This is the same movement because none of the previous movements, such as boycotts, were ever completed. For some, maybe, but for the most part, it’s just a place to keep your stable until you’re ready to assimilate them into your system. The rest of the crabs are left in the world’s worst parts, begging for scraps.
3.) Economic Empowerment
Our Movement aims to show that it is indeed possible for the less fortunate to establish their economic system. Everyone else – Caucasian, Asian, Mexican, Israeli, Jews, etc. – has their systems, and they all set up and usurp money from the less fortunate. So, the less fortunate buy from every one of them, yet none of them buy from the less fortunate. Moreover, the less fortunate really don’t have anything to sell.
4.) Collaboration with Organizations
Our Movement will demonstrate how organizations such as the National Association for the Advancement of Colored People, National Urban League, Black Lives Matter, and others can assist in creating a much more indestructible Black Wall Street.
5.) Vision for the Future
Our Movement will not settle for less than those who came before us and stopped before the rights were equal. The economy, jobs, healthcare, education, housing, incarceration – everything is unfair, and what isn’t is rigged for the less fortunate to fail, as evidenced in society.
6.) Call to Action
Our movement has started and implemented everything needed for the advancement of the economic system. There are positions for only those who understand the importance of this movement, as failure to address it will continue the degradation of the people deemed less fortunate.
No, this isn’t Noah’s Ark, nor am I a Prophet. I’m just a man who wrote a couple of books, created a magnificent website: http://www.thearkproject.llc, and who truly hopes to try and initiate a truly sustainable economic system for deprived people. We may not all have the same beliefs, but if our methods are tried, tested, and proven, we can come together and help others. My website: http://www.thearkproject.llc is very informative and considerably controversial. Please check it out, and if you are afraid, leave immediately; it’s no place for cowards. The last Prophet said: “Whoever among you sees an evil action, then let him change it with his hand [by taking action]; if he cannot, then with his tongue [by speaking out]; and if he cannot, then, with his heart – and that is the weakest of faith.” [Sahih Muslim] If we all, or even some of us, did this, there would be significant change. We are able to witness it on small and grand scales, for example, from climate control to business partnerships. I encourage, invite, and challenge you all to support me by visiting my website.
This presentation by OECD, OECD Secretariat, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by Tim Capel, Director of the UK Information Commissioner’s Office Legal Service, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
• For a full set of 530+ questions. Go to
https://skillcertpro.com/product/servicenow-cis-itsm-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
This presentation by Professor Giuseppe Colangelo, Jean Monnet Professor of European Innovation Policy, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
Gamify it until you make it Improving Agile Development and Operations with ...Ben Linders
So many challenges, so little time. While we’re busy developing software and keeping it operational, we also need to sharpen the saw, but how? Gamification can be a way to look at how you’re doing and find out where to improve. It’s a great way to have everyone involved and get the best out of people.
In this presentation, Ben Linders will show how playing games with the DevOps coaching cards can help to explore your current development and deployment (DevOps) practices and decide as a team what to improve or experiment with.
The games that we play are based on an engagement model. Instead of imposing change, the games enable people to pull in ideas for change and apply those in a way that best suits their collective needs.
By playing games, you can learn from each other. Teams can use games, exercises, and coaching cards to discuss values, principles, and practices, and share their experiences and learnings.
Different game formats can be used to share experiences on DevOps principles and practices and explore how they can be applied effectively. This presentation provides an overview of playing formats and will inspire you to come up with your own formats.
Gamify it until you make it Improving Agile Development and Operations with ...
Uncovering Library Features from API Usage on Stack Overflow
1. Uncovering Library Features from API
Usage on Stack Overflow
The 29th IEEE International Conference on Software Analysis, Evolution and Reengineering
Camilo Velázquez-Rodríguez, Eleni Constantinou and Coen De Roover
March 15th, 2022
@cvelazquezr
7. Introduction
API Features Features Use of Features
Persistent or thread-safe?
Methods for sorting?
Methods for reversing?
Filter a collection?
Transform collections?
Single method call?
Group of method calls?
Static or instance method calls?
How do libraries compare feature-wise?
2
8. Introduction
“… a set of data structures (i.e., fields and classes) and operations
(i.e., functions and methods) participating in the realisation of a
(…) functionality.”
“… a set of API calls with a corresponding name.”
[1] - Antoniol, G. & Guéhéneuc, Y.-G. Feature Identification: A Novel Approach and a Case Study. 21st Ieee Int Conf Softw Maintenance Icsm’05 1–10 (2005).
[2] - Kanda, T., Manabe, Y., Ishio, T., Matsushita, M. & Inoue, K. Semi-Automatically Extracting Features from Source Code of Android Applications. Ieice T Inf Syst, 2857–2859 (2013). 3
9. Approach
4
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
10. Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache
BCEL
11. Approach
5
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Guava
…
Apache
BCEL
12. 6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
13. 6
6
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Approach
14. Approach
7
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles
- Body of the Question
- Body of the Answer
- Method Names
- Method Names split by Camel
Case
- API Calls
Textual Information API Usage Information
15. 7
Jaccard
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Approach
TF-IDF
model
[0.7, -0.12, ..., 0.5]
[0.1, 0.45, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 0.12]
Cosine
Similarity
[1.0, -0.12, ..., 0.5]
[0.1, 1.0, ..., 0.2]
[0.4, -0.5, ..., 0.98]
[0.37, 0.19, ..., 0.32]
[0.81, 0.44, ..., 1.0]
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
- Post Titles
- Body of the Question
- Body of the Answer
- Method Names
- Method Names split by Camel
Case
- API Calls
Textual Information API Usage Information
16. Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
17. Approach
8
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
[3] - P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
18. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
19. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
20. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
21. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
22. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
23. Approach
9
Select
Library
Selected
Library
Collect Library
Information
groupID,
artifactID
1 A
Collect Answers from
Stack Overflow
Answers
from SO
Name
Public Class
Names for Library
B
3
C
Filter Answers
with Code
Filtered
Answers
Filter Answers
with API Usages
4
5
D
API
Usage
Textual
Information
Text
Vectorisation E
Combine
Vectors
Vectors
Combined
Cluster
Clusters of
SO Usages
Features
Name
Clusters
F
H
J
L
7
9
11
2
6
Calculate
Similarity
Similarity
Matrix
I
8
A
B
C
D
Selection of
Clusters
10
Selected
Clusters
K
Formed clusters would ideally have
frequent API calls, characterising the
code-part of the cluster.
[4] - Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. "LOF: identifying density-based local outliers." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93-104. 2000.
[5] - C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in The 52nd Annual Meeting of the Association for Compu- tational Linguistics: System Demonstrations, pp. 55–60. 2014.
24. Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters?
10
Text Code
[6] - L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990, vol. 344.
25. Evaluation
RQ1. Which combination of SO answer attributes produces the most cohesive clusters?
11
The Methods combination from the
API usage information attributes
produced the most cohesive
clusters.
26. Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features?
12
Generated features
27. Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features?
13
The majority of the tutorial features is
part of our uncovered features.
28. Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features?
14
When considered against the usages
in the SO data, the matched accuracy
increases.
29. Evaluation
RQ2. How similar are the automatically uncovered features to documented tutorial features?
15
On average, our uncovered features
are very similar to those in tutorials
denoted by the relevance score.
30. Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects?
17
Guava HttpClient JFreeChart …
31. Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects?
18
The majority of our uncovered features
are also used in GitHub client projects.
32. Evaluation
RQ3. To which extent do the uncovered features that do not match documented tutorial features
correspond to actual API usage in client projects?
19
The decrease in the relevance might
be caused by a less-focused usage of
the features in GitHub.
33. Examples of uncovered features
21
PDFBox Several libraries
Good features Bad features