This document proposes a methodology for discovering patterns in scientific literature using a case study of digital library evaluation. It involves:
1. Classifying documents to identify relevant papers using naive Bayes classification.
2. Semantically annotating papers with concepts from a Digital Library Evaluation Ontology using the GoNTogle annotation tool. Over 2,600 annotations were generated.
3. Clustering the annotated papers into coherent groups using k-means clustering.
4. Interpreting the clusters with the assistance of the ontology to discover patterns and trends in the literature. Benchmarking tests were performed to evaluate effectiveness of the methodology.
This is the presentation slides for the joint conference of the 134th SIG conference of Information Fundamentals and Access Technologies (IFAT) and 112th SIG conference of Document Communication (DC), Information Processing Society of Japan (IPSJ)March 22, 2019, at Toyo University, Hakusan Campus.
Cite: Kei Kurakawa, Yuan Sun, and Satoko Ando, Applying a new subject classification scheme for a database by a data-driven correspondence, IPSJ SIG Technical Report, Vol.2019-IFAT-134/2019-DC-112, No.7, pp.1-10, (2019).
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
This study analyzed the referencing of grey literature in scientific papers from the ACM Digital Library and proposed techniques to boost the retrieval of grey literature in scientific paper information retrieval systems. The study found that grey literature materials were referenced in around 16% of the bibliographic references analyzed. It then proposed a boosting technique that assigns a boosting weight to increase the ranking of documents that are grey literature or reference grey literature frequently. An experiment found the boosting technique improved retrieval of grey literature for literature search queries compared to baseline techniques. The study concluded by discussing limitations and opportunities for future work applying these techniques to recommender systems.
This document presents an overview of using machine learning methods for question classification. It discusses past research that has used features like words, parts of speech tags, and named entities with classifiers like SNoW and SVMs. Later works incorporated semantic features from WordNet, like hypernyms of the question's head word. The document outlines a plan to experiment with various feature types and classifiers using existing question classification datasets and open-source NLP tools. The goal is to automatically generate semantic features to improve over prior approaches.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
The document compares techniques for measuring research coverage in scientific papers, including HITS, Topical and Peripheral Coverage (TPC), and Topical Coverage (TC). An experiment on papers in information retrieval found that TPC provided the best results by identifying a diverse set of papers, recent papers, and papers covering various sub-topics. TPC and TC performed better than HITS at identifying seminal papers. Combining TPC and HITS may further improve identification of survey papers. The document concludes that the integrated TPC and HITS technique is best for building initial reading lists for literature review.
Using DITA's Subject Scheme Support for Educational Assessment ContentEdwina Lui
The document discusses Kaplan Test Prep's use of subject schemes to classify educational assessment content. It proposes using multiple subject schemes to define hierarchical classifications for tests, sections, and question types. This includes subject definition documents to specify classification values and labels, as well as relationship tables to define permissible hierarchies. Metadata elements would then apply these classification values to content to drive functionality like search and adaptive testing. The approach aims to standardize classifications while accounting for variations between tests.
A task-based scientific paper recommender system for literature review and ma...Aravind Sesagiri Raamkumar
My PhD oral defense presentation (as of Oct 3rd 2017)
The dissertation can be requested at this link https://www.researchgate.net/publication/323308750_A_task-based_scientific_paper_recommender_system_for_literature_review_and_manuscript_preparation
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
Long paper presented during the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)
This is the presentation slides for the joint conference of the 134th SIG conference of Information Fundamentals and Access Technologies (IFAT) and 112th SIG conference of Document Communication (DC), Information Processing Society of Japan (IPSJ)March 22, 2019, at Toyo University, Hakusan Campus.
Cite: Kei Kurakawa, Yuan Sun, and Satoko Ando, Applying a new subject classification scheme for a database by a data-driven correspondence, IPSJ SIG Technical Report, Vol.2019-IFAT-134/2019-DC-112, No.7, pp.1-10, (2019).
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
This study analyzed the referencing of grey literature in scientific papers from the ACM Digital Library and proposed techniques to boost the retrieval of grey literature in scientific paper information retrieval systems. The study found that grey literature materials were referenced in around 16% of the bibliographic references analyzed. It then proposed a boosting technique that assigns a boosting weight to increase the ranking of documents that are grey literature or reference grey literature frequently. An experiment found the boosting technique improved retrieval of grey literature for literature search queries compared to baseline techniques. The study concluded by discussing limitations and opportunities for future work applying these techniques to recommender systems.
This document presents an overview of using machine learning methods for question classification. It discusses past research that has used features like words, parts of speech tags, and named entities with classifiers like SNoW and SVMs. Later works incorporated semantic features from WordNet, like hypernyms of the question's head word. The document outlines a plan to experiment with various feature types and classifiers using existing question classification datasets and open-source NLP tools. The goal is to automatically generate semantic features to improve over prior approaches.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
The document compares techniques for measuring research coverage in scientific papers, including HITS, Topical and Peripheral Coverage (TPC), and Topical Coverage (TC). An experiment on papers in information retrieval found that TPC provided the best results by identifying a diverse set of papers, recent papers, and papers covering various sub-topics. TPC and TC performed better than HITS at identifying seminal papers. Combining TPC and HITS may further improve identification of survey papers. The document concludes that the integrated TPC and HITS technique is best for building initial reading lists for literature review.
Using DITA's Subject Scheme Support for Educational Assessment ContentEdwina Lui
The document discusses Kaplan Test Prep's use of subject schemes to classify educational assessment content. It proposes using multiple subject schemes to define hierarchical classifications for tests, sections, and question types. This includes subject definition documents to specify classification values and labels, as well as relationship tables to define permissible hierarchies. Metadata elements would then apply these classification values to content to drive functionality like search and adaptive testing. The approach aims to standardize classifications while accounting for variations between tests.
A task-based scientific paper recommender system for literature review and ma...Aravind Sesagiri Raamkumar
My PhD oral defense presentation (as of Oct 3rd 2017)
The dissertation can be requested at this link https://www.researchgate.net/publication/323308750_A_task-based_scientific_paper_recommender_system_for_literature_review_and_manuscript_preparation
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
Long paper presented during the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)
The document provides guidance on writing scientific research papers. It discusses the objectives of scientific research which include observing phenomena, developing hypotheses, testing hypotheses through experiments, and explaining results. It also outlines the typical structure of a research paper, including the introduction, literature review, methodology, results, discussion, and conclusion sections. Tips are provided for writing each section effectively, such as stating the research question or hypothesis in the introduction and interpreting findings in the discussion section.
Researcher KnowHow session presented by Ruaraidh Hill PhD MSc FHEA Lecturer in evidence synthesis at the University of Liverpool and Angela Boland MSc PhD PGCert (LTHE)Director –Liverpool Reviews & Implementation Group
CDISC is a non-profit organization that establishes clinical research data standards to support data acquisition, exchange, and submission. It has developed several standards including CDASH, which aims to standardize data collection fields across clinical trials to streamline data analysis and reduce errors. CDASH defines a set of common safety domains and variables that can be collected consistently across studies in a standardized way. This helps analyze data more efficiently, reduces training time for sites, and decreases potential errors from inconsistent data collection.
Researcher KnowHow session at the University of Liverpool from 15th March 2021 presented by Ruaraidh Hill, Angela Boland, Michelle Maden.
The session provided advice on conducting key activities in a systematic review. It can also provide a ‘top-up’ to the 3 part series of workshops about systematic reviews which ran earlier in the academic session. Suitable for postgraduates and staff planning or doing a systematic review for the first time or who wish to brush up on their knowledge.
It focuses on key steps in doing a systematic review. It offers brief practical advice, showcase tools and share top tips for progressing your review.
Researcher KnowHow session presented by Judith Carr, Research Data Manager and co-ordinated by Gary Jeffers, Research Data Officer at University of Liverpool Library.
This document provides information about evidence-based resources available through an e-library. It begins with an overview of key e-library databases like DynaMed, Nursing Reference Center, and STAT!Ref. It then discusses how to effectively search within databases using Boolean operators, truncation, and other search techniques. The document concludes by emphasizing the value of evidence-based resources for supporting high-quality patient care and decision-making.
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...Jon Phillips
Five years ago the actual implementation of an accessible worldwide digital library or archive existed in the land of fairytales. With the rise of Free and Open culture, decreased hardware costs, and cheap Internet access in some countries of the world, the ability to actualize these myths on a grand scale on the Internet became possible. Still however, the legal hurdles randomly scattered by copyright in jurisdictions around the world erected a major barricade for accessing knowledge. Copyright law generally has increased confusion around how creative works may be used. With the introduction of Creative Commons in 2003, these issues were addressed with clearly explained copyright licenses, a clear public domain dedication, and a brilliant international community consisting of 46+ International jurisdictions supporting the commons.
This presentation surveys the major digital archiving initiatives,
museums, and digital libraries around the world which use Creative Commons licenses. It also presents Creative Commons involvement with the Open Library (http://demo.openlibrary.org) to create a site where books and other media are edited collaboratively wiki-style by people around the world to help determine the copyright status of these works. The myths of lore are to be debunked.
The document describes a library management system created by Purbanchal University students to systematically manage library records and transactions. The system allows users to add, modify, delete, search, issue, and deposit books. It also tracks member details. The system aims to make the library management process faster and less error-prone compared to a manual system. It uses functions, header files, and other programming elements to manage the database of books and members. Some areas for improvement include tracking whether students have returned all books before deleting records and calculating overdue fines.
This document discusses different library management systems including indigenous, barcode, and RFID systems. The indigenous system uses Excel to manage tasks like member registration, book purchasing, and inventory. The barcode system uses barcodes on books and member cards to automate circulation. RFID uses radio frequency technology to track library assets and automate check-in, search, check-out, and return of materials without human intervention. Both barcode and RFID systems provide benefits like faster transactions and improved security but also have some limitations.
The document provides information about a library management system project for an education institute. It discusses the need to automate the library's processes to make it more efficient. Some key points include:
- The existing manual system has limitations like time consumption, difficulty in searching and maintaining records.
- The new system aims to address these issues and make operations like book searching, issuing and returning faster and easier for students and staff.
- It will also facilitate generating various reports and calculating late fees for overdue books.
This document is a project report submitted by Aaditya Shah for his AISSCE examination in 2013-2014 on a Library Management System created under the guidance of Sanjay Parmar. The report includes a declaration by Aaditya Shah, an acknowledgement thanking those who supported the project, and a certificate signed by the principal and teacher confirming the project fulfillment. The report then provides an introduction to the Library Management System software created, an analysis of the existing manual library system and benefits of the proposed computerized system, a feasibility analysis, hardware and software requirements, descriptions of the system interface and design.
Library mangement system project srs documentation.docjimmykhan
The document describes a library management system created in Java. It has four main modules: inserting data into the database, extracting data from the database, generating reports on borrowed and available books, and a search facility. The proposed system automates library processes like adding members and books, searching, borrowing and returning books. This makes transactions faster and reduces errors compared to the manual existing system. The system was implemented using Java, MS Access for the database, and designed to run on Windows operating systems. Testing was done to check functionality and ensure all requirements were met.
Postulate Approach to Library Classification
Normative Principles
Three Planes of Work
Modes of Formation of Subjects
Systems Approach to the Study of Subjects
Depth Classification
Classification in Electronic Environment
Classificatory basis for metadata
Knowledge Organization
This document outlines a course on Knowledge Representation (KR) on the Web. The course aims to expose students to challenges of applying traditional KR techniques to the scale and heterogeneity of data on the Web. Students will learn about representing Web data through formal knowledge graphs and ontologies, integrating and reasoning over distributed datasets, and how characteristics such as volume, variety and veracity impact KR approaches. The course involves lectures, literature reviews, and milestone projects where students publish papers on building semantic systems, modeling Web data, ontology matching, and reasoning over large knowledge graphs.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
This document provides an overview of the Next Generation Science Standards. It discusses that the standards were developed by Achieve in partnership with other organizations to create science standards focused on big ideas. It describes the Framework for K-12 Science Education that the standards are based on, which outlines three dimensions for each standard. It then explains the organization and structure of the Next Generation Science Standards, comparing them to previous standards.
This document provides an overview of research methodology. It defines research as a systematic, careful investigation to gain new knowledge. The objectives of research include gaining new insights, accurately portraying characteristics of groups, analyzing associations between variables, and examining causal relationships. Research methods are the techniques used, while research methodology is the systematic approach. Good research is systematic, logical, empirical, and replicable. The research process involves defining the problem, reviewing literature, formulating hypotheses, designing the study, collecting and analyzing data, interpreting results, and reporting findings. Defining the research problem clearly is crucial. Literature review helps refine the problem, justify the topic, and identify appropriate methodologies.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
The document provides guidance on writing scientific research papers. It discusses the objectives of scientific research which include observing phenomena, developing hypotheses, testing hypotheses through experiments, and explaining results. It also outlines the typical structure of a research paper, including the introduction, literature review, methodology, results, discussion, and conclusion sections. Tips are provided for writing each section effectively, such as stating the research question or hypothesis in the introduction and interpreting findings in the discussion section.
Researcher KnowHow session presented by Ruaraidh Hill PhD MSc FHEA Lecturer in evidence synthesis at the University of Liverpool and Angela Boland MSc PhD PGCert (LTHE)Director –Liverpool Reviews & Implementation Group
CDISC is a non-profit organization that establishes clinical research data standards to support data acquisition, exchange, and submission. It has developed several standards including CDASH, which aims to standardize data collection fields across clinical trials to streamline data analysis and reduce errors. CDASH defines a set of common safety domains and variables that can be collected consistently across studies in a standardized way. This helps analyze data more efficiently, reduces training time for sites, and decreases potential errors from inconsistent data collection.
Researcher KnowHow session at the University of Liverpool from 15th March 2021 presented by Ruaraidh Hill, Angela Boland, Michelle Maden.
The session provided advice on conducting key activities in a systematic review. It can also provide a ‘top-up’ to the 3 part series of workshops about systematic reviews which ran earlier in the academic session. Suitable for postgraduates and staff planning or doing a systematic review for the first time or who wish to brush up on their knowledge.
It focuses on key steps in doing a systematic review. It offers brief practical advice, showcase tools and share top tips for progressing your review.
Researcher KnowHow session presented by Judith Carr, Research Data Manager and co-ordinated by Gary Jeffers, Research Data Officer at University of Liverpool Library.
This document provides information about evidence-based resources available through an e-library. It begins with an overview of key e-library databases like DynaMed, Nursing Reference Center, and STAT!Ref. It then discusses how to effectively search within databases using Boolean operators, truncation, and other search techniques. The document concludes by emphasizing the value of evidence-based resources for supporting high-quality patient care and decision-making.
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...Jon Phillips
Five years ago the actual implementation of an accessible worldwide digital library or archive existed in the land of fairytales. With the rise of Free and Open culture, decreased hardware costs, and cheap Internet access in some countries of the world, the ability to actualize these myths on a grand scale on the Internet became possible. Still however, the legal hurdles randomly scattered by copyright in jurisdictions around the world erected a major barricade for accessing knowledge. Copyright law generally has increased confusion around how creative works may be used. With the introduction of Creative Commons in 2003, these issues were addressed with clearly explained copyright licenses, a clear public domain dedication, and a brilliant international community consisting of 46+ International jurisdictions supporting the commons.
This presentation surveys the major digital archiving initiatives,
museums, and digital libraries around the world which use Creative Commons licenses. It also presents Creative Commons involvement with the Open Library (http://demo.openlibrary.org) to create a site where books and other media are edited collaboratively wiki-style by people around the world to help determine the copyright status of these works. The myths of lore are to be debunked.
The document describes a library management system created by Purbanchal University students to systematically manage library records and transactions. The system allows users to add, modify, delete, search, issue, and deposit books. It also tracks member details. The system aims to make the library management process faster and less error-prone compared to a manual system. It uses functions, header files, and other programming elements to manage the database of books and members. Some areas for improvement include tracking whether students have returned all books before deleting records and calculating overdue fines.
This document discusses different library management systems including indigenous, barcode, and RFID systems. The indigenous system uses Excel to manage tasks like member registration, book purchasing, and inventory. The barcode system uses barcodes on books and member cards to automate circulation. RFID uses radio frequency technology to track library assets and automate check-in, search, check-out, and return of materials without human intervention. Both barcode and RFID systems provide benefits like faster transactions and improved security but also have some limitations.
The document provides information about a library management system project for an education institute. It discusses the need to automate the library's processes to make it more efficient. Some key points include:
- The existing manual system has limitations like time consumption, difficulty in searching and maintaining records.
- The new system aims to address these issues and make operations like book searching, issuing and returning faster and easier for students and staff.
- It will also facilitate generating various reports and calculating late fees for overdue books.
This document is a project report submitted by Aaditya Shah for his AISSCE examination in 2013-2014 on a Library Management System created under the guidance of Sanjay Parmar. The report includes a declaration by Aaditya Shah, an acknowledgement thanking those who supported the project, and a certificate signed by the principal and teacher confirming the project fulfillment. The report then provides an introduction to the Library Management System software created, an analysis of the existing manual library system and benefits of the proposed computerized system, a feasibility analysis, hardware and software requirements, descriptions of the system interface and design.
Library mangement system project srs documentation.docjimmykhan
The document describes a library management system created in Java. It has four main modules: inserting data into the database, extracting data from the database, generating reports on borrowed and available books, and a search facility. The proposed system automates library processes like adding members and books, searching, borrowing and returning books. This makes transactions faster and reduces errors compared to the manual existing system. The system was implemented using Java, MS Access for the database, and designed to run on Windows operating systems. Testing was done to check functionality and ensure all requirements were met.
Postulate Approach to Library Classification
Normative Principles
Three Planes of Work
Modes of Formation of Subjects
Systems Approach to the Study of Subjects
Depth Classification
Classification in Electronic Environment
Classificatory basis for metadata
Knowledge Organization
This document outlines a course on Knowledge Representation (KR) on the Web. The course aims to expose students to challenges of applying traditional KR techniques to the scale and heterogeneity of data on the Web. Students will learn about representing Web data through formal knowledge graphs and ontologies, integrating and reasoning over distributed datasets, and how characteristics such as volume, variety and veracity impact KR approaches. The course involves lectures, literature reviews, and milestone projects where students publish papers on building semantic systems, modeling Web data, ontology matching, and reasoning over large knowledge graphs.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
This document provides an overview of the Next Generation Science Standards. It discusses that the standards were developed by Achieve in partnership with other organizations to create science standards focused on big ideas. It describes the Framework for K-12 Science Education that the standards are based on, which outlines three dimensions for each standard. It then explains the organization and structure of the Next Generation Science Standards, comparing them to previous standards.
This document provides an overview of research methodology. It defines research as a systematic, careful investigation to gain new knowledge. The objectives of research include gaining new insights, accurately portraying characteristics of groups, analyzing associations between variables, and examining causal relationships. Research methods are the techniques used, while research methodology is the systematic approach. Good research is systematic, logical, empirical, and replicable. The research process involves defining the problem, reviewing literature, formulating hypotheses, designing the study, collecting and analyzing data, interpreting results, and reporting findings. Defining the research problem clearly is crucial. Literature review helps refine the problem, justify the topic, and identify appropriate methodologies.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
A systematic literature review is a formal methodology to systematically identify and evaluate relevant research on a topic. It involves developing a review protocol and search strategy, screening studies for inclusion, assessing study quality, extracting data, and synthesizing findings. The process is more rigorous than a narrative review and aims to minimize bias by being comprehensive and transparent. Key aspects of the systematic review process include developing review questions, searching literature databases and other sources, selecting studies using inclusion/exclusion criteria, assessing study quality, extracting relevant data, and synthesizing the results.
An OWA-Based Multi-Criteria System For Assigning ReviewersDereck Downing
This paper proposes an automated multi-criteria system for assigning reviewers to papers submitted to conferences based on reviewer expertise profiles. The system aggregates information from multiple public sources to create profiles for reviewers based on 12 variables related to their qualifications and experience. It then uses fuzzy logic techniques to match papers to reviewers based on keywords while avoiding conflicts of interest. The system is evaluated on a dataset from a past conference, demonstrating promising results in utilizing publicly available data to inform the reviewer assignment process.
RQ1. What are the differences between e-commerce and s-commerce?
RQ2. What are the characteristics of s-commerce?
RQ3. What are the activities of s-commerce?
RQ4. What are the research themes that are addressed in s-commerce studies?
RQ5. What are the limitations and gaps in current research of s-commerce?
The document outlines the procedures for conducting a systematic literature review on social commerce (s-commerce), including developing research questions, defining a search strategy, selecting studies, assessing study quality, extracting and synthesizing data. The review aims to understand the key concepts of s-commerce, explore common research themes,
This is the presentation slides for the workshop BigScholar 2019 in conjunction with CIKM 2019 (ACM International Conference on Information and Knowledge Management) Nov 7, 2019, at CNCC, Beijing, China.
Citation: Kurakawa K, Sun Y and Ando S (2020) Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence. Front. Big Data 2:48. doi: 10.3389/fdata.2019.00048
This document outlines the stages and processes for conducting a systematic literature review to evaluate discovery tools in open access repositories. It discusses the planning stage which involves identifying the need, developing protocols and research questions. The leading stage is described as identifying relevant research, selecting primary studies, evaluating quality, extracting data and summarizing findings. Specific inclusion/exclusion criteria and search terms are provided as examples. The reporting stage involves answering the research questions based on common data extraction from selected papers.
Systematic Literature Reviews : Concise Overviewyoukayaslam
This document provides an overview of a workshop on systematic approaches to literature reviewing led by Dr. Mark Matthews. The workshop explores elements of the systematic review process and how they can be adapted for thesis literature reviews and keeping up with literature through a PhD. It discusses formulating review questions, systematically searching literature databases and other sources, selecting studies, critically appraising research, analyzing and synthesizing findings, and structuring the writing of literature reviews. Challenges of literature reviews and additional resources are also presented.
The document provides an overview of key concepts in research methodology. It discusses definitions of research, objectives of research such as gaining new insights or testing hypotheses. It covers research design principles like defining variables and controlling for extraneous factors. It also outlines different research designs for exploratory, descriptive and experimental studies. Sample design concepts involving probability and non-probability sampling are presented. Methods of primary data collection like observation, interviews and questionnaires are explained. Finally, it provides guidance on constructing questionnaires and successful interviewing techniques.
This document discusses bibliometrics and their use at Cardiff University. It begins with an introduction to bibliometric measures like citations, impact factors, and altmetrics. It then discusses how bibliometric data is presented in Cardiff's institutional repository and how it was used to provide context for research evaluations in the UK's REF2014 assessment exercise. The document concludes by outlining Cardiff's trial of the SciVal analytics tool and plans for a new research information system to better integrate bibliometric and altmetric data.
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
This document describes a study that uses natural language processing and text mining techniques to identify future work statements in scientific papers and extract keywords from those statements. The researchers developed a multi-step pipeline to first identify the future work section, then select future work sentences within that section. They used rules and algorithms to identify sentences discussing future work. Keywords were then extracted from the selected sentences using the RAKE algorithm. An analysis found that 31.4% of papers contained future work statements, with medical science papers having the highest overlap between future work and title-abstract keywords. The researchers hope this work is a first step toward predicting future research topics.
Similar to Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology (20)
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
Από τη δεκαετία του 1970 μέχρι σήμερα το Διεθνές Συμβούλιο Αρχείων δημιουργεί και παρέχει στην κοινότητα των Αρχείων και των Αρχειονόμων μια σειρά από πρότυπα για την ανάπτυξη αρχειακών βοηθημάτων έρευνας και συναφών καταλόγων και ευρετηρίων. Στόχος των προτύπων είναι η κοινή αντίληψη, προσέγγιση και ομοιομορφία στη δημιουργία καταλόγων, καθιερωμένων εγγραφών και στη θεματική περιγραφή των αρχείων, της δομής και του περιεχομένου τους.
Ο Παγκόσμιος ιστός έχει γίνει ένα από τα σημαντικότερα μέσα διακίνησης πληροφορίας και η εκρηκτική ανάπτυξη των αντίστοιχων τεχνολογιών ανάπτυξης εφαρμογών στο περιβάλλον του έχει οδηγήσει στην αξιοποίησή του από διάφορες κοινότητες. Σε αυτό το πλαίσιο οι Αρχειονόμοι καλούνται να κωδικοποιήσουν τα μεταδεδομένα τους και να τα καταστήσουν ικανά να διαλειτουργήσουν σε ένα παγκόσμιο περιβάλλον διαχείρισης πληροφορίας και γνώσης, όπου όλες οι επιστημονικές κοινότητες συνυπάρχουν.
Στόχος του σεμιναρίου είναι να παρουσιάσει (α) βασικά πρότυπα διαχείρισης αρχειακής πληροφορίας και (β) το τεχνολογικό υπόβαθρο το οποίο καθορίζει τους τρόπους με τους οποίους είναι δυνατή η ανταλλαγή και η διαλειτουργικότητα - διασύνδεση της αρχειακής πληροφορίας με την πληροφορία που παράγουν άλλοι οργανισμοί και κοινότητες με τις οποίες οι αρχειακές υπηρεσίες έχουν άμεση σχέση.
Το σεμινάριο απευθύνεται σε εργαζόμενους σε δημόσιους και ιδιωτικούς αρχειακούς φορείς, φοιτητές και πτυχιούχους αρχειονόμους, βιβλιοθηκονόμους και πτυχιούχους Πανεπιστημίων και ΤΕΙ με παρεμφερή επαγγελματικά και επιστημονικά ενδιαφέροντα.
Το σεμινάριο εντάσσεται στις δραστηριότητες της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας του Ιονίου Πανεπιστημίου, διοργανώνεται στο πλαίσιο του 21st International Conference on Theory and Practice of Digital Libraries και θα διεξαχθεί στο ξενοδοχείο Grand Hotel Palace, Μοναστηρίου 305, Θεσσαλονίκη, την Τρίτη 19 Σεπτεμβρίου 2017 και ώρες 14.00 – 17.00.
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.
Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum
Increasing traceability of physical library items through Koha: the case of S...Giannis Tsakonas
Presentation in KohaCon2016, the major event of Koha community, on May 31, 2016. The Library & Information Center, University of Patras, Greece has developed the SELIDA framework, which integrates a set of standardized and widespread library technologies in order to increase the identification and traceability of physical items, such as books. The framework makes use of RFID tags in order to assign unique identification marks, in the form of URIs that can be globally exchanged. The framework has been implemented in the fully translated and customized Koha installation of our Library and its core services support checking in/out of books and browsing of history transactions with geospatial visualization. Its use can support transactions between various libraries or branches of the same library. The proposed presentation will describe the architecture of the framework and how it connects to Koha, as well as the challenges we faced during its development.
We were group no 2: notes for the MLAS2015 workshopGiannis Tsakonas
Summary note of the discussion of group no 2 in the IFLA MLAS 2015 workshop in Athens, March 12, 2015, involving librarians (from all around the world), book palaces and … Canadian rock groups.
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταGiannis Tsakonas
Παρουσίαση στο πάνελ "Πολιτισµός και νέες τεχνολογίες: από τον αισθητό στον ψηφιακό κόσµο" που διοργανώθηκε στο πλαίσιο της ενότητας "Ελευθέρο Βήμα" του Forum Ανάπτυξης 2014 (Κυριακή 23 Νοεµβρίου 2014 στις 20:30-22:00, Ξενοδοχείο Αστήρ, Αίθουσα ΙΙ).
{Tech}changes: the technological state of Greek Libraries.Giannis Tsakonas
The document summarizes technological changes in Greek libraries over recent years. While Greek libraries were early adopters of technological changes, penetration of eBooks and sophisticated business models remains limited. However, libraries have increasingly embraced open access, open source, and open data initiatives. Projects like Kallipos provide enhanced academic textbooks online. Funding from the EU and Greece has supported centralized technological solutions and opportunities for public/private cooperation to make technology more affordable and transform literacy programs.
Affective relationships between users & libraries in times of economic stressGiannis Tsakonas
This study used the Stimulus-Organism-Response framework to identify the critical parameters that govern the affective relationships between Greek academic libraries and their users during times of economic stress. A survey of 950 library users found that social cues like willingness, kindness, and knowledge had the strongest impact on users' emotions. Emotions like satisfaction, confidence and safety positively correlated with library usage. The findings suggest that creating a welcoming environment and providing friendly service are important for positively influencing users' feelings about the library. Further research is needed to explore how these relationship factors interact and influence social and systemic interactions.
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τις Βιβλιοθήκες του Τμήματος Νομικής του ΕΚΠΑ και του Πανεπιστημίου Πειραιώς στις 18 και 19 Ιουνίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
Policies for geospatial collections: a research in US and Canadian academic l...Giannis Tsakonas
This document summarizes a research study on geospatial collection development policies in US and Canadian academic libraries. It includes the session overview, research framework, definitions, literature review, objectives, methodology, findings, and conclusions. The methodology involved analyzing the websites of 21 academic libraries for their geospatial policies. The findings show variability in policies but many included general information, collection details, and references to open data. The conclusions are that policies lack homogeneity and more research is needed on policies in other countries.
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Giannis Tsakonas
This document discusses developing a metadata model called ARMOS (Architecture Metadata Object Schema) for describing historic buildings. It covers traditional flat metadata descriptions of architecture, examining relationships between works and images/other works. ARMOS aims to group related buildings logically and connect them to facilitate discovery. It draws from architecture theories on morphology, typology and patterns. The conceptual model identifies entities, relationships and attributes. ARMOS is a harmonization profile combining descriptive, structural, administrative and technical metadata from various sources. Issues around terminology, extensions and interoperability are discussed.
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
This document summarizes a workshop on digital information management that took place in April 2012 in Corfu, Greece. It discusses some of the challenges of information retrieval when using natural language queries, including problems of ambiguity, context, and the use of knowledge organization systems and query expansion to help address these challenges. The role of user models and evaluation in understanding real language use is also mentioned.
The document summarizes a path-based approach for storing and querying multidimensional XML (MXML) data in a relational database. MXML extends XML to represent data with different facets under different contexts. The approach stores MXML nodes in separate tables based on their type and uses a path table and Dewey labeling for indexing. It represents contexts using ordered worlds and binary vectors. It also defines Multidimensional XPath (MXPath) to query MXML data using both explicit and inherited contexts.
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataGiannis Tsakonas
Παρουσίαση για το σεμινάριο “Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data”
Το σεμινάριο διοργανώθηκε από τη Βιβλιοθήκη και Κέντρο Πληροφόρησης του Πανεπιστημίου Πατρών, στους χώρους της οποίας διεξήχθη την Παρασκευή 3 Φεβρουαρίου 2012, υπό την επιμέλεια της Ομάδας Βάσεων Δεδομένων και Πληροφοριακών Συστημάτων του Εργαστηρίου Ψηφιακών Βιβλιοθηκών και Ηλεκτρονικής Δημοσίευσης του Τμήματος Αρχειονομίας - Βιβλιοθηκονομίας του Ιονίου Πανεπιστημίου
Διδάσκοντες:
- Μανόλης Πεπονάκης (MLIS)
- Δρ. Μιχάλης Σφακάκης
- Δρ. Χρήστος Παπαθεοδώρου
This document discusses open bibliographic data and the Open Bibliographic Principles initiative. It provides background on exchanging bibliographic data and reasons for making it open, such as freeing access, facilitating collaboration, and advancing research. Complications discussed include proprietary attitudes and loss of provenance over time. The document also covers topics such as using bibliographic data as linked open data, navigating it, examples like Libris, applicable licenses, and the E-LIS experience in adopting an open license.
Evaluation is a very vital research interest in the digital library domain. This has been exhibited by the growth of the literature in the main conferences and journal papers. However it is very difficult for one to navigate in this extended corpus. For these reasons the DiLEO ontology has been developed in order to assist the exploration of important concepts and the discovery of trends in the evaluation of digital libraries. DiLEO is a domain ontology, which aims to conceptualize the DL evaluation domain by correlating its key entities and provide reasoning paths that support the design of evaluation experiments.
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...Giannis Tsakonas
Presentation in the 16th Panhellenic Conference of Academic Libraries.
Φραντζή, Μ., Ανδρέου, Α.Κ., Τσάκωνας, Γ., et al. E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληροφόρησης: τρόποι αξιοποίησης του από την Ελληνική βιβλιοθηκονομική κοινότητα, 2007. In 16ο Πανελλήνιο Συνέδριο Ακαδημαϊκών Βιβλιοθηκών,Πειραιάς (GR),1-3 Οκτωβρίου 2007
Alternative location at:
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology
1. Charting the Digital Library Evaluation
Domain with a Semantically Enhanced
Mining Methodology
S
Eleni Afiontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2
Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2
13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA
1. Department of Informatics,
Athens University of Economics & Business
2. Database & Information Systems
Group, Department of Archives & Library
Science, Ionian University
4. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
5. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
6. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
7. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
8. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
9. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
- how we discover these patterns,
10. aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
- how we discover these patterns,
in an effective, machine-operated way, in order to have reusable
and interpretable data?
14. why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
15. why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
• Lack of contextualized analytic tools
16. why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
• Lack of contextualized analytic tools
• Supervised automated processes
20. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
21. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
22. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
23. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
24. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
25. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
26. panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
• During this process we perform benchmarking tests to qualify
specific components to effectively automate the exploration of
the literature and the discovery of research patterns.
31. training phase
• e aim was to train a classifier to identify relevant papers.
32. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
33. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
34. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
35. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
36. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
37. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
38. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
39. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
- under-sampling (Tomek Links)
40. training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
- under-sampling (Tomek Links)
- over-sampling (random over-sampling)
45. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
46. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
47. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
48. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
49. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
tp rate
50. corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
tp rate
54. the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
55. the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
56. the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.
57. the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.
- Procedural level: consists of classes dealing with practical
issues.
61. the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
62. the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
63. the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
64. the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
• http://bit.ly/12nlryh
67. the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
68. the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
69. the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/
subclasses and their score ranging from 0 to 1.
70. the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/
subclasses and their score ranging from 0 to 1.
• 2,672 annotations were manually generated.
73. the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
74. the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
75. the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
• Multi-label classification via the ML framework Meka.
76. the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
• Multi-label classification via the ML framework Meka.
• Four methods
- binary
representation
- Label powersets
- RAkEL
- ML-kNN
• Four algorithms
- Naïve Bayes
- Multinomial
Naïve Bayes
- k-Nearest-
Neighbors
- Support Vector
Machines
• Four metrics
- Hamming Loss
- Accuracy
- One-error
- F1 macro
79. the process - 3/3
• Performance tests were repeated using GoNTogle.
80. the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
81. the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.39
0.29
0.49
0.02
82. the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.39
0.29
0.49
0.02
GoNTogle
Meka
87. clustering - 1/3
• e final data set consists of 224 vectors of 53 features
88. clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
89. clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
90. clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is
assigned to the document m, otherwise 0.
91. clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is
assigned to the document m, otherwise 0.
- tf-idf: feature frequency ffi of fi in all vectors is equal to 1
when the respective subclass is annotated to the respective
document m; idfi is the inverse document frequency of the
feature i in documents M.
94. clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
95. clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective
function (cost or error) for various values of K.
96. clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective
function (cost or error) for various values of K.
- Agglomerative Hierarchical Clustering: a ‘bottom up’ built
hierarchy of clusters.
99. clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
100. clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
101. clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
102. clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
- Coverage: the proportion of features participating in the
clusters to the total number of features
103. clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
- Coverage: the proportion of features participating in the
clusters to the total number of features
- Dissimilarity mean: the average of the distinctiveness of the
clusters, defined in terms of the dissimilarity di,j between all
the possible pairs of the clusters.
116. conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
117. conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
118. conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
119. conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
• By exploring previous profiles, one can weight all the available
options.
120. conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
• By exploring previous profiles, one can weight all the available
options.
• is approach can extend other coding methodologies in terms
of transparency, standardization and reusability.