There are high expectations for Linked Government Data—the practice of publishing public sector information on the Web using Linked Data formats. This slideset reviews some of the ongoing work in the US, UK, and within W3C, as well as activities within my institute (DERI, National University of Ireland, Galway).
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Linked Data Implementations—Who, What and Why?OCLC
Presented at the CNI Spring Membership Meeting in San Antonio, Texas 4 April 2016. OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015, receiving responses from a total of 90 institutions in 20 countries. In the 2015 survey, 112 projects or services that consumed or published linked data were described (compared to 76 in 2014). This presentation summarizes the 2015 survey results: 1) which institutions have implemented or are implementing linked data; 2) what linked data sources institutions are consuming, and why; 3) what institutions are publishing, and why; 4) barriers and advice from the implementers.
There are high expectations for Linked Government Data—the practice of publishing public sector information on the Web using Linked Data formats. This slideset reviews some of the ongoing work in the US, UK, and within W3C, as well as activities within my institute (DERI, National University of Ireland, Galway).
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Linked Data Implementations—Who, What and Why?OCLC
Presented at the CNI Spring Membership Meeting in San Antonio, Texas 4 April 2016. OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015, receiving responses from a total of 90 institutions in 20 countries. In the 2015 survey, 112 projects or services that consumed or published linked data were described (compared to 76 in 2014). This presentation summarizes the 2015 survey results: 1) which institutions have implemented or are implementing linked data; 2) what linked data sources institutions are consuming, and why; 3) what institutions are publishing, and why; 4) barriers and advice from the implementers.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Should We Expect a Bang or a Whimper? Will Linked Data Revolutionize Scholar Authoring and Workflow Tools?
Jeff Baer, Senior Director of Product Management, Research Development Services, Proquest
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...idescitation
According to the survey India is one of the
leading countries in the word for technical education and
management education. Numbers of students are increasing
day by day by the growth rate of 45% per annum. Advancement
in technology puts special effect on education system. This
helps in upgrading higher education. Some universities and
colleges are using these technologies. Weblog is one of them.
Main aim of this paper is to represent web logs using clustering
technique for predicting next user movement and user
behavior analysis. This paper moves around the web log
clustering technique based on Markov chain results .In this
paper we present an ideal approach to web clustering
(clustering web site users) and predicting their behavior for
next visit. Methodology: For generating effective result approx
14 engineering college web usage data is used and an advance
clustering approach is presenting after optimizing the other
clustering approach.Results: The user behavior is predicted
with the help of the advance clustering approach based on the
FPCM and k-mean. Proposed algorithm is used to mined and
predict user’s preferred paths. To predict the user behavior
existing approaches have been used. But the existing
approaches are not enough because of its reaction towards
noise. Thus with the help of ACM, noise is reduced, provides
more accurate result for predicting the user behavior. Approach
Implementation:The algorithm was implemented in MAT
LAB, DTRG and in Java .The experiment result proves that
this method is very effective in predicting user behavior. The
experimental results have validated the method’s effectiveness
in comparison with some previous studies.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Should We Expect a Bang or a Whimper? Will Linked Data Revolutionize Scholar Authoring and Workflow Tools?
Jeff Baer, Senior Director of Product Management, Research Development Services, Proquest
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
Research Data Sharing: A Basic FrameworkPaul Groth
Some thoughts on thinking about data sharing. Prepared for the 2016 LERU Doctoral Summer School - Data Stewardship for Scientific Discovery and Innovation.
http://www.dtls.nl/fair-data/fair-data-training/leru-summer-school/
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...idescitation
According to the survey India is one of the
leading countries in the word for technical education and
management education. Numbers of students are increasing
day by day by the growth rate of 45% per annum. Advancement
in technology puts special effect on education system. This
helps in upgrading higher education. Some universities and
colleges are using these technologies. Weblog is one of them.
Main aim of this paper is to represent web logs using clustering
technique for predicting next user movement and user
behavior analysis. This paper moves around the web log
clustering technique based on Markov chain results .In this
paper we present an ideal approach to web clustering
(clustering web site users) and predicting their behavior for
next visit. Methodology: For generating effective result approx
14 engineering college web usage data is used and an advance
clustering approach is presenting after optimizing the other
clustering approach.Results: The user behavior is predicted
with the help of the advance clustering approach based on the
FPCM and k-mean. Proposed algorithm is used to mined and
predict user’s preferred paths. To predict the user behavior
existing approaches have been used. But the existing
approaches are not enough because of its reaction towards
noise. Thus with the help of ACM, noise is reduced, provides
more accurate result for predicting the user behavior. Approach
Implementation:The algorithm was implemented in MAT
LAB, DTRG and in Java .The experiment result proves that
this method is very effective in predicting user behavior. The
experimental results have validated the method’s effectiveness
in comparison with some previous studies.
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Finding knowledge, data and answers on the Semantic Webebiquity
Web search engines like Google have made us all smarter by providing ready access to the world's knowledge whenever we need to look up a fact, learn about a topic or evaluate opinions. The W3C's Semantic Web effort aims to make such knowledge more accessible to computer programs by publishing it in machine understandable form.
<p>
As the volume of Semantic Web data grows software agents will need their own search engines to help them find the relevant and trustworthy knowledge they need to perform their tasks. We will discuss the general issues underlying the indexing and retrieval of RDF based information and describe Swoogle, a crawler based search engine whose index contains information on over a million RDF documents.
<p>
We will illustrate its use in several Semantic Web related research projects at UMBC including a distributed platform for constructing end-to-end use cases that demonstrate the semantic web’s utility for integrating scientific data. We describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which searches the Semantic Web for data relevant to a given query ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
The slides of part one of the Metadata Provenance Tutorial (Linked Data Provenance). Part 2 is here: http://de.slideshare.net/MagnusPfeffer/metadata-provenance-tutorial-part-2-modelling-provenance-in-rdf
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Project Website: http://www.researchobject.org/
researchobjects.org is a community project that has developed an approach to describe and package up all resources used as part of an investigation as Research Objects (RO’s).
RO’s - provide two main features; a manifest - a consistent way to provide a well-typed, structured description of the resources used in an investigation; and a ‘bundle’ - a mechanism for packaging up manifests with resources as a single, publishable unit.
RO’s therefore carry the research context of an experiment - data, software, standard operating procedures (SOPs), models etc - and gather together the components of an experiment so that they are findable, accessible, interoperable and reproducible (FAIR). RO’s combine software and data into an aggregative data structure consisting of well described reconstructable parts.
RO’s have the potential to address a number of challenges pertinent to open research including: a) supporting interoperability between infrastructures by using ROs as a primary mechanism for exchange and publication b) supporting the evolution of research objects as a living collection, enabling provenance tracking c) providing the ability to pivot research object components (data, software, models) that are not restricted to the traditional publication.
Here we present work towards the development and adoption of ROs:
(i) A series of specifications and conventions, using community standards, for the RO manifest and RO bundles.
(ii) Implementations of Java, Python and Ruby APIs and tooling against those specifications;
(iii) Examples of representations of the RO models in various languages (e.g. JSON-LD, RDF, HTML).
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet spefic criteria, has become an increasingly important, yet challenging task to support issues such as entity retrieval or semantic search and data linking. Particularly with respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and ecient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to
the semantic web tradition in dealing with "fnding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.
While an understanding of the nature of the content of specic datasets is a crucial
prerequisite for the mentioned issues, we adopt in this dissertation the notion of
\dataset prole" | a set of features that describe a dataset and allow the comparison
of dierent datasets with regard to their represented characteristics. Our
rst research direction was to implement a collaborative ltering-like dataset recommendation
approach, which exploits both existing dataset topic proles, as well
as traditional dataset connectivity measures, in order to link LOD datasets into
a global dataset-topic-graph. This approach relies on the LOD graph in order to
learn the connectivity behaviour between LOD datasets. However, experiments have
shown that the current topology of the LOD cloud group is far from being complete
to be considered as a ground truth and consequently as learning data.
Facing the limits the current topology of LOD (as learning data), our research
has led to break away from the topic proles representation of \learn to rank"
approach and to adopt a new approach for candidate datasets identication where
the recommendation is based on the intensional proles overlap between dierent
datasets. By intensional prole, we understand the formal representation of a set of
schema concept labels that best describe a dataset and can be potentially enriched
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
This slide deck accompanies the manuscript "Interoperability and FAIRness through a novel combination of Web technologies", submitted to PeerJ Computer Science: https://doi.org/10.7287/peerj.preprints.2522v1
It describes the output of the "Skunkworks" FAIR implementation group, who were tasked with building a prototype infrastructure that would fulfill the FAIR Principles for scholarly data publishing. We show how a novel combination of the Linked Data Platform, RDF Mapping Language (RML) and Triple Pattern Fragments (TPF) can be combined to create a scholarly publishing infrastructure that is markedly interoperable, at both the metadata and the data level.
This slide deck (or something close) will be presented at the Dutch Techcenter for Life Sciences Partners Workshop, November 4, 2016.
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
Tutorial presented at 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012), January 28-30, 2012. http://sites.google.com/site/web2011ihi/participants/tutorials
This tutorial weaves together three themes and the associated topics:
[1] The role of biomedical ontologies
[2] Key Semantic Web technologies with focus on Semantic provenance and integration
[3] In-practice tools and real world use cases built to serve the needs of sleep medicine researchers, cardiologists involved in clinical practice, and work on vaccine development for human pathogens.
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
The position paper aims at discussing the potential of exploiting linked data best practice to provide metadata documenting domain specific resources created through verbose acquisition-processing pipelines. It argues that resource selection, namely the process engaged to choose a set of resources suitable for a given analysis/design purpose, must be supported by a deep comparison of their metadata. The semantic similarity proposed in our previous works is discussed for this purpose and the main issues to make it scale up to the web of data are introduced. Discussed issues contribute beyond the re-engineering of our similarity since they largely apply to every tool which is going to exploit information made available as linked data. A research plan and an exploratory phase facing the presented issues are described remarking the lessons we have learnt so far.
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.
See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Knowledge discoverylaurahollink
1. Knowlege Discovery for the Semantic Web
An Application to Web Usage Mining
&
How to use semantics in the Preprocessing stage
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Claudia D’Amato - University of Bari, IT.
Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
2. Knowlege Discovery for the Semantic Web
An Application to Web Usage Mining
&
How to use semantics in the Preprocessing stage
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Claudia D’Amato - University of Bari, IT.
Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
3. An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources
• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
4. An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources
• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
• What about usage of Linked
Open Data?
5. An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources
• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
• What about usage of Linked
Open Data?
• Can we use semantics to
improve mining of Web Usage?
6. Mining Usage of Linked Open Data in USEWOD
USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.]
1. USEWOD workshop series @ ESWC / WWW since 2011
2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc.,
and client side logs from YASGUI.
7. Mining Usage of Linked Open Data in USEWOD
USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.]
1. USEWOD workshop series @ ESWC / WWW since 2011
2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc.,
and client side logs from YASGUI.
example removed
8. Mining Usage of Linked Open Data in USEWOD
• Results of USEWOD: LOD usage mining for more efficient indexing [1],
cashing [2], auto-completion [3], etc.
[1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study
of real-world SPARQL queries. USEWOD @ WWW 2011
[2] Lorey, J., & Naumann, F. Caching and prefetching strategies for sparql queries. USEWOD @
ESWC 2013.
[3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE: SPARQL Index for Efficient Autocompletion.
ISWC (Posters & Demos) 2013.
[4] Rietveld, L., & Hoekstra, R. Man vs. Machine: Differences in SPARQL Queries. USEWOD @
ESWC 2014
[5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
• Issues:
• what is the difference between
queries by machines and humans? [4]
• what is the meaning of repeated
queries by bots/tools?
• a lot of the usage is invisible due to
data dump download [5]
9. Usage mining example 1: clustering rdf:properties
in DBpedia
Instead of listing all DBpedia properties
alphabetically, can we display them in a
more meaningful way? Can we use query
logs for this?
[5]
10. Usage mining example 1: clustering rdf:properties
in DBpedia
Instead of listing all DBpedia properties
alphabetically, can we display them in a
more meaningful way? Can we use query
logs for this?
[5]
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
11. Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
12. Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
13. Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
Result: no significant differences ☹
14. Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
Result: no significant differences ☹
15. Usage mining example 2: mining semantically
enriched query logs
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
16. Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
17. Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
Problem when mining ‘raw’ logs: low support of even the most
frequent patterns
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
18. Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
Problem when mining ‘raw’ logs: low support of even the most
frequent patterns
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
19. Usage mining example 2: mining semantically
enriched query logs
Approach:
1. link queries to entities in
LOD cloud
2. choose class of entity +
selected properties
3. detect modifier words
(download, trailer, cast,
date, etc.)
20. Usage mining example 2: mining semantically
enriched query logs
Approach:
1. link queries to entities in
LOD cloud
2. choose class of entity +
selected properties
3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:
• Freebase (has a lot of movie related info)
• DBpedia (Wikipedia is widely used)
21. Usage mining example 2: mining semantically
enriched query logs
Approach:
1. link queries to entities in
LOD cloud
2. choose class of entity +
selected properties
3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:
• Freebase (has a lot of movie related info)
• DBpedia (Wikipedia is widely used)
22. Usage mining example 2: mining semantically
enriched query logs
Approach:
1. link queries to entities in
LOD cloud
2. choose class of entity +
selected properties
3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:
• Freebase (has a lot of movie related info)
• DBpedia (Wikipedia is widely used)
23. Usage mining example 2: mining semantically
enriched query logs
•Sequential
pattern mining
on the class-
level using
PrefixSpan.
24. Usage mining example 2: mining semantically
enriched query logs
•Sequential
pattern mining
on the class-
level using
PrefixSpan.
25. Usage mining example 2: mining semantically
enriched query logs
1.Discover frequent patterns on class-level using
• Using the efficient PrefixSpan algorithm to mine all possible subsequence
patterns
26. Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive
• state of the art = 3 classes: generalization, specification,
reformulation
•Approach:
1.link queries to entities in the LOD cloud
2.Choose class of entity
3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
27. Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive
• state of the art = 3 classes: generalization, specification,
reformulation
•Approach:
1.link queries to entities in the LOD cloud
2.Choose class of entity
3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
28. Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive
• state of the art = 3 classes: generalization, specification,
reformulation
•Approach:
1.link queries to entities in the LOD cloud
2.Choose class of entity
3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
Conclusions:
• Identified patterns not visible on raw
data.
• but “the method is only moderately
successful in identifying the most
prominent relations for a given query
pair”
29. The feature selection issue when using LOD
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
30. Feature Selection
• Feature selection = Limiting the number of features for faster computation
times, more understandable models, better prediction value.
• Using Linked Open Data can lead to large number of features per data point.
• a DBpedia resource easily has 50 property-value pairs.
• more are easily added using reasoning
• note: these numbers are not large compared to the number of features in
DNA strings, or all words in a text corpus!
• Still, many of them are irrelevant or redundant.
31. Feature Selection Example
• Goal: learn a relation R between x and y.
• In this paper, R = ‘occupation’, ‘gender’, ‘instance_of’, ‘acted_in’, ‘genre’,
‘position_played_on_team’
• Approach: given a training set of pairs of x, y, learn a “whitelist” of properties
in DBpedia, WikiData, YAGO and WordNet that indicate a relation R between
x and y
• Cast as a subset selection problem:
• E = the set of possible properties
• local search over the power set of E (i.a. all subsets) to find the optimal
subset.
Learning to Exploit Structured Resources
for Lexical Inference. Vered Shwartz, Omer
Levy, Ido Dagan and Jacob Goldberger.
CoNLL 2015 (to appear)july
32. Data Fusion
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
38. Methods for Data Fusion (ontology alignment)
label
label
label
label
39. Methods for Data Fusion: structural matchers
label
label
label
label
40. Methods for Data Fusion: structural matchers
label
label
label
label
• E.g. Similarity Flooding: the similarity of a matched pair s1
and t1 propagates to their respective neighbors s2 and t2.
• neighbors can be defined as subclasses,
superclasses, instances, domain/ranges, etc.
• Structural measures are in practice never used stand
alone.
[10] Ngo, Duy Hoa, and Zohra Bellahsene.
YAM++-results for OAEI 2012. OAEI @
ISWC 2012.
[11] Sergey Melnik, Hector Garcia-Molina,
and Erhard Rahm. Similarity flooding: A
versatile graph matching algorithm and its
application to schema matching.
ICDE 2002.
41. Methods for Data Fusion: instance based matchers
label
label
label
label
42. Methods for Data Fusion: instance based matchers
label
label
label
label
• Match classes based on similarity of their instances
• note: you need a way to assess similarity of the instances!
44. Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.
• “nearly all [ontology alignment systems] use a string similarity metric” [12]
• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.
45. Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.
• “nearly all [ontology alignment systems] use a string similarity metric” [12]
• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.
46. Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.
• “nearly all [ontology alignment systems] use a string similarity metric” [12]
• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.
http://www.dbpedia.org/page/Judith_Sargentini
47. Link types
Equality
SameAs
EquivalentClasses
EquivalentProperties
“Den Haag” = “The Hague”
wood-material = wood
Hierarchical
rdfs:subClassOf
rdf:type
rdfs:subPropertyOf
aat:Artist ⊇ wn:Artist
tgn:Africa ∈ wn:Continent
conf:has_the_last_name =
edas:hasLastName
Weaker semantics
skos:closeMatch / exactMatch /
broadMatch /narrowMatch /
relatedMatch
geonames:Italy skos:closeMatch
librarytopics:Italy
Domain specific links
E.g. born-in
E.g. hasStyle
E.g. hasPart
Van Gogh (ULAN) born-in Groot-
Zundert (TGN)
50. Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
51. Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
52. Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
• Even more important to be critical
and evaluate the data
• source criticism
• tool criticism (see http://
event.cwi.nl/toolcriticism/)
54. Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings
• relatively cheap and easy to interpret
• only precision, no recall
55. Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings
• relatively cheap and easy to interpret
• only precision, no recall
2. Comparison to a reference alignment
• precision and recall
• used in OAEI on the SEALS platform
• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
56. Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings
• relatively cheap and easy to interpret
• only precision, no recall
2. Comparison to a reference alignment
• precision and recall
• used in OAEI on the SEALS platform
• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)
• arguably the best method!
• need to have access to an application + users
58. Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures:
• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
59. Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures:
• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
60. Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures:
• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
61. Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures:
• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
• where r(a) is the semantic distance between correspondence a and
correspondence a’ in the reference alignment, A is the number of
correspondences.
• 2. weight score of mappings based on the frequency of their use
• e.g from usage logs! Laura Hollink, Mark van Assem, Shenghui
Wang, Antoine Isaac, Guus Schreiber. Two
Variations on Ontology Alignment
Evaluation: Methodological Issues.ESWC
2008.
62. Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings
• relatively cheap and easy to interpret
• only precision, no recall
2. Comparison to a reference alignment
• precision and recall
• used in OAEI on the SEALS platform
• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)
• arguably the best method!
• need to have access to an application + users
63. Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words
• vector similarity captures semantics between words
• No explicit semantics, but we can’t deny that there is meaning there!
• Success seems to be mostly due to big data
64. Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words
• vector similarity captures semantics between words
• No explicit semantics, but we can’t deny that there is meaning there!
• Success seems to be mostly due to big data
Mikolov, Tomas, et al. "Distributed
representations of words and phrases and
their compositionality." Advances in neural
information processing systems. 2013.
65. Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words
• vector similarity captures semantics between words
• No explicit semantics, but we can’t deny that there is meaning there!
• Success seems to be mostly due to big data
Mikolov, Tomas, et al. "Distributed
representations of words and phrases and
their compositionality." Advances in neural
information processing systems. 2013.
Example:
Vec(Madrid) - Vec(Spain) + Vec(France)
is closer to Vec(Paris) than to any other
vector
66. NELL: Never-Ending Language Learning
• several machine learning approaches to discover facts (beliefs) from text on
the web
• string features, distribution of context words, html structure, visual image
analysis.
• Running since 2010, has so far learned over 80 million beliefs
67. NELL: Never-Ending Language Learning
• several machine learning approaches to discover facts (beliefs) from text on
the web
• string features, distribution of context words, html structure, visual image
analysis.
• Running since 2010, has so far learned over 80 million beliefs
T. Mitchell, W. Cohen, E. Hruschka, P.
Talukdar, J. Betteridge, A. Carlson, B. Dalvi,
M. Gardner, B. Kisiel, J. Krishnamurthy, N.
Lao, K. Mazaitis, T. Mohamed, N.
Nakashole, E. Platanios, A. Ritter, M.
Samadi, B. Settles, R. Wang, D. Wijaya, A.
Gupta, X. Chen, A. Saparov, M. Greaves, J.
Welling. In Proceedings of the Conference
on Artificial Intelligence (AAAI), 2015.
68. Research Task Format
Work in 6 groups of 10 students
• 5 people design an approach to
association rules with semantics
• 5 people focus on how that
approach should be evaluated
The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
69. Research Task Format
Work in 6 groups of 10 students
• 5 people design an approach to
association rules with semantics
• 5 people focus on how that
approach should be evaluated
The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
• We would like one presentation per group of 10 people
• of 3 or 4 slides
• of max 4 minutes (less is fine too!)
• Send me the slides in PDF, with your group number in the title,
by email to l.hollink@cwi.nl, today before 16:30.
• The presentation should show clearly:
1. the AR method
2. how did you take into account semantics?
3. the evaluation method
• BONUS: argue when and why your approach will score high.
• BONUS: discuss how the newly learned links can be
represented and used.
70. Research Task Format
Work in 6 groups of 10 students
• 5 people design an approach to
association rules with semantics
• 5 people focus on how that
approach should be evaluated
The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
• We would like one presentation per group of 10 people
• of 3 or 4 slides
• of max 4 minutes (less is fine too!)
• Send me the slides in PDF, with your group number in the title,
by email to l.hollink@cwi.nl, today before 16:30.
• The presentation should show clearly:
1. the AR method
2. how did you take into account semantics?
3. the evaluation method
• BONUS: argue when and why your approach will score high.
• BONUS: discuss how the newly learned links can be
represented and used.
Tips:
• you may pick a dataset that
you will use as an example