The document discusses semantic search and how it can improve on traditional keyword-based search. It describes how semantic search can extend and refine search queries using ontologies and semantic metadata. This allows for more precise and complete search results. Semantic search also enables cross-referencing related information, exploratory search through semantic navigation, and reasoning over semantic data to infer implicit facts.
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
These slides were prepared for the NIRS toolkit course at the Donders, which due to the Corona crisis has been postponed. The slides present BIDS, explain how fNIRS often involves multiple signals, and relates the two to synchronization and data management
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
The document describes the Semantic Scout, a framework developed by CNR Semantic Technology Lab for searching, presenting, and analyzing entities from CNR data sources using semantic web, linked open data, natural language processing, and information retrieval techniques. It summarizes the goals and architecture of the Semantic Scout, including how it converts CNR data into ontologies and triples, publishes and links the data, and allows users to search and explore the data through a SPARQL endpoint and other interfaces. The document also provides an example of how the Semantic Scout can be used to identify experts on a topic by searching the integrated CNR data cloud.
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsMatteo Romanello
This document outlines a project to develop tools to extract information from classics scholarly texts. It aims to improve information retrieval for classics researchers by automatically identifying mentions of realia (people, places, sources) and extracting canonical references to primary sources from unstructured texts. The methodology involves building corpora of classics articles, creating a knowledge base from existing structured classics data sources, and developing natural language processing tools trained on the knowledge base to extract entities and references from the text corpora. The expected results are improved access points to information for researchers through enriched full-text search and links to relevant primary sources.
Data Integration at the Ontology Engineering GroupOscar Corcho
Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
These slides were prepared for the NIRS toolkit course at the Donders, which due to the Corona crisis has been postponed. The slides present BIDS, explain how fNIRS often involves multiple signals, and relates the two to synchronization and data management
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
The document describes the Semantic Scout, a framework developed by CNR Semantic Technology Lab for searching, presenting, and analyzing entities from CNR data sources using semantic web, linked open data, natural language processing, and information retrieval techniques. It summarizes the goals and architecture of the Semantic Scout, including how it converts CNR data into ontologies and triples, publishes and links the data, and allows users to search and explore the data through a SPARQL endpoint and other interfaces. The document also provides an example of how the Semantic Scout can be used to identify experts on a topic by searching the integrated CNR data cloud.
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsMatteo Romanello
This document outlines a project to develop tools to extract information from classics scholarly texts. It aims to improve information retrieval for classics researchers by automatically identifying mentions of realia (people, places, sources) and extracting canonical references to primary sources from unstructured texts. The methodology involves building corpora of classics articles, creating a knowledge base from existing structured classics data sources, and developing natural language processing tools trained on the knowledge base to extract entities and references from the text corpora. The expected results are improved access points to information for researchers through enriched full-text search and links to relevant primary sources.
Data Integration at the Ontology Engineering GroupOscar Corcho
Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).
A Non-Technical, Example-Driven Introduction to Linked Datakjanowicz
How Linked Data and Semantic Web Technologies Foster the Publication, Retrieval, Reuse, and Integration of Data. A Non-Technical, Example-Driven Introduction to Linked Data for the UCSB Library.
This document discusses the process of ontology design 101. It explains that the key steps are to determine the domain and focus of the ontology, consider reusing existing ontologies, enumerate the important terms and concepts, and define the classes, properties, and constraints. An example ontology about wine and food pairings is used to illustrate these steps. The document emphasizes that ontology design is an iterative process without a single correct approach.
This document provides an overview of text mining, including its history and definitions. Key points:
- Text mining aims to extract useful information and discover new knowledge from large amounts of unstructured text data without having to read it all.
- Don Swanson is considered a pioneer in text mining for discovering new biomedical relationships through analyzing complementary sets of literature.
- There is no single agreed-upon definition, but text mining generally involves retrieving relevant texts, representing their content, and analyzing the representation to find patterns or associations.
- Current text mining systems are still fairly primitive and rely heavily on human input, but the goal is more automated analysis of large text collections to extract meaningful patterns rather than just
The document proposes a browsing-oriented approach to semantic faceted search to better support fuzzy information needs. It introduces browsing-oriented facet and facet value spaces, including an extended facet tree using clustering. It also proposes a browsing-oriented facet ranking approach based on notions of small steps, uniform steps, and comprehensible result segments. An evaluation finds the extended facet tree and browsing-oriented ranking improve effectiveness and efficiency for complex, fuzzy tasks compared to conventional faceted search approaches.
The document summarizes statistics about quantifying RDF data sets. It provides counts of triples, literals, URIs, subjects, predicates, objects, and other metrics for several RDF datasets, including LinkedCT, BioGrid, RxNorm, SUNY Reach, and DrugBank. For each dataset, it lists the top 5 subjects, objects, and predicates as well as overall statistics like literalness, uniqueness, coverage, and type frequencies.
This document presents SPLENDID, a system for federated querying across linked data sources. It uses Vocabulary of Interlinked Datasets (VoiD) descriptions to select relevant sources and optimize query planning and execution. The system applies techniques from distributed database systems to federated SPARQL querying, including dynamic programming for join ordering and statistics-based cost estimation. An evaluation using the FedBench suite found it efficiently selects sources and executes queries, outperforming state-of-the-art federated querying systems by leveraging VoiD descriptions and statistics. Future work includes integrating it with other systems and improving its cost models.
Amit Sheth, "Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data,"
WSU & AFRL Window-on-Science Seminar on Data Mining, August 05, 2009.
http://wiki.knoesis.org/index.php/Seminar_on_Data_Mining#Semantics_empowered_Understanding.2C_Analysis_and_Mining_of_Nontraditional_and_Unstructured_Data
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
The document discusses named entity recognition, which involves locating and classifying atomic elements like names, organizations, locations, quantities, and times into predefined categories. It provides examples of entity mapping candidates for the string "Armstrong" and discusses how context, ambiguity, and accuracy are used to select the correct entity. It also discusses using semantic graphs and linked data to analyze entities and help with the selection process.
This document discusses ontology design and development. It describes the ontology development process, which includes pre-development, development, and post-development activities. Development activities involve specification, conceptualization, formalization, and implementation. The document also outlines methodologies for ontology design, which guide the construction of consistent ontologies through management, development-oriented, and support activities. These activities work together to efficiently develop complex ontologies.
The document discusses exploratory semantic search using linked open data. It describes how a user could browse related entities in a knowledge graph starting from a book, following links to the author, other authors influenced by or influencing the first author, and their notable works. This allows the user to serendipitously discover related information without having to formulate a precise search query. The document also provides examples of exploring topics like space flights and accidents. Finally, it mentions exploratory search tools that augment video search using linked open data.
The document describes the key components and processes involved in building a data warehousing and business intelligence capability. It involves extracting, transforming, and loading data from operational systems into data repositories and a data warehouse. From there, data is organized into data marts and analytics are performed on the data through online analytical processing, data mining, reporting, and visualization to provide insights. A meta-data repository tracks and manages the movement and transformation of data throughout the process.
This document discusses applications of linked data and semantic web technologies. It describes the linked open data cloud and prominent datasets like DBpedia. It provides statistics about the size and connectivity of linked open data. It also discusses ontologies, browsers, and search engines that facilitate working with linked data. Finally, it outlines the components needed to build linked data driven web applications and access linked data through SPARQL endpoints and libraries.
This document discusses linked data and its applications in the Web of Data. It describes the four principles of linked data: (1) using URIs to identify things, (2) using HTTP URIs so that these things can be looked up, (3) returning useful RDF information about each URI, and (4) including links to other URIs. Following these principles leads to interlinking data across the web and creating a Web of Data. The document outlines the increasing growth of the Web of Data since its inception in 2007.
1) The document discusses challenges in using machine learning and data analytics for materials science research. Specifically, most materials are irrelevant for a given purpose, so models need to identify statistically exceptional subgroups rather than averaging all data.
2) Two potential methods for identifying promising subgroups are discussed: focusing on materials with small oxygen-carbon-oxygen angles or large carbon-oxygen bond lengths for catalysis applications.
3) The concept of a model's domain of applicability is introduced, wherein models perform best when applied only to similar data they were trained on, rather than all data globally. Identifying these reliable domains is important.
The document discusses the history and development of ontologies. It begins with definitions of key terms like ontology, vocabulary, and taxonomy. It then provides a brief history of ontologies dating back to ancient Greek philosophers. The document also discusses how ontologies are used in computer science to formally represent domain knowledge. It provides examples of ontologies in fields like medicine, commerce, and the semantic web. Finally, it discusses best practices for building ontologies, such as reusing existing terms and collaborating with domain experts and end users.
This document proposes using Wikipedia concepts to improve topic modeling. It discusses using n-grams like bigrams and phrases from Wikipedia categories rather than only individual words. The goal is to develop a topic model that associates words and Wikipedia concepts with mixtures of topics. Important steps include collecting a dataset, preprocessing it, performing topic modeling using an LDA model that incorporates Wikipedia concepts, and evaluating the results against models using only unigrams or n-grams. Key benefits noted are the ability to represent topics with more representative concepts and reduce ambiguity compared to models using only individual words.
The document describes a summer institute on discovering big data held in San Diego from August 5-9, 2013. It discusses several topics related to big data in neuroscience including available resources, how to find and connect relevant information, challenges around data integration from disparate sources, and using ontologies and machine learning for tasks like data tagging.
A Non-Technical, Example-Driven Introduction to Linked Datakjanowicz
How Linked Data and Semantic Web Technologies Foster the Publication, Retrieval, Reuse, and Integration of Data. A Non-Technical, Example-Driven Introduction to Linked Data for the UCSB Library.
This document discusses the process of ontology design 101. It explains that the key steps are to determine the domain and focus of the ontology, consider reusing existing ontologies, enumerate the important terms and concepts, and define the classes, properties, and constraints. An example ontology about wine and food pairings is used to illustrate these steps. The document emphasizes that ontology design is an iterative process without a single correct approach.
This document provides an overview of text mining, including its history and definitions. Key points:
- Text mining aims to extract useful information and discover new knowledge from large amounts of unstructured text data without having to read it all.
- Don Swanson is considered a pioneer in text mining for discovering new biomedical relationships through analyzing complementary sets of literature.
- There is no single agreed-upon definition, but text mining generally involves retrieving relevant texts, representing their content, and analyzing the representation to find patterns or associations.
- Current text mining systems are still fairly primitive and rely heavily on human input, but the goal is more automated analysis of large text collections to extract meaningful patterns rather than just
The document proposes a browsing-oriented approach to semantic faceted search to better support fuzzy information needs. It introduces browsing-oriented facet and facet value spaces, including an extended facet tree using clustering. It also proposes a browsing-oriented facet ranking approach based on notions of small steps, uniform steps, and comprehensible result segments. An evaluation finds the extended facet tree and browsing-oriented ranking improve effectiveness and efficiency for complex, fuzzy tasks compared to conventional faceted search approaches.
The document summarizes statistics about quantifying RDF data sets. It provides counts of triples, literals, URIs, subjects, predicates, objects, and other metrics for several RDF datasets, including LinkedCT, BioGrid, RxNorm, SUNY Reach, and DrugBank. For each dataset, it lists the top 5 subjects, objects, and predicates as well as overall statistics like literalness, uniqueness, coverage, and type frequencies.
This document presents SPLENDID, a system for federated querying across linked data sources. It uses Vocabulary of Interlinked Datasets (VoiD) descriptions to select relevant sources and optimize query planning and execution. The system applies techniques from distributed database systems to federated SPARQL querying, including dynamic programming for join ordering and statistics-based cost estimation. An evaluation using the FedBench suite found it efficiently selects sources and executes queries, outperforming state-of-the-art federated querying systems by leveraging VoiD descriptions and statistics. Future work includes integrating it with other systems and improving its cost models.
Amit Sheth, "Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data,"
WSU & AFRL Window-on-Science Seminar on Data Mining, August 05, 2009.
http://wiki.knoesis.org/index.php/Seminar_on_Data_Mining#Semantics_empowered_Understanding.2C_Analysis_and_Mining_of_Nontraditional_and_Unstructured_Data
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
The document discusses named entity recognition, which involves locating and classifying atomic elements like names, organizations, locations, quantities, and times into predefined categories. It provides examples of entity mapping candidates for the string "Armstrong" and discusses how context, ambiguity, and accuracy are used to select the correct entity. It also discusses using semantic graphs and linked data to analyze entities and help with the selection process.
This document discusses ontology design and development. It describes the ontology development process, which includes pre-development, development, and post-development activities. Development activities involve specification, conceptualization, formalization, and implementation. The document also outlines methodologies for ontology design, which guide the construction of consistent ontologies through management, development-oriented, and support activities. These activities work together to efficiently develop complex ontologies.
The document discusses exploratory semantic search using linked open data. It describes how a user could browse related entities in a knowledge graph starting from a book, following links to the author, other authors influenced by or influencing the first author, and their notable works. This allows the user to serendipitously discover related information without having to formulate a precise search query. The document also provides examples of exploring topics like space flights and accidents. Finally, it mentions exploratory search tools that augment video search using linked open data.
The document describes the key components and processes involved in building a data warehousing and business intelligence capability. It involves extracting, transforming, and loading data from operational systems into data repositories and a data warehouse. From there, data is organized into data marts and analytics are performed on the data through online analytical processing, data mining, reporting, and visualization to provide insights. A meta-data repository tracks and manages the movement and transformation of data throughout the process.
This document discusses applications of linked data and semantic web technologies. It describes the linked open data cloud and prominent datasets like DBpedia. It provides statistics about the size and connectivity of linked open data. It also discusses ontologies, browsers, and search engines that facilitate working with linked data. Finally, it outlines the components needed to build linked data driven web applications and access linked data through SPARQL endpoints and libraries.
This document discusses linked data and its applications in the Web of Data. It describes the four principles of linked data: (1) using URIs to identify things, (2) using HTTP URIs so that these things can be looked up, (3) returning useful RDF information about each URI, and (4) including links to other URIs. Following these principles leads to interlinking data across the web and creating a Web of Data. The document outlines the increasing growth of the Web of Data since its inception in 2007.
1) The document discusses challenges in using machine learning and data analytics for materials science research. Specifically, most materials are irrelevant for a given purpose, so models need to identify statistically exceptional subgroups rather than averaging all data.
2) Two potential methods for identifying promising subgroups are discussed: focusing on materials with small oxygen-carbon-oxygen angles or large carbon-oxygen bond lengths for catalysis applications.
3) The concept of a model's domain of applicability is introduced, wherein models perform best when applied only to similar data they were trained on, rather than all data globally. Identifying these reliable domains is important.
The document discusses the history and development of ontologies. It begins with definitions of key terms like ontology, vocabulary, and taxonomy. It then provides a brief history of ontologies dating back to ancient Greek philosophers. The document also discusses how ontologies are used in computer science to formally represent domain knowledge. It provides examples of ontologies in fields like medicine, commerce, and the semantic web. Finally, it discusses best practices for building ontologies, such as reusing existing terms and collaborating with domain experts and end users.
This document proposes using Wikipedia concepts to improve topic modeling. It discusses using n-grams like bigrams and phrases from Wikipedia categories rather than only individual words. The goal is to develop a topic model that associates words and Wikipedia concepts with mixtures of topics. Important steps include collecting a dataset, preprocessing it, performing topic modeling using an LDA model that incorporates Wikipedia concepts, and evaluating the results against models using only unigrams or n-grams. Key benefits noted are the ability to represent topics with more representative concepts and reduce ambiguity compared to models using only individual words.
The document describes a summer institute on discovering big data held in San Diego from August 5-9, 2013. It discusses several topics related to big data in neuroscience including available resources, how to find and connect relevant information, challenges around data integration from disparate sources, and using ontologies and machine learning for tasks like data tagging.
Mid-Ontology Learning from Linked Data @JIST2011Lihua Zhao
This document describes a mid-ontology learning approach for integrating ontology schemas from different linked data sources. It collects data from linked instances using owl:sameAs links. Predicates are grouped by exact matching of objects and pruning using string and knowledge-based similarity measures. The approach aims to automatically learn a simple ontology that can represent data from diverse domains in linked open data.
Kno.e.sis Center collaborates with AFRL on various projects aimed at improving human performance, including using ontologies to discover new knowledge in fields like cognition, developing collaborative visualization technologies, and determining exposure to toxic agents through easily deployable tests using samples like blood and urine. The collaborations also involve developing interface technologies that revolutionize human and machine collaboration as well as decision support systems to maximize information processing and decision making effectiveness. The overall goal is to better understand fundamental cognitive processes, develop new biomolecular and nanotechnology solutions, and improve human performance for military applications.
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
The document discusses how text analytics can fuel semantic search and sensemaking by extracting features from documents, analyzing relationships between entities, and integrating search with other data sources. It outlines trends toward more unified search platforms that incorporate user context and infer intent to provide categorized, clustered results rather than just hit lists. The goal is for search to be the starting point for iterative sensemaking through analysis and synthesis of information.
Search, Signals & Sense: An Analytics Fueled Vision
Open hpi semweb-06-part7
1. Semantic Web
Technologies
Lecture 6: Applications in the Web of Data
07: Semantic Search
Dr. Harald Sack
Hasso Plattner Institute for IT Systems Engineering
University of Potsdam
Spring 2013
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)
2. 2
Lecture 6: Applications in the Web of Data
Open HPI - Course: Semantic Web Technologies
Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
3. 3
07 - Semantic Search
Open HPI - Course: Semantic Web Technologies - Lecture 6: Applications in the Web of Data
Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4. 4
Meaning
sender
Experience
receiver
Context
Concept
symbolizes refers to
Experience http://commons.wikimedia.org/wiki/User:McSmit
Symbol Object
stands for
Armstrong
Pragmatics Ogden, Richards: The Meaning of Meaning:
Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Study of the Influence of Language upon Thought and of the Science of Symbolism (1923)
A Potsdam
5. Arms
tron
g
Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam
6. http://dbpedia.org/resource/Neil_Armstrong
Neil Armstrong Entities
is a is a
Ontologies
same as
Kosmonaut Astronaut Person
subClassOf
is NOT a
Science Occupation
subClassOf
has an
Employment
7. Classical Information Retrieval
files of records
7
Set of Documents
(acc. to Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
8. Classical Information Retrieval
Information requests files of records
7
Set of Queries Set of Documents
(acc. to Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
9. Classical Information Retrieval
Information requests files of records
7
Set of Queries Set of Documents
similarity
Query indexing
Formulation
indexing language
(acc. to Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
10. Classical Information Retrieval
(simplified version)
Set of documents
8
„search“
?
searching, vb. , in allen ger n
sprachen bezeugt: got.sokjan,
ags. sēcan, as. sokian, an. Soekj
search term(s) keywords
[Bd. 20, Sp. 835]
sēza, ahd. suohhan. aus idg. sprachen steht
am nächsten lat. sāgiospüre, air. saigim gehe
search query einer sache nach, suche; zur weiteren
verwandtschaft vgl. Walde-Pokorny 2, 449.
der umlaut des stammvokals erscheint im nd.,
er wird im md. verzeichnet vonCrecelius
oberhess. wb. 827; Spiess henneb. id. 248;
Hertel Thüringen240; Gerbet Vogtland 425
und auf kolonialem boden bei
Schröerdeutsche mundarten des ungrischen
berglandes 225.
neben eigentlichem suchen 'einer sache
nachspüren, sich bemühen, sie
aufzufinden' (dann auch 'jemanden
aufsuchen, ihn bedrohen, angreifen') steht
search index eine reich bezeugte bedeutungsgruppe mehr
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
11. Evaluation of Information Retrieval Systems
9 |R∩P|
Recall =
|R|
|R∩P|
Precision =
relevant documents that have been retrieved |P|
(1+α)⋅(Recall ⋅ Precision )
Fα=
α⋅(Recall + Precision )
P
R
relevant documents retrieved documents
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
12. Semantic Search
(One of many Definitions...)
10 • Annotation of (text-based) metadata with semantic entities
• Entity-based Information Retrieval
• Make use of semantic relations, as e.g. content-based
similarities of relationships
• Interoperable metadata via semantic annotations
• for content-based description
• for structural / technical description (Multimedia Ontologies)
Overall Goal:
Quantitative and qualitative improvement of Information Retrieval
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
13. Semantic Search
Semantic metadata enable improvement of traditional keyword-
based retrieval by
(1) Query String Extension/Refinement
enables more precise or more complete search results
(2) Cross Referencing
enables to complement search results with additional associated
or similar information
(3) Exploratory Search
enables visualization and navigation of the search space
(4) Reasoning
enables to complement search results with implicitly given
information
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
14. Semantic Search
Query String Extension
12
• Keyword-based search does not deliver all search results that
are relevant for a query, because synonyms and metaphors might
describe the queried content.
• Extension of the original query string (Query Extension)
• from dictionaries and thesauri
• extend query with synonyms, hyponyms, etc.
• from domain ontologies
• extend query with meronyms, related concepts, etc.
Original query string: Bank
possible extensions: Bank ∨ depository financial institution
∨ credit union ∨ acquirer
∨ federal reserve ∨ ...
increase recall
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
15. Semantic Search
Query String Refinement
13
• Keyword-based search does also deliver search results that
are not relevant for a query, because query terms and
document terms might be ambiguous.
• Refinement of the original query string (Query Refinement)
• from dictionaries and thesauri
• disambiguate polysemic terms with hypernyms
• from domain ontologies
• disambiguate polysemic terms with holonyms
Original query string: Bank
possible refinements: (1) Bank ∧ financial institution
(2) Bank ∧ incline ∧ slope ∧ side
(3) Bank ∧ container
(4) Bank ∧ deposit ∧ repository
increase precision
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
16. Semantic Search
Cross Referencing
14
• Provide search results that do not literally contain the query
string but are closely related to the query by content
• Apply domain ontologies for determining related concepts
• Apply statistical analysis of large (text) document
corpora
dbprop:mission
dbpedia:Michael_Collins
dbpedia:Apollo_11
dbprop:mission dbprop:mission
Neil Armstrong dbpedia:Neil_Armstrong dbpedia:Buzz_Aldrin
NER
query string
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
17. Semantic Search
Exploratory Search
15 • Provide additional search results that do not necessarely contain
95
the query string but are related to the query by content or also
are related to the search results achieved by the direct
query
• Apply domain ontologies and heuristics to determine the
relevance of facts
dcterms:subject
category:Apollo_program
dbpedia:Apollo_11
dcterms:subject
dbpedia-owl:mission
dbpedia:Apollo_13
rdf:type
dbpedia:Neil_Armstrong
yago:Space_accidents_and_incidents
rdf:type
dbpedia:Space_Shuttle_Challenger
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
18. Semantic Search
Reasoning
16 • Provide additional search results (and information) that do not
95
necessarely contain the query string but are related to the
query by content, whereby the relation may not be a direct one,
but can be derived via entailment.
• Apply domain ontologies, reasoning algorithms and
heuristics to find new facts and determine the relevance of
facts
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
19. Semantic Search
Reasoning
17
95
Example: query string= Neil Armstrong
(Hard) questions to solve via reasoning:
• Will there be the Moon or documents about the Moon in the search results?
• How is Neil Armstrong related to the Moon? (is he?)
• Was Neil Armstrong (really) on the Moon?
• ...
category:Missions_to_the_Moon dcterms:subject
dcterms:subject
category:Exploration_of_the_Moon
dbpedia:Apollo_11 skos:broader
skos:broader
dbpedia-owl:mission category:Spaceflight
dbpedia:Neil_Armstrong category:Moon
dcterms:subject skos:broader
dbpedia:Moon category:Animals_in_Space
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
20. 18
08 - Exploratory Semantic Search
Open HPI - Course: Semantic Web Technologies - Lecture 6: Applications in the Web of Data
Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam