1. The document outlines an agenda for a seminar on the semantic web and machine learning including introductions, foundations of the semantic web, ontology learning, ontology mapping, semantic annotation, using ontologies, and applications.
2. It compares different data models including taxonomies, thesauruses, topic maps, and ontologies, noting that ontologies provide a formal specification of a shared conceptualization of a domain with defined relationships between concepts.
3. The seminar will cover how machine learning can be used to learn ontologies and ontology mappings from text to help integrate and converge information across devices and contexts on the semantic web.
This tutorial, given at CALIBER 2009 at Pondicherry University, aims at portraying the unlimited potential of a select set of Web based applications for libraries and information centers in a real world perspective by trying out open source solutions. It attempts to unleash and demystify the plethora of features and functionalities of some of the popular library management, digital library as well as OA Archive/Harvester applications such as KOHA, Greenstone, DSpace, OAI Harvester, Drupal etc..
Social network architecture - Part 3. Big data - Machine learningPhu Luong Trong
This document provides an overview of big data architecture and machine learning. It discusses:
1. The core components of a social network architecture including user data storage, activity systems, notifications, and external integration.
2. Definitions of big data from various sources focusing on the volume, velocity, and variety of large and complex data sets.
3. How machine learning is used for data analysis, applications like weather forecasting and search, and algorithms like supervised learning and decision trees.
This document discusses generating personalized web pages for tutoring systems using knowledge-based approaches. It covers key topics like ontologies, student modeling, cognitive psychology, and hypertext. Personalized web pages can be adapted based on a student's knowledge, learning style, goals, preferences and other factors inferred from their interactions. The document argues that web pages should be designed following principles of cognitive ergonomics and rhetoric to facilitate understanding and avoid issues like high cognitive load.
The document discusses information architecture and search patterns, providing definitions and principles of information architecture, examples of search patterns and design patterns, and discussing various approaches to search such as faceted navigation, auto-complete, and structured results. It also covers emerging areas like conversational search and augmented reality. The document is authored by Peter Morville and provides his contact information and links to relevant resources at the end.
Interaction Synchronicity in Web based Collaborative Learning SystemsAri Bader-Natal
Slides from "Interaction Synchronicity in Web-based Collaborative Learning Systems" talk at E-Learn 2009. Full paper available at http://aribadernatal.com/docs/badernatal_elearn2009_updated.pdf or http://www.editlib.org/p/32603
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
General Framework for the Rapid Development of Interactive Paper ApplicationsBeat Signer
Presentation given at CoPADD 2006, 1st International Workshop on Collaborating over Paper and Digital Documents, Banff, Canada, November 2006
ABSTRACT: We present a component-based framework that supports the rapid development of a wide variety of interactive paper applications. The framework includes authoring and publishing tools as well as a server that supports the linking of active areas on paper to a wide range of different media types and services.
Thesis Defense: Building a Semantic Web of Comic Book MetadataSean Petiya
Building a Semantic Web of Comic Book Metadata: User Application Profiles for Publishing Linked Data in HTML/RDFa
Kent State University - November 11, 2014
The objective of this research was to present a case study for developing a domain ontology, and explore methodologies for improving the usability and potential usage of that vocabulary through the development of interoperable metadata application profiles designed for specific groups of users within a community. This objective was realized by the development of a metadata vocabulary for comic books and comic book collections, and a series of metadata application profiles designed for publishing Linked Data in the content of existing information systems using HTML/RDFa. Semantic Web standards and technologies represent an opportunity for connecting data about comic books and graphic novels in LOD datasets with detailed, community-created data on the open Web. Recognizing the potential for an open exchange of data about comic books and graphic novels, a case study was designed to gain a comprehensive understanding of the domain and develop an effective data model. The initial phase of the study involved a review of information and reference resources, acquisition of example materials, and practical experience gained indexing comics in a collaborative Web database. A metamodel for comics was then developed and realized as an XML schema, with those elements mapped as properties to classes in an OWL ontology. In order to align the ontology with the wider Web environment and validate the model, the final phase of the case study explored external sources through a review of existing information systems and an analysis of their content. Results were then summarized as skeleton, data-driven user persona documents, which were used to guide the design of a series of metadata application profiles representing the functional requirements identified. The profiles build upon a core schema and incorporate elements from other Web vocabularies as necessary, focusing on publishing Linked Data in existing information systems using HTML/RDFa. Examples were explored and validated for their ability to link to LOD resources and produce meaningful, valid RDF data consistent with the Ontology. The final result is a flexible and extensible, semantic model for comics. The Comic Book Ontology (CBO) as an RDFS/OWL vocabulary is compatible with a variety of other systems, including next-generation library catalogs, where it can potentially be used in a collaborative exchange of data to describe relationships between comics material and content not previously available. This study demonstrates how an ontology can be applied to existing collaborative projects, database, content, or research to enhance the visibility, reference, and utilization of those endeavors through their publication as Linked Data.
This tutorial, given at CALIBER 2009 at Pondicherry University, aims at portraying the unlimited potential of a select set of Web based applications for libraries and information centers in a real world perspective by trying out open source solutions. It attempts to unleash and demystify the plethora of features and functionalities of some of the popular library management, digital library as well as OA Archive/Harvester applications such as KOHA, Greenstone, DSpace, OAI Harvester, Drupal etc..
Social network architecture - Part 3. Big data - Machine learningPhu Luong Trong
This document provides an overview of big data architecture and machine learning. It discusses:
1. The core components of a social network architecture including user data storage, activity systems, notifications, and external integration.
2. Definitions of big data from various sources focusing on the volume, velocity, and variety of large and complex data sets.
3. How machine learning is used for data analysis, applications like weather forecasting and search, and algorithms like supervised learning and decision trees.
This document discusses generating personalized web pages for tutoring systems using knowledge-based approaches. It covers key topics like ontologies, student modeling, cognitive psychology, and hypertext. Personalized web pages can be adapted based on a student's knowledge, learning style, goals, preferences and other factors inferred from their interactions. The document argues that web pages should be designed following principles of cognitive ergonomics and rhetoric to facilitate understanding and avoid issues like high cognitive load.
The document discusses information architecture and search patterns, providing definitions and principles of information architecture, examples of search patterns and design patterns, and discussing various approaches to search such as faceted navigation, auto-complete, and structured results. It also covers emerging areas like conversational search and augmented reality. The document is authored by Peter Morville and provides his contact information and links to relevant resources at the end.
Interaction Synchronicity in Web based Collaborative Learning SystemsAri Bader-Natal
Slides from "Interaction Synchronicity in Web-based Collaborative Learning Systems" talk at E-Learn 2009. Full paper available at http://aribadernatal.com/docs/badernatal_elearn2009_updated.pdf or http://www.editlib.org/p/32603
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
General Framework for the Rapid Development of Interactive Paper ApplicationsBeat Signer
Presentation given at CoPADD 2006, 1st International Workshop on Collaborating over Paper and Digital Documents, Banff, Canada, November 2006
ABSTRACT: We present a component-based framework that supports the rapid development of a wide variety of interactive paper applications. The framework includes authoring and publishing tools as well as a server that supports the linking of active areas on paper to a wide range of different media types and services.
Thesis Defense: Building a Semantic Web of Comic Book MetadataSean Petiya
Building a Semantic Web of Comic Book Metadata: User Application Profiles for Publishing Linked Data in HTML/RDFa
Kent State University - November 11, 2014
The objective of this research was to present a case study for developing a domain ontology, and explore methodologies for improving the usability and potential usage of that vocabulary through the development of interoperable metadata application profiles designed for specific groups of users within a community. This objective was realized by the development of a metadata vocabulary for comic books and comic book collections, and a series of metadata application profiles designed for publishing Linked Data in the content of existing information systems using HTML/RDFa. Semantic Web standards and technologies represent an opportunity for connecting data about comic books and graphic novels in LOD datasets with detailed, community-created data on the open Web. Recognizing the potential for an open exchange of data about comic books and graphic novels, a case study was designed to gain a comprehensive understanding of the domain and develop an effective data model. The initial phase of the study involved a review of information and reference resources, acquisition of example materials, and practical experience gained indexing comics in a collaborative Web database. A metamodel for comics was then developed and realized as an XML schema, with those elements mapped as properties to classes in an OWL ontology. In order to align the ontology with the wider Web environment and validate the model, the final phase of the case study explored external sources through a review of existing information systems and an analysis of their content. Results were then summarized as skeleton, data-driven user persona documents, which were used to guide the design of a series of metadata application profiles representing the functional requirements identified. The profiles build upon a core schema and incorporate elements from other Web vocabularies as necessary, focusing on publishing Linked Data in existing information systems using HTML/RDFa. Examples were explored and validated for their ability to link to LOD resources and produce meaningful, valid RDF data consistent with the Ontology. The final result is a flexible and extensible, semantic model for comics. The Comic Book Ontology (CBO) as an RDFS/OWL vocabulary is compatible with a variety of other systems, including next-generation library catalogs, where it can potentially be used in a collaborative exchange of data to describe relationships between comics material and content not previously available. This study demonstrates how an ontology can be applied to existing collaborative projects, database, content, or research to enhance the visibility, reference, and utilization of those endeavors through their publication as Linked Data.
Ontology learning tools aim to automate the process of building ontologies from various data sources using machine learning and other AI techniques. Most current tools are semi-automatic and require human validation and input. They can learn from text alone using natural language processing, from text combined with existing ontologies, or from structured knowledge bases and ontologies. However, ontology learning remains a challenging task and current tools have limitations such as requiring large amounts of high-quality input data and rules specified by experts.
This document outlines steps towards building an ontology-based learning environment. It begins by defining an ontology as a way to capture shared knowledge that can be reused across applications and groups. It then presents an ontology model that maps educational concepts like curriculum development, learning technology selection, and competency mapping. The document describes how an ontology could enable adaptive testing, context-based learning, and human resource processes like recruitment based on competencies. Overall, the document argues that an ontology-based approach could provide a formal way to structure and share educational knowledge.
An Ontology for Learning Services on the Shop FloorCarsten Ullrich
An ontology expresses a common understanding of a domain that serves as a basis of communication between people or systems, and enables knowledge sharing, reuse of domain knowledge, reasoning and thus problem solving. In Technology-Enhanced Learning, especially in Intelligent Tutoring Systems and Adaptive Learning Environments, ontologies serve as the basis of adaptivity and personalization. For mathematics learning and similarly structured domains, ontologies and their usage for adaptive learning are well understood and established. This contribution presents an ontology for the industrial shop floor (the area of a factory where operatives assemble products) and illustrates its usage in several learning services.
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
Este documento presenta un proyecto de educación vial llamado "Movilidad Segura" desarrollado por la Institución Educativa María Mancilla Sánchez. El proyecto busca implementar estrategias para mejorar la seguridad de los estudiantes peatones a través de capacitaciones, una patrulla escolar de tránsito y multiplicadores. El proyecto se basa en decretos y leyes nacionales sobre educación vial y servicio social. Su objetivo es formar a los estudiantes en comportamientos seguros para el tráns
An updated "what is happening on the Semantic Web" presentation for 2010 - includes business use, government use, and some speculation on the current areas of excitement and development. A very accessible talk, not aimed solely at a technical audience.
This document provides an overview of ontologies, semantic web technologies, and their applications. It discusses syntactic web limitations and the need to add semantics. Key concepts covered include ontology, RDF, RDFS, OWL, Protege, and how these technologies enable a global linked database by semantically connecting data on the web.
Practical Semantic Web and Why You Should Care - DrupalCon DC 2009Boris Mann
Presented at Drupalcon DC 2009 - http://dc2009.drupalcon.org/session/practical-semantic-web-and-why-you-should-care
An overview of Semantic Web concepts and RDF. Exploration of RDFa. How open data fits. Examples of modules and functionality in Drupal today, and a plan for Drupal 7.
Ontology Web services for Semantic ApplicationsTrish Whetzel
The document describes ontology web services created by the National Center for Biomedical Ontology to facilitate the application of ontologies in biomedical science. The services provide access to ontologies and related functions like searching, term details, hierarchies and mappings. Additional services allow the creation of ontology-based annotations using tools like the annotator and ontology recommender. All services are accessible via RESTful web APIs.
The Semantic Web (and what it can deliver for your business)Knud Möller
3-hour talk I gave on behalf of Social Bits and the Irish Internet Association (IIA). Contains an introduction to the general idea of the Semantic Web and Linked Data, its relevance and opportunities for businesses, and a look under the hood - how does it all work?
We present Fresnel Forms, a plugin we developed for Protégé, an editor for Semantic Web ontologies. The Fresnel Forms plugin processes the currently active ontology in a Protégé session to export a semantic wiki for that ontology. This export uses Semantic MediaWiki’s XML-based export format for import into an existing wiki. Fresnel Forms also provides a GUI editor to let the user fine-tune the generated interface before exporting it to a wiki.
Fresnel Forms exports use features from Semantic MediaWiki and Semantic Forms to provide an annotate-and-browse data system interface. Each wiki Fresnel Forms generates provides forms for entering data for classes and fields that conform to the original ontology. Templates provide displays of pages created with these forms. Finally, the wiki’s ExportRDF feature creates Semantic Web triples for the data entered that use URI’s from the original ontology. Fresnel Forms provides thus an efficient way to create a wiki for populating a given Semantic Web ontology.
Fresnel Forms can be downloaded and installed on Protégé from http://is.cs.ou.nl/OWF/index.php5/Fresnel_Forms
The document introduces the Semantic Web and how it allows for the integration and merging of disparate datasets. It provides an example of merging two bookstore datasets that have similar information but are structured differently. By exporting the datasets as RDF triples, mapping identical resources, and adding a few statements to link equivalent terms, the datasets can be merged. This allows for new queries to be answered by combining information from both original datasets. The Semantic Web provides technologies to automate this kind of data integration and enable more powerful queries across multiple sources of data.
This document discusses adding semantic structure to real-time social data from Twitter through Twitter Annotations. It describes how Annotations can be mapped to existing Semantic Web vocabularies and linked to datasets to enable real-time semantic search over social and linked data. A system called TwitLogic is presented that captures Twitter data, converts it to RDF, and publishes it as linked streams to allow for continuous querying and integration with the live Semantic Web.
The document discusses the evolution of the World Wide Web towards a Semantic Web, where computers will be able to understand the meaning, context and relationships between data on web pages. It provides an example of how Semantic Web coding could link together different web pages about a professor by relating her faculty page, research, blog and staff listing. This creates a richer experience for users by making more information accessible in an interconnected way. The document then outlines some methods for implementing Semantic Web coding, such as using RDF triples or microformats, and provides examples of microformats being used on web pages.
The document discusses using NOSQL techniques like MapReduce to perform sentiment analysis on blog data by:
1) Accessing blog documents in parallel using MapReduce;
2) Parsing the documents into word lists using natural language processing techniques in MapReduce;
3) Creating histograms of word frequencies to construct feature vectors representing each document.
Open Source Web Content Management Technologies for LibrariesAnil Mishra
This document provides an agenda and overview for an open source web content management technologies pre-conference tutorial focused on libraries. The agenda covers topics including the current information landscape, open source overview, categories of open source software for libraries, and several specific open source digital library systems and content management platforms. An overview of each topic is provided along with considerations around selecting and implementing open source solutions for libraries.
Ontology learning tools aim to automate the process of building ontologies from various data sources using machine learning and other AI techniques. Most current tools are semi-automatic and require human validation and input. They can learn from text alone using natural language processing, from text combined with existing ontologies, or from structured knowledge bases and ontologies. However, ontology learning remains a challenging task and current tools have limitations such as requiring large amounts of high-quality input data and rules specified by experts.
This document outlines steps towards building an ontology-based learning environment. It begins by defining an ontology as a way to capture shared knowledge that can be reused across applications and groups. It then presents an ontology model that maps educational concepts like curriculum development, learning technology selection, and competency mapping. The document describes how an ontology could enable adaptive testing, context-based learning, and human resource processes like recruitment based on competencies. Overall, the document argues that an ontology-based approach could provide a formal way to structure and share educational knowledge.
An Ontology for Learning Services on the Shop FloorCarsten Ullrich
An ontology expresses a common understanding of a domain that serves as a basis of communication between people or systems, and enables knowledge sharing, reuse of domain knowledge, reasoning and thus problem solving. In Technology-Enhanced Learning, especially in Intelligent Tutoring Systems and Adaptive Learning Environments, ontologies serve as the basis of adaptivity and personalization. For mathematics learning and similarly structured domains, ontologies and their usage for adaptive learning are well understood and established. This contribution presents an ontology for the industrial shop floor (the area of a factory where operatives assemble products) and illustrates its usage in several learning services.
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
Este documento presenta un proyecto de educación vial llamado "Movilidad Segura" desarrollado por la Institución Educativa María Mancilla Sánchez. El proyecto busca implementar estrategias para mejorar la seguridad de los estudiantes peatones a través de capacitaciones, una patrulla escolar de tránsito y multiplicadores. El proyecto se basa en decretos y leyes nacionales sobre educación vial y servicio social. Su objetivo es formar a los estudiantes en comportamientos seguros para el tráns
An updated "what is happening on the Semantic Web" presentation for 2010 - includes business use, government use, and some speculation on the current areas of excitement and development. A very accessible talk, not aimed solely at a technical audience.
This document provides an overview of ontologies, semantic web technologies, and their applications. It discusses syntactic web limitations and the need to add semantics. Key concepts covered include ontology, RDF, RDFS, OWL, Protege, and how these technologies enable a global linked database by semantically connecting data on the web.
Practical Semantic Web and Why You Should Care - DrupalCon DC 2009Boris Mann
Presented at Drupalcon DC 2009 - http://dc2009.drupalcon.org/session/practical-semantic-web-and-why-you-should-care
An overview of Semantic Web concepts and RDF. Exploration of RDFa. How open data fits. Examples of modules and functionality in Drupal today, and a plan for Drupal 7.
Ontology Web services for Semantic ApplicationsTrish Whetzel
The document describes ontology web services created by the National Center for Biomedical Ontology to facilitate the application of ontologies in biomedical science. The services provide access to ontologies and related functions like searching, term details, hierarchies and mappings. Additional services allow the creation of ontology-based annotations using tools like the annotator and ontology recommender. All services are accessible via RESTful web APIs.
The Semantic Web (and what it can deliver for your business)Knud Möller
3-hour talk I gave on behalf of Social Bits and the Irish Internet Association (IIA). Contains an introduction to the general idea of the Semantic Web and Linked Data, its relevance and opportunities for businesses, and a look under the hood - how does it all work?
We present Fresnel Forms, a plugin we developed for Protégé, an editor for Semantic Web ontologies. The Fresnel Forms plugin processes the currently active ontology in a Protégé session to export a semantic wiki for that ontology. This export uses Semantic MediaWiki’s XML-based export format for import into an existing wiki. Fresnel Forms also provides a GUI editor to let the user fine-tune the generated interface before exporting it to a wiki.
Fresnel Forms exports use features from Semantic MediaWiki and Semantic Forms to provide an annotate-and-browse data system interface. Each wiki Fresnel Forms generates provides forms for entering data for classes and fields that conform to the original ontology. Templates provide displays of pages created with these forms. Finally, the wiki’s ExportRDF feature creates Semantic Web triples for the data entered that use URI’s from the original ontology. Fresnel Forms provides thus an efficient way to create a wiki for populating a given Semantic Web ontology.
Fresnel Forms can be downloaded and installed on Protégé from http://is.cs.ou.nl/OWF/index.php5/Fresnel_Forms
The document introduces the Semantic Web and how it allows for the integration and merging of disparate datasets. It provides an example of merging two bookstore datasets that have similar information but are structured differently. By exporting the datasets as RDF triples, mapping identical resources, and adding a few statements to link equivalent terms, the datasets can be merged. This allows for new queries to be answered by combining information from both original datasets. The Semantic Web provides technologies to automate this kind of data integration and enable more powerful queries across multiple sources of data.
This document discusses adding semantic structure to real-time social data from Twitter through Twitter Annotations. It describes how Annotations can be mapped to existing Semantic Web vocabularies and linked to datasets to enable real-time semantic search over social and linked data. A system called TwitLogic is presented that captures Twitter data, converts it to RDF, and publishes it as linked streams to allow for continuous querying and integration with the live Semantic Web.
The document discusses the evolution of the World Wide Web towards a Semantic Web, where computers will be able to understand the meaning, context and relationships between data on web pages. It provides an example of how Semantic Web coding could link together different web pages about a professor by relating her faculty page, research, blog and staff listing. This creates a richer experience for users by making more information accessible in an interconnected way. The document then outlines some methods for implementing Semantic Web coding, such as using RDF triples or microformats, and provides examples of microformats being used on web pages.
The document discusses using NOSQL techniques like MapReduce to perform sentiment analysis on blog data by:
1) Accessing blog documents in parallel using MapReduce;
2) Parsing the documents into word lists using natural language processing techniques in MapReduce;
3) Creating histograms of word frequencies to construct feature vectors representing each document.
Open Source Web Content Management Technologies for LibrariesAnil Mishra
This document provides an agenda and overview for an open source web content management technologies pre-conference tutorial focused on libraries. The agenda covers topics including the current information landscape, open source overview, categories of open source software for libraries, and several specific open source digital library systems and content management platforms. An overview of each topic is provided along with considerations around selecting and implementing open source solutions for libraries.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
Presentation at the EarthCube Face Face-to-Face Workshop of Semantics & Ontologies Workgroup: April 30-May 1, 2012, Ballston, VA.
Workshop site: http://earthcube.ning.com/group/semantics-and-ontologies/page/workshops
For more recent material on this topic, see: http://wiki.knoesis.org/index.php/PCS
The document discusses search based applications for navigating large amounts of data in the cloud. It outlines challenges around performance, usability, and dealing with big data. It then describes how semantic processing can be used to analyze, transform and retrieve information from various data types like text, images, audio and video. Examples are provided of applications that enrich navigation context, handle multimedia, and link related data. The document argues that search based approaches can help address issues of usability, agility and scalable performance when accessing large distributed datasets in the cloud.
Metadata and Taxonomies for More Flexible Information Architecture jrhowe
The document discusses the foundations of information architecture (IA) and how principles of library and information science can be applied to design corporate websites and intranets. It defines key concepts of IA like organization systems, navigation systems, labeling systems, and searching systems. The presentation also outlines why IA is important to reduce costs for users and organizations as well as improve learning across information systems.
The document discusses why technical communicators should care about metadata. It notes that metadata helps users find the right information through search and filtering. When structured properly through topics and relationships, metadata can help manage content, create conditional publications, and interface with machines. The presentation provides tips for technical communicators such as defining style guides and processes for metadata, structuring information, and letting publishing engines utilize metadata to their full potential.
Semantic Web research anno 2006:main streams, popular falacies, current statu...Frank van Harmelen
This keynote at the Cooperative Intelligent Agents Workshop was a good opportunity to give my view on the current state of Semantic Web research: what is it about, what is it not about, what has been achieved, what remains to be done. (Includes the now infamous slide "What's it like to be a machine")
This document provides an overview of the Demystifying OWL tutorial. The tutorial will explain description logics and the OWL family of ontology languages. It will cover the makeup of description logics, including the TBox (terminology) and ABox (assertions). The tutorial will also discuss OWL 1 and OWL 2, the open versus closed world assumption, the unique name assumption, and available tools and resources. The goal is to help attendees fully understand the application of semantic web and ontology technologies in model-driven software development.
This document discusses developments in text analytics from 2011-2012. It covers major acquisitions in the field, the rise of big data and APIs/platforms, and progress on tasks like information extraction, knowledge integration, and semantic search. It also outlines future directions, including sentiment analysis beyond polarity, identity resolution, speech analytics, and augmented reality interfaces. The overall focus is on generating signals from text to power applications and provide richer information and experiences.
The document discusses semantic web technologies including linked data and SPARQL. It describes how the semantic web allows sharing and connecting data across applications through common data formats and languages to describe relationships between data and real world objects. Linked data follows principles like using URIs to identify resources and HTTP URIs to look up related data through dereferenceable links, enabling exploration of a web of connected data.
EclipseConEurope2012 SOA - Models As Operational DocumentationMarc Dutoo
At Eclipse Con Europe 2012 in the SOA Symposium track, JWT's EMF model export to structure and information in Document Management Systems is explained and demonstrated for in the case of the EasySOA service documentation registry, with JWT workflows producing a basis for SOA operational documentation.
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
Neo4j is a graph database that stores data in nodes and relationships. It allows for efficient querying of connected data through graph traversals. Key aspects include nodes that can contain properties, relationships that connect nodes and also contain properties, and the ability to navigate the graph through traversals. Neo4j provides APIs for common graph operations like creating and removing nodes/relationships, running traversals, and managing transactions. It is well suited for domains that involve connected, semi-structured data like social networks.
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
Presentation about using Neo4j from Django presented at OSCON 2010, Portland OR.
Sample code is available at: https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/djangosites/blog/
1. The document discusses using ontologies in intelligent tutoring systems to generate personalized web pages for students based on their knowledge level, learning style, and other attributes in their student model.
2. Information is extracted from the web and annotated with metadata. Relevant concepts, facts, and metaphors are identified and dynamically included in generated web pages structured around the domain ontology.
3. This allows the system to continuously update based on new information from the web while ensuring coherence and understanding by reflecting the conceptual structure of the domain for the learner.
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
The document discusses the need for improved user interfaces and tools to help non-technical people interact with and understand semantic models and ontologies. It notes that current tools have limitations and outlines key human interactions with ontologies, including understanding existing models, developing new models, and adding and modifying data according to models. The Live OWL Documentation Environment (LODE) is introduced as a tool aiming to automatically generate ontology documentation to help people better understand ontologies with minimal effort.
An Introduction to Semantic Web TechnologyAnkur Biswas
The document provides an overview of the semantic web and some of its key challenges. It discusses:
1) The evolution of the world wide web from a web of documents to a web of linked data through technologies like RDF, OWL, and SPARQL that add semantic meaning.
2) The vision for the semantic web is to publish machine-readable data using common formats so that information can be automatically processed by agents and integrated across sources.
3) Some challenges in realizing this vision include dealing with implicit knowledge, heterogeneous data distributions, and maintaining links and correctness over time as data changes.
The document discusses challenges with information architecture (IA) projects that lack proper semantic structures. It presents a three-layer architecture model with a middle semantic structure layer to address this. This layer is best implemented using semantic web standards like Topic Maps or RDF/OWL to define relationships between information categories. When implemented correctly in content management systems (CMS), strong semantic structures improve search capabilities by making relationships between articles explicit.
Similar to Semantic Web and Machine Learning Tutorial (20)
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. Agenda
Semantic Web and
Machine Learning Tutorial • Introduction
• Foundations of the Semantic Web
• Ontology Learning
• Learning Ontology Mapping
• Semantic Annotation
Steffen Staab Andreas Hotho
ISWeb – Information Knowledge and Data Engineering Group • Using Ontologies
Systems and Semantic Web University of Kassel
University of Koblenz Germany • Applications
Germany
2
Syntax is not enough Information Convergence
• Convergence not just in devices, also in “information”
– Your personal information (phone, PDA,…)
Calendar, photo, home page, files…
– Your “professional” life (laptop, desktop, … Grid)
Web site, publications, files, databases, …
– Your “community” contexts (Web)
Hobbies, blogs, fanfic, social networks…
• The Web teaches us that people will work to share
– How do we CREATE, SEARCH, and BROWSE in the non-text
Andreas based parts of our lives?
• Tel
• E-Mail
3 4
2. Meaning of Informationen: XML ≠ Meaning, XML = Structure
(or: what it means to be a computer)
name ναµε
< name >
education < education>
<εδυχατιον>
CV Χς
< CV >
work < work>
<ωορκ>
private < private >
<πριϖατε>
5 6
Source of Problems (One) Layer Model of the Semantic Web
XML is unspecific:
No predetermined vocabulary
No semantics for relationships
& must be specified upfront
Only possible in close cooperations
– Small, reasonably stable group
– Common interests or authorities
Not possible in the Web or on a broad scale in
general !
7 8
3. Some Principal Ideas What is an Ontology?
• URI – uniform resource identifiers Gruber 93:
• XML – common syntax
• Interlinked An Ontology is a
• Layers of semantics – Tim Berners- formal specification ⇒ Executable
from database to Lee, Weaving
of a shared ⇒ Group of persons
the Web
knowledge base to
proofs conceptualization ⇒ About concepts
of a domain of interest ⇒ Between application
and „unique truth“
Design principles of WWW applied to Semantics!!
9 10
Menu Menu
Taxonomy Thesaurus
Object Object
Person Topic Document Person Topic Document
Student Researcher Semantics Student Researcher Semantics
Doctoral Student PhD Student F-Logic Ontology Doktoral Student PhD Student F-Logic Ontology
synonym similar
Taxonomy := Segmentation, classification and ordering of • Terminology for specific domain
elements into a classification system according to their • Graph with primitives, 2 fixed relationships (similar, synonym)
relationships between each other • originate from bibliography
11 12
4. Menu
Topic Map Ontology (in our sense)
Object
is_a
Object
knows described_in
Person Topic Document
knows described_in
Person Topic Document is_a
writes
writes
Student Researcher Semantics F-Logic Ontology
Student Researcher Semantics is_a
subTopicOf similar
Affiliation
Doktoral Student PhD Student
PhD Student
PhD Student F-Logic Ontology Rules
Doktoral Student PhD Student F-Logic Ontology instance_of
T described_inD
similar T is_about D
Tel Affiliation
synonym similar
Tel Affiliation York Sure
P writes D is_about T P knows T
+49 721 608 6592 AIFB
• Topics (nodes), relationships and occurences (to documents)
• ISO-Standard
• Representation Language: Predicate Logic (F-Logic)
• typically for navigation- and visualisation
• Standards: RDF(S); coming up standard: OWL
13 14
The Semantic Web What’s in a link? Formally
cooperatesWith
cooperatesWith
Ontology rdfs:Domain rdfs:Range
Person
Person W3C recommendations
rdfs:subClass
Employee
Employee
• RDF: an edge in a graph
rdfs:subClass
PostDoc
rdfs:subClass
• OWL: consistency (+subsumption+classif. + …)
PostDoc Professor
Professor
rdf:type
rdf:type
<swrc:PostDoc rdf:ID="person_sha">
<swrc:name>Siegfried
<swrc:Professor Currently under discussion
• Rules: a deductive database
Handschuh</swrc:name> rdf:ID="person_sst">
Meta- <swrc:cooperatesWith rdf:resource =
<swrc:name>Steffen Staab
</swrc:name>
data
"http://www.uni-koblenz.de/~staab
#person_sst"/>
...
</swrc:Professor>
swrc:cooperatesWith
...
Currently under intense research
</swrc:PostDoc>
Web
• Proof: worked-out proofs
page • Trust: signature & everything working together
15 16
URL http://www.aifb.uni-karlsruhe.de/WBS/sha http://www.aifb.uni-karlsruhe.de/WBS/sst
5. What’s in a link? Informally Ontologies and their Relatives (I)
• There are many relatives around:
• RDF: pointing to shared data
• OWL: shared terminology – Controlled vocabularies, thesauri and classification systems available
in the WWW, see http://www.lub.lu.se/metadata/subject-help.html
• Classification Systems (e.g. UNSPSC, Library Science, etc.)
• Thesauri (e.g. Art & Architecture, Agrovoc, etc.)
• Rules: if-then-else conditions • DMOZ Open Directory http://www.dmoz.org
– Lexical Semantic Nets
• WordNet, see http://www.cogsci.princeton.edu/~wn/
• Proof: proof already shown • EuroWordNet, see http://www.hum.uva.nl/~ewn/
– Topic Maps, http://www.topicmaps.org (e.g. used within knowledge
• Trust: reliability management applications)
• In general it is difficult to find the border line!
17 18
Ontologies and their Relatives (II) Ontologies - Some Examples
• General purpose ontologies:
– WordNet / EuroWordNet, http://www.cogsci.princeton.edu/~wn
– The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html
General – IEEE Standard Upper Ontology, http://suo.ieee.org/
Formal logical • Domain and application-specific ontologies:
Thesauri Is-a Frames constraints – RDF Site Summary RSS, http://groups.yahoo.com/group/rss-dev/files/schema.rdf
Catalog / ID – UMLS, http://www.nlm.nih.gov/research/umls/
– GALEN
– SWRC – Semantic Web Research Community: http://ontoware.org/projects/swrc/
– RETSINA Calendering Agent, http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf
– Dublin Core, http://dublincore.org/
Terms/ Informal Formal Value • Web Services Ontologies
Is-a Axioms – Core ontology of services http://cos.ontoware.org
Glossary Instance Restric- Disjoint – Web Service Modeling ontology http://www.wsmo.org
tions Inverse
– DAML-S
• Meta-Ontologies
Relations, – Semantic Translation, http://www.ecimf.org/contrib/onto/ST/index.html
... – RDFT, http://www.cs.vu.nl/~borys/RDFT/0.27/RDFT.rdfs
– Evolution Ontology, http://kaon.semanticweb.org/examples/Evolution.rdfs
• Ontologies in a wider sense
– Agrovoc, http://www.fao.org/agrovoc/
– Art and Architecture, http://www.getty.edu/research/tools/vocabulary/aat/
– UNSPSC, http://eccma.org/unspsc/
– DTD standardizations, e.g. HR-XML, http://www.hr-xml.org/
19 20
6. Tools for markup... Not tied to specific domains
PhotoStuff Demo
21 22
Not tied to specific domains Shared Workspace (Xarop + Screenshot)
Shape Visual
VDE plug-in Shape
erasure Descriptor
launch selection
selection
Save Shape Color
selection Descriptor
Prototype
extraction
Instances
Domain
Ontology
Browser Selected
region
Draw panel
M-OntoMat is publicly available
http://acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html
23 24
7. Social networks:
Coming sooner than you may think… e.g. Friend of a Friend (FOAF)
• Say stuff about yourself (or others) in OWL files,
link to who you “know”
25 Estimates of the number of Foaf users range from 2M-5M 26
Using FOAF in other contexts Get a B&N price (In Euros)
Jennifer Golbeck
27 28
http://trust.mindswap.org
8. Of a particular book In its German edition?
29 30
The Semantic Wave
YOU
ARE
HERE
2005
YOU
ARE
HERE
2003
(Berners-Lee, 03)
31 32
9. Now. The semantic web and machine learning
• RDF, RDFS and OWL are ready for prime time What can machine learning do for What can the Semantic Web do
the Semantic Web? for Machine Learning?
– Designs are stable, implementations maturing
• Major Research investment translating into application 1. Learning Ontologies 1. Lots and lots of tools to
development and commercial spinoffs (even if not fully automatic) describe and exchange data
2. Learning to map between for later use by machine
– Adobe 6.0 embraces RDF learning methods in a
ontologies
– IBM releases tools, data and partnering 3. Deep Annotation: Reconciling canonical way!
– HP extending Jena to OWL databases and ontologies 2. Using ontological structures
– OWL Engines by Ontoprise GmbH, Network Inference, Racer GmbH 4. Annotation by Information to improve the machine
Extraction learning task
– Proprietary OWL ontologies for vertical markets
5. Duplicate recognition 3. Provide background
• c.f. pharmacology, HMO/health care, ... Soft drinks
knowledge to guide machine
– Several new starts in SW space learning
33 34
Foundations of the Semantic Web: References Agenda
• Semantic Web Activity at W3C http://www.w3.org/2001/sw/
• www.semanticweb.org (currently relaunched) • Introduction
• Journal of Web Semantics
• D. Fensel et al.: Spinning the Semantic Web: Bringing the World Wide Web to Its Full • Foundations of the Semantic Web
Potential, MIT Press 2003
• G. Antoniou, F. van Harmelen. A Semantic Web Primer, MIT Press 2004. • Ontology Learning
• S. Staab, R. Studer (eds.). Handbook on Ontologies. Springer Verlag, 2004.
•
•
S. Handschuh, S. Staab (eds.). Annotation for the Semantic Web. IOS Press, 2003.
International Semantic Web Conference series, yearly since 2002, LNCS
• Learning Ontology Mapping
• World Wide Web Conference series, ACM Press, first Semantic Web papers since
1999
• Semantic Annotation
• York Sure, Pascal Hitzler, Andreas Eberhart, Rudi Studer, The Semantic Web in One
Day, IEEE Intelligent Systems, • Using Ontologies
http://www.aifb.uni-karlsruhe.de/WBS/phi/pub/sw_inoneday.pdf
• Applications
• Some slides have been stolen from various places, from Jim Hendler and Frank van
Harmelen, in particular.
35 36
10. The OL Layer Cake How do people acquire taxonomic knowledge?
• I have no idea!
∀x, y (married ( x, y ) → love( x, y )) Rules
• But people apply taxonomic reasoning!
Relations – „Never do harm to any animal!“
cure(dom:DOCTOR,range:DISEASE)
=> „Don‘t do harm to the cat!“
is_a(DOCTOR,PERSON) Concept Hierarchies
• More difficult questions:
DISEASE:=<I,E,L> Concepts – representation
– reasoning patterns
{disease,illness} Synonyms
• But let‘s speculate a bit! ;-)
disease, illness, hospital Terms
37 38
How do people acquire taxonomic knowledge? How do people acquire taxonomic knowledge?
What is liver cirrhosis? What is liver cirrhosis?
Diseases such as liver cirrhosis are
Mr. Smith died from liver cirrhosis. difficult to cure. (New York Times)
Mr. Jagger suffers from liver cirrhosis.
Alcohol abuse can lead to liver cirrhosis.
=>prob(isa(liver cirrhosis,disease))
39 40
11. How do people acquire taxonomic knowledge? Evaluation of Ontology Learning
The apriori approach is based on a gold standard ontology:
– Given an ontology modeled by an expert
-> The so called gold standard
What is liver cirrhosis? – Compare the learned ontology with the gold standard
Cirrhosis: noun[uncountable] • Which methods exists:
serious disease of the liver, – learning accuracy/precision/recall/f-measure
often caused by drinking too – Count edges in the “ontology graph”
• Counting of direct relation only (Reinberger et.al. 2005)
much alcohol • Least common superconcept
• Semantic cotopy
• …
liver cirrhosis ≈ cirrhosis ∧ isa(cirrhosis, disease) – Evaluation via application (cf. section using ontologies)
→ prob(isa(liver cirrhosis, disease)) 41 42
The Semantic Cotopy Example for SC
bookable root
SC (c, O) = {c' | c' ≤ O c ∨ c ≤ O c'} rentable joinable thing activity
driveable appartment excursion trip vehicle appartment excursion trip
rideable car TWV car
bike bike
[Maedche & Staab 02] SC(bike)={bike,rideable,driveable.rentable,bookable} SC(bike)={bike,TWV,vehicle,thing,root}
43 => TO(bike,O1,O2)=1/9!!! 44
12. Common Semantic Cotopy Example for SC‘
bookable root
SC ' (c, O1 , O2 ) = {c' | c'∈ C1 ∩ C2 ∧ (c' ≤ O1 c ∨ c ≤ O1 c' )} rentable joinable thing activity
driveable appartment excursion trip vehicle appartment excursion trip
rideable car TWV car
bike bike
SC‘(driveable)={bike,car} SC‘(vehicle)={bike,car}
45
=> TO(driveable,O1,O2)=1 46
One more Example Semantic Cotopy Revisited (Once More)
root
SC ' ' (c, O1 , O2 ) = {c' | c'∈ C1 ∩ C2 ∧ (c' > O1 c ∨ c < O1 c' )}
thing activity
car bike apartment excursion trip
vehicle appartment excursion trip
1
TWV car TO (O1 , O2 ) = ∑ TO(c, O1 , O2 )
| C1 | c∈C1 ,∉C2
bike
SC‘(car)={car} SC‘(vehicle)={bike,car}
=> TO(driveable,O1,O2)=1/2
47 48
13. Example for Precision/Recall Example for Precision/Recall
P=100% P=100%
bookable root
bookable root
rentable joinable thing activity
rentable joinable thing activity
driveable appartment excursion trip vehicle appartment excursion trip
driveable appartment excursion trip vehicle appartment excursion trip
rideable car TWV car
bike car TWV car
bike bike
F=100% R=87,5% bike
F=93.33%
R=100%
49 50
Example for Precision/Recall Another Example
P=90% P=100%
bookable root
root
rentable joinable thing activity
thing activity
car bike apartment excursion trip
driveable appartment planable trip vehicle appartment excursion trip
vehicle appartment excursion trip
rideable car excursion TWV car
TWV car
bike bike
F=94.74% bike F=57.14%
R=40%
R=100% 51 52
14. Evaluation Methodology Lexical Recall and F‘
1
TO (O1 , O2 ) = ∑ TO(c, O1, O2 )
| C1 | c∈C1
| CO1 ∩ CO2 |
⎧ TO ' (c, O1 , O2 ) if c ∈ C2 LR (O1 , O2 ) =
TO (c, O1 , O2 ) = ⎨ | CO2 |
⎩TO ' ' (c, O1 , O2 ) if c ∉ C2
| SC (c, O1 , O2 ) ∩ SC (c, O2 , O1 ) | 2 * F (O1 , O2 ) * LR (O1 , O2 )
TO ' (c, O1 , O2 ) := F ' (O1 , O2 ) =
| SC (c, O1 , O2 ) ∪ SC (c, O2 , O1 ) | ( F (O1 , O2 ) + LR (O1 , O2 ))
| SC (c, O1 , O2 ) ∩ SC (c' , O2 , O1 ) |
TO ' ' (c, O1 , O2 ) := max c '∉C2
| SC (c, O1 , O2 ) ∪ SC (c' , O2 , O1 ) |
P (O1 , O 2 ) = TO (O1 , O2 )
R (O1 , O 2 ) = TO (O2 , O1 )
2 ⋅ P(O1 , O2 ) ⋅ R (O1 , O2 )
F (O1 , O2 ) =
P (O1 , O2 ) + R (O1 , O2 )
53 54
Evaluation of Ontology Learning Starting Point in OL from text
• The aposteriori Approach: • Context-based approaches:
– ask domain expert for a per concept evaluation of the learned – Distributional Hypothesis [Harris 85]:
ontology „Words are (semantically) similar to the
– Count three categories of concepts: extent to which they appear in similar (syntactic) contexts“
• Correct : both in learned and the gold ontology – leads to creation of groups
• New : only in learned ontology, but relevant and should be in gold
standard as well
• Spurious: useless • Looking for explicit information:
– Compute precision = (correct + new) / (correct + new + – Texts
spurious) – WWW
• As the result: – Thesauri
The a priori evaluations are aweful – BUT
A posteriori evaluations by domain experts still show
very good results, very helpful for domain expert!
Sabou M., Wroe C., Goble C. and Mishne G.,Learning Domain Ontologies for Web Service Descriptions: an
Experiment in Bioinformatics, In Proceeedings of the 14th International World Wide Web Conference (WWW2005), 55 56
Chiba, Japan, 10-14 May, 2005.
15. Looking for explicit information Pattern based approaches (Hearst Patterns)
There are two sources: • Match patterns in corpus:
• NP0 such as NP1 ... NPn-1 (and|or) NPn
• such NP0 as NP1 ... NPn-1 (and|or) NPn
• Looking for patterns in texts: • NP1 ... NPn (and|or) other NP0
– ‚is-a‘ patterns [Hearst 92,98],[Poesio et al. 02], [Ahmid et al. 03] • NP0, (including,especially) NP1 ... NPn-1 (and|or) NPn
– ‚part-of‘ patterns [Charniak et al. 99]
– ‚causation‘ patterns [Girju 02/03]
for all NPi 1 ≤ i ≤ n isa Hearst (head(NPi ), head(NP0 ))
# HearstPatterns(t1 , t 2 )
isa Hearst (t1 , t 2 ) =
# HearstPatterns(t1 ,*)
• Using the Web:
– [Etzioni et al. 04] • isaHearst(conference,event)=0.44
• isaHearst(conference,body)=0.22
– [Cimiano et al. 04]
• isaHearst(conference,meeting)=0.11
• isaHearst(conference,course)=0.11
• isaHearst(conference,activity)=0.11
57 58
WWW Patterns The Vector-Space Model
Generate patterns: • Idea: collect context information based on the
• <t1>s such as <t2> distributional hypothesis and represent it as a
• such <t1>s as <t2> vector:
• <t1>s, especially <t2>
• <t1>s, including <t2>
• <t2> and other <t2>s die_from suffer_from enjoy eat
• <t2> or other <t2>s
disease X X
and Query the Web using the GoogleAPI: cirrhosis X X
# Patterns(t1 , t 2 ) • compute similarity among vectors
isa WWW (t1 , t 2 ) =
# Patterns(t1 ,*) wrt. to some measure
59 60
16. Clustering Concept Hierarchies from Text Context Extraction
• Observation: ontology engineers need information about • extract syntactic dependencies from text
the effectiveness, efficiency and trade-offs of different ⇒ verb/object, verb/subject, verb/PP relations
approaches ⇒ car: drive_obj, crash_subj, sit_in, …
• LoPar, a trainable statistical left-corner parser:
• Similarity-based
– agglomerative/bottom-up
– divisive/top-down: Bi-Section-KMeans
Parser tgrep Lemmatizer Smoothing
• Set-theoretical
– set operations (inclusion)
– FCA, based on Galois lattices
Lattice
FCA Pruning Weighting
Compaction
[Cimiano et al. 03-04]
61 62
Example Weighting (threshold t)
• People book hotels. The man drove the bike • Conditional: P(n | varg )
along the beach.
⎛ P(n | varg ) ⎞
• Hindle: P (n | varg ) ⋅ log⎜
⎜ P ( n) ⎟ ⎟
book_subj(people)
⎝ ⎠
book_subj(people)
book_obj(hotels) book_obj(hotel)
drive_subj(man) ⎛ P(n | varg ) ⎞
drove_subj(man) • Resnik: S R (varg ) ⋅ P(n | varg ) ⋅ log⎜
⎜ P ( n) ⎟ ⎟
drove_obj(bike) Lemmatization drive_obj(bike) ⎝ ⎠
drove_along(beach) drive_along(beach) ⎛ P(n' | varg ) ⎞
S R (varg ) = ∑ P(n' | varg ) ⋅ log⎜⎜ P ( n' ) ⎟ ⎟
n' ⎝ ⎠
63 64
17. Tourism Formal Context Tourism Lattice
bookable rentable driveable rideable joinable
appartment X X
car X X X
motor-bike X X X X
excursion X X
trip X X
65 66
Concept Hierarchy Agglomerative/Bottom-Up Clustering
bookable
rentable joinable
driveable appartment excursion trip
rideable car
car bus appartment excursion trip
bike
67 68
18. Linkage Strategies Bi-Section-KMeans
• Complete-Linkage:
– consider the two most dissimilar elements of each of the clusters car appartment bus
=> O(n2 log(n)) trip excursion
• Average-Linkage:
– consider the average similarity of the elements in the clusters appartment excursion
=> O(n2 log(n)) car trip
bus
• Single-Linkage:
– consider the two most similar elements of each of the clusters
=> O(n2)
bus car appartment excursion trip
bus car
69 70
Data Sets Results Tourism Domain
• Tourism (118 Mio. tokens):
– http://www.all-in-all.de/english
– http://www.lonelyplanet.com
– British National Corpus (BNC)
– handcrafted tourism ontology (289 concepts)
• Finance (185 Mio. tokens):
– Reuters news from 1987
– GETESS finance ontology (1178 concepts)
71 72
22. Mapping Methods Example Thing
simLabel = 0.0 Vehicle
simSuper = 1.0
• Heuristic and Rule-based methods simInstance = 0.9 1.0
Automobile hasSpecification
simRelation = 0.9
• Graph analysis simAggregation = 0.7 Speed
Object
Marc’s Porsche fast
• Probabilistic approaches 0.7
Vehicle
hasOwner 0.9
• Reasoning, theorem proving
Boat
Owner Car 0.9
• Machine-learning hasSpeed Speed
Marc
Porsche KA-123 250 km/h
85 86
Mapping Methods GLUE: Defining Similarity
A,S
Assoc. Prof Snr. Lecturer ¬A, S
• Heuristic and Rule-based methods
A,¬S
Hypothetical
• Graph analysis Common
Marked up
domain
• Probabilistic approaches
¬A,¬S
• Reasoning, theorem proving P(A ∩ S) P(A,S)
Sim(Assoc. Prof., Snr. Lect.) = =
[Jaccard, 1908] P(A ∪ S) P(A,¬S) + P(A,S) + P(¬A,S)
• Machine-learning Joint Probability Distribution: P(A,S),P(¬A,S),P(A,¬S),P(¬A,¬S)
Multiple Similarity measures in terms of the
87 JPD 88
23. GLUE: No common data instances Machine Learning for computing similarities
¬A,¬S United States ¬A,S A,¬S Australia ¬A,¬S
In practice, not easy to find data tagged with both A S
ontologies !
A S
¬S
¬A
¬S
¬A A,¬S A,S A,S ¬A,S
United States Australia A S
CLA CLS
Solution: Use Machine Learning ¬A ¬S
JPD estimated by counting the sizes of the partitions
89 90
GLUE: Improve Predictive Accuracy – Use Multi-
Strategy Learning GLUE Next Step: Exploit Constraints
Single Classifier cannot exploit all available information • Constraints due to the taxonomy structure
Combine the prediction of multiple classifiers Parents
People Staff
Staff
A
Meta-Learner Staff Fac Acad Tech
CLA1 A Children
¬A Prof Assoc. Prof Asst. Prof Prof Snr. Lect. Lect.
…
A
¬A
CLAN
¬A
• Domain specific constraints
– Department-Chair can only map to a unique concept
Content Learner
Frequencies on different words in the text in the data instances
Name Learner
• Numerous constraints of different types
Words used in the names of concepts in the taxonomy
Extended Relaxation Labeling to ontology matching
Others …
91 92
26. CREAM – Creating Metadata Annotation by Markup
[K-CAP 2001; [K-CAP 2001]
WWW 2002]
Generate
Generate
Class
Class
Instance Download of
Instance
markup-only
version of
Attribute
Attribute OntoMat from
Instance
Instance
DAML
Onto- http://annotation.
Agents semanticweb.org
Relationship Relationship
Instance Instance
101 102
Annotation by Authoring [WWW 2003] Annotation vs. Deep Annotation
Input Annotation Output Ontology
[WWW 2002]
Create Text and Ontology
based-
if possible Links Metadata
out of a Class
Instance
Input Deep Annotation Output
Attribute Instance
Relationship
Ontology
Instance
generates
simple text
Mapping
Rules
103 DB Database 104
DB
28. Current State-of-the-art Semi-automatic Annotation
• ML-based IE (e.g.Amilcare@{OntoMat,MnM}) [EKAW 2002]
– start with hand-annotated training corpus
– rule induction
• Standard IE (MUC)
– handcrafted rules
– Wrappers
• Large-scale IE [SemTag&Seeker@WWW‘03]
EU IST
– Large scale system
Dot-Kom
– disambiguation with TAP
• (C-)Pankow (Cimiano et.al. WWW’04, WWW’05)
• KnowItAll (Etzioni et al. WWW‘04)
109 110
Comparison of CREAM and S-CREAM Different Results
Core processes: Input, Output <hotel> Zwei Linden </hotel> Zwei Linden InstOf Hotel
– (M) Manual Annotation (OntoMat) Relational Metadata Zwei Linden Locatet_At Dobbertin
– (A1) Information Extraction (Amilcare) XML annotated Dokument <city>Dobbertin</city> Dobbertin InstOf City
Zwei Linden Has_Room single_room_1
M <singleroom>Single room</singleroom> single_room1 InstOf Single_Room
single_room1 Has_Rate rate1
Thing rate1 InstOf Rate
<price>25,66</price> rate1 Price 25,66
<hotel> region accommodation <currency>EUR</currency> rate1 Currency EUR
A1 Zwei Linden Located_at Zwei Linden Has_Room double_room1
</hotel>
Document IE
<city>
Dobbertin
? City
Located_at
Hotel <doubleroom>Double room</doubleroom> double_room1 InstOf Double_Room
double_room1 Has_Rate rate2
rate2 InstOf Rate
</city>
Dobbertin Zwei Linden <price>43,66</price> rate2 Price 43,46
<currency>EUR</currency> rate2 Currency EUR
Amilcare (IE-Tool) OntoMat-Annotizer
111 112
29. Comparison of CREAM and S-CREAM IE and Wrapper Learning
Core processes: Input, Output
– (M) Manual Annotation (OntoMat) Relational Metadata • Boosted wrapper induction
– (A1) Information extraction (Amilcare) XML annotated Document
• Exploiting linguistic constraints
M • Hidden Markov models
Thing
• Data mining and IE
<hotel> DR region accommodation • Bootstrapping
Zwei Linden A2 A3 Located_at
• First-order learning
A1 Hotel
</hotel>
Document IE City Hotel
City
<city>
Dobbertin Hotel Located_at
</city> City
Dobbertin Zwei Linden
Currently: Simple Centering-Modell
Future: Learn Coherency Rules 113 114
Wrapper SemTag
No tutorial about IE and Wrapper learning but… • The goal is to add semantic tags to the existing HTML
body of the web.
• IE often focuses on small number of classes • SemTag uses TAP, where TAP is a public broad,
shallow knowledgebase.
• Is not easily adaptable to new domains
• TAP Contains lexical and taxonomical information
• Needs a lot of trainings examples
about popular objects like music, movies, sports, etc.
Needed Example:
“The Chicago Bulls announced that Michael Jordan will…”
Will be:
• It would be great if IE would scale to a large number The <resource ref = http://tap.stanford.edu/Basketball
of classes (concepts) on a large amount of unlabeled Team_Bulls>Chicago Bulls</resource> announced yesterday
data that <resource ref = “http://tap.stanford.edu/
AthleteJordan_Michael”> Michael Jordan</resource> will...’’
115 116
Dill et al, SemTag and Seeker. WWW’03
30. SemTag The Self-Annotating Web
• Lookup of all instances from the ontology (TAP) – 65K
instances
• There is a huge amount of non-formalized
• Disambiguate the occurrences as: knowledge in the Web
– One of those in the taxonomy
– Not present in the taxonomy
• Placing labels in the taxonomy is hard • Use statistics to interpret this non-formalized
• Use bag-of-words approach for disambiguation knowledge and propose formal annotations:
• 3 people evaluated 200 labels in context – agreed on only
68.5% - metonymy
semantics ≈ syntax + statistics?
• Applied on 264 million pages
• Produced 550 million labels and 434 spots
• Accuracy 82% • Annotation by maximal statistical evidence
117 118
Dill et al, SemTag and Seeker. WWW’03
PANKOW: Pattern-based ANnotation through
Knowledge On the Web Patterns (Cont‘d)
• HEARST1: <CONCEPT>s such as <INSTANCE> • DEFINITE1: the <INSTANCE> <CONCEPT>
• HEARST2: such <CONCEPT>s as <INSTANCE> • DEFINITE2: the <CONCEPT> <INSTANCE>
• HEARST3: <CONCEPT>s, (especially/including) <INSTANCE>
• HEARST4: <INSTANCE> (and/or) other <CONCEPT>s • APPOSITION:<INSTANCE>, a <CONCEPT>
• COPULA: <INSTANCE> is a <CONCEPT>
• Examples:
– countries such as Niger • Examples:
– such countries as Niger • the Niger country
– countries, especially Niger
• the country Niger
– countries, including Niger
• Niger, a country in Africa
– Niger and other countries instanceOf(Niger,country) instanceOf(Niger,country)
• Niger is a country in Africa
– Niger or other countries
119 120
31. PANKOW Process Gimme‘ The Context: C-PANKOW
• Contextualize the pattern-matching by taking into
account the similarity of the Google-abstract in which the
pattern was matched and the one to be annotated
• Download a fixed number n of Google-abstracts
matching so-called clues and analyze them linguistically,
matching the patterns offline:
– match more complex structures
– more efficient as the number of Google-queries only depends on n
– more offline processing, reducing network traffic
121 122
Comparison Web-scale information extraction
System # Recall/ Learning Accuracy KnowItAll Idea:
Accuracy
– Web is the largest knowledge base
[MUC-7] 3 >> 90% n.a.
– The goal is to find all instances corresponding to a given concept in the
[Fleischman02] 8 70.4% n.a. web and extract them
PANKOW 59 24.9% 58.91% The System is:
[Hahn98] –TH 325 21% 67%
– Domain-Independent
[Hahn98]-CB 325 26% 73% – Use Bootstrap technique
– Based on Linguistic Patterns
[Hahn98]-CB 325 31% 76%
KnowItAll vs (C-)Pankow
C-PANKOW 682 29.35% 74.37%
- Pankow starts from a Web page and annotates a given term on the
page using the Web
[Alfonseca02] 1200 17.39% 44%
(strict)
- KnowItAll starts from a concept and aims at finding all instances on the
Web
LA based on least common superconcept123 124
O. Etzioni, 2004.
Etzioni,
lcs of two concepts (Hahn et.al. 98)