This document discusses semantic analysis and natural language processing. It describes 7 typical tasks: 1) topic detection, 2) named entity recognition, 3) co-reference resolution and word sense disambiguation, 4) relation extraction, 5) sentiment analysis, 6) social annotation, and 7) text summarization. For each task it provides details on the goal, common techniques used, and examples. The document also discusses publishing content as linked data using semantic vocabularies and ontologies to make it machine-readable and processable.
The document discusses using semantics to improve search engines and information retrieval. It describes some current limitations like recall issues, results being dependent on vocabulary, and content not being machine-readable. It then outlines several key aspects of using semantics: semantic analysis to extract facts from text, using semantic vocabularies as channels to publish linked data, using ontologies for semantic content modeling, and semantic matchmaking for automatic distribution of content. The goal is to move from isolated data silos to a global web of data where objects are linked with typed relationships and explicit semantics.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
Automap is a text-mining tool that enables the extraction of concepts, relationships, and networks from text corpora. It allows users to create semantic networks and meta-networks through both automated and manual coding of texts. The tool generates visualizations of textual networks and statistical analyses of network structures that can provide insights into themes, knowledge structures, and dynamics within texts. While computational models have limitations, validating results against human analysis of sample texts and domain expertise can help improve models and lead to new research insights.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
The document discusses the motivation and goals for the TREC 2015 Dynamic Domain Track. It aims to encourage new research on domain-specific search strategies and systems that can discover, organize, and present relevant information from a subset of the internet related to a user's interests. The track will explore three domains: counterfeit pharmaceuticals, the Ebola crisis, and local politics in the Pacific Northwest. Systems will perform iterative search runs over these domains and be evaluated on how quickly and thoroughly they can return relevant results.
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
Nowadays, successful applications are those which contain features that captivate and engage users. Using an interactive news retrieval system as a use case, in this paper we study the effect of timeline and named-entity components on user engagement. This is in contrast with previous studies where the importance of these components were studied from a retrieval effectiveness point of view. Our experimental results show significant improvements in user engagement when named-entity and timeline components were installed. Further, we investigate if we can predict user-centred metrics through user's interaction with the system. Results show that we can successfully learn a model that predicts all dimensions of user engagement and whether users will like the system or not. These findings might steer systems that apply a more personalised user experience, tailored to the user's preferences.
Forms part of a training course in ontology given in Buffalo in 2009. For details and accompanying video see http://ontology.buffalo.edu/smith/IntroOntology_Course.html
The document discusses using semantics to improve search engines and information retrieval. It describes some current limitations like recall issues, results being dependent on vocabulary, and content not being machine-readable. It then outlines several key aspects of using semantics: semantic analysis to extract facts from text, using semantic vocabularies as channels to publish linked data, using ontologies for semantic content modeling, and semantic matchmaking for automatic distribution of content. The goal is to move from isolated data silos to a global web of data where objects are linked with typed relationships and explicit semantics.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
Automap is a text-mining tool that enables the extraction of concepts, relationships, and networks from text corpora. It allows users to create semantic networks and meta-networks through both automated and manual coding of texts. The tool generates visualizations of textual networks and statistical analyses of network structures that can provide insights into themes, knowledge structures, and dynamics within texts. While computational models have limitations, validating results against human analysis of sample texts and domain expertise can help improve models and lead to new research insights.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
The document discusses the motivation and goals for the TREC 2015 Dynamic Domain Track. It aims to encourage new research on domain-specific search strategies and systems that can discover, organize, and present relevant information from a subset of the internet related to a user's interests. The track will explore three domains: counterfeit pharmaceuticals, the Ebola crisis, and local politics in the Pacific Northwest. Systems will perform iterative search runs over these domains and be evaluated on how quickly and thoroughly they can return relevant results.
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
Nowadays, successful applications are those which contain features that captivate and engage users. Using an interactive news retrieval system as a use case, in this paper we study the effect of timeline and named-entity components on user engagement. This is in contrast with previous studies where the importance of these components were studied from a retrieval effectiveness point of view. Our experimental results show significant improvements in user engagement when named-entity and timeline components were installed. Further, we investigate if we can predict user-centred metrics through user's interaction with the system. Results show that we can successfully learn a model that predicts all dimensions of user engagement and whether users will like the system or not. These findings might steer systems that apply a more personalised user experience, tailored to the user's preferences.
Forms part of a training course in ontology given in Buffalo in 2009. For details and accompanying video see http://ontology.buffalo.edu/smith/IntroOntology_Course.html
Global Media Monitoring presented through several systems for collecting, extracting and enriching data, forming and exploring events across languages in real-time - ...resulting in the system Event Registry (http://eventregistry.org/)
The document discusses the Semantic Web and ontologies. It begins by explaining that the current web was built for humans, not machines, and the Semantic Web aims to make data machine-readable to allow computers to work on our behalf. It then discusses ontologies as a formal specification of a shared conceptualization of a domain that introduces vocabulary and specifies meaning. Finally, it provides examples of how the Semantic Web could enable applications and intelligent agents to interact with distributed data sources.
This document provides an overview of hypertext and intertext in reading and writing. It defines hypertext as non-linear text that uses links to allow readers to navigate between related pieces of information and create their own understanding. Hypertext is made possible by technologies like the World Wide Web and allows for multimedia integration. Intertext refers to the relationships between texts and how a text's meaning depends on its context. Reading and writing involves understanding intertextual connections and how authors develop arguments using evidence from other sources.
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Marko Grobelnik
At the HrTAL2016 conference I presented the talk on "Language as a Social Sensor to operate with Knowledge". The talk included a section on language as an interface between physical nature and the world of human mind and human society. The role of language as a 'sensor'has several consequences in uncertainties and inexactness of the language evolution, as we know it. The talk was accompanies with several live demonstrations of the systems on semantic annotation (wikifier.org) and media monitoring (eventregistry.org).
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
This document summarizes an exploratory study on mining query logs for actionable intelligence. It discusses query logs as a genre and how they can be analyzed to extract useful metadata like frequently searched terms that can then be used to automatically tag and categorize documents. Analyzing the top queries from an enterprise search log, keywords were identified that could help structure the search experience and improve search quality and user satisfaction, providing actionable intelligence for business decisions.
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPijnlc
Webpages are loaded with vast and different kinds of information about the entities in the real-world.
Information retrieval from the Web is of a greater significance today to get the accurate queried data
within the desired time frame which is increasingly becoming difficult with each passing day. Need to
develop a system to solve entity extraction problems from the web as compared to the traditional system.
It’s critical to have a clear understanding of natural language sentence processing in the web page along
with the structure of the web page to get the correct information with speed. Here we have proposed
approach for information retrieval technique for Web using NLP where techniques Hierarchical
Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-
CRF) along with Visual Page Segmentation is used to get the accurate results. Also parallel processing is
used to achieve the results in desired time frame. It further improves the decision making between HCRF
and Semi-CRF by using bidirectional approach rather than top-down approach. It enables better
understanding of the content and page structure.
An Introduction to Information Retrieval and Applicationssathish sak
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
Proposals are *required* for each team, and will be counted in the score
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
This presentation examines features and benefits in Microsoft Office SharePoint Server (MOSS) 2007 enteprise search. It contains configuration guidance, code snippets, tips and tricks.
(1) The document discusses the Semantic Web, ontologies, and ontology learning. It defines the Semantic Web as an extension of the current web that gives information well-defined meaning. (2) Ontologies are formal specifications of concepts and relations that provide shared meanings between machines and humans. (3) Ontology learning is the automatic or semi-automatic process of extracting ontological concepts and relations from text to build or enrich ontologies. The document outlines methods for ontology learning and its applications.
This document provides guidance on how to conduct research. It outlines the key steps in the research process including planning, locating and gathering information, selecting and recording relevant information, checking back that all requirements have been met, and presenting and evaluating the research. It emphasizes planning the research before beginning, using available search tools and databases effectively, developing an awareness of information quality, and properly citing sources to avoid plagiarism.
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
This document describes how Datanizing uses natural language processing and machine learning techniques to derive data-driven insights from large amounts of user-generated content. They automatically gather and clean relevant content, rank it by relevance, and calculate key insights. This includes detecting topics, creating data-driven personas, understanding semantic context, and predicting changing interests in real-time to help customers better align marketing messages. The process involves text analysis, word embeddings, topic modeling, and classification techniques.
How to model digital objects within the semantic webAngelica Lo Duca
These slides describe the general concept of semantic Web and Linked Data, then they illustrate the concept of digital object. Finally they give a use case.
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014Marko Grobelnik
The document discusses the goal of natural language processing moving beyond "processing" text to truly understanding text through the use of world models. It provides an example of a top-down world model called Cyc and how such a model can be used to disambiguate text and allow for reasoning beyond the information directly stated. Micro-reading is discussed as understanding individual documents through extracting logical assertions and allowing for inference, with examples provided. Finally, some of the key challenges for artificial intelligence in developing comprehensive world models and achieving true understanding are outlined.
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
• The Semantic Web is a distributed, flexible modeling framework.
• The Semantic Web is primarily descriptive in nature. The Semantic Web is used to describe web-pages, services, systems, etc.
• Neno is an object-oriented language that was designed specifically for the Semantic Web.
• Fhat is a virtual machine represented in the Semantic Web.
• With Neno/Fhat the Semantic Web now has a procedural component. The Semantic Web now includes object methods, algorithms, and computing machines.
• The Semantic Web can be made to behave like a distributed, general-purpose computer. Not just an information repository.
This document provides an overview of Linked Open Data and examples of applications that use LOD. It defines key concepts like Linked Data and the 5 star system. It discusses where to find LOD datasets and how to access and work with remote LOD through HTTP requests, SPARQL queries, or downloading datasets. Examples are given of LOD browsers, mashups, and intelligent web applications that have been built using LOD. The presenter's own projects involving cultural heritage data, history texts, and international aid data are described. Overall the document aims to illustrate the benefits and potential of LOD for developers and provide inspiration for new applications.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
This document discusses various techniques for optimizing search, including:
- Using autocomplete and disambiguation to improve destination search results at Booking.com
- Scoring documents based on term frequency, inverse document frequency, record popularity, language matches, and other factors
- Normalizing scores to range from 1-1.4 to better combine different scoring components
- Storing additional field information in payloads to simulate dismax queries over a single field
- Different approaches for combining query terms matched in multiple fields to find documents where terms are near each other
Este documento proporciona un directorio de descuentos para rubros como automóvil, hogar y familia en Mexicali, Baja California. Incluye más de 50 negocios con sus nombres, actividades, direcciones, teléfonos y los porcentajes de descuento que ofrecen, los cuales van del 10% al 50% dependiendo del negocio y servicio.
The pH Conductivity Meters are multi parameter micro processor based highly accurate pH and conductivity meters these type of meters are best when, more than one parameter in the given solution require to be checked and recorded.For More Information Please Logon http://goo.gl/VEvLke
Este documento describe las aplicaciones para crear y difundir contenidos en línea como páginas web, blogs y wikis. Estas herramientas permiten alojar, gestionar y compartir contenidos y posibilitan la comunicación bidireccional. Se definen principalmente por su capacidad de creación y edición de contenido, su relación con la World Wide Web y el grado de participación que permiten.
Global Media Monitoring presented through several systems for collecting, extracting and enriching data, forming and exploring events across languages in real-time - ...resulting in the system Event Registry (http://eventregistry.org/)
The document discusses the Semantic Web and ontologies. It begins by explaining that the current web was built for humans, not machines, and the Semantic Web aims to make data machine-readable to allow computers to work on our behalf. It then discusses ontologies as a formal specification of a shared conceptualization of a domain that introduces vocabulary and specifies meaning. Finally, it provides examples of how the Semantic Web could enable applications and intelligent agents to interact with distributed data sources.
This document provides an overview of hypertext and intertext in reading and writing. It defines hypertext as non-linear text that uses links to allow readers to navigate between related pieces of information and create their own understanding. Hypertext is made possible by technologies like the World Wide Web and allows for multimedia integration. Intertext refers to the relationships between texts and how a text's meaning depends on its context. Reading and writing involves understanding intertextual connections and how authors develop arguments using evidence from other sources.
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Marko Grobelnik
At the HrTAL2016 conference I presented the talk on "Language as a Social Sensor to operate with Knowledge". The talk included a section on language as an interface between physical nature and the world of human mind and human society. The role of language as a 'sensor'has several consequences in uncertainties and inexactness of the language evolution, as we know it. The talk was accompanies with several live demonstrations of the systems on semantic annotation (wikifier.org) and media monitoring (eventregistry.org).
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
This document summarizes an exploratory study on mining query logs for actionable intelligence. It discusses query logs as a genre and how they can be analyzed to extract useful metadata like frequently searched terms that can then be used to automatically tag and categorize documents. Analyzing the top queries from an enterprise search log, keywords were identified that could help structure the search experience and improve search quality and user satisfaction, providing actionable intelligence for business decisions.
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPijnlc
Webpages are loaded with vast and different kinds of information about the entities in the real-world.
Information retrieval from the Web is of a greater significance today to get the accurate queried data
within the desired time frame which is increasingly becoming difficult with each passing day. Need to
develop a system to solve entity extraction problems from the web as compared to the traditional system.
It’s critical to have a clear understanding of natural language sentence processing in the web page along
with the structure of the web page to get the correct information with speed. Here we have proposed
approach for information retrieval technique for Web using NLP where techniques Hierarchical
Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-
CRF) along with Visual Page Segmentation is used to get the accurate results. Also parallel processing is
used to achieve the results in desired time frame. It further improves the decision making between HCRF
and Semi-CRF by using bidirectional approach rather than top-down approach. It enables better
understanding of the content and page structure.
An Introduction to Information Retrieval and Applicationssathish sak
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
Proposals are *required* for each team, and will be counted in the score
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
This presentation examines features and benefits in Microsoft Office SharePoint Server (MOSS) 2007 enteprise search. It contains configuration guidance, code snippets, tips and tricks.
(1) The document discusses the Semantic Web, ontologies, and ontology learning. It defines the Semantic Web as an extension of the current web that gives information well-defined meaning. (2) Ontologies are formal specifications of concepts and relations that provide shared meanings between machines and humans. (3) Ontology learning is the automatic or semi-automatic process of extracting ontological concepts and relations from text to build or enrich ontologies. The document outlines methods for ontology learning and its applications.
This document provides guidance on how to conduct research. It outlines the key steps in the research process including planning, locating and gathering information, selecting and recording relevant information, checking back that all requirements have been met, and presenting and evaluating the research. It emphasizes planning the research before beginning, using available search tools and databases effectively, developing an awareness of information quality, and properly citing sources to avoid plagiarism.
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
This document describes how Datanizing uses natural language processing and machine learning techniques to derive data-driven insights from large amounts of user-generated content. They automatically gather and clean relevant content, rank it by relevance, and calculate key insights. This includes detecting topics, creating data-driven personas, understanding semantic context, and predicting changing interests in real-time to help customers better align marketing messages. The process involves text analysis, word embeddings, topic modeling, and classification techniques.
How to model digital objects within the semantic webAngelica Lo Duca
These slides describe the general concept of semantic Web and Linked Data, then they illustrate the concept of digital object. Finally they give a use case.
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014Marko Grobelnik
The document discusses the goal of natural language processing moving beyond "processing" text to truly understanding text through the use of world models. It provides an example of a top-down world model called Cyc and how such a model can be used to disambiguate text and allow for reasoning beyond the information directly stated. Micro-reading is discussed as understanding individual documents through extracting logical assertions and allowing for inference, with examples provided. Finally, some of the key challenges for artificial intelligence in developing comprehensive world models and achieving true understanding are outlined.
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
• The Semantic Web is a distributed, flexible modeling framework.
• The Semantic Web is primarily descriptive in nature. The Semantic Web is used to describe web-pages, services, systems, etc.
• Neno is an object-oriented language that was designed specifically for the Semantic Web.
• Fhat is a virtual machine represented in the Semantic Web.
• With Neno/Fhat the Semantic Web now has a procedural component. The Semantic Web now includes object methods, algorithms, and computing machines.
• The Semantic Web can be made to behave like a distributed, general-purpose computer. Not just an information repository.
This document provides an overview of Linked Open Data and examples of applications that use LOD. It defines key concepts like Linked Data and the 5 star system. It discusses where to find LOD datasets and how to access and work with remote LOD through HTTP requests, SPARQL queries, or downloading datasets. Examples are given of LOD browsers, mashups, and intelligent web applications that have been built using LOD. The presenter's own projects involving cultural heritage data, history texts, and international aid data are described. Overall the document aims to illustrate the benefits and potential of LOD for developers and provide inspiration for new applications.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
This document discusses various techniques for optimizing search, including:
- Using autocomplete and disambiguation to improve destination search results at Booking.com
- Scoring documents based on term frequency, inverse document frequency, record popularity, language matches, and other factors
- Normalizing scores to range from 1-1.4 to better combine different scoring components
- Storing additional field information in payloads to simulate dismax queries over a single field
- Different approaches for combining query terms matched in multiple fields to find documents where terms are near each other
Este documento proporciona un directorio de descuentos para rubros como automóvil, hogar y familia en Mexicali, Baja California. Incluye más de 50 negocios con sus nombres, actividades, direcciones, teléfonos y los porcentajes de descuento que ofrecen, los cuales van del 10% al 50% dependiendo del negocio y servicio.
The pH Conductivity Meters are multi parameter micro processor based highly accurate pH and conductivity meters these type of meters are best when, more than one parameter in the given solution require to be checked and recorded.For More Information Please Logon http://goo.gl/VEvLke
Este documento describe las aplicaciones para crear y difundir contenidos en línea como páginas web, blogs y wikis. Estas herramientas permiten alojar, gestionar y compartir contenidos y posibilitan la comunicación bidireccional. Se definen principalmente por su capacidad de creación y edición de contenido, su relación con la World Wide Web y el grado de participación que permiten.
El documento define el turismo cultural como aquel relacionado con el patrimonio y las culturas tradicionales como monumentos y sitios histórico-artísticos. También señala que los turistas ahora exigen mayor calidad de servicios y eligen destinos que preserven el medio ambiente y la cultura local. Además, describe el turismo urbano en Buenos Aires, destacando su patrimonio histórico, diversidad cultural y status como capital cultural de Latinoamérica.
Este documento trata sobre el mantenimiento industrial. Explica los diferentes tipos de mantenimiento como el correctivo, preventivo, predictivo y de mantenimiento productivo total. También describe conceptos clave como la fiabilidad, mantenibilidad y disponibilidad. Finalmente, cubre temas como la planificación del mantenimiento industrial y métodos de control del estado como la monitorización de vibraciones y temperatura para la detección de fallas.
The document is a newsletter from the IWA Specialist Group on Wetland Systems for Water Pollution Control dated June 2014. It provides information on:
- Recent membership trends which show membership between 450-550 and is well distributed globally.
- Updates to the group's constitution to standardize the conference proposal process and implement electronic voting for future conference sites.
- An upcoming election for new Regional Coordinators according to the new constitution.
- A proposal for a new IWA Task Group on "Mainstreaming the use of treatment wetlands".
- Upcoming events for the group including their biennial conference in Shanghai in October 2014.
Habitação colectiva na áfrica lusófona – análise de dois casos de estudoLara Plácido
1) O documento discute a arquitetura moderna e habitação coletiva na África Lusófona, com foco em Angola e Moçambique.
2) A arquitetura moderna começou a se desenvolver nessas regiões nas décadas de 1940-1970, impulsionada pelo desenvolvimento colonial e ideias do Movimento Moderno. Arquitetos como Vasco Vieira da Costa projetaram habitações que se adaptavam ao clima quente e úmido.
3) Dois casos de estudo são analisados: um edifício
El documento describe la infraestructura y servicios que ofrece CEDIA para conectividad, repositorios, proyectos, capacitación, colaboración y eventos. Incluye la Red Avanzada de CEDIA, Internet, Eduroam, federaciones, intercambio VoIP, oficina virtual, clúster, nube, grabación de eventos, campus virtual, entre otros. El objetivo es apoyar la investigación, enseñanza y colaboración entre instituciones miembros.
This curriculum vitae summarizes the career and qualifications of Dr. Hava Karsenty Avraham. It lists her education, including a B.S. in Biology from Hebrew University in 1977 and a Ph.D. in General & Tumor Immunology from Hebrew University in 1982. It then details her various faculty appointments at institutions including Harvard Medical School and Northeastern University. The CV also outlines her grant funding, publications, teaching experience, and honors received throughout her career in immunology and cancer research.
Google Analytics Konferenz 2016: Dashboards mit Google Analytics und Excel me...e-dialog GmbH
Wie Dashboards mit den Analytics Boardmitteln eingerichtet werden können und darüberhinaus in MS Excel durch Plugins automatiisert (und gepimpt) werden können.
El documento habla sobre las protestas recientes de maestros de la Coordinadora Nacional de Trabajadores de la Educación (CNTE) contra la reforma educativa en México. Los maestros bloquearon el acceso al Congreso para impedir la aprobación de leyes secundarias de la reforma. Hubo actos de violencia durante las protestas, aunque los maestros niegan su responsabilidad y afirman que fueron provocados por personas infiltradas. El día de hoy, los maestros continúan bloqueando los accesos al Congreso.
El currículum vitae presenta los datos personales y la educación de Johanna Lizbeth Sánchez Samayoa, quien estudió contaduría en el Instituto Nacional Texistepeque y la Escuela José Alejandro Cabrera. Realizó prácticas como auxiliar contable en un despacho y una mueblería, donde obtuvo referencias laborales de sus jefes. También proporciona referencias personales.
Este documento presenta una lista de 8 categorías estadísticas sobre las provincias españolas: 1) las más envejecidas, 2) las más jóvenes, 3) las con más parados, 4) las que más exportan, 5) las donde se conceden más hipotecas, 6) las con más afiliados a la Seguridad Social, 7) las que más dinero gastan, y 8) las que más dinero ingresan. Para cada categoría, se indica la fuente de datos y el año de la información estadística.
Este documento clasifica tres tipos principales de animales: 1) los vertebrados, que tienen huesos, 2) los carnívoros, que comen carne, y 3) los invertebrados, que no tienen huesos y se arrastran.
El documento habla sobre Charles Darwin y su teoría de la evolución de las especies. Explica que Darwin llegó a la conclusión de que las especies se originan a partir de una especie ancestral a través de un proceso de selección natural. Aunque sus ideas evolutivas revolucionarias no fueron bien recibidas inicialmente, hoy constituyen la base del neodarwinismo, la teoría más aceptada para explicar el origen y la evolución de las especies. El documento también anuncia una exposición sobre Darwin que se llevará a cabo en
Indo-Japan Trade and Investment Highlights:
India and Japan to work IT together
Japanese firm Looking to Strengthen Aviation Defence Presence in India
Omori to Acquire Majority Stake in Multi Pack Systems
Return of Mazda in India
Currency Swap to Control Falling Rupee and Improving Financial Ties with Japan
Suzuki to make India Hub for Export of Left Wheel Drive Swift Models to other Emerging Markets
India and Japan Launch Joint Research Programs in Applied Science
The Anime Bond
Knowledge Center: Overview of Indian Labour Law
This document provides an overview of the elementary curriculum for grades 3-5 at Yeil Christian International School in South Korea. It outlines the core subjects of reading, English/language arts, math, science, and social studies. For third grade, the curriculum focuses on developing reading fluency and comprehension, grammar, spelling, and writing skills. Students will read from the Not So Very Long Ago textbook series and the English 3 textbook. The curriculum integrates fundamentalist Christian views and aims to encourage biblical worldviews, though these views may not reflect all students, faculty or administration.
The eleventh international world wide web conference will be held in Honolulu, Hawaii, USA from May 7-11. Participants from over 20 countries will register to attend the prestigious event, which will see Tim Berners-Lee and Ian Foster speak.
Digital Humanities is a term that elicits both excitement and scorn in scholarly circles, and there is still a great deal of discussion as to whether it is a field of inquiry, a set of research methods, or simply a new perspective on arts and humanities research. This workshop will provide a brief survey of how the evolving theory and practice of using contemporary technology and technology-assisted research methods are impacting scholarship in the arts and humanities.
This document discusses text mining and its applications. It begins by defining text mining as sifting through vast collections of unstructured data beyond what traditional data mining tools can handle. It then covers various text mining tasks like information extraction, topic detection, summarization, and opinion mining. Applications discussed include tech mining, expert finders, question answering systems, and sentiment analysis of online reviews and social media. The document concludes by noting the increasing importance and opportunities of text mining as more data is generated.
Slides of the course on big data by C. Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.
-> Definition of text mining, the main categories of tools available (such as topic categorization or sentiment analysis) and their use for business.
This document summarizes a PhD case study on arguments about deleting Wikipedia articles. The study had the following key findings:
1) Article creators often misunderstand deletion policies and express strong emotions when their articles are deleted. This can demotivate newcomers from continuing to contribute.
2) Novices make arguments for keeping articles that are more problematic than experts' arguments, such as appeals to personal preference rather than notability.
3) Discussions without a clear consensus are more time-consuming, difficult to judge, and likely to cause emotional upset if the decision is later overturned.
4) Identifying clear criteria for article deletion, such as notability, sources, bias, and maintenance, could help
This document provides a SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis of open-source software (OSS). It notes that while OSS has strengths like low costs and a large community, its biggest challenge is supporting software without strict governance rules. OSS also faces threats from established institutions resistant to change and past perceptions that liken it to outdated systems. The document advocates for "next generation" library catalogs that go beyond just finding information to helping users understand content through services like analyzing word frequencies, phrases and numeric metadata.
This document provides guidance for students analyzing data and writing an academic report on learning a new literacy through producing a digital media artifact. It discusses options for emergent or theory-guided data analysis and coding. It also offers tips for structuring the report, discussing findings, and integrating literature. Students are asked to examine how patterns in their data relate to concepts like affinity spaces, participatory culture, and distributed expertise.
Beyond document retrieval using semantic annotations Roi Blanco
Traditional information retrieval approaches deal with retrieving full-text document as a response to a user's query. However, applications that go beyond the "ten blue links" and make use of additional information to display and interact with search results are becoming increasingly popular and adopted by all major search engines. In addition, recent advances in text extraction allow for inferring semantic information over particular items present in textual documents. This talks presents how enhancing a document with structures derived from shallow parsing is able to convey a different user experience in search and browsing scenarios, and what challenges we face as a consequence.
This document discusses opinion mining for social media. It provides an introduction to opinion mining and sentiment analysis, and discusses some of the challenges involved in performing opinion mining on social media data, including short sentences, incorrect language, and topic divergence. The document then outlines the Arcomem research project, which aims to perform opinion mining on social media to analyze opinions about events over time. It describes the project's entity, topic and opinion extraction workflow and some of the main research directions.
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionSTLab
The vision of the Semantic Web is to populate the web with machine understandable information so that artificial intelligences (AI) can use it as background knowledge for assisting humans in performing a significant number of their daily tasks. Research in this field produced a standardised knowledge representation format (namely, linked data) and huge amount of machine-readable data available on the web (namely, the web of data), mostly derived from structured data (typically databases) or semi-structured data (e.g. Wikipedia infoboxes). However, most of the web consists of natural language text containing valuable knowledge for enriching the web of data. Hence, a main challenge is to extract as much relevant knowledge as possible from this content, and publish them in the form of linked data. Open Information Extraction (OIE) has been developed recently as an approach to extract information from unstructured data, mostly of a textual nature.
However, the information extracted is typically in the form of triples of strings (subject, relational phrase, object). OIE approaches are useful but insufficient alone for populating the web with machine readable information as their results are not directly linkable to, and immediately reusable from, other linked data sources. In this seminar, after giving a brief introduction to background concepts and notions, I will describe a work that proposes a novel Open Knowledge Extraction approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable information. In particular I will discuss an approach based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of such semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as linked data and ontology axioms. Experimental evaluations conducted with the help of crowdsourcing confirm this hypothesis showing very high performances. A demo of Open Knowledge Extraction at http://wit.istc.cnr.it/stlab-tools/legalo.
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
The document discusses how text analytics can fuel semantic search and sensemaking by extracting features from documents, analyzing relationships between entities, and integrating search with other data sources. It outlines trends toward more unified search platforms that incorporate user context and infer intent to provide categorized, clustered results rather than just hit lists. The goal is for search to be the starting point for iterative sensemaking through analysis and synthesis of information.
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
Finding Knowledge: Assessing Knowledge in the Age of SearchSimon Knight
This document discusses how epistemology relates to allowing internet access during exams. It notes that Denmark allows internet access during exams to test problem-solving and analysis skills. While no communication sites are allowed, the policy is based on epistemological claims about the nature of knowledge. It also discusses the risks of injustice and content holes when relying on internet searches due to filter bubbles, bias, and a lack of opposing views. Throughout, it emphasizes that both search tools and students need to consider multiple perspectives and the assumptions underlying information to gain an accurate understanding.
This document provides step-by-step instructions for conducting academic library research. It outlines choosing a topic and keywords, constructing a search strategy, choosing appropriate research tools like books, articles, primary sources, and datasets, running searches and evaluating results. Key tips include using synonyms, limiting or expanding search terms, combining terms with "and" or "or", trying different databases and subject headings, and getting full text or requesting items through interlibrary loan when not available locally.
This document provides an overview of how to analyze content using NVivo software. It discusses uploading documents to NVivo, coding the content into categories or nodes, and using tools like word frequency queries, text searches, and visualization to analyze relationships within and across texts. The goal of content analysis is to make inferences about messages, authors, audiences, and cultural contexts by systematically coding and examining texts.
Discovery and the Age of Insight: Walmart EIM Open House 2013Joe Lamantia
Discovery is the most important business capability in the emerging Age of Insight - it's the missing ingredient that makes Big Data a source of value for businesses and people.
The Language of Discovery is an essential tool for providing discovery capability, whether at the scale of designing a single discovery application, determining the value proposition of a new product or service, or managing a strategic portfolio of technology and business initiatives.
This presentation outlines the Age of Insight, and suggests deep structural and historic precedents visible in the Age of Reason, especially in the central parallels between Natural Philosophy and the emerging discipline of Data Science. We then review the language of discovery, and consider widely visible examples of products and services that demonstrate the language.
We review our own usage of the framework as an analytical and generative toolkit for providing discovery capability, and share best practices for employing this perspective across a variety of levels of need.
This document provides guidance for teachers on producing research by examining their experiences learning a new literacy practice. It outlines approaches to analyzing collected data, including emergent and theory-guided analysis. Coding data and relating findings back to literature on key concepts are discussed. Tips are provided on writing up results, including structuring the report, using quotes, and maintaining an academic style.
HyperMembrane Structures for Open Source Cognitive ComputingJack Park
Open source "cognitive computing" systems, specifically OpenSherlock; describes a HyperMembrane structure, a kind of information fabric, for machine reading, literature-based discovery, deep question answering. Platform is open source, uses ElasticSearch, topic maps, JSON, link-grammar parsing, and qualitative process models.
This document provides guidance for students on analyzing data and writing an academic report on producing a digital media artifact related to new literacy concepts. It discusses emergent and theory-guided approaches to data analysis, coding data, relating findings to literature, and structuring the report. Writing tips are provided, such as integrating quotes, using academic language, and focusing the presentation on interesting dimensions rather than retelling the whole study.
Unister is an e-business company founded in 2002 based in Leipzig, Germany. It operates several online travel portals and has over 1670 employees with more than 450 in IT. Some of its major travel portals include Ab-in-den-urlaub.de, Travel24.com, Fluege.de, Reisen.de, and Urlaubtours.de, which allow users to book flights, hotels, vacation packages and more. However, its Fluege.de portal has been controversial due to a lack of transparency in displaying flight prices and bundling additional services into packages without clear disclosure.
Twoo.com is a social networking website launched in 2011 that helps users meet new people. It has over 12 million monthly active users and is available in 38 languages across 200 countries. The website allows users to create a profile, view profiles of compatible users nearby, and like or pass on other profiles. If two users like each other mutually, they are connected. While the basic features are free, premium subscriptions provide additional visibility and perks. The website generates revenue from ads and premium subscriptions for credits and unlimited access.
Twibes allows users to organize tweets by topic into interest groups called Twibes. Users can join up to 10 existing Twibes or create 3 of their own. To use Twibes, one searches for and joins a relevant Twibe by tweeting about its topic or keywords. Twibes collects tweets from its members on similar topics and adds them to the user's Twitter lists for easier viewing, but the tool has some functionality and search issues.
TweetDeck is a free web, desktop, and Chrome app that allows users to manage multiple Twitter and Facebook communication streams. It features include URL shortening, photo uploading, content filtering, unlimited account management, and posting to multiple accounts simultaneously. The app organizes social media content into customizable columns for timelines, searches, direct messages, trends, and scheduled posts. While it does not provide statistics or advanced posting capabilities, TweetDeck offers a well-designed interface for viewing multiple social media streams at once.
The document presents an ontology for modeling information from the Tourismusverband Innsbruck's (TVb) online channels. It describes the analysis conducted of TVb's website and Facebook page to identify relevant concepts to include in the ontology. The ontology models places, events, organizations, trips, creative works, and people. It provides subclasses and examples for each concept. Specifically, it details the event concept and subclasses such as business, children's, comedy, and dance events. The ontology aims to semantically represent TVb's online content to facilitate search and dissemination across different channels.
This document discusses mapping content from the feratel media technologies company to schema.org for annotations on the Tourismusverband Innsbruck (TVb) website. It identifies relevant schema.org terms for concepts like hotels, restaurants, and events. It describes implementing the mappings by modifying the HTML template used to integrate feratel content, but notes limitations from not being able to distinguish content types. The document concludes that a cleaner solution would be to have annotations inserted at the source in feratel through a new feratel plugin.
1. The document evaluates the impact of adding semantic annotations to part of the Tourismusverband Innsbruck (TVb) website.
2. Key metrics like total visitors, new vs returning visitors, and visitors from search engines like Google and Bing were analyzed for periods before and after annotations were deployed on January 20, 2014.
3. The results show increases in total visitors of 8.63% and visitors from top search engines after annotations, indicating annotations may help increase website visibility. However, longer term analysis is needed to draw definitive conclusions.
This document outlines publication rules for disseminating content from the Tourismusverband Innsbruck (TVb) website to various social media channels. It analyzes the types of content published on the TVb Facebook, Twitter, YouTube, and blog pages. Based on this analysis, it defines mappings of content categories from the TVb website to different channels. It also describes workflows between channels and necessary content transformations for each category and channel. The goal is to automatically publish relevant content from the TVb website to social media channels based on these defined rules.
This document presents an implementation of content and ontology mappings for a tourism organization using object models. It describes 10 main object models that were created based on Schema.org vocabulary. Relationships between the objects are explained through figures. The current implementation extracts RDF triples from HTML and stores them in a triple store. Objects are then instantiated from the triples, though complex associations are ignored. Using an RDF framework called RDFBeans is proposed to fully represent the RDF graph through instantiated objects. Discussion points include keeping Schema.org structures where possible and a new idea to use an RDF reasoner instead of rules.
The document summarizes mappings between content from the Tourismusverband Innsbruck's (TVB) social media channels (Facebook, Twitter, YouTube) and the TVB ontology. It analyzed each channel over the past 6 months to identify common content types (e.g. hotels, events, places of interest), which were then mapped to classes and properties in the TVB ontology using schema.org vocabulary. Tables in sections 2-4 provide examples of mappings for different content types from each individual social channel.
The document summarizes the website ttr.tirol.at, an information portal on tourism in Tirol, Austria. The portal provides statistical, business, and strategic information on tourism to interested parties. It is managed by Tirol Werbung, the Tirol promotion agency, and MCI III. Visitors can access key tourism facts and statistics for Tirol and Austria, data on important visitor markets, and information on marketing, research, and innovation in tourism. While registration is required, the information may not be used without permission of the site operators. The site was launched in 2009 and relaunched in 2012.
TrustYou is a data and technology company that collects, analyzes, and provides access to social media content like online reviews and posts about hotels, accommodations, and restaurants. It offers tools for social media monitoring and semantic search across multiple languages and industries. Specifically, it provides widgets and software to monitor brand reputation, analyze customer sentiment, and integrate user reviews on websites.
This document provides information about the website www.sti-innsbruck.at, a travel guide website founded in 2008 in Vienna, Austria. It has 15 employees and provides crowd-sourced travel information from its community as well as curated guides. It earns money through advertising and selling premium guides available on its mobile apps. The website also features augmented reality and allows users to access travel information offline.
Tripbirds is a hotel booking site that aggregates data from social networks like Facebook and Instagram. It shows users if any of their Facebook friends have checked into a place they are searching for. The site also displays photos from Instagram and reviews from booking.com for each hotel. Tripbirds gets a 10% commission from booking.com for any bookings made through the site. It is based in Stockholm and was founded in 2011, receiving $750,000 in venture funding.
Traveltainment provides software solutions for the travel industry to make booking leisure travel easier. It has offices in several European countries and is part of the Amadeus Leisure Group. Its solutions are integrated with major travel portals and agencies in over 30 countries, allowing booking of hotels, reviews of hotels, revenue forecasts, and price comparisons. It offers booking engines, review tools, and other services to internet portals, travel agencies, call centers, and tour operators.
Travel Audience is a travel marketing company founded in 2011 that provides context-sensitive advertisements on popular travel portals in Germany, Austria and Switzerland, reaching over 14 million people per month. They work with over 50 advertiser customers and high-quality publisher partners to place targeted ads for products like hotels, flights, and activities based on location, theme, and customer type. Advertisers pay only for clicks with no setup fees. The company aims to distribute ads across multiple travel platforms but does not explicitly mention social media dissemination.
This presentation by Juraj Čorba, Chair of OECD Working Party on Artificial Intelligence Governance (AIGO), was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “Pro-competitive Industrial Policy” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/pcip.
This presentation was uploaded with the author’s consent.
This presentation by Katharine Kemp, Associate Professor at the Faculty of Law & Justice at UNSW Sydney, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by Yong Lim, Professor of Economic Law at Seoul National University School of Law, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by Tim Capel, Director of the UK Information Commissioner’s Office Legal Service, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
Carrer goals.pptx and their importance in real lifeartemacademy2
Career goals serve as a roadmap for individuals, guiding them toward achieving long-term professional aspirations and personal fulfillment. Establishing clear career goals enables professionals to focus their efforts on developing specific skills, gaining relevant experience, and making strategic decisions that align with their desired career trajectory. By setting both short-term and long-term objectives, individuals can systematically track their progress, make necessary adjustments, and stay motivated. Short-term goals often include acquiring new qualifications, mastering particular competencies, or securing a specific role, while long-term goals might encompass reaching executive positions, becoming industry experts, or launching entrepreneurial ventures.
Moreover, having well-defined career goals fosters a sense of purpose and direction, enhancing job satisfaction and overall productivity. It encourages continuous learning and adaptation, as professionals remain attuned to industry trends and evolving job market demands. Career goals also facilitate better time management and resource allocation, as individuals prioritize tasks and opportunities that advance their professional growth. In addition, articulating career goals can aid in networking and mentorship, as it allows individuals to communicate their aspirations clearly to potential mentors, colleagues, and employers, thereby opening doors to valuable guidance and support. Ultimately, career goals are integral to personal and professional development, driving individuals toward sustained success and fulfillment in their chosen fields.
This presentation by Thibault Schrepel, Associate Professor of Law at Vrije Universiteit Amsterdam University, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij
This is a workshop about communication and collaboration. We will experience how we can analyze the reasons for resistance to change (exercise 1) and practice how to improve our conversation style and be more in control and effective in the way we communicate (exercise 2).
This session will use Dave Gray’s Empathy Mapping, Argyris’ Ladder of Inference and The Four Rs from Agile Conversations (Squirrel and Fredrick).
Abstract:
Let’s talk about powerful conversations! We all know how to lead a constructive conversation, right? Then why is it so difficult to have those conversations with people at work, especially those in powerful positions that show resistance to change?
Learning to control and direct conversations takes understanding and practice.
We can combine our innate empathy with our analytical skills to gain a deeper understanding of complex situations at work. Join this session to learn how to prepare for difficult conversations and how to improve our agile conversations in order to be more influential without power. We will use Dave Gray’s Empathy Mapping, Argyris’ Ladder of Inference and The Four Rs from Agile Conversations (Squirrel and Fredrick).
In the session you will experience how preparing and reflecting on your conversation can help you be more influential at work. You will learn how to communicate more effectively with the people needed to achieve positive change. You will leave with a self-revised version of a difficult conversation and a practical model to use when you get back to work.
Come learn more on how to become a real influencer!
This presentation by OECD, OECD Secretariat, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsRosie Wells
Insight: In a landscape where traditional narrative structures are giving way to fragmented and non-linear forms of storytelling, there lies immense potential for creativity and exploration.
'Collapsing Narratives: Exploring Non-Linearity' is a micro report from Rosie Wells.
Rosie Wells is an Arts & Cultural Strategist uniquely positioned at the intersection of grassroots and mainstream storytelling.
Their work is focused on developing meaningful and lasting connections that can drive social change.
Please download this presentation to enjoy the hyperlinks!
This presentation by OECD, OECD Secretariat, was made during the discussion “Competition and Regulation in Professions and Occupations” held at the 77th meeting of the OECD Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found at oe.cd/crps.
This presentation was uploaded with the author’s consent.
This presentation by Professor Alex Robson, Deputy Chair of Australia’s Productivity Commission, was made during the discussion “Competition and Regulation in Professions and Occupations” held at the 77th meeting of the OECD Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found at oe.cd/crps.
This presentation was uploaded with the author’s consent.
This presentation by Nathaniel Lane, Associate Professor in Economics at Oxford University, was made during the discussion “Pro-competitive Industrial Policy” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/pcip.
This presentation was uploaded with the author’s consent.
2. www.sti-innsbruck.at
Semantic Engagement
Why use semantics?
• Problems with current day search engines:
– Recall issues
– Results are dependent on the vocabulary
– Results are single Web pages
– Human involvement is necessary for result interpretation
– Results of Web searches are not readily accessible by other software tools
• Content is not machine-readable:
– It is difficult to distinguish between:
“I am a professor of computer science.”
and
“You may think, I am a professor of computer science.
Well, actually. . .”
2
3. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
3
5. www.sti-innsbruck.at
Semantic Analysis
What is Semantic Analysis?
• Discovering facts in texts
• Deriving additional facts from them
• Somewhere in the Web the text fragment “Dieter is married to Anna” occurs
(extracted statement)
• Named Entity Recognition tells us that Dieter is a (German) male given name, and
Anna is a female given name (enriched with background knowledge)
• We can infer that Dieter and Anna are persons and
– Dieter is male
– Anna is female
– Anna is married to Dieter
– What with “Anna-Marie is married with Dieter”?
(derive new facts)
5
8. www.sti-innsbruck.at
Semantic as a channel
• Publishing Linked Data (data represented in accordance to the Semantic Web
paradigms) can take various forms:
– serialized graph (e.g. a RDF-XML file)
– hidden in markup of the text
– access through open graph databases (triple stores)
• Publishing Linked Data also involves publishing the used format:
– There is a huge amount of different formats
– Formats are often used in combination with another
8
10. www.sti-innsbruck.at
Semantic Content Modelling
• A Ontology is a
– “formal (mathematical),
– explicit specification (male and female are disjunct)
– of a shared conceptualisation”
• (... of a domain)
[Gruber, 1993]
10
13. www.sti-innsbruck.at
Semantic Match Making
• The number of digital publishing channels has increased exponentially in
the past decade
• Content production has risen tremendously in the past century
• Everybody has to put a lot of content into a lot of channels
• Manual efforts begin to be futile
• Automatic review and adjustment of content and dissemination to channels
13
14. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
14
15. www.sti-innsbruck.at
Semantic Analysis
15
Text mining:
“Text mining, sometimes alternately referred to as text data mining, roughly
equivalent to text analytics, refers to the process of deriving high-quality
information from text.“
Source: Wikipedia
How do we get “high-quality information”?
16. www.sti-innsbruck.at
Semantic Analysis
“I saw the man on the hill with a telescope”
16
Example:
Quiz questions:
• Do I have a telescope?
• Does the hill have a telescope?
• Does the man have a telescope?
• Is the man on the hill?
• Am I on the hill?
• Is the telescope on the hill?
18. www.sti-innsbruck.at
Semantic Analysis
1. Topic detection
2. Named entity recognition
3. Co-reference and Disambiguation
4. Relation Extraction
5. Sentiment detection and Opinion mining
6. Social annotation
7. Text summarization
18
Seven typical tasks of Information Extraction from Natural Language:
19. www.sti-innsbruck.at
Semantic Analysis
19
• Topic detection: detect the topics of a document (e.g. chat log, tweets, etc) on a meta
level.
• Catalogues for topics: Open Directory Project (dmoz), Wikipedia,...
• Techniques:
– Classification
– Clustering
– other Machine Learning techniques
“In linguistics, the topic (or theme) of a sentence is often defined as what is being
talked about, and the comment (rheme or focus) is what is being said about the
topic.”
Source: Wikipedia
1. Topic detection
21. www.sti-innsbruck.at
Semantic Analysis
21
• NER involves identification of proper names in texts, and classification into a set of
predefined categories of interest.
• Three universally accepted categories: person, location and organisation
• Often also: measures (percent, money, weight etc), email addresses, recognition of
date/time expressions etc.
• Domain-specific entities: names of hotels, medical conditions, names of ships,
bibliographic references etc.
• “Named entity recognition: recognition of known entity names (for people and
organizations), place names, temporal expressions, and certain types of numerical
expressions, employing existing knowledge of the domain or information extracted
from other sentences.
• For example, in processing the sentence "M. Smith likes fishing", named entity
detection would denote detecting that the phrase "M. Smith" does refer to a person,
but without necessarily having (or using) any knowledge about a certain M. Smith
who is (/or, "might be") the specific person whom that sentence is talking about. “
wikipedia
2. Named Entity Recognition (NER)
23. www.sti-innsbruck.at
Semantic Analysis
23
“In linguistics, co-reference occurs when multiple expressions in a sentence or document
refer to the same thing; or in linguistic jargon, they have the same ‘referent’.”
Source: Wikipedia
“In computational linguistics, word-sense disambiguation (WSD) is an open problem of
natural language processing, which governs the process of identifying which sense of a
word (i.e. meaning) is used in a sentence, when the word has multiple meanings.”
Source: Wikipedia
• Is used connect information, e.g.
“I bought a new car. It is a green Mercedes convertible with 200 horse power”
George W. Bush George H. W. Bush
• Knowing who or what is meant:
“Former president George Bush started the war in Irak, code-named Operation Desert
Storm.”
3. Co-reference and Disambiguation
24. www.sti-innsbruck.at
Semantic Analysis
24
“A relationship extraction task requires the detection and classification of semantic
relationship mentions within a set of artifacts, typically from text or XML documents. The
task is very similar to that of information extraction (IE), but IE additionally requires the
removal of repeated relations (disambiguation) and generally refers to the extraction of
many different relationships.”
Source: Wikipedia
Example: “Dieter is married to Anna”
Dieter Fensel Anna Fensel
Relation
4. Relation Extraction
25. www.sti-innsbruck.at
Semantic Analysis
25
• “Sentiment analysis and opinion refers to the application of natural language
processing, computational linguistics, and text analytics to identify and extract
subjective information in source materials.
• Sentiment analysis aims to determine the attitude of a speaker or a writer with
respect to some topic or the overall contextual polarity of a document.
• The attitude may be his or her judgment or evaluation, affective state, or the intended
emotional communication.”wikipedia
• A sentiment is a thought, view, or attitude, especially one based mainly on emotion
rather than reason
• Must consider features such as:
– Subtlety of sentiment expression e.g. irony
– Domain/context dependence
– Effect of syntax on semantics
5. Sentiment and Opinion Detection
27. www.sti-innsbruck.at
Semantic Analysis
27
• Extraction of opinions and their meaning from text.
• Very difficult and not yet solved task.
• Example:
1. This is a great hotel.
2. A great amount of money was spent for promoting this hotel.
3. One might think this is a great hotel.
5. Sentiment and Opinion Detection
29. www.sti-innsbruck.at
Semantic Analysis
29
• Developed for web users to organize and share their favorite web pages online by
social annotations
• Emergent useful information that has been explored for folksonomy, visualization,
semantic web, etc
• : delicious, bibsonomy, last.fm, ...
6. Social Annotation
31. www.sti-innsbruck.at
Semantic Analysis
• “Automatic summarization involves reducing a text document or a larger corpus of
multiple documents into a short set of words or paragraph that conveys the main
meaning of the text.
– Extractive methods work by selecting a subset of existing words, phrases, or
sentences in the original text to form the summary.
– abstractive methods build an internal semantic representation and then use
natural language generation techniques to create a summary that is closer to
what a human might generate.
• The state-of-the-art abstractive methods are still quite weak, so most research has
focused on extractive methods.
• Two particular types of summarization often addressed in the literature are
– keyphrase extraction, where the goal is to select individual words or phrases to
"tag" a document,
– and document summarization, where the goal is to select whole sentences to
create a short paragraph summary.” wikipedia
31
7. Text Summarization
33. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
33
34. www.sti-innsbruck.at
Semantic as a Channel
The Semantic Web Approach
• Represent Web content in a form that is more easily machine-processable.
• Use intelligent techniques to take advantage of these representations.
• Knowledge will be organized in conceptual spaces according to its meaning.
• Automated tools for maintenance and knowledge discovery
• Semantic query answering
• Query answering over several documents
• Defining who may view certain parts of information (even parts of documents) will be
possible.
• Semantic Web does not rely on text-based manipulation, but rather on machine-
processable metadata
34
35. www.sti-innsbruck.at
Semantic as a Channel
• Scope: Add machine-processable semantics to the information
⟹ Search and aggregation engines can provide much
better service in finding and retrieving information
• Search Engine Optimization
– Are potential customers finding your web site?
– Is it possible that potential customers might not be aware that your site exists?
– Do your targeted search terms have high search engine rankings?
– Does your website attract a large number of daily visitors?
• Search engines are driven by keywords:
– search engine optimization is concerned with improving the visibility of a website or web page in
search engines' unpaid search results
– the more frequent a site appears in the search results list, the more visitors it will receive
• Semantic Search:
– semantic search tries to understand the searcher's intent and meaning of the query instead of
parsing the keywords like a dictionary
– semantic search dives into the relationships between the query words, how the are connected, in
order to understand what they mean
35
36. www.sti-innsbruck.at
Semantic as a Channel
Implementations – Rich Snippets
• Implementation realization of an application, plan, idea, model, or design.
• Snippets—the few lines of text that appear under every search result—are designed
to give users a sense for what’s on the page and why it’s relevant to their query.
• If Google understands the content on your pages, we can create rich snippets—
detailed information intended to help users with specific queries.
36
40. www.sti-innsbruck.at 40
Evolution of the Web: Web of Data
Hypertext
Hypermedia
Web
Web of Data
Semantic Web
Picture from [3]
?
Picture from [4]
“As We May Think”, 1945
Semantic
Annotations
41. www.sti-innsbruck.at 41
Motivation: From a Web of Documents to a Web
of Data
• Web of Documents • Fundamental elements:
1. Names (URIs)
2. Documents (Resources)
described by HTML, XML, etc.
3. Interactions via HTTP
4. (Hyper)Links between documents
or anchors in these documents
• Shortcomings:
– Untyped links
– Web search engines fail on
complex queries
“Documents”
Hyperlinks
42. www.sti-innsbruck.at 42
• Web of Documents • Web of Data
“Documents”
“Things”
Hyperlinks
Typed Links
Motivation: From a Web of Documents to a Web
of Data
43. www.sti-innsbruck.at 43
• Characteristics:
– Links between arbitrary things
(e.g., persons, locations, events,
buildings)
– Structure of data on Web pages is
made explicit
– Things described on Web pages
are named and get URIs
– Links between things are made
explicit and are typed
• Web of Data
“Things”
Typed Links
Motivation: From a Web of Documents to a Web
of Data
44. www.sti-innsbruck.at 44
Vision of the Web of Data
• The Web today
– Consists of data silos which can be
accessed via specialized search egines
in an isoltated fashion.
– One site (data silo) has movies, the
other reviews, again another actors.
– Many common things are represented
in multiple data sets
– Linking identifiers link these data sets
• The Web of Data is envisioned as
a global database
– consisting of objects and their
descriptions
– in which objects are linked with each
other
– with a high degree of object structure
– with explicit semantics for links and
content
– which is designed for humans and
machines
Content on this slide by Chris Bizer,
Tom Heath and Tim Berners-Lee
46. www.sti-innsbruck.at
The three dimensions
46
• Format is an explicit set of requirements to be satisfied by a material, product, or
service.
The most known examples are RDF and OWL.
• A (Semantic Web) vocabulary can be considered as a special form of (usually light-
weight) ontology, or sometimes also merely as a collection of URIs with an (usually
informally) described meaning*.
URI = uniform resource identifier
Semantic vocabularies include: FOAF, Dublin Core, Good Relations, etc.
• Implementation realization of an application, plan, idea, model, or design.
OWLIM - a family of semantic repositories, or RDF database management system
* http://semanticweb.org/wiki/Ontology
47. www.sti-innsbruck.at
Semantic Formats
Format
• an explicit set of requirements to be satisfied by a material, product, or service.
• is an encoded format for converting a specific type of data to displayable information.
47
48. www.sti-innsbruck.at
Semantic Formats
Methods of describing Web content:
48
HTML Meta
Elements
1999
RDFs
1998
RDF
2004
RDFa
2005
Microformats
2007
OWL
2008
SPARQL
2009
OWL 2
2010
RIF
2011
Microdata
49. www.sti-innsbruck.at
Semantic Formats
Format – HTML Meta Elements
• HTML or XHTML elements which provide structured metadata about a Web page
• Represented using the <meta...> element
• Can be used to specify page description, keywords and any other metadata not
provided through the other head elements and attributes
• Example:
49
<meta http-equiv="Content-Type" content="text/html" >
50. www.sti-innsbruck.at
Semantic Formats
Format – HTML Meta Elements
• Search engine optimization attributes: keywords, description, language, robots
– keywords attribute - although popular in the 90s, search engine providers realized that
information stored in meta elements (especially the keywords attribute) was often unreliable
and misleading, or created to draw users towards spam sites
– description attribute - provides concise explanation of a Web page's content
– the language attribute - tells search engines what natural language the website is written in
– the robots attribute - controls whether or not search engine spiders are allowed to index a
page, and whether or not they should follow links from a page
50
51. www.sti-innsbruck.at
Semantic Formats
Format – HTML Meta Elements
• Example - metadata contained by www.wikipedia.org:
51
<meta charset="utf-8">
<meta name="title" content="Wikipedia">
<meta name="description" content="Wikipedia, the free encyclopedia that anyone can edit.">
<meta name="author" content="Wikimedia Foundation">
<meta name="copyright" content="Creative Commons Attribution-Share Alike 3.0 and GNU
Free Documentation License">
<meta name="publisher" content="Wikimedia Foundation">
<meta name="language" content="Many">
<meta name="robots" content="index, follow">
<!--[if lt IE 7]>
<meta http-equiv="imagetoolbar" content="no">
<![endif]-->
<meta name="viewport" content="initial-scale=1.0, user-scalable=yes">
52. www.sti-innsbruck.at
Semantic Formats
HTML Meta Elements – There was soon a need
• For controlled vocabularies used in these meta data
• Richer formats to add these meta data and define their properties
52
54. www.sti-innsbruck.at
RDF
Format – RDF
• The Resource Description Framework (RDF) is a language for representing
information about resources in the World Wide Web.
• RDF provides a common framework for expressing information so it can be
exchanged between applications without loss of meaning.
• It is based on the idea of identifying things using Web identifiers (called Uniform
Resource Identifiers, or URIs) and describing resources in terms of simple properties
and property values
• Thus, RDF can represent simple statements about resources as a graph of nodes
and arcs representing the resources, and their properties and values.
• It specifically supports the evolution of schemas over time without requiring all the
data consumers to be changed
54
Source: http://www.iis.sinica.edu.tw/~trc/public/courses/Fall2008/week15/slide-w15.html#%287%29
55. www.sti-innsbruck.at
RDF
Format – RDF
• Based on triples <subject, predicate, object>
• An RDF triple contains three components:
– the subject, which is an RDF URI reference or a blank node
– the predicate, which is an RDF URI reference
– the object, which is an RDF URI reference, a literal or a blank node
– An RDF triple is conventionally written in the order subject, predicate, object.
– The predicate is also known as the property of the triple.
• Triple data model:
<subject, predicate, object>
– Subject: Resource or blank node
– Predicate: Property
– Object: Resource (or collection of resources), literal or blank node
• Example:
<ex:john, ex:father-of, ex:bill>
55
56. www.sti-innsbruck.at
RDF
Format – RDF
• An RDF graph is a set of RDF triples.
• The set of nodes of an RDF graph is the set of subjects and objects of triples in the
graph.
• Person ages (:age) and favorite friends (:fav)
56
Properties encoded as XML entities:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/
22-rdf-syntax-ns#"
xmlns:example="http://fake.host.edu/e
xample-schema#">
<example:Person>
<example:name>Smith</example:name>
<example:age>21</example:age>
<example:fav>Jones</example>
</example:Person>
</rdf:RDF>
57. www.sti-innsbruck.at
Resources
• A resource may be:
– Web page (e.g. http://www.w3.org)
– A person (e.g. http://www.fensel.com)
– A book (e.g. urn:isbn:0-345-33971-1)
– Anything denoted with a URI!
• A URI is an identifier and not a location on the Web
• RDF allows making statements about resources:
– http://www.w3.org has the format text/html
– http://www.fensel.com has first name Dieter
– urn:isbn:0-345-33971-1 has author Tolkien
57
58. www.sti-innsbruck.at
URI, URN, URL
• A Uniform Resource Identifier (URI) is a string of characters used to identify a name
or a resource on the Internet
• A URI can be a URL or a URN
• A Uniform Resource Name (URN) defines an item's identity
– the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e.
International Standard Book Number (ISBN), as well as the unique reference within that
system and allows one to talk about a book, but doesn't suggest where and how to obtain an
actual copy of it
• A Uniform Resource Locator (URL) provides a method for finding it
– the URL http://www.sti-innsbruck.at/ identifies a resource (STI's home page) and implies that
a representation of that resource (such as the home page's current HTML code, as encoded
characters) is obtainable via HTTP from a network host named www.sti-innsbruck.at
58
59. www.sti-innsbruck.at
RDF-XML
• RDF-XML serialization of a university canteen (note the different vocabularies):
59
<rdf:Description rdf:about="http://lom.sti2.at/mensen/8">
<rdf:type rdf:resource="http://purl.org/goodrelations/v1#Location"/>
<gr:name>Musik Penzing</gr:name>
<foaf:page rdf:resource="http://menu.mensen.at/index/index/locid/8"/>
<vcard:adr rdf:resource="http://lom.sti2.at/mensen/8/adr"/>
<vcard:tel rdf:resource="http://lom.sti2.at/mensen/8/tel"/>
<vcard:geo rdf:resource="http://lom.sti2.at/mensen/8/geo"/>
</rdf:Description>
<rdf:Description rdf:about="http://lom.sti2.at/mensen/8/adr">
<rdf:type rdf:resource="http://www.w3.org/2006/vcard/ns#Work"/>
<vcard:street-address>Penzinger Straße 7</vcard:street-address>
<vcard:postal-code>1140</vcard:postal-code>
<vcard:locality> Wien</vcard:locality>
<vcard:country>Austria</vcard:country>
</rdf:Description>
<rdf:Description rdf:about="http://lom.sti2.at/mensen/8/tel">
<rdf:type rdf:resource="http://www.w3.org/2006/vcard/ns#Work"/>
<rdf:value>+43 1 89 42 146</rdf:value>
</rdf:Description>
<rdf:Description rdf:about="http://lom.sti2.at/mensen/8/geo">
<vcard:latitude rdf:datatype="http://www.w3.org/2001/XMLSchema#double">48.1897501</vcard:latitude>
<vcard:longitude rdf:datatype="http://www.w3.org/2001/XMLSchema#double">16.3134461</vcard:longitude>
</rdf:Description>
61. www.sti-innsbruck.at
RDFS Principles
• Resource
– All resources are implicitly instances of rdfs:Resource
• Class
– Describe sets of resources
– Classes are resources themselves - e.g. Webpages, people, document types
• Class hierarchy can be defined through rdfs:subClassOf
• Every class is a member of rdfs:Class
• Property
– Subset of RDFS Resources that are properties
• Domain: class associated with property: rdfs:domain
• Range: type of the property values: rdfs:range
• Property hierarchy defined through: rdfs:subPropertyOf
61
64. www.sti-innsbruck.at
OWL
Format – OWL
• Family of knowledge representation languages for
authoring ontologies
• WebOnt developed OWL language
• OWL based on earlier languages OIL and
DAML+OIL
• Characterized by formal semantics and RDF/XML-
based serializations for the Semantic Web
• Endorsed by the World Wide Web Consortium
(W3C)
Source: McGuinness, COGNA October 3, 2003
64
65. www.sti-innsbruck.at
Design Goals for OWL
• Shareable
– Ontologies should be publicly available and different data sources should be able
to commit to the same ontology for shared meaning. Also, ontologies should be
able to extend other ontologies in order to provide additional definitions.
• Changing over time
– An ontology may change during its lifetime. A data source should specify the
version of an ontology to which it commits.
• Interoperability
– Different ontologies may model the same concepts in different ways. The
language should provide primitives for relating different representations, thus
allowing data to be converted to different ontologies and enabling a "web of
ontologies."
65
66. www.sti-innsbruck.at
Design Goals for OWL
• Inconsistency detection
– Different ontologies or data sources may be contradictory. It should be possible
to detect these inconsistencies.
• Balancing expressivity and complexity
– The language should be able to express a wide variety of knowledge, but should
also provide for efficient means to reason with it. Since these two requirements
are typically at odds, the goal of the web ontology language is to find a balance
that supports the ability to express the most important kinds of knowledge.
• Ease of use
– The language should provide a low learning barrier and have clear concepts and
meaning. The concepts should be independent from syntax.
66
67. www.sti-innsbruck.at
Design Goals for OWL
• Compatible with existing standards
– The language should be compatible with other commonly used Web and industry
standards. In particular, this includes XML and related standards (such as XML
Schema and RDF), and possibly other modeling standards such as UML.
• Internationalization
– The language should support the development of multilingual ontologies, and
potentially provide different views of ontologies that are appropriate for different
cultures.
67
68. www.sti-innsbruck.at
OWL
OWL Sublanguages
• The W3C-endorsed OWL specification includes the definition of three variants of
OWL, with different levels of expressiveness (ordered by increasing expressiveness):
– OWL Lite - originally intended to support those users primarily
needing a classification hierarchy and simple constraints
– OWL DL - was designed to provide the maximum expressiveness
possible while retaining computational completeness, decidability,
and the availability of practical reasoning algorithms.
– OWL Full - designed to preserve some compatibility with RDF
Schema
• The following set of relations hold. Their inverses do not.
– Every legal OWL Lite ontology is a legal OWL DL ontology.
– Every legal OWL DL ontology is a legal OWL Full ontology.
– Every valid OWL Lite conclusion is a valid OWL DL conclusion.
– Every valid OWL DL conclusion is a valid OWL Full conclusion.
• Development of OWL Lite tools has thus proven almost as difficult as development of
tools for OWL DL, and OWL Lite is not widely used
Each of these sublanguage
is a syntactic extension of
its simpler predecessor.
Source: McGuinness, COGNA October 3, 2003
68
69. www.sti-innsbruck.at
OWL
Format – OWL
• Class Axioms
– oneOf (enumerated classes)
– disjointWith
– sameClassAs applied to class expressions
– rdfs:subClassOf applied to class expressions
• Boolean Combinations of Class Expressions
– unionOf
– intersectionOf
– complementOf
• Arbitrary Cardinality
– minCardinality
– maxCardinality
– cardinality
• Filler Information
– hasValue Descriptions can include specific value information
Source: McGuinness, COGNA October 3, 2003
69
71. www.sti-innsbruck.at
Experience with OWL
• OWL playing key role in increasing number & range of applications
– eScience, eCommerce, geography, engineering, defence, …
– E.g., OWL tools used to identify and repair errors in a medical ontology: “would
have led to missed test results if not corrected”
• Experience of OWL in use has identified restrictions:
– on expressivity
– on scalability
• These restrictions are problematic in some applications
• Research has now shown how some restrictions can be overcome
• W3C OWL WG has updated OWL accordingly
– Result is called OWL 2
• OWL 2 is now a Proposed Recommendation
71
72. www.sti-innsbruck.at
OWL 2 in a Nutshell
• Extends OWL with a small but useful set of features
– That are needed in applications
– For which semantics and reasoning techniques are well understood
– That tool builders are willing and able to support
• Adds profiles
– Language subsets with useful computational properties (EL, RL, QL)
• Is fully backwards compatible with OWL:
– Every OWL ontology is a valid OWL 2 ontology
– Every OWL 2 ontology not using new features is a valid OWL ontology
• Already supported by popular OWL tools & infrastructure:
– Protégé, HermiT, Pellet, FaCT++, OWL API
72
73. www.sti-innsbruck.at
Format – OWL 2
• Inherits OWL 1 language features
• Makes some patterns easier to write
• Does not change expressiveness, semantics and complexity
• Provides more efficient processing in implementations
• Syntactic sugar:
– DisjointUnion - Union of a set of classes; all the classes are pairwise disjoint
– DisjointClasses - A set of classes; all the classes are pairwise disjoint
– NegativeObjectPropertyAssertion - Two individuals; a property does not hold between them
– NegativeDataPropertyAssertion - An individual; a literal; a property does not hold between
them
• OWL 2 allows the same identifiers (URIs) to denote individuals, classes, and
properties
• Interpretation depends on context
• A very simple form of meta-modelling
OWL2
Source: McGuinness, COGNA October 3, 2003
73
74. www.sti-innsbruck.at
OWL2
• Qualified cardinality restrictions
– e.g., persons having two friends who are republicans
• Property chains
– e.g., the brother of your parent is your uncle
• Local reflexivity restrictions
– e.g., narcissists love themselves
• Reflexive, irreflexive, and asymmetric properties
– e.g., nothing can be a proper part of itself (irreflexive)
• Disjoint properties
– e.g., you can’t be both the parent of and child of the same person
• Keys
– e.g., country + license plate constitute a unique identifier for vehicles
74
75. www.sti-innsbruck.at
Format – OWL 2
• An OWL 2 profile (commonly called a fragment or a sublanguage in computational
logic) is a trimmed down version of OWL 2 that trades some expressive power for the
efficiency of reasoning.
• OWL 2 profiles
– OWL 2 EL is particularly useful in applications employing ontologies that contain very large
numbers of properties and/or classes.
– OWL 2 QL is aimed at applications that use very large volumes of instance data,
and where query answering is the most important reasoning task
– OWL 2 RL is aimed at applications that require scalable reasoning without
sacrificing too much expressive power.
• OWL 2 profiles are defined by placing restrictions on the structure of OWL 2
ontologies.
OWL2
Source: http://semwebprogramming.org/?p=175
75
77. www.sti-innsbruck.at
Format – RIF
• A collection of dialects (rigorously defined
rule languages)
• Intended to facilitate rule sharing and
exchange
• RIF framework is a set of rigorous
guidelines for constructing RIF dialects in a
consistent manner
• The RIF framework includes several
aspects:
– Syntactic framework
– Semantic framework
– XML framework
• RIF can be used to map between
vocabularies (one of the proposed use
cases)
Rule Interchanged Format (RIF)
Source: Michael Kifer State University of New York at Stony Brook
77
78. www.sti-innsbruck.at
Rule Interchanged Format (RIF)
• Exchange of Rules
– The primary goal of RIF is to facilitate the exchange of rules
• Consistency with W3C specifications
– A W3C specification that builds on and develops the existing range of
specifications that have been developed by the W3C
– Existing W3C technologies should fit well with RIF
• scale Adoption
– Rules interchange becomes more effective the wider is their adoption ("network
effect“)
Goals:
78
79. www.sti-innsbruck.at
Format – RIF
• The standard RIF dialects are:
– Core - the fundamental RIF language. It is designed to be the common subset of most rule
engines. (It provides "safe" positive datalog with builtins.)
– BLD (Basic Logic Dialect) - adds a few things that Core doesn't have: logic functions,
equality in the then-part, and named arguments. (This is positive Horn logic, with equality
and builtins.)
– PRD (Production Rules Dialect) - adds a notion of forward-chaining rules, where a rule fires
and then performs some action, such as adding more information to the store or retracting
some information.
• Although RIF dialects were designed primarily for interchange, each dialect is a
standard rule language and can be used even when portability and interchange are
not required.
• The XML syntax is the only one defined as a standard for interchange. Various
presentation syntaxes are used in the specification, but they are not recommended
for sending between different systems.
Rule Interchanged Format (RIF)
Source: http://www.w3.org/2005/rules/wiki/RIF_FAQ#What_is_RIF-BLD.3F__.28and_RIF-Core.2C_PRD.2C_FLD.29
79
80. www.sti-innsbruck.at
• Compliance model
– Clear conformance criteria, defining what is or is not a conformant to RIF
• Different semantics
– RIF must cover rule languages having different semantics
• Limited number of dialects
– RIF must have a standard core and a limited number of standard dialects based
upon that core
• OWL data
– RIF must cover OWL knowledge bases as data where compatible with RIF
semantics
[http://www.w3.org/TR/rif-ucr/]
Requirements
Rule Interchanged Format (RIF)
80
81. www.sti-innsbruck.at
• RDF data
– RIF must cover RDF triples as data where compatible with RIF semantics
• Dialect identification
– The semantics of a RIF document must be uniquely determined by the content of
the document, without out-of-band data
• XML syntax
– RIF must have an XML syntax as its primary normative syntax
• Merge rule sets
– RIF must support the ability to merge rule sets
• Identify rule sets
– RIF must support the identification of rule sets
Requirements
Rule Interchanged Format (RIF)
[http://www.w3.org/TR/rif-ucr/]
81
82. www.sti-innsbruck.at
• RIF wants to cover: rules in logic dialects and rules used by production rule
systems (e.g. active databases)
• Logic rules only add knowledge
• Production rules change the facts!
• Logic rules + Production Rules?
– Define a logic-based core and a separate production-rule core
– If there is an intersection, define the common core
Basic Principle: a Modular Architecture
Rule Interchanged Format (RIF)
82
83. www.sti-innsbruck.at
Format – RIF
• A simplified example of RIF-Core rules combined with OWL to capture anatomical
knowledge that can be used to help label brain cortex structures in MRI images.
Rule Interchanged Format (RIF)
Source: http://www.w3.org/2005/rules/wiki/Modeling_Brain_Anatomy
83
85. www.sti-innsbruck.at
SPARQL
Format – SPARQL
• A recursive acronym for SPARQL Protocol and RDF Query Language
• On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation
• Query language based on RDQL
• Used to retrieve and manipulate data stored in RDF format
• Uses SQL-like syntax
85
87. www.sti-innsbruck.at
PREFIX uni: <http://example.org/uni/>
SELECT ?name
FROM <http://example.org/personal>
WHERE { ?s uni:name ?name. ?s rdf:type uni:lecturer }
• PREFIX
– Prefix mechanism for abbreviating URIs
• SELECT
– Identifies the variables to be returned in the query answer
– SELECT DISTINCT
– SELECT REDUCED
• FROM
– Name of the graph to be queried
– FROM NAMED
• WHERE
– Query pattern as a list of triple patterns
• LIMIT
• OFFSET
• ORDER BY
SPARQL Queries
87
88. www.sti-innsbruck.at
SPARQL
Format – SPARQL
• Example SPARQL Query:
– “Return the full names of all people in the graph”
– Results:
fullName
=================
"John Smith"
"Mary Smith"
88
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT ?fullName
WHERE {?x vCard:FN ?fullName}
@prefix ex: <http://example.org/#> .
@prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
ex:john
vcard:FN "John Smith" ;
vcard:N [
vcard:Given "John" ;
vcard:Family "Smith" ] ;
ex:hasAge 32 ;
ex:marriedTo :mary .
ex:mary
vcard:FN "Mary Smith" ;
vcard:N [
vcard:Given "Mary" ;
vcard:Family "Smith" ] ;
ex:hasAge 29 .
89. www.sti-innsbruck.at
• PREFIX: based on namespaces
• DISTINCT: The DISTINCT solution modifier eliminates duplicate solutions.
Specifically, each solution that binds the same variables to the same RDF
terms as another solution is eliminated from the solution set.
• REDUCED: While the DISTINCT modifier ensures that duplicate solutions
are eliminated from the solution set, REDUCED simply permits them to be
eliminated. The cardinality of any set of variable bindings in an REDUCED
solution set is at least one and not more than the cardinality of the solution
set with no DISTINCT or REDUCED modifier.
• LIMIT: The LIMIT clause puts an upper bound on the number of solutions
returned. If the number of actual solutions is greater than the limit, then at
most the limit number of solutions will be returned.
SPARQL Query keywords
89
90. www.sti-innsbruck.at
• OFFSET: OFFSET causes the solutions generated to start after the
specified number of solutions. An OFFSET of zero has no effect.
• ORDER BY: The ORDER BY clause establishes the order of a solution
sequence.
• Following the ORDER BY clause is a sequence of order comparators,
composed of an expression and an optional order modifier (either ASC() or
DESC()). Each ordering comparator is either ascending (indicated by the
ASC() modifier or by no modifier) or descending (indicated by the DESC()
modifier).
SPARQL Query keywords
90
91. www.sti-innsbruck.at
Microdata
Format – Microdata
• Use HTML5 elements to include semantic descriptions into web documents aiming to
replace RDFa and Microformats.
• Introduce new tag attributes to include semantic data into HTML
• Unless you know that your target consumer only accepts RDFa, you are probably
best going with microdata.
• While many RDFa-consuming services (such as the semantic search engine Sindice)
also accept microdata, microdata-consuming services are less likely to accept RDFa.
91
• Advantages:
– the variable groupings of data within published area
tables may not be the detail required for a particular
application (e.g. age group, ethnic group or
occupational classification).
– the cross-tabulations of variables available in area
tables may not be those needed for a study (e.g. counts
of individuals by age and ethnic group and occupation).
92. www.sti-innsbruck.at
• Search engines, web crawlers, and browsers can extract and process
Microdata from a web page and use it to provide a richer browsing
experience for users.
• Microdata uses a supporting vocabulary to describe an item and name-
value pairs to assign values to its properties
• Microdata helps technologies such as search engines and web crawlers
better understand what information is contained in a web page, providing
better search results.
Two important vocabularies:
http://www.data-vocabulary.org/
http://schema.org/
92
Microdata
93. www.sti-innsbruck.at
Microdata Global Attributes
• itemscope – Creates the Item and indicates that descendants of this element contain
information about it.
• itemtype – A valid URL of a vocabulary that describes the item and its properties
context.
• itemid – Indicates a unique identifier of the item.
• itemprop – Indicates that its containing tag holds the value of the specified item
property. The properties name and value context are described by the items
vocabulary. Properties values usually consist of string values, but can also use URLs
using the a element and its href attribute, the img element and its src attribute, or
other elements that link to or embed external resources.
• itemref – Properties that are not descendants of the element with the itemscope
attribute can be associated with the item using this attribute. Provides a list of
element itemids with additional properties elsewhere in the document.
93
Microdata
94. www.sti-innsbruck.at
Microdata
94
<div itemscope itemtype="http://data-vocabulary.org/Person">
My name is <span itemprop="name">Bob Smith</span>
but people call me <span itemprop="nickname">Smithy</span>.
Here is my home page:
<a href="http://www.example.com" itemprop="url">www.example.com</a>
I live in Albuquerque, NM and work as an <span itemprop="title">engineer</span>
at <span itemprop="affiliation">ACME Corp</span>.
</div>
<div>
My name is Bob Smith but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>
I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>
Enriched with microdata:
Source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=176035
Example: Microdata in use
95. www.sti-innsbruck.at
Microdata
Format – Microdata
• Example without microdata:
95
<section>
Hello, my name is John Doe, I am a graduate research assistant at the University
of Dreams. My friends call me Johnny. You can visit my homepage at
<a href="http://www.JohnnyD.com">www.JohnnyD.com</a>
. I live at 1234 Peach Drive Warner Robins, Georgia.
</section>
96. www.sti-innsbruck.at
Microdata
Format – Microdata
• Example using microdata:
96
<section itemscope itemtype="http://data-vocabulary.org/Person">
Hello, my name is
<span itemprop="name">John Doe</span>
, I am a
<span itemprop="title">graduate research assistant</span>
at the
<span itemprop="affiliation">University of Dreams</span>.
My friends call me
<span itemprop="nickname">Johnny</span>
. You can visit my homepage at
<a href=http://www.JohnnyD.com itemprop="url">www.JohnnyD.com</a>.
<section itemprop="address" itemscope itemtype="http://data-
vocabulary.org/Address">
I live at
<span itemprop="street-address"> 1234 Peach Drive</span>
<span itemprop="locality">Warner Robins</span> ,
<span itemprop="region">Georgia</span>.
</section>
</section>
97. www.sti-innsbruck.at 97
Microformats
• An approach to add meaning to HTML elements and to make data
structures in HTML pages explicit.
• “Designed for humans first and machines second, microformats are a set of
simple, open data formats built upon existing and widely adopted standards.
Instead of throwing away what works today, microformats intend to solve
simpler problems first by adapting to current behaviours and usage patterns
(e.g. XHTML, blogging).” [6]
What are Microformats?
98. www.sti-innsbruck.at 98
Microformats
• Are highly correlated with semantic (X)HTML / “Real world semantics” / “Lowercase
Semantic Web” [9].
• Real world semantics (or the Lowercase Semantic Web) is based on three notions:
– Adding of simple semantics with microformats (small pieces)
– Adding semantics to the today’s Web instead of creating a new one (evolutionary not
revolutionary)
– Design for humans first and machines second (user centric design)
• A way to combine human with machine-readable information.
• Provide means to embed structured data in HTML pages.
• Build upon existing standards.
• Solve a single, specific problem (e.g. representation of geographical information,
calendaring information, etc.).
• Provide an “API” for your website.
• Build on existing (X)HTML and reuse existing elements.
• Work in current browsers.
• Follow the DRY principle (“Don’t Repeat Yourself”).
• Compatible with the idea of the Web as a single information space.
What are Microformats?
99. www.sti-innsbruck.at
Microformats
Format – Microformats
• Directly use meta tags of XHTML to embed semantic information in web documents;
• Microformats were developed as a competing approach directly using some existing
HTML tags to include meta data in HTML documents
• As of 2010, microformats allow the encoding and extraction of events, contact
information, social relationships and so on
• Advantages:
– you can publish a single, human readable version of your information in HTML and then
make it machine readable with the addition of a few standard class names
– No need to learn another language
– Easy to add
• However: they overload the class tag which causes problems for some parsers as it
makes semantic information and styling markup hard to differentiate
99
101. www.sti-innsbruck.at
Microformats
101
<div class="vcard“><img class="photo" src="www.example.com/bobsmith.jpg" />
<strong class="fn">Bob Smith</strong>
<span class="title">Senior editor</span> at <span class="org">
ACME Reviews</span><span class="adr">
<span class="street-address">200 Main St</span>
<span class="locality">Desertville</span>, <span class="region">AZ</span>
<span class="postal-code">12345</span>
</span></div>
<div>
<img src="www.example.com/bobsmith.jpg" />
<strong>Bob Smith</strong>
Senior editor at ACME Reviews
200 Main St
Desertville, AZ 12345
</div>
Enriched with microformat:
Source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146897
Example: Microformats in use
102. www.sti-innsbruck.at 102
RDFa
• RDFa is a W3C recommendation.
• RDFa is a serialization syntax for embedding an RDF graph into XHTML.
• Goals: Bringing the Web of Documents and the Web of Data closer
together.
• Overcomes some of the drawbacks of microformats
• Both for human and machine consumption.
• Follows the DRY (“Don’t Repeat Yourself”) – principles.
• RDFa is domain-independent. In contrast to the domain-dedicated
microformats, RDFa can be used for custom data and multiple schemas.
• Benefits inherited from RDF: Independence, modularity, evolvability, and
reusability.
• Easy to transform RDFa into RDF data.
• Tools for RDFa publishing and consumption exist.
103. www.sti-innsbruck.at
RDFa
• Relevant XHTML attributes: @rel, @rev, @content, @href, and @src (examples and
explanations on the following slides)
• New RDFa-specific attributes: @about, @property, @resource, @datatype, and
@typeof (examples and explanations on the following slides)
103
Syntax: How to use RDFa in XHTML
104. www.sti-innsbruck.at
RDFa
Format – RDFa
• Is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for
embedding rich metadata within Web documents.
• Adds a set of attribute-level extensions to XHTML enabling the embedding of RDF
triples;
• Integrates best with the W3C meta data stack built on top of RDF
• Benefits [Wikipedia RDFa, n.d.]:
– Publisher independence: each website can use its own standards;
– Data reuse: data is not duplicated - separate XML/HTML sections are not required for the
same content;
– Self containment: HTML and RDF are separated;
– Schema modularity: attributes are reusable;
– Evolv-ability: additional fields can be added and XML transforms can extract the semantics
of the data from an XHTML file;
– Web accessibility: more information is available to assistive technology.
• Disadvantage: the uptake of the technology is hampered by the web-
master’s lack of familiarity with this technology stack
104
105. www.sti-innsbruck.at
RDFa
Format – RDFa
• RDFa Attributes:
– about and src – a URI or CURIE specifying the resource the metadata is about
– rel and rev – specifying a relationship or reverse-relationship with another resource
– href and resource – specifying the partner resource
– property – specifying a property for the content of an element
– content – optional attribute that overrides the content of the element when using the
property attribute
– datatype – optional attribute that specifies the datatype of text specified for use with the
property attribute
– typeof – optional attribute that specifies the RDF type(s) of the subject (the resource that the
metadata is about).
105
106. www.sti-innsbruck.at
• @rel: a whitespace separated list of CURIEs (Compact URIs), used for
expressing relationships between two resources ('predicates’);
• All content on this site is licensed under <a rel="license"
href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons
License </a>.
106
RDFa
Syntax: How to use RDFa in XHTML
107. www.sti-innsbruck.at
RDFa
107
107
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">Bob Smith</span>,
but people call me <span property="v:nickname">Smithy</span>.
Here is my homepage:
<a href="http://www.example.com" rel="v:url">www.example.com</a>.
I live in Albuquerque, NM and work as an <span property="v:title">engineer</span>
at <span property="v:affiliation">ACME Corp</span>.
</div>
<div>
My name is Bob Smith but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>.
I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>
Enriched with RDFa:
Source: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146898
Example: RDFa in use
108. www.sti-innsbruck.at
Microformats, RDFa, Microdata
• RDFa adds a set of attribute-level extensions to XHTML enabling the
embedding of RDF triples
• Microformats directly use meta tags of XHTML to embed semantic
information in web documents
• Microdata use HTML5 elements to include semantic descriptions into web
documents aiming to replace RDFa and Microformats.
• For the moment, we have three competing proposals that should be
supported in parallel until one of them can take a dominant role on the web.
108
109. www.sti-innsbruck.at
Microformats, RDFa, Microdata
• RDFa integrates best with the W3C meta data stack built on top of RDF.
• However, this also seems to hamper the uptake of this technology by many
webmasters that are not familiar with this technology stack.
• Microformats were developed as a competing approach directly using some
existing HTML tags to include meta data in HTML documents.
• Actually, they overload the class tag which causes problems for some
parsers as it makes semantic information and styling markup hard to
differentiate.
• Microdata instead introduce new tag attributes to include semantic data into
HTML.
109
111. www.sti-innsbruck.at
• All three of them are intended for machines to interpret content, i.e. search
engines
• Major search engines support Microformat in combination with the
Schema.org vocabulary.
• If your choice is not supported by Google, Bing, and Yahoo your choice is
useless.
111
Microformats, RDFa, Microdata
113. www.sti-innsbruck.at
Vocabularies:
• A (Semantic Web) vocabulary can be considered a special form of (usually light-
weight) ontology, or sometimes also merely as a collection of URIs with an (usually
informally) described meaning.
• Recap “what ontologies are”: “An ontology is a formal specification of a shared
conceptualization”
• http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
113
Semantic Channels: Vocabularies
115. www.sti-innsbruck.at
Semantic Channels: Vocabularies
115
• Vocabulary to describe content in social channels
• Enables...
– semantic links between online communities
– to fully describe the content and structure of community sites
– the integration of online community information
– browsing of connected Semantic Web items
– to add a social aspect to the Semantic Web
SIOC
116. www.sti-innsbruck.at 116
A SIOC document, unlike a traditional Web page, can be combined with
other SIOC and RDF documents to create a unified database of information.
Semantic Channels: Vocabularies
SIOC
117. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – DublinCore
• Early Dublin Core workshops popularized the idea of "core metadata" for simple and
generic resource descriptions.
• Metadata terms are a set of vocabulary terms which can be used to describe
resources for the purposes of discovery.
• The terms can be used to describe a full range of web resources: video, images, web
pages etc. and physical resources such as books and objects like artworks
• The Dublin Core standard includes two levels:
– Simple Dublin Core comprises 15 elements;
– Qualified Dublin Core includes three additional elements;— Audience, Provenance and
RightsHolder;— as well as a group of element refinements, also called qualifiers, that refine
the semantics of the elements in ways that may be useful in resource discovery.
117
Source: http://dublincore.org (tutorials)
118. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – DublinCore
• Characteristics of DublinCore:
– All elements are optional
– All elements are repeatable
– Elements may be displayed in any order
– Extensible
– International in scope
• The fifteen core elements are usable with or without qualifiers
• Qualifiers make elements more specific:
– Element Refinements narrow meanings, never extend
– Encoding Schemes give context to element values
• If your software encounters an unfamiliar qualifier, look it up –or just
ignore it!
118
Source: http://dublincore.org (tutorials)
119. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – DublinCore
• Expressing Dublin Core in HTML/XHTML meta and link elements:
119
...
<head profile="http://dublincore.org/documents/dcq-html/">
<title>Expressing Dublin Core in HTML/XHTML meta and link elements</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.title" lang="en" content="Expressing Dublin Core
in HTML/XHTML meta and link elements" />
<meta name="DC.creator" content="Andy Powell, UKOLN, University of Bath" />
<meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF" content="2003-11-01" />
<meta name="DC.identifier" scheme="DCTERMS.URI"
content="http://dublincore.org/documents/dcq-html/" />
<link rel="DCTERMS.replaces" hreflang="en"
href="http://dublincore.org/documents/2000/08/15/dcq-html/" />
<meta name="DCTERMS.abstract" content="This document describes how
qualified Dublin Core metadata can be encoded
in HTML/XHTML <meta> elements" />
<meta name="DC.format" scheme="DCTERMS.IMT" content="text/html" />
<meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
</head>
...
120. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – FOAF
• Friend of a Friend
• Uses RDF to describe the relationship people have to other “things” around them
• FOAF permits intelligent agents to make sense of the thousands of connections
people have with each other, their jobs and the items important to their lives;
• Because the connections are so vast in number, human interpretation of the
information may not be the best way of analyzing them.
• FOAF is an example of how the Semantic Web attempts to make use of the
relationships within a social context.
120
121. www.sti-innsbruck.at
• Vocabulary to link people and information using the Web.
• Enables...
– to describe Agents, Documents, Groups, Organizations, Projects and their
relationships.
– a distributed social network that can be crawled.
121
Semantic Channels: Vocabularies
Friend of a Friend (FOAF)
122. www.sti-innsbruck.at
• Friend of a Friend is a project that aims at providing simple ways to describe
people and relations among them
• FOAF adopts RDF and RDFS
• Full specification available on: http://xmlns.com/foaf/spec/
• Tools based on FOAF:
– FOAF search (http://foaf.qdos.com/)
– FOAF builder (http://foafbuilder.qdos.com/)
– FOAF-a-matic (http://www.ldodds.com/foaf/foaf-a-matic)
– FOAF.vix (http://foaf-visualizer.org/)
Semantic Channels: Vocabularies
Friend of a Friend (FOAF)
122
124. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – FOAF
• Example
• Which says "there is a Person called Dan Brickley who has an email address whose
sha1 hash is..."
124
<foaf:Person>
<foaf:name>Dan Brickley</foaf:name>
<foaf:mbox_sha1sum>
748934f32135cfcf6f8c06e253c53442721e15e7
</foaf:mbox_sha1sum>
</foaf:Person>
126. www.sti-innsbruck.at
• GoodRelations is a standardized vocabulary (also known as "schema",
"data dictionary", or "ontology") for product, price, store, and company data
that can ...
1. be embedded into existing static and dynamic Web pages and that
2. can be processed by other computers. This increases the visibility of your
products and services in the latest generation of search engines, recommender
systems, and other novel applications.
126
Semantic Channels: Vocabularies
GoodRelations
127. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – GoodRelations
• A lightweight ontology for annotating offerings and other aspects of e-commerce on
the Web.
• The only OWL DL ontology officially supported by both Google and Yahoo.
• It provides a standard vocabulary for expressing things like
– that a particular Web site describes an offer to sell cellphones of a certain make and model at
a certain price,
– that a pianohouse offers maintenance for pianos that weigh less than 150 kg,
– or that a car rental company leases out cars of a certain make and model from a particular
set of branches across the country.
• Also, most if not all commercial and functional details of e-commerce scenarios can
be expressed, e.g. eligible countries, payment and delivery options, quantity
discounts, opening hours, etc.
127
http://semanticweb.org/wiki/GoodRelations
128. www.sti-innsbruck.at
“Keep simple things simple and make
complex things possible”
• Cater for LOD and OWL DL worlds
• Academically sound
• Industry-strength engineering
• Practically relevant
128
http://purl.org/goodrelations
Lightweight
Web of Data
LOD
RDF + a little bit
Heavyweight
Web of Data
OWL DL
Design Principles:
Semantic Channels: Vocabularies
GoodRelations
130. www.sti-innsbruck.at
• A microdata vocabulary.
• Schema.org is a collection of schemas, i.e., html tags, that webmasters can
use to markup their pages in ways recognized by major search providers.
• Search engines including Bing, Google, Yahoo! and Yandex rely on this
markup to improve the display of search results, making it easier for people
to find the right web pages.
130
Semantic Channels: Vocabularies
Schema.org
132. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Vocabulary – schema.org
• Example*:
– Imagine you have a page about the movie Avatar—a page with a link to a movie trailer,
information about the director, and so on. Your HTML code might look something like this:
132
<div>
<h1>Avatar</h1>
<span>Director: James Cameron (born August 16, 1954)</span>
<span>Science fiction</span>
<a href="../movies/avatar-theatrical-trailer.html">Trailer</a>
</div>
* http://schema.org/docs/gs.html
133. www.sti-innsbruck.at
• In-page structured data for search
• Do not ask “so, how do we describe hotels?”, but “how can we improve
markup on existing pages that describe hotels?” (or Cars, Software, ...)
• Simplify publisher/webmaster experience
• Record agreements between search engines
• Central use case: augmented search results
133
Semantic Channels: Vocabularies
Schema.org
135. www.sti-innsbruck.at
Semantic Channels: Vocabularies
Implementations – Rich Snippets
• Three steps to rich snippets
1. Pick a markup format.
Google suggests using microdata, but any of the three formats below are acceptable.
• Microdata (recommended)
• Microformats
• RDFa
2. Mark up your content.
Google supports rich snippets for these content types:
• Reviews
• People
• Products
• Businesses and organizations
• Recipes
• Events
• Music
• Google also recognizes markup for video content and uses it to improve our search results.
135
137. www.sti-innsbruck.at
Semantic Channels: Implementations
Implementations – OWLIM
• OWLIM is a high-performance OWL repository
• Storage and Inference Layer (SAIL) for Sesame RDF database
• OWLIM performs OWL DLP reasoning
• It is uses the IRRE (Inductive Rule Reasoning Engine) for forward-chaining and “total
materialization”
• In-memory reasoning and query evaluation
• OWLIM provides a reliable persistence, based on RDF N-Triples
• OWLIM can manage millions of statements on desktop hardware
• Extremely fast upload and query evaluation even for huge ontologies and knowledge
bases
• OWLIM is developed by Ontotext
137
138. www.sti-innsbruck.at
Semantic Channels: Implementations
Implementations – OWLIM
• OWLIM is available as a Storage and Inference Layer (SAIL) for Sesame RDF.
• Benefits:
– Sesame’s infrastructure, documentation, user community, etc.
– Support for multiple query language (RQL, RDQL, SeRQL)
– Support for import and export formats (RDF/XML, N-Triples, N3)
138
139. www.sti-innsbruck.at
Semantic Channels: Implementations
Implementations – Jena
• Apache Jena™ is a Java framework for building Semantic Web applications.
• Jena provides a collection of tools and Java libraries to help you to develop semantic
web and linked-data apps, tools and servers.
• The Jena Framework includes:
– an API for reading, processing and writing RDF data in XML, N-triples and Turtle formats;
– an ontology API for handling OWL and RDFS ontologies;
– a rule-based inference engine for reasoning with RDF and OWL data sources;
– stores to allow large numbers of RDF triples to be efficiently stored on disk;
– a query engine compliant with the latest SPARQL specification
– servers to allow RDF data to be published to other applications using a variety of protocols,
including SPARQL
139
141. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
141
142. www.sti-innsbruck.at
Social Media: upcoming challenges
• Social Networks are often
compared to “Walled Gardens” [1]
• It becomes tremendously hard to
share the right content with the
right audience.
• Consumption is limited to few active
channels but the image of a
brand/company/individual is shaped
by the community – and
community does not happen in a
single channel.
142
[1] http://www.ibiblio.org/hhalpin/homepage/presentations/digitalidentity2009/#%283%29
143. www.sti-innsbruck.at
Semantic as End User Enabler
• Solve these obstacles by mechanizing important aspects of these tasks,
and therefore offer a scalable, cost-sensitive, and effective online
dissemination solution.
• Introduce a layer on top of the various internet based communication
channels that is domain specific and not channel specific.
Information model
defines the type of information items in the domain
Channel model
describes the various channels, the interaction
pattern, and their target groups
Weaver
mappings of information items to channels
143
145. www.sti-innsbruck.at
Separating Symbol and Knowledge Level
Analogy 1 (for senior people in the audience)
“I am about to propose the existence of something called the knowledge level,
within which knowledge is to be defined.” [Newell, 1982]
• Knowledge is intimately linked with
rationality. Systems of which
rationality can be posited can be
said to have knowledge.
• At the knowledge level, knowledge
is described functionally in terms
of goals and rationality.
• At the symbol level, knowledge is described operational in terms of
achieving the goals through a certain sequence of activities.
• Obviously, there are various ways to encode knowledge at the symbol level.
145
Observer Agent
146. www.sti-innsbruck.at
Separating Content and Rendering
• Analogy 2 (for the juniors in the audience):
– Content may be presented differently in different contexts.
– Therefore, it should be modeled independent from a specific representation
– Stylesheets connect content with a specific presentation
• Content:
146
<html><head>
<link rel="stylesheet" type="text/css" href="/tryit.css" /></head>
<body>
<div itemscope itemtype="http://schema.org/Person">
<img src="http://www.fensel.com/dieter.jpg" itemprop="image" />
<span id="property">Title: <span itemprop="jobTitle">Prof. Dr.</span></span>
<span id="property">Name: <span itemprop="name">Dieter Fensel</span></span>
<span id="property">Nationality: <span itemprop="nationality">German</span></span>
<span id="property">Birthdate: <span itemprop="birthdate">October 1960</span></span>
<span id="property">Address: <span itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">Technikerstr. 21a</span>,
<span itemprop="postalCode">6020</span>
<span itemprop="addressLocality">Innsbruck</span>,
<span itemprop="addressRegion">Tirol</span>
</span></span>
<span id="property">Tel.: <span itemprop="telephone">+43 512 507 6485</span></span>
<span id="property">E-Mail: <a href="mailto:dieter.fensel@sti2.at" itemprop="email">dieter.fensel@sti2.at</a></span>
<span id="property">WWW: <a href="http://www.fensel.com/" itemprop="url">http://www.fensel.com/</a></span>
</div></body><html>
154. www.sti-innsbruck.at
Channel model
The channel model describes the different channels, their interaction patterns
and the target groups.
• Channels can be
– online or offline
– for broadcasting or sharing
– for group communication or collaboration
– of static or dynamic information
• The number of different channels is growing constantly
• The target groups are very different from channel to channel
154
155. www.sti-innsbruck.at
Offline and Online Channels
• Walk-in customer
• Telephone
• Email
• Fax
• Hotel website
• Review sites
• Booking sites
• Social network sites
• Blogs
• Fora & destination sites
• Chat
• Video & photo sharing
155
157. www.sti-innsbruck.at
Weaver
The weaver is responsible for mapping of information items to the appropriate
channels.
• Separation of content and communication channels
• Reuse of the same content for various dissemination means
• Explicit alignment of information items and channels
157
158. www.sti-innsbruck.at
Weaver component
Elements 1 to 3 are about the content. They define the actual categories, the agent
responsible for them, and the process of interacting with this agent.
1. Information item
It defines an information category that should be disseminated through
various channels.
2. Editor
The editor defines the agent that is responsible for providing the content of an
information item.
3. Interaction protocol
This defines the interaction protocol governing how an editor collects the
content.
158
The details of the Weaver
159. www.sti-innsbruck.at
Weaver
159
Elements 4 to 9 are about the dissemination of these items.
4. Information type
An instance of a concept, a set of instances of a concept (i.e., an
extensional definition of the concept), or a concept description (i.e., an
intentional definition of a concept).
5. Processing rule
These rules govern how the content is processed to fit a channel. Often
only a subset of the overall information item fits a certain channel.
6. Channel
The media that is used to disseminate the information.
The details of the Weaver
160. www.sti-innsbruck.at
Weaver
160
7. Scheduling information
Information on how often and in which intervals the dissemination will be
performed which includes temporal constrains over multi-channel
disseminations.
8. Executor
It determines which agent or process is performing the update of a
channel. Such an agent can be a human or a software solution.
9. Executor interaction protocol
It governs the interaction protocol defining how an executer receives its
content.
The details of the Weaver
161. www.sti-innsbruck.at
• The organization STI International has regular general assemblies
• A general assembly is described by the president
• The description refers to the general assembly as an event
• The description of a general assembly must refer to a future event
• The information is published on the content management system Drupal
161
Weaver
Example: Weaver Storyline
162. www.sti-innsbruck.at
• Item: sti2:general_assembly
• Editor: President
• Type: Concept Description
• Channel: homepage/event/past-‐event
• Schedule Constraint: date > current date
• Executor: Drupal
• Executor Interaction Protocol: none
162
This process forms a part of the weaver component:
Weaver
Example: Weaver Storyline
163. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
163
166. www.sti-innsbruck.at
Semantic Match Making
• Scientific aspects:
– Channels and content are matched automatically
– Texts are shortened/enlarged to an amount which fits the Blogpost/Tweet
– Pictures/Videos/Slides are attached where needed
166
167. www.sti-innsbruck.at
Related Systems
• A Semantic Publish/Subscribe System for Selective Dissemination of
the RSS Documents [1]
• Semantic Email Addressing: Sending Email to People, Not Strings [2]
167
[1] J. Ma, G. Xu, J. L. Wang, and T. Huang: A semantic publish/subscribe system for selective
dissemination of the RSS documents. In Proceedings of the Fifth International Conference on
Grid and Cooperative Computing (GCC'06), pp. 432-439, 2006.
[2] Michael Kassoff, Charles Petrie, Lee-Ming Zen and Michael Genesereth. Semantic Email
Addressing: The Semantic Web Killer App? IEEE Internet Computing 13(1): 48-55 (2009)
168. www.sti-innsbruck.at
A Semantic Publish/Subscribe System for
Selective Dissemination of the RSS Documents
How do we relate to channels and content:
• Our approach, in contrast, matches information items rather than events to
channels rather than users.
• Instead of graph matching, we use predefined weavers for channel
selection
168
The presented system:
• System bases on the Semantic Web RDF and OWL standards and RDF
• Site Summaries (RSS) in order to introduce an efficient publish/subscribe
mechanism that includes an event matching algorithm based on graph
matching.
Source: J. Ma, G. Xu, J. L. Wang, and T. Huang: A semantic publish/subscribe system for
selective dissemination of the RSS documents. In Proceedings of the Fifth International
Conference on Grid and Cooperative Computing (GCC'06), pp. 432-439, 2006.
169. www.sti-innsbruck.at 169
Source: J. Ma, G. Xu, J. L. Wang, and T. Huang: A semantic publish/subscribe system for
selective dissemination of the RSS documents. In Proceedings of the Fifth International
Conference on Grid and Cooperative Computing (GCC'06), pp. 432-439, 2006.
A Semantic Publish/Subscribe System for
Selective Dissemination of the RSS Documents
170. www.sti-innsbruck.at
Semantic Email Addressing: Sending Email to
People, Not Strings
• SEA: Semantic E-Mail Addressing
• Main idea: send emails to concepts that dynamically change over time
• People describe their interests via foaf.
• Senders send emails to interest groups.
170
Source: Michael Kassoff, Charles Petrie, Lee-Ming Zen and Michael Genesereth.
Semantic Email Addressing: The Semantic Web Killer App? IEEE Internet
Computing 13(1): 48-55 (2009)
171. www.sti-innsbruck.at 171
Semantic Email Addressing: Sending Email to
People, Not Strings
Source: Michael Kassoff, Charles Petrie, Lee-Ming Zen and Michael Genesereth.
Semantic Email Addressing: The Semantic Web Killer App? IEEE Internet
Computing 13(1): 48-55 (2009)
172. www.sti-innsbruck.at
Outline
1. Overview
2. Semantic Analysis (= Natural Language Processing)
3. Semantics as a channel (= Semantic vocabularies)
4. Semantic Content Modelling (=Ontologies)
5. Semantic Match Making (= Automatic distribution)
6. Summary
172
174. www.sti-innsbruck.at
Semantic Analysis
174
• Enables computers to interpret natural language
• A lot of different tasks (Topic detection, Named entity recognition, Sentiment
detection, Opinion mining, etc.)
• Key task in analyzing content and automatically taking decisions about it.
175. www.sti-innsbruck.at
Semantic Channels
175
• Distinguish between
– languages (such as SPARQL, RDF, OWL, RDFa, Microformats, etc.)
– vocabularies (such as GoodRelations, Schema.org, FOAF, SIOC, etc.)
– Implementations (such as OWLIM,…)
• Dissemination in the right channels gains potential (see Google’s rich
snippets)
176. www.sti-innsbruck.at
Semantic Content Modelling
176
• Content modelling has to be shaped to individual domains.
• Content modelling only has to be done once for domains.
• Content is therefore reusable.
• A weaver brings channels and content together
177. www.sti-innsbruck.at
Semantic Match Making
177
• Channels and content are matched automatically
• Texts are shortened/enlarged to an amount which fits the
Blogpost/Tweet
• Pictures/Videos/Slides are attached where needed
Matcher
Branch specific concepts
Collect feedback
+
statistics
Web
3.0/Mobile/Other
Web/Blog
Distribute content
Social
Web