1. The document discusses challenges in semantic annotation and summarization including helping users understand semantic annotations, extracting annotations from contexts, and ensuring quality of service for semantics-enabled services.
2. It proposes using one word summaries generated from a knowledge base to help users understand the intended meaning of annotations.
3. Evaluation results show the meaning summarization approach achieves up to 63% precision in determining if two words have the same meaning in a given context.
The document discusses using NOSQL techniques like MapReduce to perform sentiment analysis on blog data by:
1) Accessing blog documents in parallel using MapReduce;
2) Parsing the documents into word lists using natural language processing techniques in MapReduce;
3) Creating histograms of word frequencies to construct feature vectors representing each document.
This document is a resume for Shubhi Jain. It summarizes her contact information, education history including a Master's degree in Computer Science from University at Buffalo and a Bachelor's degree in Electronics and Communication from Rajiv Gandhi Technical University. It also outlines her professional experience as a Software Engineer III at Walmart Labs and previously as a Software Developer at Tata Consultancy Services. It lists some relevant academic projects as well.
Cross Lingual Information Retrieval Using Search Engine and Data MiningIDES Editor
With the explosive growth of international users,
distributed information and the number of linguistic
resources, accessible throughout the World Wide Web,
information retrieval has become crucial for users to find,
retrieve and understand relevant information, in any language
and form. Cross- Language Information Retrieval (CLIR) is a
subfield of Information Retrieval which provides a query in
one language and searches document collections in one or
many languages but it also has a specific meaning of crosslanguage
information retrieval where a document collection
is multilingual. In the present research, we focus on query
translation, disambiguation of multiple translation candidates
and query expansion with various combinations, in order to
improve the effectiveness of retrieval. Extracting, selecting
and adding terms that emphasize query concepts are performed
using expansion techniques such as, pseudo-relevance
feedback, domain-based feedback and thesaurus-based
expansion. A method for information retrieval for a query
expressed in a native language is presented in this paper. It
uses insights from data mining and intelligent search for
formulating the query and parsing the results.
The document summarizes trends in semantic technology and semantic search. It discusses how information technology is increasingly digital and communication between humans and computers is common. It notes that semantics is a missing piece to fully integrate digital content, business processes, devices, and the internet. It then provides an overview of semantic technologies like ontologies, semantic search, and linked data.
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
HSA is a new computing platform architecture being standardized by the HSA Foundation which has as Founding members, AMD, ARM, Imagination, TI, Mediatek, Samsung and Qualcomm. HSA is intended to make the use of heterogeneous programming widespread by making purpose built architectures as easy to program as modern CPUs are. We start off by doing this with the GPU, the most widely deployed companion processor to the CPU and one which especially complements the CPU in low power and performance workloads. This requires some hardware architecture changes, that we have been working on for some time (in particular those that enable user mode scheduling, unified address space, unified shared memory, compute context switching, etc.) and which we have encapsulated into the spec currently under review by the HSA Foundation.
In short, HSA codifies the hardware architecture changes that are needed to enable mainstream programmers to develop heterogeneous application with the same facility that they do CPU only applications by seamlessly integrating the sequential programming capability of the CPU with the parallel compute capability of the GPU. We describe the software stacks that are needed for HSA, the benefits that accrue to both developers as well as end users, and describe our vision of the how HSA will help unify the ecosystems of the smartphone and tablet platforms as well as bring it closer to that of the traditional PC market. We will provide analysis of several examples which arise in applications and present data to validate the performance per watt benefit of HSA.
Floe is a project of the Inclusive Design Research Centre at OCAD University, funded by a grant from The William and Flora Hewlett Foundation. Floe is creating tools and techniques for enabling inclusive, flexible learning for open education.
The document proposes a framework for performing joint inference over multiple tasks like attribute prediction, link prediction, and entity resolution on noisy information networks. The key aspects are:
1) A declarative language like Datalog is used to specify the domain, features, predictions, and iteration over predictions. This allows users to declaratively specify the prediction tasks and how they are combined.
2) A unifying framework is presented where the domain, features, predictions, and applying predictions are specified generically to support the different tasks.
3) An implementation of the framework uses the declarative language to efficiently handle complex prediction functions and arbitrary interleaving of the prediction tasks.
The document discusses using NOSQL techniques like MapReduce to perform sentiment analysis on blog data by:
1) Accessing blog documents in parallel using MapReduce;
2) Parsing the documents into word lists using natural language processing techniques in MapReduce;
3) Creating histograms of word frequencies to construct feature vectors representing each document.
This document is a resume for Shubhi Jain. It summarizes her contact information, education history including a Master's degree in Computer Science from University at Buffalo and a Bachelor's degree in Electronics and Communication from Rajiv Gandhi Technical University. It also outlines her professional experience as a Software Engineer III at Walmart Labs and previously as a Software Developer at Tata Consultancy Services. It lists some relevant academic projects as well.
Cross Lingual Information Retrieval Using Search Engine and Data MiningIDES Editor
With the explosive growth of international users,
distributed information and the number of linguistic
resources, accessible throughout the World Wide Web,
information retrieval has become crucial for users to find,
retrieve and understand relevant information, in any language
and form. Cross- Language Information Retrieval (CLIR) is a
subfield of Information Retrieval which provides a query in
one language and searches document collections in one or
many languages but it also has a specific meaning of crosslanguage
information retrieval where a document collection
is multilingual. In the present research, we focus on query
translation, disambiguation of multiple translation candidates
and query expansion with various combinations, in order to
improve the effectiveness of retrieval. Extracting, selecting
and adding terms that emphasize query concepts are performed
using expansion techniques such as, pseudo-relevance
feedback, domain-based feedback and thesaurus-based
expansion. A method for information retrieval for a query
expressed in a native language is presented in this paper. It
uses insights from data mining and intelligent search for
formulating the query and parsing the results.
The document summarizes trends in semantic technology and semantic search. It discusses how information technology is increasingly digital and communication between humans and computers is common. It notes that semantics is a missing piece to fully integrate digital content, business processes, devices, and the internet. It then provides an overview of semantic technologies like ontologies, semantic search, and linked data.
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
HSA is a new computing platform architecture being standardized by the HSA Foundation which has as Founding members, AMD, ARM, Imagination, TI, Mediatek, Samsung and Qualcomm. HSA is intended to make the use of heterogeneous programming widespread by making purpose built architectures as easy to program as modern CPUs are. We start off by doing this with the GPU, the most widely deployed companion processor to the CPU and one which especially complements the CPU in low power and performance workloads. This requires some hardware architecture changes, that we have been working on for some time (in particular those that enable user mode scheduling, unified address space, unified shared memory, compute context switching, etc.) and which we have encapsulated into the spec currently under review by the HSA Foundation.
In short, HSA codifies the hardware architecture changes that are needed to enable mainstream programmers to develop heterogeneous application with the same facility that they do CPU only applications by seamlessly integrating the sequential programming capability of the CPU with the parallel compute capability of the GPU. We describe the software stacks that are needed for HSA, the benefits that accrue to both developers as well as end users, and describe our vision of the how HSA will help unify the ecosystems of the smartphone and tablet platforms as well as bring it closer to that of the traditional PC market. We will provide analysis of several examples which arise in applications and present data to validate the performance per watt benefit of HSA.
Floe is a project of the Inclusive Design Research Centre at OCAD University, funded by a grant from The William and Flora Hewlett Foundation. Floe is creating tools and techniques for enabling inclusive, flexible learning for open education.
The document proposes a framework for performing joint inference over multiple tasks like attribute prediction, link prediction, and entity resolution on noisy information networks. The key aspects are:
1) A declarative language like Datalog is used to specify the domain, features, predictions, and iteration over predictions. This allows users to declaratively specify the prediction tasks and how they are combined.
2) A unifying framework is presented where the domain, features, predictions, and applying predictions are specified generically to support the different tasks.
3) An implementation of the framework uses the declarative language to efficiently handle complex prediction functions and arbitrary interleaving of the prediction tasks.
This document is an excerpt from a catalog for women's fastpitch softball cleats and accessories from the company RINGOR. It summarizes that RINGOR is solely focused on fastpitch softball and provides details on several of their metal and non-metal cleat models as well as other gear like bats bags, clothing, and accessories. The cleat models highlighted include the Diamond Gem spike and spike PTT, Diamond Star spike and spike PTT, and Diamond Star spike mid and spike mid PTT.
The document outlines the dissemination and exploitation plans for the INSEMTIVES project. It describes the project website, blog, publications, events, workshops, and collaboration activities to disseminate results to target communities. It also discusses exploitation of results through spin-off companies, releasing open source technology, and partner companies applying methods in document management systems, content platforms, and search engines to involve users in semantic annotation.
WP8 Okenterprise Use Case - Applying Insemtives to Corporate PortalsINSEMTIVES project
This document discusses adding semantic annotations to an existing corporate portal to improve knowledge management. It describes challenges with current information overload and proposes using semantics to better organize content. A mockup will be developed to demonstrate personalized recommendations and contextual links. Requirements will be refined by involving stakeholders and aligning with the technical platform. The goal is to integrate these semantic annotation capabilities directly into the portal to benefit all employees.
The document discusses using crowdsourcing to annotate dynamic web content on the seekda web services portal. It describes setting up participatory design workshops with users to prototype and design an online dashboard for crowdsourcing annotations. The first workshop cycles involved users voting on features and providing feedback over 6 weeks. Results showed many user suggestions were implemented, improving the portal. Later challenges involved using Mechanical Turk for initial annotations, a mashups challenge, and a long-term points-based competition to motivate long-term user contributions.
The document summarizes Semantic Games, a project that uses online games to crowdsource semantic tasks like ontology matching and image annotation. It describes several games developed as part of the project, including SpotTheLink (ontology matching between DBPedia and Proton concepts), SEAFISH (image annotation using DBPedia images), and Tubelink (annotating YouTube videos with Linked Data concepts). It discusses lessons learned from the games as well as a gaming API developed to support similar semantic games. Evaluation of the games found high consensus rates and valid alignments from SpotTheLink and positive user feedback for SEAFISH.
This document discusses combining human and computational intelligence to support the generation and consumption of semantic annotations. It addresses four main problems: 1) helping users understand semantic annotations, 2) extracting annotations from user resources, 3) quality of service for semantics-enabled services, and 4) semi-automatic annotation of existing annotations. The document proposes solutions such as meaning summarization algorithms, developing a "semantic folksonomy" evaluation platform, and studying the effects of semantics on social tagging systems. It discusses developing and evaluating a knowledge base enrichment algorithm and building a gold standard dataset for evaluation.
The document discusses creating and using ontologies. It defines an ontology as a representation of things in a domain, their characteristics and relationships. Ontologies are used to share a common understanding of a domain among people and machines. They make domain assumptions and knowledge explicit and separate domain knowledge from operational knowledge. The document provides an overview of the ontology development process including requirements analysis, conceptualization, and implementation. It discusses finding existing ontologies and provides examples of competency questions for requirements analysis.
The Semantic Travel Concierge - a vision of the potential of semantic technologies for the travel industry. Deborah L. McGuinness Keynote at the Opentravel Alliance Advisory Forum - Miami, Fla, April 11, 2012.
Integrating digital traces into a semantic enriched dataDhaval Thakker
The document discusses integrating digital traces from social media into a semantic-enriched data cloud for informal learning. It outlines a processing pipeline that collects digital traces, semantically augments them using ontologies, and allows browsing and interaction through a semantic query service. An exploratory study on job interviews found that authentic examples from digital traces were useful learning stimuli but could be mistaken as norms without context. Semantic technologies provide opportunities to organize digital traces for informal learning but further work is needed to fully realize this potential.
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverChris McNulty
The document discusses enterprise content management (ECM) features in SharePoint 2010. It provides an overview of managed metadata and how it can be used to classify and organize content. Key ECM capabilities covered include versioning and approvals, content routing with drop-off libraries and content organizers, records management both in-place and with records centers, document holds, disposition and information lifecycle management. The presenter emphasizes how these tools help users effectively aggregate, organize and manage large volumes of content and documents.
This document discusses semantic web technologies and how they can be used to integrate information from multiple sources. It describes how RDF can be used to represent data and metadata from different applications in a common format. By merging RDF representations from different sources, it becomes possible to query across the sources as if the data came from a single source and discover new relationships not evident from any one source alone. Ontologies and semantic models are used to formally represent domain knowledge to enable reasoning over the integrated data.
Grant Ingersoll discussed using open source projects like Lucene for building an open search lab (OSL). Lucene is part of a large ecosystem of open source projects including Solr, Hadoop, Mahout, and others. It provides functionality for indexing, searching, and analyzing large amounts of data. The OSL could use a service-oriented architecture with Lucene and related projects to build a distributed, scalable system for content acquisition, storage, search and machine learning. Lucene is well-suited for information retrieval and data structure research.
This document describes a tutorial on using semantic metadata with Grid services. The tutorial will cover:
1. Setting up a Globus container and deploying various semantic services and an operation provider for sticky notes.
2. Attaching RDF semantic bindings to sticky notes to represent their metadata.
3. Querying the semantic bindings of sticky notes using SPARQL or other query languages. The queries can exploit relationships defined in an ontology.
The hands-on exercises will guide participants in building a semantically-aware Grid service by completing the various setup and configuration steps. Participants will learn how to attach, query, and infer over semantic metadata for Grid resources.
This document describes a tutorial on using semantic metadata with Grid services. The tutorial will cover:
1. Setting up a Globus container and deploying various semantic services and operation providers to enable semantic capabilities for Grid resources like sticky notes.
2. Attaching RDF metadata to sticky note resources using semantic bindings.
3. Querying the semantic bindings of resources using SPARQL or other query languages and making inferences over the metadata by using an ontology.
The hands-on exercises will guide participants in deploying the necessary software components, adding semantic description and querying capabilities to a sticky note service, and executing queries that leverage an ontology to infer additional information from the semantic metadata.
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceTed Dunning
This document discusses how search and big data technologies are evolving to enable reflected intelligence capabilities. It provides backgrounds of Ted Dunning from MapR and Ivan Provalov from LucidWorks. The document outlines various use cases that combine search, analytics and discovery on big data to gain insights from user interactions. It argues that the combination of MapR's data platform and LucidWorks' search technologies provides an integrated solution for building next generation search and discovery applications.
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
The Mobile Web is evolving fast and mobile access to the Web of Data is gaining momentum. Interlinked RDF resources consumed from portable devices need proper adaptation to the context in which the action is performed. This paper introduces PRISSMA (Presentation of Resources for Interoperable Semantic and Shareable Mobile Adapt- ability), a domain-independent vocabulary for displaying Web of Data resources in mobile environments. The vocabulary is the first step to- wards a declarative framework aimed at sharing and re-using presenta- tion information for context-adaptable user interfaces over RDF data.
The namespace vocabulary can be found at http://ns.inria.fr/prissma
This document summarizes a lightning talk about developing a corpus interface called Word Tree. It discusses problems with existing small corpora being difficult to extract data from and available tools being designed for specialists. It proposes extending an existing word tree visualization tool to allow interactive filtering, comparisons, tagging, saving, and distributing states using a larger corpus to address these issues. It outlines a visualisation workflow and testing the interface on a corpus to gather feedback.
This document is an excerpt from a catalog for women's fastpitch softball cleats and accessories from the company RINGOR. It summarizes that RINGOR is solely focused on fastpitch softball and provides details on several of their metal and non-metal cleat models as well as other gear like bats bags, clothing, and accessories. The cleat models highlighted include the Diamond Gem spike and spike PTT, Diamond Star spike and spike PTT, and Diamond Star spike mid and spike mid PTT.
The document outlines the dissemination and exploitation plans for the INSEMTIVES project. It describes the project website, blog, publications, events, workshops, and collaboration activities to disseminate results to target communities. It also discusses exploitation of results through spin-off companies, releasing open source technology, and partner companies applying methods in document management systems, content platforms, and search engines to involve users in semantic annotation.
WP8 Okenterprise Use Case - Applying Insemtives to Corporate PortalsINSEMTIVES project
This document discusses adding semantic annotations to an existing corporate portal to improve knowledge management. It describes challenges with current information overload and proposes using semantics to better organize content. A mockup will be developed to demonstrate personalized recommendations and contextual links. Requirements will be refined by involving stakeholders and aligning with the technical platform. The goal is to integrate these semantic annotation capabilities directly into the portal to benefit all employees.
The document discusses using crowdsourcing to annotate dynamic web content on the seekda web services portal. It describes setting up participatory design workshops with users to prototype and design an online dashboard for crowdsourcing annotations. The first workshop cycles involved users voting on features and providing feedback over 6 weeks. Results showed many user suggestions were implemented, improving the portal. Later challenges involved using Mechanical Turk for initial annotations, a mashups challenge, and a long-term points-based competition to motivate long-term user contributions.
The document summarizes Semantic Games, a project that uses online games to crowdsource semantic tasks like ontology matching and image annotation. It describes several games developed as part of the project, including SpotTheLink (ontology matching between DBPedia and Proton concepts), SEAFISH (image annotation using DBPedia images), and Tubelink (annotating YouTube videos with Linked Data concepts). It discusses lessons learned from the games as well as a gaming API developed to support similar semantic games. Evaluation of the games found high consensus rates and valid alignments from SpotTheLink and positive user feedback for SEAFISH.
This document discusses combining human and computational intelligence to support the generation and consumption of semantic annotations. It addresses four main problems: 1) helping users understand semantic annotations, 2) extracting annotations from user resources, 3) quality of service for semantics-enabled services, and 4) semi-automatic annotation of existing annotations. The document proposes solutions such as meaning summarization algorithms, developing a "semantic folksonomy" evaluation platform, and studying the effects of semantics on social tagging systems. It discusses developing and evaluating a knowledge base enrichment algorithm and building a gold standard dataset for evaluation.
The document discusses creating and using ontologies. It defines an ontology as a representation of things in a domain, their characteristics and relationships. Ontologies are used to share a common understanding of a domain among people and machines. They make domain assumptions and knowledge explicit and separate domain knowledge from operational knowledge. The document provides an overview of the ontology development process including requirements analysis, conceptualization, and implementation. It discusses finding existing ontologies and provides examples of competency questions for requirements analysis.
The Semantic Travel Concierge - a vision of the potential of semantic technologies for the travel industry. Deborah L. McGuinness Keynote at the Opentravel Alliance Advisory Forum - Miami, Fla, April 11, 2012.
Integrating digital traces into a semantic enriched dataDhaval Thakker
The document discusses integrating digital traces from social media into a semantic-enriched data cloud for informal learning. It outlines a processing pipeline that collects digital traces, semantically augments them using ontologies, and allows browsing and interaction through a semantic query service. An exploratory study on job interviews found that authentic examples from digital traces were useful learning stimuli but could be mistaken as norms without context. Semantic technologies provide opportunities to organize digital traces for informal learning but further work is needed to fully realize this potential.
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverChris McNulty
The document discusses enterprise content management (ECM) features in SharePoint 2010. It provides an overview of managed metadata and how it can be used to classify and organize content. Key ECM capabilities covered include versioning and approvals, content routing with drop-off libraries and content organizers, records management both in-place and with records centers, document holds, disposition and information lifecycle management. The presenter emphasizes how these tools help users effectively aggregate, organize and manage large volumes of content and documents.
This document discusses semantic web technologies and how they can be used to integrate information from multiple sources. It describes how RDF can be used to represent data and metadata from different applications in a common format. By merging RDF representations from different sources, it becomes possible to query across the sources as if the data came from a single source and discover new relationships not evident from any one source alone. Ontologies and semantic models are used to formally represent domain knowledge to enable reasoning over the integrated data.
Grant Ingersoll discussed using open source projects like Lucene for building an open search lab (OSL). Lucene is part of a large ecosystem of open source projects including Solr, Hadoop, Mahout, and others. It provides functionality for indexing, searching, and analyzing large amounts of data. The OSL could use a service-oriented architecture with Lucene and related projects to build a distributed, scalable system for content acquisition, storage, search and machine learning. Lucene is well-suited for information retrieval and data structure research.
This document describes a tutorial on using semantic metadata with Grid services. The tutorial will cover:
1. Setting up a Globus container and deploying various semantic services and an operation provider for sticky notes.
2. Attaching RDF semantic bindings to sticky notes to represent their metadata.
3. Querying the semantic bindings of sticky notes using SPARQL or other query languages. The queries can exploit relationships defined in an ontology.
The hands-on exercises will guide participants in building a semantically-aware Grid service by completing the various setup and configuration steps. Participants will learn how to attach, query, and infer over semantic metadata for Grid resources.
This document describes a tutorial on using semantic metadata with Grid services. The tutorial will cover:
1. Setting up a Globus container and deploying various semantic services and operation providers to enable semantic capabilities for Grid resources like sticky notes.
2. Attaching RDF metadata to sticky note resources using semantic bindings.
3. Querying the semantic bindings of resources using SPARQL or other query languages and making inferences over the metadata by using an ontology.
The hands-on exercises will guide participants in deploying the necessary software components, adding semantic description and querying capabilities to a sticky note service, and executing queries that leverage an ontology to infer additional information from the semantic metadata.
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceTed Dunning
This document discusses how search and big data technologies are evolving to enable reflected intelligence capabilities. It provides backgrounds of Ted Dunning from MapR and Ivan Provalov from LucidWorks. The document outlines various use cases that combine search, analytics and discovery on big data to gain insights from user interactions. It argues that the combination of MapR's data platform and LucidWorks' search technologies provides an integrated solution for building next generation search and discovery applications.
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
The Mobile Web is evolving fast and mobile access to the Web of Data is gaining momentum. Interlinked RDF resources consumed from portable devices need proper adaptation to the context in which the action is performed. This paper introduces PRISSMA (Presentation of Resources for Interoperable Semantic and Shareable Mobile Adapt- ability), a domain-independent vocabulary for displaying Web of Data resources in mobile environments. The vocabulary is the first step to- wards a declarative framework aimed at sharing and re-using presenta- tion information for context-adaptable user interfaces over RDF data.
The namespace vocabulary can be found at http://ns.inria.fr/prissma
This document summarizes a lightning talk about developing a corpus interface called Word Tree. It discusses problems with existing small corpora being difficult to extract data from and available tools being designed for specialists. It proposes extending an existing word tree visualization tool to allow interactive filtering, comparisons, tagging, saving, and distributing states using a larger corpus to address these issues. It outlines a visualisation workflow and testing the interface on a corpus to gather feedback.
The document discusses using data and analytics in online education. It notes that online learning is increasingly popular as people can learn on their own schedule and from anywhere. However, online education faces challenges in keeping each unique learner engaged. The document proposes addressing this by collecting detailed interaction data and using algorithms to provide personalized recommendations and guidance to students and teachers. It outlines an architecture that would log student and faculty data, process the logs in Hadoop, and power data-driven applications to improve instruction and learning outcomes. Examples discussed include faculty dashboards providing insights and an adaptive math tutor enhancing activities based on student performance data.
1. The document discusses the architecture of Search Computing, which aims to support complex, multi-domain queries over distributed data sources on the web.
2. The architecture includes components for high-level query analysis, mapping queries to sub-queries over individual domains, planning and executing queries, and merging results.
3. It uses an incremental prototyping approach, starting with core query execution functionality and gradually adding capabilities like planning, mapping to domains, and result presentation.
1) The document presents a new ontology-based question answering method using query templates for the dining domain.
2) A dining ontology is developed to represent concepts like cuisine, facilities, meals, and their relationships.
3) Query templates are generated from the dining ontology and stored to enable faster retrieval of answers from the ontology compared to using SPARQL queries. This improves reusability.
MeshLabs is a pure-play developer of text analytics software. Our core product is a hybrid text analytics engine, that combines linguistic (NLP), statistic, and semantic approaches to process large volumes of unstructured and structured content. Built to enterprise performance standards, the engine offers flexible integration capabilities including content connectors and APIs. We are a team of information retrieval professionals who are passionate about solving complex unstructured data processing problems for a variety of industries. Our product is deployed at large enterprises globally. We specialize in developing products using emerging content processing technologies to solve complex customer experience management problems. I can discuss with you specific ideas, best practices, and case studies.
Developing a digital literacy framework in your schoolEduwebinar
Presented by June Wall and hosted by KB Enterprises (Aust) Pty Ltd. Provides information literacy, ICT literacy and critical literacy models and processes for a whole school approach to digital literacy.
This document discusses distributed database systems and distributed query processing. It begins with an introduction that notes the differences between distributed and centralized query processing, including considering the physical data distribution and communication costs during query optimization in distributed systems. The document then provides an overview of its contents, which include discussions of centralized query processing, the basics of distributed query processing, global query optimization, and a summary. It also gives examples of motivations for distributed query processing like low response times, high throughput, and efficient hardware usage.
Similar to UAB 2011- Combining human and computational intelligence (20)
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersINSEMTIVES project
The document discusses incentivizing users to contribute semantic content through gamification. It describes two case studies: (1) motivating employees at Telefonica R&D to annotate knowledge on their intranet portal, and (2) designing a mobile app for restaurant reviews that uses gamification like badges to encourage detailed, semantic annotations from users. Experimental results showed that social incentives like competition and viewing neighbors' performance can be as effective as monetary rewards at motivating contributions. The document advocates using iterative design methods like prototyping and field experiments to develop incentive-compatible semantic applications.
This document summarizes a research study that investigated motivational values in two case studies of knowledge management systems. The researchers conducted interviews and focus groups to understand what motivates user participation and annotation. They found common motivations like reputation, self-development and community, but also differences between the cases. The researchers propose design features to support these motivational values and encourage contribution and annotation.
This document discusses using crowdsourcing to annotate web services for a search engine. It describes crawling web pages to identify APIs, but notes that human confirmation is still needed. An annotation wizard was created for Amazon Mechanical Turk workers to categorize and tag pages. Initial results showed low quality annotations, but limiting tasks and increasing pay improved accuracy rates to about 80%. Crowdsourcing was found to be an effective way to quickly generate high-quality annotations at low cost.
The document discusses incentivizing employees to undertake semantic annotation tasks within an enterprise. It outlines how game mechanics like feedback, clear goals, compelling narratives, and skill-based progression can motivate employees. Specific incentive ideas for semantic annotation include: (1) using reputation systems to highlight experts, (2) holding competitions with prizes to encourage participation, (3) introducing fun elements to routine tasks, and (4) emphasizing that contributions benefit the public good of the organization. The document also provides an example of semantic annotation tools and metrics from Telefonica.
The document provides guidelines for designing incentivized technology and discusses issues related to gamification of semantic tasks. It addresses 10 design guidelines for incentivized apps, including making apps usable, enjoyable, visible, sociable, valuable, and explorable. For each guideline, it outlines specific design considerations and requirements drawn from literature on human-computer interaction and motivation. Examples are given of how design has evolved over time to better meet these guidelines.
The document discusses using citizen science projects like Galaxy Zoo and Moon Zoo to engage children in annotating and classifying astronomical images through games on the Tiny Planets educational website. It describes the Moon Explorer activity, which has children annotate images from the NASA Lunar Reconnaissance Orbiter to estimate crater sizes and compare "boulderiness". It also outlines the development of The Universe Game, an intermediate-level strategy game intended to incorporate annotation of astronomical images as part of gameplay.
This document discusses ways to motivate users to contribute semantic content through incentives and gamification. It provides examples of semantic authoring tasks that can be crowdsourced or turned into games. Guidelines are presented for designing incentive structures and game mechanics to encourage ongoing user engagement with semantic tasks. The goal is to help semantic technologies reach critical mass by involving millions of end users in content creation.
The document discusses using incentives to increase annotations on blogs within the Telefonica company intranet portal. It proposes a two-level incentive system: 1) A competition with prizes to make tasks more fun; and 2) Developing an expert reputation network where user annotations reveal expertise and help match workers to projects. Mechanism design is applied to analyze the principal-agent and public goods dilemmas and adjust incentive rules and parameters to motivate annotations as a public good. The document advocates careful incentive design to solve real user problems and assure project success.
This document describes the Seekda web services portal and its efforts to improve the annotation of web APIs in its index. It notes that the portal currently indexes over 28,500 web services but that many lack annotations and descriptions. It outlines a plan to involve users in helping to validate, annotate and "catch" more web APIs. The plan includes using lightweight annotations, ontologies to aid search, and exploring more formal semantic web service descriptions. It also describes conducting field studies, usability testing and workshops to gather requirements and prototype the participatory annotation system.
- The L!NKS project aims to annotate images on Facebook semi-automatically using bootstrapping methods or manually using an annotation tool. It also allows for image search and challenges with incentives. Annotations can be exported as Linked Open Data.
- It targets communities with common interests to encourage social participation and visibility through Facebook. A limited set of carefully chosen images are used.
- Rewards for participation include wall notifications, rankings with friends, and challenges. Actions include creating challenges, annotating images, uploading images, and searching images.
The document discusses several semantic games including SeaFish, TubeLink, and The Universe Game. SeaFish was evaluated through an online survey where players provided over 14,000 answers over 900 rounds to generate annotations for 3 different data sets. Lessons from SeaFish included the importance of traceability and carefully selecting data sources. TubeLink and The Universe Game are presented as additional semantic games, along with discussion of using games on websites like SemanticGames.org, Kongregate, and Facebook for advertising.
The document discusses L!NKS, a system for incentivizing the annotation of images through gamification and social factors. It allows users to annotate images manually or semi-automatically, search annotated images, and earn rewards visible to their social networks for their annotations. The goal is to motivate communities to annotate images for tasks like semantic web research through challenges, rankings, and notifications.
The document discusses the development of semantic annotation tools within the INSEMTIVES project. The tools include games to motivate annotation, bootstrapping tools to automate annotation, and tools for annotating web services and media. The tools will be supported by a backend platform providing storage, semantic search and navigation. A work plan outlines development of a generic gaming toolkit, human-driven annotation tools, and bootstrapping tools over a 36 month period by various partners. Future work includes improving the platform and developing an "incentive layer" to integrate tools.
INSEMTIVES year 2 - Dissemination and Community BuildingINSEMTIVES project
The document discusses the dissemination and community building efforts of the INSEMTIVES project, including maintaining a website and blog to share current topics, generating press coverage in publications, presenting at various events, publishing conference and journal papers, holding an annual game challenge co-located with semantic web conferences to generate semantic content, expanding the user advisory board with experts, and having an impact within relevant communities. It also outlines plans for the 2011 game challenge and second user advisory board meeting.
This document summarizes work from the INSEMTIVES project on developing models and methods for creating and using lightweight, structured knowledge on the semantic web. It discusses problems with current web annotations like synonymy, polysemy, and specificity gaps. The project aims to address these by developing models to enrich web annotations with semantics and associated services. Key challenges include determining the right level of semantic complexity for users and algorithms for bootstrapping annotations, reaching consensus on vocabularies, and evolving annotations over time.
This document discusses a project called INSEMTIVES that aims to increase user motivation for semantic content creation. The project will analyze semantic content authoring tasks, identify where human input is most valuable, define incentive models, and develop a methodology for semantic content creation that incorporates incentives. The work plan involves tasks analyzing content processes, developing the methodology, and defining incentive models over 36 months. Research methods will include literature reviews, usability testing, interviews and workshops. Findings from case studies on existing tools will also inform the models and guidelines.
This document discusses incentivizing user participation in semantic content authoring. It covers how human contribution is needed for tasks like annotation, ontology evaluation, and alignment. Games and virtual worlds can provide incentives like reputation, competition, and intrinsic motivation. Examples of semantic games include OntoPronto, OntoTube, and Massacre. Virtual worlds like Tiny Planets show how annotation games in virtual environments can reward user contributions. The document concludes that turning semantic tasks into games has potential to generate large amounts of human-produced semantic data, but faces challenges in task design, knowledge resources, and ensuring participation.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
2. Semantic annotation lifecycle
Problem 4: semi-
automatic semantification
free text annotations
of existing annotations
Problem 2:
extract Problem 1: help the
(semantic) user find and
annotations understand the
from contexts meaning of semantic
of user annotations
resource at What if the users could use
publishing semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning
Context
Problem 3: QoS of semantics-enabled
services
4/14/2011 2
3. Index: meaning summarization
Problem 1: help the
user find and
understand the
meaning of semantic
annotations
User
Semantic
search … Reasoning
4/14/2011 3
4. Meaning summarization: why?
• The right meaning of the words being used for
the annotation are in the mind of the people
using them
• E.g.: Java:
– an island in Indonesia south of Borneo; one of the
island
world's most densely populated regions
– a beverage consisting of an infusion of ground coffee
beverage
beans; "he ordered a cup of coffee“
– a simple platform-independent object-oriented
programming language used for writing applets that
programming language
are downloaded from the World Wide Web by a client
and run on the client's machine
• Descriptions are too long for the user to grasp the
meaning immediately – too high barrier to start
generating semantic annotations
4/14/2011 4
5. Meaning summarization: an
example
One word summaries are
generated from the relations
in the knowledge base, sense
definitions, synonyms and
hypernym terms
4/14/2011 5
6. Meaning summarization:
evaluation results
Best precision: 63%
If we talk about java, does the word coffee mean the same as island?
Discriminating power: 76,4%
4/14/2011 6
7. Index: gold standard dataset Problem 4: semi-
automatic semantification
of existing annotations
In order to evaluate the
performance of the
algorithms, a
gold standard dataset is
needed
User
Semantic
search … Reasoning
Problem 3: QoS of semantics-enabled
services?
4/14/2011 7
8. Proposed Approach
Create a gold standard of folksonomy with sense
Tag Tokens Senses
# of annotations 4 296
Unique tags 857
Unique URLs 644
Preprocessing Disambiguation
Unique users 1 194
Annotator Agreement
80% Accuracy 81 %
59% Accuracy
Java – an island in
Indonesia to the south of
javaisland Java island Borneo
Java is land Island – a land mass that is
… surrounded by water
4/14/2011 8
9. A Platform for Gold Standards of
Semantic Annotation Systems
• Manual validation
• RDF export
• Evaluation of
– Preprocessing
– WSD
– BoW Search
– Convergence
• Open source: 7 modules
25K lines of code
http://sourceforge.net/projects/tags2con/ 26% of comments
4/14/2011 9
11. Index: QoS for semantic search
User
Semantic
search … Reasoning
Problem 3: QoS of semantics-enabled
services?
4/14/2011 11
12. Semantic search: why?
• With the free text search, the following problems
may reduce precision and recall:
– synonymy problem: searching for “images” should
return resources annotated with “picture”
– polysemy problem: searching for “java” (island)
should not return resources annotated with “java”
(coffee beverage)
– specificity gap problem: searching for “animals”
should also return resources annotated with “dogs”
• Semantic, meaning-based search can address the
above listed problems
4/14/2011 12
13. Semantics vs Folksonomy
Used to build
javaisland “raw” queries Semantic search:
complete and
correct results
Used to build (the baseline)
java island BoW queries
Used to build
Java(island) island(land) semantic queries
correct and complete
Specificity Gap (SG)
link
query vehicle
submit SG=1 Recall goes
down as the
specificity gap
car increases
User
SG=2
result
resource taxi
annotation
Specificity Gap
4/14/2011 13
14. Index: semantic convergence
Problem 4: semi-
automatic semantification
of existing annotations
User
Semantic
search … Reasoning
4/14/2011 14
15. Semantic convergence: Why?
Cannot
Other decide Other Cannot
1% 6% 3% decide
5% Abbreviation
Abbreviation
2%
5%
Missing
sense
15%
With a WN
sense Missing I don't know
49% sense With a WN 4%
Ajax sense
36%
Mac 71%
Apple
CSS
…
Random:
programming and “General” domains: cooking, travel,
web domain I don't
know education
4/14/2011 3% 15
16. Semantic convergence: proposed
solution
• Find new senses of terms
– Find different senses of the same term (word sense)
– Find synonymous of a term (synonymous sets - synset)
• Place the new synset in the vocabulary is-a hierarchy
• What we improve
– Better use of Machine Learning techniques
– The polysemy issue is not considered in the state of the art
– Missing or “subjective” evaluations in the state of the art
• Evaluation using the Delicious dataset
4/14/2011 16
17. Convergence Evaluation:
Finding Senses
Tag Collocation User Collocation
t2
t2 B2 U1 B1
B1
t1 t1 t3
t3 t4 t5
B4 U2 t5
B4 t4
B3
B3
Random Baseline
Precision: 56% Precision: 42% Precision: 57%
Recall: 73% Recall: 29% Recall: 68%
4/14/2011 17
18. Semantic annotation lifecycle
Problem 4: semi-
automatic semantification
free text annotations
of existing annotations
Problem 2:
extract combining human and computational
Problem 1: help the
(semantic)
user understand the intelligence
annotations
meaning of semantic
from contexts
annotations?
of user
resource at
Conclusions What if the users could use
publishing? semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning
Context
Problem 3: QoS of semantics-enabled
services?
4/14/2011 18
19. Conclusions
• We developed and evaluated a meaning summarization algorithm
• We developed a “semantic folksonomy” evaluation platform
• We studied the effect of semantics on social tagging systems:
– how much semantics can help?
– how much the user needs to be involved?
– How human and computer intelligence can be combined in the
generation and consumption of semantic annotations
• We developed and evaluated a knowledge base enrichment
algorithm
• We built and used a gold standard dataset for evaluating:
– Word Sense Disambiguation
– Tag Preprocessing
– Semantic Search
– Semantic Convergence
4/14/2011 19
21. Publications
• Semantic Disambiguation in Folksonomy: a Case Study
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
Advanced Language Technologies for Digital Libraries, Springer’s
LNCS.
• Semantic Annotation of Images on Flickr
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
ESWC 2011
• A Classification of Semantic Annotation Systems
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
Semantic Web Journal – second review phase
• Sense Induction in Folksonomies
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
IJCAI-LHD 2011 – under review
• Evaluating the Quality of Service in Semantic Annotation Systems
Ilya Zaihrayeu, Pierre Andrews, and Juan Pane;
in preparation
4/14/2011 21
22. WP 2 TIMELINE AND DELIVERABLES
Months
0 6 12 18 24 30 36
D2.1.1: State of the Art
Tasks D2.1.2: Specification of the
and requirements from
model
the use case partners
Task 2.1
Designing UIBK
models
D2.2.2+D2.2.3: Report on linking
D2.4 Report on the
D2.2.1: Report on bootstrapping semantic annotations to external sources
refinement of the proposed
semantic annotations and on reaching and on keeping them up-to-date when
models, methods and
consensus in the use of semantics the underlying semantic model changes
semantic search
Task 2.2
Designing
methods UNITN
Task 2.3 D2.3.1: Requirements for D2.3.2: Specification for
Research on semantics-aware IR methods semantics-aware IR methods
Information
Retrieval (IR)
methods for ONTO D2.5 Report on the state of
semantic the art, proposed suitable
models and methods for
content automatic visual annotation
Task 2.4
Models and
methods for UTC
automatic
visual
annotation
Editor's Notes
Say how it’s different from tagora dataset => we have gold standard preprocessing disambiguation, with agreement between at least two annotators
The first platform for building gold standards for the evaluation of concept-based search algorithms, vocabulary convergence algorithms, etc in folksonomiesThe first gold standard dataset produced and publishedThe first evaluation of a keywords-based search algorithm w.r.t. the gold standard semantic search in a folksonomyTag preprocessing algorithm, WSD algorithm, concept-based search algorithm