Cross-Language Information Retrieval (CLIR) systems extend classic information retrieval mechanisms for allowing users to query across languages, i.e., to retrieve documents written in languages different from the language used for query formulation.
In this paper, we present a CLIR system exploiting multilingual ontologies for enriching documents representation with multilingual semantic information during the indexing phase and for mapping query fragments to concepts during the retrieval phase.
This system has been applied on a domain-specific document collection and the contribution of the ontologies to the CLIR system has been evaluated in conjunction with the use of both Microsoft Bing and Google Translate translation services.
Results demonstrate that the use of domain-specific resources leads to a significant improvement of CLIR system performance.
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Riccardo Albertoni
The development of a Spatial Data Infrastructure (SDI) at
European level is strategic to answer the needs of environmental management requested by the European, national and local policies. Several European projects and initiatives aim to share, integrate and make accessible large amount of environmental data in order to overcome cross-
border/language/cultural barriers. To this purpose, environmental thesauri are used as shared nomenclatures in metadata compilation and information discovery, and they are increasingly made available on the web.
This paper provides a methodological approach for creating a catalogue of the environmental thesauri available on the web and assessing their reusability with respect to domain independent criteria. It highlights critical issues providing some recommendations for improving thesauri reusability.
Developing an arabic plagiarism detection corpuscsandit
A corpus is a collection of documents. It is a valuable resource in linguistics research to
perform statistical analysis and testing hypothesis for different linguistic rules. An annotated
corpus consists of documents or entities annotated with some task related labels such as part of
speech tags, sentiment etc One such task is plagiarism detection that seeks to identify if a given
document is plagiarized or not. This paper describes our efforts to build a plagiarism detection
corpus for Arabic. The corpus consists of about 350 plagiarized – source document pairs and
more than 250 documents where no plagiarism was found. The plagiarized documents consists
of students submitted assignments. For each of the plagiarized documents, the source document
was located from the Web and downloaded for further investigation. We report corpus statistics
including number of documents, number of sentences and number of tokens for each of the
plagiarized and source categories.
Presentation made in the context of the FAO AIMS Webinar titled “Knowledge Organization Systems (KOS): Management of Classification Systems in the case of Organic.Edunet” (http://aims.fao.org/community/blogs/new-webinaraims-knowledge-organization-systems-kos-management-classification-systems)
21/2/2014
Phrase linguistic classification and generalization for improving statistical...Hiroshi Matsumoto
De Gispert, Adrià. "Phrase linguistic classification and generalization for improving statistical machine translation." Proceedings of the ACL Student Research Workshop. Association for Computational Linguistics, 2005.
Collaborative Modeling of Processes and Ontologies with MoKiMauro Dragoni
The objective of this framework is to sustain and encourage the collaboration between different kind of experts for modeling domains and for providing a semantic representation of the it. Examples of experts are the Domain Experts (i.e. those that knows the domain but usually lacks the modelling skills), and the Knowledge Engineers (those that have the skills but have not a clear understanding of the domain). During this talk, I will present the last version of MoKi, the wiki-based tool designed for supporting such a framework and I will show how this tool has been customized and extended in several projects in order to face the different challenges raised by the usage of semantic representations in different domains.
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)Riccardo Albertoni
The development of a Spatial Data Infrastructure (SDI) at
European level is strategic to answer the needs of environmental management requested by the European, national and local policies. Several European projects and initiatives aim to share, integrate and make accessible large amount of environmental data in order to overcome cross-
border/language/cultural barriers. To this purpose, environmental thesauri are used as shared nomenclatures in metadata compilation and information discovery, and they are increasingly made available on the web.
This paper provides a methodological approach for creating a catalogue of the environmental thesauri available on the web and assessing their reusability with respect to domain independent criteria. It highlights critical issues providing some recommendations for improving thesauri reusability.
Developing an arabic plagiarism detection corpuscsandit
A corpus is a collection of documents. It is a valuable resource in linguistics research to
perform statistical analysis and testing hypothesis for different linguistic rules. An annotated
corpus consists of documents or entities annotated with some task related labels such as part of
speech tags, sentiment etc One such task is plagiarism detection that seeks to identify if a given
document is plagiarized or not. This paper describes our efforts to build a plagiarism detection
corpus for Arabic. The corpus consists of about 350 plagiarized – source document pairs and
more than 250 documents where no plagiarism was found. The plagiarized documents consists
of students submitted assignments. For each of the plagiarized documents, the source document
was located from the Web and downloaded for further investigation. We report corpus statistics
including number of documents, number of sentences and number of tokens for each of the
plagiarized and source categories.
Presentation made in the context of the FAO AIMS Webinar titled “Knowledge Organization Systems (KOS): Management of Classification Systems in the case of Organic.Edunet” (http://aims.fao.org/community/blogs/new-webinaraims-knowledge-organization-systems-kos-management-classification-systems)
21/2/2014
Phrase linguistic classification and generalization for improving statistical...Hiroshi Matsumoto
De Gispert, Adrià. "Phrase linguistic classification and generalization for improving statistical machine translation." Proceedings of the ACL Student Research Workshop. Association for Computational Linguistics, 2005.
Collaborative Modeling of Processes and Ontologies with MoKiMauro Dragoni
The objective of this framework is to sustain and encourage the collaboration between different kind of experts for modeling domains and for providing a semantic representation of the it. Examples of experts are the Domain Experts (i.e. those that knows the domain but usually lacks the modelling skills), and the Knowledge Engineers (those that have the skills but have not a clear understanding of the domain). During this talk, I will present the last version of MoKi, the wiki-based tool designed for supporting such a framework and I will show how this tool has been customized and extended in several projects in order to face the different challenges raised by the usage of semantic representations in different domains.
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The objective of this webinar is to provide a brief overview of the Knowledge Organization Systems (KOS) and the tools used for managing them. The presentation will focus on the management of the multilingual Organic.Edunet ontology as a case study. In this context it will present aspects such as the collaborative work, multilinguality needs and update of the concepts using an online KOS management tool (MoKi).
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
As the information access across languages increases, the importance of a system that supports querybased
searching with the presence of multilingual also grows. Gathering the information in different
natural language is the most difficult task, which requires huge resources like database and digital
libraries. Cross language information retrieval (CLIR) enables to search in multilingual document
collections using the native language which can be supported by the different data mining techniques. This
paper deals with various data mining techniques that can be used for solving the problems encountered in
CLIR.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The paper presents a model for developing intelligent query processing in Malayalam. For this the
investigator has selected a domain as time enquiry system in Malayalam language. This work discusses
issues involved in Natural Language Processing. NLQPS is a restricted domain system, deals with the
natural Language Queries on time enquiry for different modes of transportation. The system performs a
shallow syntactic and semantic analysis of the input query. After the knowledge level understanding of the
query, the system triggers a reasoning process to determine the type of query and the result slots that are
required. The investigator tries to extract the hidden intelligent behind a Natural Language Query
submitted by a user.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Wreck a nice beach: adventures in speech recognitionStephen Marquard
Introduction to speech recognition and a description of a project to integrate CMU Sphinx into the Opencast Matterhorn lecture capture system, focusing on language model adaptation using Wikipedia as a corpus.
Keynote given at ISWC 2019 Semantic Management for Healthcare WorkshopMauro Dragoni
Automatically monitoring and supporting healthy lifestyle is a recent research trend, fostered by the availability of low-cost monitoring devices, and it can significantly contribute to the prevention of chronic diseases deriving from incorrect diet and lack of physical activity. In this talk I will present the HORUS.AI platform: an AI-based platform built upon the integration of semantic web technologies and persuasive techniques for motivating people to adopt healthy lifestyle or for supporting them to cope with the self-management of chronic diseases. The platform collects data from users’ devices, explicit users’ inputs, or from the external environment (e.g. facts of the world) and interacts with users by using a goal-based metaphor. Interactive dialogues are used for proposing set of challenges to users that, through a mobile application, are able to provide the required information and to receive contextual motivational messages helping them to achieve the proposed goals. HORUS.AI is constituted by two main layers: the Knowledge and the Dialog-Based Persuasive layers. The Knowledge Layer contains the knowledge bases modeling the specific domains for which users are monitored (e.g. diet), the rules provided by domain-experts, and the RDF-based reasoner that combines the modeled knowledge with the users’ generated data. The results produced by reasoning operations are coded into motivational strategies and messages by the Dialog-based Persuasive Layer. The Dialog-based Persuasive Layer creates and manages dialogues and generates motivational messages based on the information provided by the Knowledge Layer and learned from previous users’ behavior. This way, messages are tailored to specific users. These two layers are supported by an Input/Output Layer exploited for directly communicating with users (i.e. dedicated mobile application or social media channels) by providing summaries of the acquired data, the chat containing the interactions between the users and the system, and graphical items showing the users’ statuses with respect to their goals. HORUS.AI has been validated within the context of different territorial labs and projects and the observed results demonstrated the suitability of HORUS.AI in real-world scenarios.
Translating Ontologies in Real-World SettingsMauro Dragoni
To enable knowledge access across languages, ontologies that are often represented only in English, need to be translated into different languages. The main challenge in translating ontologies is to find the right term with respect to the domain modeled by ontology itself. Machine translation services may help in this task; however, a crucial requirement is to have translations validated by experts before the ontologies are deployed. Real-world applications must implement a support system addressing this task for relieve experts work in validating all translations. In this paper, we present ESSOT, an Expert Supporting System for Ontology Translation. The peculiarity of this system is to exploit semantic information of the concept's context for improving the quality of label translations. The system has been tested both within the Organic.Lingua project by translating the modeled ontology in three languages and on other multilingual ontologies in order to evaluate the effectiveness of the system in other contexts. The results have been compared with the translations provided by the Microsoft Translator API and the improvements demonstrated the viability of the proposed approach.
More Related Content
Similar to Using Semantic and Domain-based Information in CLIR Systems
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The objective of this webinar is to provide a brief overview of the Knowledge Organization Systems (KOS) and the tools used for managing them. The presentation will focus on the management of the multilingual Organic.Edunet ontology as a case study. In this context it will present aspects such as the collaborative work, multilinguality needs and update of the concepts using an online KOS management tool (MoKi).
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
As the information access across languages increases, the importance of a system that supports querybased
searching with the presence of multilingual also grows. Gathering the information in different
natural language is the most difficult task, which requires huge resources like database and digital
libraries. Cross language information retrieval (CLIR) enables to search in multilingual document
collections using the native language which can be supported by the different data mining techniques. This
paper deals with various data mining techniques that can be used for solving the problems encountered in
CLIR.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The paper presents a model for developing intelligent query processing in Malayalam. For this the
investigator has selected a domain as time enquiry system in Malayalam language. This work discusses
issues involved in Natural Language Processing. NLQPS is a restricted domain system, deals with the
natural Language Queries on time enquiry for different modes of transportation. The system performs a
shallow syntactic and semantic analysis of the input query. After the knowledge level understanding of the
query, the system triggers a reasoning process to determine the type of query and the result slots that are
required. The investigator tries to extract the hidden intelligent behind a Natural Language Query
submitted by a user.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Wreck a nice beach: adventures in speech recognitionStephen Marquard
Introduction to speech recognition and a description of a project to integrate CMU Sphinx into the Opencast Matterhorn lecture capture system, focusing on language model adaptation using Wikipedia as a corpus.
Keynote given at ISWC 2019 Semantic Management for Healthcare WorkshopMauro Dragoni
Automatically monitoring and supporting healthy lifestyle is a recent research trend, fostered by the availability of low-cost monitoring devices, and it can significantly contribute to the prevention of chronic diseases deriving from incorrect diet and lack of physical activity. In this talk I will present the HORUS.AI platform: an AI-based platform built upon the integration of semantic web technologies and persuasive techniques for motivating people to adopt healthy lifestyle or for supporting them to cope with the self-management of chronic diseases. The platform collects data from users’ devices, explicit users’ inputs, or from the external environment (e.g. facts of the world) and interacts with users by using a goal-based metaphor. Interactive dialogues are used for proposing set of challenges to users that, through a mobile application, are able to provide the required information and to receive contextual motivational messages helping them to achieve the proposed goals. HORUS.AI is constituted by two main layers: the Knowledge and the Dialog-Based Persuasive layers. The Knowledge Layer contains the knowledge bases modeling the specific domains for which users are monitored (e.g. diet), the rules provided by domain-experts, and the RDF-based reasoner that combines the modeled knowledge with the users’ generated data. The results produced by reasoning operations are coded into motivational strategies and messages by the Dialog-based Persuasive Layer. The Dialog-based Persuasive Layer creates and manages dialogues and generates motivational messages based on the information provided by the Knowledge Layer and learned from previous users’ behavior. This way, messages are tailored to specific users. These two layers are supported by an Input/Output Layer exploited for directly communicating with users (i.e. dedicated mobile application or social media channels) by providing summaries of the acquired data, the chat containing the interactions between the users and the system, and graphical items showing the users’ statuses with respect to their goals. HORUS.AI has been validated within the context of different territorial labs and projects and the observed results demonstrated the suitability of HORUS.AI in real-world scenarios.
Translating Ontologies in Real-World SettingsMauro Dragoni
To enable knowledge access across languages, ontologies that are often represented only in English, need to be translated into different languages. The main challenge in translating ontologies is to find the right term with respect to the domain modeled by ontology itself. Machine translation services may help in this task; however, a crucial requirement is to have translations validated by experts before the ontologies are deployed. Real-world applications must implement a support system addressing this task for relieve experts work in validating all translations. In this paper, we present ESSOT, an Expert Supporting System for Ontology Translation. The peculiarity of this system is to exploit semantic information of the concept's context for improving the quality of label translations. The system has been tested both within the Organic.Lingua project by translating the modeled ontology in three languages and on other multilingual ontologies in order to evaluate the effectiveness of the system in other contexts. The results have been compared with the translations provided by the Microsoft Translator API and the improvements demonstrated the viability of the proposed approach.
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
The presentation provides an overview of what an ontology is and how it can be used for representing information and for retrieving data with a particular focus on the linguistic resources available for supporting this kind of task. Overview of semantic-based retrieval approaches by highlighting the pro and cons of using semantic approaches with respect to classic ones. Use cases are presented and discussed
Exploiting Multilinguality For Creating Mappings Between ThesauriMauro Dragoni
The definition of mappings between multilingual thesauri is a recent research topic concerning the application of the traditional schema mapping algorithms in conjunction with the use of multilingual resources.
In this paper, we present a multilingual mapping approach aiming at defining matches between terms belonging to
multilingual thesauri. The paper presents the approach as a variant of the schema mapping problem and discusses its evaluation on (i) domain-specific use cases and (ii) on a standard benchmark, namely MultiFarm benchmark, used for measuring the effectiveness of multilingual ontology mapping systems.
The widespread adoption of Information Technology systems and their
capability to trace data about process executions has made available Information
Technology data for the analysis of process executions. Meanwhile, at business
level, static and procedural knowledge, which can be exploited to analyze and rea-
son on data, is often available. In this paper we aim at providing an approach that,
combining static and procedural aspects, business and data levels and exploiting
semantic-based techniques allows business analysts to infer knowledge and use it
to analyze system executions. The proposed solution has been implemented using
current scalable Semantic Web technologies, that offer the possibility to keep the
advantages of semantic-based reasoning with non-trivial quantities of data.
Authoring OWL 2 ontologies with the TEX-OWL syntaxMauro Dragoni
This work describes a new syntax that can be used to write OWL 2 ontologies. The syntax, which is known as TEX-OWL, was developed to address the need for an easy-to-read and easy-to-write plain text syntax. TEX-OWL is
inspired by LaTeX syntax, and covers all construct of OWL 2.
We designed TEX-OWL to be less verbose than the other OWL syntaxes, and easy-to-use especially for quickly developing small-size ontologies with just a text editor.
The important features of the syntax are discussed in this work, and a reference implementation of a Java-based parser and writer is described.
A Fuzzy Approach For Multi-Domain Sentiment AnalysisMauro Dragoni
An emerging field within Sentiment Analysis concerns the investigation about how sentiment polarities towards concepts have to be adapted with respect to the different domains in which they are used. In this paper, we explore the use of fuzzy logic for modeling concept polarities, and the uncertainty associated with them, with respect to different domains. The approach is based on the use of a knowledge graph built by combining two linguistic resources, namely WordNet and SenticNet. Such a knowledge graph is then exploited by a graph-propagation algorithm that propagates sentiment information learned from labeled datasets. The system implementing the proposed approach has been evaluated on the Blitzer dataset by demonstrating its viability in real-world cases.
Multilingual Knowledge Organization Systems Management: Best PracticesMauro Dragoni
This presentation addresses the most well-known challenges in managing multilingual knowledge organization systems.
Such challenges are presented and it is discussed how they have been addressed with the implementation of a collaborative tool called MoKi.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Using Semantic and Domain-based Information in CLIR Systems
1. Using Semantic and Domain-based
Information in CLIR Systems
Alessio Bosca2, Matteo Casu2, Chiara Di Francescomarino1,
Mauro Dragoni1
(1) Fondazione Bruno Kessler (FBK), Shape and Evolve Living Knowledge Unit (SHELL)
(2) CELI s.r.l.
https://shell.fbk.eu/index.php/Mauro_Dragoni - dragoni@fbk.eu
11th Extended Semantic Web Conference 2014 – May, 27th 2014
2. Background – CLIR: 3 Scenarios…
The document collection is monolingual, but users can formulate
queries in more than one language.
The document collection contains documents in multiple
languages and users can query the entire collection in one or
more languages.
The document collection contains documents with mixed-
language content and users can query the entire collection in one
or more languages.
3. Background – … and 2 strategies
Model dependent
Translation and retrieval are integrated in an uniform framework
Model independent
Translation and retrieval are treated as separated processes
4. Background - Challenges
Out-of-Vocabulary issue
improve the corpora used for training the machine translation model.
usage of domain information for increasing the coverage of the
dictionaries.
Usage of semantic artifacts for structuring the representation of
(multilingual) documents.
GOAL
to integrate domain-specific semantic knowledge within a
CLIR system and evaluate their effectiveness
5. Our Scenario
Use case: the agricultural domain
Knowledge resources: Agrovoc and Organic.Lingua ontologies
3 components used in the proposed approach:
Annotator
Indexer
Retriever
7. en
es
it
de
fr
….
Document content is used as query.
Between the candidate results, only “exact matches” are
considered.
Annotation Process – Step 2
9. Approach - Index
Given a document:
Text and annotations are extracted.
The context of each concept is retrieved from the ontologies.
Each contextual concepts are indexed with a weight proportional
w.r.t. their semantic distance from the semantic annotation.
Structure of each index record:
10. Approach - Retriever
Three retrieval configurations available:
Only translations: query terms are translated by using machine
translation services.
Semantic expansion by exploiting the domain ontology: query terms
are matched with ontology concepts; if an exact match exists, query
is expanded by using the URI of the concept and the URIs of the
contextual ones.
Ontology matching only: terms not having an exact match with
ontology concepts are discarded.
11. Evaluation - Setup
Collection of 13,000 multilingual documents.
48 queries originally provided in English and manually translated
in 12 languages under the supervision of both domain and
language experts.
Gold standard manually built by the domain experts.
MAP, Prec@5, Prec@10, Prec@20, Recall have been used.
14. Conclusions
The use of domain-specific ontologies lead to an improvement of CLIR
systems effectiveness.
Find the right trade-off between the effort of manually annotating
documents and the system effectiveness
Future work:
Improve the automatic annotation component
Move to a more complex semantic representation of information in order
to answer to more complex query.
References:
www.organic-edunet.eu: the portal
www.organic-lingua.eu/en/outcomes/deliverables: the data
Model-independent approaches treat translation and retrieval as two separate processes. The queries or the documents are first translated into the corresponding language of the documents or the queries. Monolingual IR models are then applied directly. A typical and also broadly used approach of this type is the machine translation (MT) approach which employs MT systems to translate the queries or documents before the monolingual retrieval process.
Model dependent methods integrate the translation and retrieval processes in a uniform framework. These methods, developed in the context of language models, have the advantage of accounting better for the uncertainty of translation during retrieval.
The main difference is that in the model independent approaches the result of the translation process is taken as it is, while in the model dependent, the uncertainty associated with each translation is considered during the retrieval phase. Therefore, the final rank considers it.
In this work we don’t focus on the study of a new statistical machine translation component to be integrated in CLIR systems, but on how to exploit available multilingual information in the domain-specific semantic knowledge used for enriching document representation and for improving retrieval effectiveness.
Obviously, we will also evaluate such effectiveness.
In the preliminary step, all labels are extracted from the ontologies and they are separately indexed by their languages.
In the second step, the content of each document is used as query over the different indexes that produces a set of ranks for each language present in the document. (non x tutte le lingue di cui ci sono labels).
The language contained in the document may be obtained in two different ways: (1) the textual content is tagged with the language code, or (2) a Language Identifier component is used.
Among the results, each label is searched within the document content, and only exact matches are considered as annotations.
The URIs of each accepted concept are put into the document representation and they are indexed together with the document content.
Here, some statistics about the automatic and manual annotations.
To notice the different size of the ontologies, as well as, the number of the manual annotations.
For each annotation, the “context” of each concept, identified by its parents and children are extracted from the ontology and they are indexed with a weight proportional to their semantic distance from the concept identified by the annotation.
In our case, the weight decreases …
The system implements three different configurations:
Query are translated by using available MT services (Google Translate and Microsoft Bing). The rational is the high coverage of their dictionaries (information about the coverage statistics)
Queries are expanded with URIs coming from the ontology: for each term, the system looks for an exact match in the ontology (in the same way of how the indexing process is done) and, if found, the query is expanded with the concept label. The match with the ontology is done by considering the query terms in their original language.
Queries are transformed in their semantic representation by considering only terms that have a match with the ontology. The goal of this configuration is to verify the sole impact of the ontology on the effectiveness of the system.
Queries have been created started by the analysis of query logs and by selecting them in order to avoid similar queries and for covering as many topics as possible.
Each query has been manually translated in each available language and each translation have been validated by language experts.
(Nel paper c’e’ scritto 8 lingue, e’ un errore oppure c’e’ un’altra motivazione?)
http://googletranslate.blogspot.it/2012/04/breaking-down-language-barriersix-years.html
http://research.microsoft.com/en-us/projects/mt/
- The baseline presents a high average recall => it means that the use of the freely available translation services doesn’t affect negatively the retrieval of relevant documents (OOV terms challenge)
- Agrovoc has a more coverage of the domain, and this is the reason for which we have a better improvement of the system effectiveness
- The use of the manual annotations significantly boosts the system effectiveness
- The use of a double weight doesn’t lead to further improvements of the system
Goal: verify the impact of the ontology on the documents retrieval
Observe the recall to notice that the use of ontologies it self doesn’t allow to retrieve a significant number of relevant documents
Not all queries can be transformed in their semantic representation
In general the performance are quite poor, this means that the overfitting between the document collection, the queries, and the ontologies is limited
Queries have been created started by the analysis of query logs and by selecting them in order to avoid similar queries and for covering as many topics as possible.
Each query has been manually translated in each available language and each translation have been validated by language experts.
(Nel paper c’e’ scritto 8 lingue, e’ un errore oppure c’e’ un’altra motivazione?)
http://googletranslate.blogspot.it/2012/04/breaking-down-language-barriersix-years.html
http://research.microsoft.com/en-us/projects/mt/