Distributional approaches are based on a simple hypothesis: the meaning of a word can be inferred from its usage. The application of that idea to the vector space model makes possible the construction of a WordSpace in which words are represented by mathematical points in a geometric space. Similar words are represented close in this space and the definition of ``word usage'' depends on the definition of the context used to build the space, which can be the whole document, the sentence in which the word occurs, a fixed window of words, or a specific syntactic context. However, in its original formulation WordSpace can take into account only one definition of context at a time. We propose an approach based on vector permutation and Random Indexing to encode several syntactic contexts in a single WordSpace. Moreover, we propose some operations in this space and report the results of an evaluation performed using the GEMS 2011 Shared Evaluation data.
This document discusses FunCoup 2.0, a tool for predicting functional coupling between genes/proteins using genomics data and orthology. FunCoup uses a naïve Bayesian framework to train on four datasets representing different types of functional coupling. It then predicts functional links between genes and assigns them a confidence score. FunCoup predictions have been validated by recovering known cancer pathways and tumor mutation sets. The tool allows researchers to investigate conserved pathways and complexes across species to generate new hypotheses and discoveries.
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
This document proposes an alignment-based pattern representation model for information extraction that uses both lexical sequences and dependency analysis features. It describes previous syntax-dependent models and introduces an approach based on lexical alignment of arguments that considers dependency analysis as an additional feature. An evaluation on a scenario template task shows the proposed model outperforms previous syntax-dependent models in precision, recall, and F-measure.
OTTHO (On the Tip of my THOught) is an information seeking system designed for solving a language game which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. OTTHO implements a knowledge infusion process in order to provide a background knowledge which allows a deeper understanding of the items it deals with. The knowledge infusion process consists of two steps: 1) extracting and modeling relationships between words extracted from several knowledge sources; 2) reasoning on the induced models in order to generate new knowledge. OTTHO extracts knowledge from several sources, such as a dictionary, news, Wikipedia, and various unstructured repositories and creates a memory of linguistic knowledge and world facts. Starting from some external stimuli (e.g. words) depending on the task to be accomplished, the reasoning mechanism allows retrieving some specific pieces of knowledge from the memory created in the previous step. OTTHO has a great potential for more practical applications besides solving a language game. It could be used for implementing an alternative paradigm for associative information retrieval, for computational advertising and recommender systems.
Microstructure prediction in cutting of TitaniumHongtao Ding
The document summarizes research on using a dislocation density-based material model and finite element modeling to predict nanocrystalline microstructure changes during machining of commercially pure titanium. The model captures grain size evolution by simulating the generation, interaction, and annihilation of dislocations. Simulation results for strain, temperature, and grain size during orthogonal cutting matched well with experimental measurements and showed that cutting parameters like rake angle can be optimized to achieve the desired microstructure.
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
The document outlines Pierpaolo Basile's work on word sense disambiguation and intelligent information access. It introduces key concepts like word sense disambiguation, outlines Basile's JIGSAW algorithm for WSD that uses WordNet senses and different strategies for part of speech tags, and discusses applications of WSD in areas like information retrieval, question answering and knowledge acquisition to enhance intelligent information access.
This document discusses FunCoup 2.0, a tool for predicting functional coupling between genes/proteins using genomics data and orthology. FunCoup uses a naïve Bayesian framework to train on four datasets representing different types of functional coupling. It then predicts functional links between genes and assigns them a confidence score. FunCoup predictions have been validated by recovering known cancer pathways and tumor mutation sets. The tool allows researchers to investigate conserved pathways and complexes across species to generate new hypotheses and discoveries.
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
This document proposes an alignment-based pattern representation model for information extraction that uses both lexical sequences and dependency analysis features. It describes previous syntax-dependent models and introduces an approach based on lexical alignment of arguments that considers dependency analysis as an additional feature. An evaluation on a scenario template task shows the proposed model outperforms previous syntax-dependent models in precision, recall, and F-measure.
OTTHO (On the Tip of my THOught) is an information seeking system designed for solving a language game which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. OTTHO implements a knowledge infusion process in order to provide a background knowledge which allows a deeper understanding of the items it deals with. The knowledge infusion process consists of two steps: 1) extracting and modeling relationships between words extracted from several knowledge sources; 2) reasoning on the induced models in order to generate new knowledge. OTTHO extracts knowledge from several sources, such as a dictionary, news, Wikipedia, and various unstructured repositories and creates a memory of linguistic knowledge and world facts. Starting from some external stimuli (e.g. words) depending on the task to be accomplished, the reasoning mechanism allows retrieving some specific pieces of knowledge from the memory created in the previous step. OTTHO has a great potential for more practical applications besides solving a language game. It could be used for implementing an alternative paradigm for associative information retrieval, for computational advertising and recommender systems.
Microstructure prediction in cutting of TitaniumHongtao Ding
The document summarizes research on using a dislocation density-based material model and finite element modeling to predict nanocrystalline microstructure changes during machining of commercially pure titanium. The model captures grain size evolution by simulating the generation, interaction, and annihilation of dislocations. Simulation results for strain, temperature, and grain size during orthogonal cutting matched well with experimental measurements and showed that cutting parameters like rake angle can be optimized to achieve the desired microstructure.
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
The document outlines Pierpaolo Basile's work on word sense disambiguation and intelligent information access. It introduces key concepts like word sense disambiguation, outlines Basile's JIGSAW algorithm for WSD that uses WordNet senses and different strategies for part of speech tags, and discusses applications of WSD in areas like information retrieval, question answering and knowledge acquisition to enhance intelligent information access.
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
In the last few years, the increasing availability of large corpora spanning several time periods has opened new opportunities for the diachronic analysis of language.
This type of analysis can bring to the light not only linguistic phenomena related to the shift of word meanings over time, but it can also be used to study the impact that societal and cultural trends have on this language change.
This paper introduces a new resource for performing the diachronic analysis of named entities built upon Wikipedia page revisions.
This resource enables the analysis over time of changes in the relations between entities (concepts), surface forms (words), and the contexts surrounding entities and surface forms, by analysing the whole history of Wikipedia internal links.
We provide some useful use cases that prove the impact of this resource on diachronic studies and delineate some possible future usage.
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
Oggi durante l'evento "STEM - Open day / Open mind" presso il Colla coworking ho parlato di come l'industria tecnologica ha cancellato le donne dalla storia. Dalla seconda guerra mondiale alla metà degli anni sessanta, le donne specializzate nell'industria informatica superavano gli uomini. Le donne, durante la guerra, gestivano calcolatori, si occupavano della logistica dell'esercito ed eseguivano calcoli balistici. Dopo la fine della guerra, si occupavano della raccolta dei dati e del funzionamento di calcolatori di numerose istituzioni.
Le cose cambiarono negli anni settanta, quando le istituzioni e l'industria si resero conto dell'importanza dei computer. Da questo punto in poi le donne sono state estromesse a vantaggio di uomini pagati meglio e con ruoli di maggior prestigio.
La storia dimostra, nel caso ci fossero ancora dubbi, che le donne hanno le stesse capacità degli uomini nelle discipline STEM e che il condiziomento sociale ha allontanato le donne dalle discipline scientifiche e tecnologiche.
This paper describes the first edition of the “Solving language games” (NLP4FUN) task at the EVALITA 2018 campaign. The task consists in designing an artificial player for “The Guillotine” (La Ghigliottina, in Italian), a challenging language game which demands knowledge covering a broad range of topics. The game consists in finding a word which is semantically correlated with a set of 5 words called clues. Artificial players for that game can take advantage from the availability of open repositories
on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to find the solution.
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
We report the results of an exploratory study aimed at investigating the language of happiness in Italian tweets.
Specifically, we conduct a time-wise analysis of the happiness load of tweets by leveraging a lexicon of happiness extracted from 8.6M tweets. Furthermore, we report the results of a statistical linguistic analysis aimed at extracting the
most frequent concepts associated with the
happy and sad words in our lexicon
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
During the last decade, the surge in available data spanning different epochs has inspired a new analysis of cultural, social, and linguistic phenomena from a temporal perspective.
In this talk, I will describe Temporal Random Indexing (TRI) a method that enables the analysis of the time evolution of the meaning of a word by exploiting large corpora.
TRI is able to build WordSpaces that take into account temporal information. This methodology is exploited for building time series that trace how a word changes its meaning over time. I will report some experiments on the Italian language, and I will show the preliminary results obtained during my visiting at the Turing Institute by analysing the UK internet archive corpus.
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
We propose a Deep Learning architecture for sequence labeling based on a state of the art model that exploits both word- and character-level representations through the combination of bidirectional LSTM, CNN and CRF. We evaluate the proposed method on three Natural Language Processing tasks for Italian: PoS-tagging of tweets, Named Entity Recognition and Super-Sense Tagging. Results show that the system is able to achieve state of the art performance in all the tasks and in some cases overcomes the best systems previously developed for the Italian.
Pierpaolo Basile is the CTO of QuestionCube, an artificial intelligence company that provides question answering and natural language processing technologies. QuestionCube's products include Respondo for question answering over FAQs and analytics dashboards, and Dokumenton for question answering over large documents using AI to learn user preferences. The company was founded in 2012 and has grown from an early startup winner to providing an upcoming cloud platform to power their question answering technologies for both Respondo and Dokumenton.
Diachronic Analysis of the Italian Language exploiting Google NgramPierpaolo Basile
In this paper, we propose several methods for the diachronic analysis of the Italian language.
We build several models by exploiting Temporal Random Indexing and the Google Ngram dataset for the Italian language.
Each proposed method is evaluated on the ability to automatically identify meaning shift over time.
To this end, we introduce a new dataset built by looking at the etymological information reported in some dictionaries.
I dati sono ovunque e oltre l'80% dell'informazione in rete è presente in forma non strutturata. Una fonte di conoscenza di inestimabile valore che potrebbe essere resa fruibile in formato aperto.
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
L'obiettivo del laboratorio è soddisfare la curiosità di tutti i geek e svelare loro i segreti della macchina più affascinante dell'universo. La macchina che ha rivoluzionato i meccanismi con cui l'informazione è processata dando vita a quella che oggi chiamiamo informatica e alla macchina delle meraviglie che chiamiamo computer. Il laboratorio non richiederà particolari conoscenze informatiche o matematiche e sarà accompagnato da esempi pratici e divertenti come la macchina di Turing realizzata interamente con mattoncini LEGO.
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
This presentation describes the participation of the UNIBA team
in the Named Entity rEcognition and Linking (NEEL) Chal-
lenge. We propose a knowledge-based algorithm able to
recognize and link named entities in English tweets. The
approach combines the simple Lesk algorithm with informa-
tion coming from both a distributional semantic model and
usage frequency of Wikipedia concepts. The algorithm per-
forms poorly in the entity recognition, while it achieves good
results in the disambiguation step.
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
This presentation describes two approaches to compositional semantics in distributional semantic spaces.
Both approaches conceive the semantics of complex structures, such as phrases or sentences, as being other than
the sum of its terms.
Syntax is the plus used as a glue to compose words.
The former kind of approach encodes information about syntactic dependencies directly into distributional spaces, the latter exploits compositional operators reflecting the syntactic role of words.
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
This work proposes an approach to the construction of WordSpaces which takes into account temporal information. The proposed method is able to build a geometrical space considering several periods of time. This methodology enables the analysis of the time evolution of the meaning of a word. Exploiting this approach, we build a framework, called Temporal Random Indexing (TRI) that provides all the necessary tools for building WordSpaces and performing such linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics.
http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
The document presents an enhanced Lesk word sense disambiguation algorithm that incorporates a distributional semantic model. The traditional Lesk algorithm has issues like short gloss definitions and string-based context matching. The new approach addresses these by expanding glosses using related meanings, calculating similarity in a word embedding space, and incorporating knowledge of sense distributions from annotated data. It represents word meanings and contexts as vectors in the space, computes overlap as cosine similarity, and combines this with prior sense probabilities.
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
This paper proposes two approaches to compositional
semantics in distributional semantic spaces. Both approaches
conceive the semantics of complex structures, such
as phrases or sentences, as being other than the sum of its
terms. Syntax is the plus used as a glue to compose words. The
former kind of approach encodes information about syntactic
dependencies directly into distributional spaces, the latter exploits
compositional operators reflecting the syntactic role of words.
We present a preliminary evaluation performed on GEMS
2011 “Compositional Semantics” dataset, with the aim of understanding
the effects of these approaches when applied to
simple word pairs of the kind Noun-Noun, Adjective-Noun and
Verb-Noun. Experimental results corroborate our conjecture that
exploiting syntax can lead to improved distributional models and
compositional operators, and suggest new openings for future
uses in real-application scenario.
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
This paper investigates the role of Distributional
Semantic Models (DSMs) in Question Answering (QA), and
specifically in a QA system called QuestionCube. QuestionCube is
a framework for QA that combines several techniques to retrieve
passages containing the exact answers for natural language questions.
It exploits Information Retrieval models to seek candidate
answers and Natural Language Processing algorithms for the
analysis of questions and candidate answers both in English and
Italian. The data source for the answer is an unstructured text
document collection stored in search indices.
In this paper we propose to exploit DSMs in the QuestionCube
framework. In DSMs words are represented as mathematical
points in a geometric space, also known as semantic space. Words
are similar if they are close in that space. Our idea is that
DSMs approaches can help to compute relatedness between users’
questions and candidate answers by exploiting paradigmatic
relations between words. Results of an experimental evaluation
carried out on CLEF2010 QA dataset, prove the effectiveness of
the proposed approach.
1) The document discusses super sense tagging using distributional features from a WordSpace, which represents words as vectors in a geometric space based on their contexts. 2) Two WordSpaces were created using different contexts - Wikipedia pages and categories. 3) The method used SVM classification with lexical, part-of-speech, and distributional features from the WordSpaces. 4) Evaluation showed the distributional features improved recall over the baseline, demonstrating they help address data sparseness in super sense tagging.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
In the last few years, the increasing availability of large corpora spanning several time periods has opened new opportunities for the diachronic analysis of language.
This type of analysis can bring to the light not only linguistic phenomena related to the shift of word meanings over time, but it can also be used to study the impact that societal and cultural trends have on this language change.
This paper introduces a new resource for performing the diachronic analysis of named entities built upon Wikipedia page revisions.
This resource enables the analysis over time of changes in the relations between entities (concepts), surface forms (words), and the contexts surrounding entities and surface forms, by analysing the whole history of Wikipedia internal links.
We provide some useful use cases that prove the impact of this resource on diachronic studies and delineate some possible future usage.
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
Oggi durante l'evento "STEM - Open day / Open mind" presso il Colla coworking ho parlato di come l'industria tecnologica ha cancellato le donne dalla storia. Dalla seconda guerra mondiale alla metà degli anni sessanta, le donne specializzate nell'industria informatica superavano gli uomini. Le donne, durante la guerra, gestivano calcolatori, si occupavano della logistica dell'esercito ed eseguivano calcoli balistici. Dopo la fine della guerra, si occupavano della raccolta dei dati e del funzionamento di calcolatori di numerose istituzioni.
Le cose cambiarono negli anni settanta, quando le istituzioni e l'industria si resero conto dell'importanza dei computer. Da questo punto in poi le donne sono state estromesse a vantaggio di uomini pagati meglio e con ruoli di maggior prestigio.
La storia dimostra, nel caso ci fossero ancora dubbi, che le donne hanno le stesse capacità degli uomini nelle discipline STEM e che il condiziomento sociale ha allontanato le donne dalle discipline scientifiche e tecnologiche.
This paper describes the first edition of the “Solving language games” (NLP4FUN) task at the EVALITA 2018 campaign. The task consists in designing an artificial player for “The Guillotine” (La Ghigliottina, in Italian), a challenging language game which demands knowledge covering a broad range of topics. The game consists in finding a word which is semantically correlated with a set of 5 words called clues. Artificial players for that game can take advantage from the availability of open repositories
on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to find the solution.
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
We report the results of an exploratory study aimed at investigating the language of happiness in Italian tweets.
Specifically, we conduct a time-wise analysis of the happiness load of tweets by leveraging a lexicon of happiness extracted from 8.6M tweets. Furthermore, we report the results of a statistical linguistic analysis aimed at extracting the
most frequent concepts associated with the
happy and sad words in our lexicon
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
During the last decade, the surge in available data spanning different epochs has inspired a new analysis of cultural, social, and linguistic phenomena from a temporal perspective.
In this talk, I will describe Temporal Random Indexing (TRI) a method that enables the analysis of the time evolution of the meaning of a word by exploiting large corpora.
TRI is able to build WordSpaces that take into account temporal information. This methodology is exploited for building time series that trace how a word changes its meaning over time. I will report some experiments on the Italian language, and I will show the preliminary results obtained during my visiting at the Turing Institute by analysing the UK internet archive corpus.
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
We propose a Deep Learning architecture for sequence labeling based on a state of the art model that exploits both word- and character-level representations through the combination of bidirectional LSTM, CNN and CRF. We evaluate the proposed method on three Natural Language Processing tasks for Italian: PoS-tagging of tweets, Named Entity Recognition and Super-Sense Tagging. Results show that the system is able to achieve state of the art performance in all the tasks and in some cases overcomes the best systems previously developed for the Italian.
Pierpaolo Basile is the CTO of QuestionCube, an artificial intelligence company that provides question answering and natural language processing technologies. QuestionCube's products include Respondo for question answering over FAQs and analytics dashboards, and Dokumenton for question answering over large documents using AI to learn user preferences. The company was founded in 2012 and has grown from an early startup winner to providing an upcoming cloud platform to power their question answering technologies for both Respondo and Dokumenton.
Diachronic Analysis of the Italian Language exploiting Google NgramPierpaolo Basile
In this paper, we propose several methods for the diachronic analysis of the Italian language.
We build several models by exploiting Temporal Random Indexing and the Google Ngram dataset for the Italian language.
Each proposed method is evaluated on the ability to automatically identify meaning shift over time.
To this end, we introduce a new dataset built by looking at the etymological information reported in some dictionaries.
I dati sono ovunque e oltre l'80% dell'informazione in rete è presente in forma non strutturata. Una fonte di conoscenza di inestimabile valore che potrebbe essere resa fruibile in formato aperto.
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
L'obiettivo del laboratorio è soddisfare la curiosità di tutti i geek e svelare loro i segreti della macchina più affascinante dell'universo. La macchina che ha rivoluzionato i meccanismi con cui l'informazione è processata dando vita a quella che oggi chiamiamo informatica e alla macchina delle meraviglie che chiamiamo computer. Il laboratorio non richiederà particolari conoscenze informatiche o matematiche e sarà accompagnato da esempi pratici e divertenti come la macchina di Turing realizzata interamente con mattoncini LEGO.
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
This presentation describes the participation of the UNIBA team
in the Named Entity rEcognition and Linking (NEEL) Chal-
lenge. We propose a knowledge-based algorithm able to
recognize and link named entities in English tweets. The
approach combines the simple Lesk algorithm with informa-
tion coming from both a distributional semantic model and
usage frequency of Wikipedia concepts. The algorithm per-
forms poorly in the entity recognition, while it achieves good
results in the disambiguation step.
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
This presentation describes two approaches to compositional semantics in distributional semantic spaces.
Both approaches conceive the semantics of complex structures, such as phrases or sentences, as being other than
the sum of its terms.
Syntax is the plus used as a glue to compose words.
The former kind of approach encodes information about syntactic dependencies directly into distributional spaces, the latter exploits compositional operators reflecting the syntactic role of words.
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
This work proposes an approach to the construction of WordSpaces which takes into account temporal information. The proposed method is able to build a geometrical space considering several periods of time. This methodology enables the analysis of the time evolution of the meaning of a word. Exploiting this approach, we build a framework, called Temporal Random Indexing (TRI) that provides all the necessary tools for building WordSpaces and performing such linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics.
http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
The document presents an enhanced Lesk word sense disambiguation algorithm that incorporates a distributional semantic model. The traditional Lesk algorithm has issues like short gloss definitions and string-based context matching. The new approach addresses these by expanding glosses using related meanings, calculating similarity in a word embedding space, and incorporating knowledge of sense distributions from annotated data. It represents word meanings and contexts as vectors in the space, computes overlap as cosine similarity, and combines this with prior sense probabilities.
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
This paper proposes two approaches to compositional
semantics in distributional semantic spaces. Both approaches
conceive the semantics of complex structures, such
as phrases or sentences, as being other than the sum of its
terms. Syntax is the plus used as a glue to compose words. The
former kind of approach encodes information about syntactic
dependencies directly into distributional spaces, the latter exploits
compositional operators reflecting the syntactic role of words.
We present a preliminary evaluation performed on GEMS
2011 “Compositional Semantics” dataset, with the aim of understanding
the effects of these approaches when applied to
simple word pairs of the kind Noun-Noun, Adjective-Noun and
Verb-Noun. Experimental results corroborate our conjecture that
exploiting syntax can lead to improved distributional models and
compositional operators, and suggest new openings for future
uses in real-application scenario.
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
This paper investigates the role of Distributional
Semantic Models (DSMs) in Question Answering (QA), and
specifically in a QA system called QuestionCube. QuestionCube is
a framework for QA that combines several techniques to retrieve
passages containing the exact answers for natural language questions.
It exploits Information Retrieval models to seek candidate
answers and Natural Language Processing algorithms for the
analysis of questions and candidate answers both in English and
Italian. The data source for the answer is an unstructured text
document collection stored in search indices.
In this paper we propose to exploit DSMs in the QuestionCube
framework. In DSMs words are represented as mathematical
points in a geometric space, also known as semantic space. Words
are similar if they are close in that space. Our idea is that
DSMs approaches can help to compute relatedness between users’
questions and candidate answers by exploiting paradigmatic
relations between words. Results of an experimental evaluation
carried out on CLEF2010 QA dataset, prove the effectiveness of
the proposed approach.
1) The document discusses super sense tagging using distributional features from a WordSpace, which represents words as vectors in a geometric space based on their contexts. 2) Two WordSpaces were created using different contexts - Wikipedia pages and categories. 3) The method used SVM classification with lexical, part-of-speech, and distributional features from the WordSpaces. 4) Evaluation showed the distributional features improved recall over the baseline, demonstrating they help address data sparseness in super sense tagging.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Encoding syntactic dependencies by vector permutation
1. Encoding
syntac-c
dependencies
by
vector
permuta-on
Pierpaolo
Basile,
Annalina
Caputo
and
Giovanni
Semeraro
Department
of
Computer
Science
University
of
Bari
“Aldo
Moro”
(Italy)
GEMS
2011:
GEometrical
Models
of
Natural
Language
Seman-cs
Edinburgh,
Scotland
-‐
July
31st,
2011
2. Mo-va-on
• meaning
is
its
use
• the
meaning
of
a
word
is
determined
by
the
set
of
textual
contexts
in
which
it
appears
• one
defini-on
of
context
at
a
-me
2
3. Building
Blocks
• Random
Indexing
• Dependency
Parser
• Vector
Permuta-on
3
4. Random
Indexing
• assign
a
context
vector
to
each
context
element
(e.g.
document,
passage,
term,
…)
• term
vector
is
the
sum
of
the
context
vectors
in
which
the
term
occurs
– some-mes
the
context
vector
could
be
boosted
by
a
score
(e.g.
term
frequency,
PMI,
…)
4
5. Context
Vector
0
0
0
0
0
0
0
-‐1
0
0
0
0
1
0
0
-‐1
0
1
0
0
0
0
1
0
0
0
0
-‐1
• sparse
• high
dimensional
• ternary
{-‐1,
0,
+1}
• small
number
of
randomly
distributed
non-‐
zero
elements
5
6. Random
Indexing
(formal)
n,k n,m m,k
B =A R k << m
B
preserves
the
distance
between
points
(Johnson-‐Lindenstrauss
lemma)
dr = c ! d
6
7. Dependency
parser
John
eats
a
red
apple.
subject
object
John
eats
apple
modifier
red
7
8. Vector
permuta-on
• using
permuta-on
of
elements
in
random
vector
to
encode
several
contexts
– right
shib
of
n
elements
to
encode
dependents
(permuta-on)
– leb
shib
of
n
elements
to
encode
heads
(inverse
permuta-on)
• choose
a
different
n
for
each
kind
of
dependency
8
9. Method
• assign
a
context
vector
to
each
term
• assign
a
shib
func-on
(Πn)
to
each
kind
of
dependency
• each
term
is
represented
by
a
vector
which
is
– the
sum
of
the
permuted
vectors
of
all
the
dependent
terms
– the
sum
of
the
inverse
permuted
vectors
of
all
the
head
terms
9
10. Example
John
-‐>
(0,
0,
0,
0,
0,
0,
1,
0,
-‐1,
0)
eat
-‐>
(1,
0,
0,
0,
-‐1,
0,
0
,0
,0
,0)
John
eats
a
red
apple
red-‐>
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)
apple
-‐>
(1,
0,
0,
0,
0,
0,
0,
-‐1,
0,
0)
mod-‐>Π3;
obj-‐>Π7
(apple)=Π3(red)+Π-‐7(eat)=…
10
12. Output
R
B
Vector
space
of
random
Vector
space
of
terms
context
vectors
12
13. Query
1/4
• similarity
between
terms
– cosine
similarity
between
terms
vectors
in
B
– terms
are
similar
if
they
occur
in
similar
syntac-c
contexts
13
14. Query
2/4
Words
similar
to
“provide”
offer
0.855
supply
0.819
deliver
0.801
give
0.787
contain
0.784
require
0.782
present
0.778
14
15. Query
3/4
• similarity
between
terms
exploi-ng
dependencies
what
are
the
objects
of
the
word
“provide”?
1. get
the
term
vector
for
“provide”
in
B
2. compute
the
similarity
with
all
permutated
vectors
in
R
using
the
permuta-on
assigned
to
“obj”
rela-on
15
16. Query
4/4
What
are
the
objects
of
the
word
“provide”?
informa-on
0.344
food
0.208
support
0.143
energy
0.143
job
0.142
16
17. Composi-onal
seman-cs
1/2
• words
are
represented
in
isola-on
• represent
complex
structure
(phrase
or
sentence)
is
a
challenge
task
– IR,
QA,
IE,
Text
Entailment,
…
• how
to
combine
words
– tensor
product
of
words
– Clark
and
Pulman
suggest
to
take
into
account
symbolic
features
(syntac-c
dependencies)
17
18. Composi-onal
seman-cs
2/2
man
reads
magazine
(Clark
and
Pulman)
man ! subj ! read ! obj ! magazine
18
19. Similarity
between
structures
man
reads
magazine
woman
browses
newspaper
man ! subj ! read ! obj ! magazine
woman ! subj ! browse ! obj ! newspaper
19
20. …a
bit
of
math
(w1 ! w2 )" (w3 ! w4 ) = (w1 " w3 ) # (w2 " w4 )
man ! woman " read ! browse " magazine ! newspaper
20
21. System
setup
• Implemented
in
JAVA
• Two
corpora
– TASA:
800K
sentences
and
9M
dependencies
– a
por-on
of
ukWaC:
7M
sentences
and
127M
dependencies
– 40,000
most
frequent
words
• Dependency
parser
– MINIPAR
21
22. Evalua-on
• GEMS
2011
Shared
Task
for
composi-onal
seman-cs
– list
of
two
pairs
of
words
combina-on
• rated
by
humans
• 5,833
rates
• encoded
dependencies:
subj,
obj,
mod,
nn
– GOAL:
compare
the
system
performance
against
humans
scores
• Spearman
correla-on
22
25. Conclusion
and
Future
Work
• Conclusion
– encode
syntac-c
dependencies
using
vector
permuta-ons
and
Random
Indexing
– early
arempt
in
seman-c
composi-on
• Future
Work
– deeper
evalua-on
(in
vivo)
– more
formal
study
about
seman-c
composi-on
– tackle
scalability
problem
– try
to
encode
other
kinds
of
context
25