In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
Continuous representations of words and documents, which is recently referred to as Word Embeddings, have recently demonstrated large advancements in many of the Natural language processing tasks.
In this presentation we will provide an introduction to the most common methods of learning these representations. As well as previous methods in building these representations before the recent advances in deep learning, such as dimensionality reduction on the word co-occurrence matrix.
Moreover, we will present the continuous bag of word model (CBOW), one of the most successful models for word embeddings and one of the core models in word2vec, and in brief a glance of many other models of building representations for other tasks such as knowledge base embeddings.
Finally, we will motivate the potential of using such embeddings for many tasks that could be of importance for the group, such as semantic similarity, document clustering and retrieval.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
Continuous representations of words and documents, which is recently referred to as Word Embeddings, have recently demonstrated large advancements in many of the Natural language processing tasks.
In this presentation we will provide an introduction to the most common methods of learning these representations. As well as previous methods in building these representations before the recent advances in deep learning, such as dimensionality reduction on the word co-occurrence matrix.
Moreover, we will present the continuous bag of word model (CBOW), one of the most successful models for word embeddings and one of the core models in word2vec, and in brief a glance of many other models of building representations for other tasks such as knowledge base embeddings.
Finally, we will motivate the potential of using such embeddings for many tasks that could be of importance for the group, such as semantic similarity, document clustering and retrieval.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
Brief introduction on attention mechanism and its application in neural machine translation, especially in transformer, where attention was used to remove RNNs completely from NMT.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
http://flink-forward.org/kb_sessions/deep-learning-with-apache-flink-and-dl4j/
Deep Learning has become very popular over the last few years in areas such as Image Recognition, Fraud Detection, Machine Translation etc. Deep Learning has proved to be very useful in handling unstructured data and extracting value from them. A big challenge with having to build deep learning models was the high cost of training them. With the recent advent of distributed frameworks like Apache Flink, Apache Spark etc.. it’s faster to train Deep Learning models in parallel on modern platform architecture. In this talk, we’ll be showing how to use Apache Flink Streaming with the open source Deep Learning framework, DeepLearning4j to perform large scale deep learning model training. We will show a demo of a Recurrent Neural Net that is trained for language modeling and have it generate text.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
Brief introduction on attention mechanism and its application in neural machine translation, especially in transformer, where attention was used to remove RNNs completely from NMT.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
http://flink-forward.org/kb_sessions/deep-learning-with-apache-flink-and-dl4j/
Deep Learning has become very popular over the last few years in areas such as Image Recognition, Fraud Detection, Machine Translation etc. Deep Learning has proved to be very useful in handling unstructured data and extracting value from them. A big challenge with having to build deep learning models was the high cost of training them. With the recent advent of distributed frameworks like Apache Flink, Apache Spark etc.. it’s faster to train Deep Learning models in parallel on modern platform architecture. In this talk, we’ll be showing how to use Apache Flink Streaming with the open source Deep Learning framework, DeepLearning4j to perform large scale deep learning model training. We will show a demo of a Recurrent Neural Net that is trained for language modeling and have it generate text.
Technological Unemployment and the Robo-EconomyMelanie Swan
Technological Unemployment (jobs outsourced to technology) is coming and the challenge is to steward an orderly and beneficial transition to more intense human-technology collaboration
State of Blockchain 2017: Smartnetworks and the Blockchain EconomyMelanie Swan
Blockchain is a fundamental IT for secure value transfer over networks. For any asset registered in a cryptographic ledger, the whole Internet is a VPN for its confirmation, assurity, and transfer. Blockchain reinvents economics and governance for the digital age. The long-tail structure of digital networks allows personalized economic and governance services. Smartnetworks are a new form of automated global infrastructure for large-scale next-generation projects.
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two reformulations. In this paper, we study the distributed representation of queries learnt by deep neural network models, such as the Convolutional Latent Semantic Model, and show that they can be used to represent query reformulations as vectors. These reformulation vectors exhibit favourable properties such as mapping semantically and syntactically similar query changes closer in the embedding space. Our work is motivated by the success of continuous space language models in capturing relationships between words and their meanings using offset vectors. We demonstrate a way to extend the same intuition to represent query reformulations.
Furthermore, we show that the distributed representations of queries and reformulations are both useful for modelling session context for query prediction tasks, such as for query auto-completion (QAC) ranking. Our empirical study demonstrates that short-term (session) history context features based on these two representations improves the mean reciprocal rank (MRR) for the QAC ranking task by more than 10% over a supervised ranker baseline. Our results also show that by using features based on both these representations together we achieve a better performance, than either of them individually.
Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728
One weekend software hack called "Movie Hack Attack".
Video content is played and is analyzed realtime for sentiment, emotions, and more.
Sentiment shown in chart below, emotions+objects of attention to the left with random picture grabbed from google search, persons/locations/organizations to the right with random picture grabbed from google search.
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
Retrouvez la présentation de notre Meetup du 27 septembre 2017 présenté par notre collaborateur Abdelwahab HEBA : Deep Learning in practice : Speech recognition and beyond
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
(Invited talk at Search Solutions 2015)
A lot of recent work in neural models and “Deep Learning” is focused on learning vector representations for text, image, speech, entities, and other nuggets of information. From word analogies to automatically generating human level descriptions of images, the use of text embeddings has become a key ingredient in many natural language processing (NLP) and information retrieval (IR) tasks.
In this talk, I will present some personal learnings from working on (neural and non-neural) text embeddings for IR, as well as highlight a few key recent insights from the broader academic community. I will talk about the affinity of certain embeddings for certain kinds of tasks, and how the notion of relatedness in an embedding space depends on how the vector representations are trained. The goal of this talk is to encourage everyone to start thinking about text embeddings beyond just as an output of a “black box” machine learning model, and to highlight that the relationships between different embedding spaces are about as interesting as the relationships between items within an embedding space.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
The next phase of Smart Network Convergence could be putting Deep Learning systems on the Internet. Deep Learning and Blockchain Technology might be combined in the smart networks of the future for automated identification (deep learning) and automated transaction (blockchain). Large scale future-class problems might be addressed with Blockchain Deep Learning nets as an advanced computational infrastructure, challenges such as million-member genome banks, energy storage markets, global financial risk assessment, real-time voting, and asteroid mining.
Blockchain Deep Learning nets and Smart Networks more generally are computing networks with intelligence built in such that identification and transfer is performed by the network itself through sophisticated protocols that automatically identify (deep learning), and validate, confirm, and route transactions (blockchain) within the network.
Writing NodeJS applications is an easy task for JavaScript developers. However, getting what is happening under the hood in NodeJS may be intimidating, but understanding it is vital for web developers.
Indeed, when you try to learn NodeJS, most tutorials are about the NodeJS ecosystem like Express, Socket.IO, PassportJS. It is really rare to see some tutorials about the NodeJS runtime itself.
By this meetup, I want to spot the light on some advanced NodeJS topics so as to help developers answering questions an experienced NodeJS developer is expected to answer. Understanding these topics is essential to make you a much more desirable developer. I want to explore several topics including the famous event-loop along with NodeJS Module Patterns and how dependencies actually work in NodeJS.
I hope that this meetup would help you to be more comfortable understanding advanced code written in NodeJS.
As the volume of content continues to grow exponentially helping search engines to understand context and the topical themes within your site is increasingly important. Understanding some of the concepts are covered and also ways to utilise these in your marketing strategy.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...sboisen
Presented May 24, 2007 at the Semantic Technology Conference
This talk describes an effort at Logos Research Systems to build a semantic knowledgebase encompassing general background information about entities and relationships from the Bible (one of the world's most popular collections of information). The scope includes people, places, belief systems, ethnic attributes, social roles, as well as family and other inter-personal relationships, places visited, etc. This Bible Knowledgebase (BK) will be used to support knowledge discovery and visualization in both desktop and web-server configurations for Logos' products. It will also provide an integration framework for Logos' substantial digital library (more than 7000 titles from over 100 different publishers). The project is a good example of what it takes to move a real-world, knowledge-intensive application into a Semantic Web framework.
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
Continuous bag of words (cbow) word2vec word embedding work is that it tends to predict the
probability of a word given a context. A context may be a single word or a group of words. But for
simplicity, I will take a single context word and try to predict a single target word.
The purpose of this question is to be able to create a word embedding for the given data set.
data set text:
In linguistics word embeddings were discussed in the research area of distributional semantics. It
aims to quantify and categorize semantic similarities between linguistic items based on their
distributional properties in large samples of language data. The underlying idea that "a word is
characterized by the company it keeps" was popularized by Firth.
The technique of representing words as vectors has roots in the 1960s with the development of
the vector space model for information retrieval. Reducing the number of dimensions using
singular value decomposition then led to the introduction of latent semantic analysis in the late
1980s.In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language
models" to reduce the high dimensionality of words representations in contexts by "learning a
distributed representation for words". (Bengio et al, 2003). Word embeddings come in two different
styles, one in which words are expressed as vectors of co-occurring words, and another in which
words are expressed as vectors of linguistic contexts in which the words occur; these different
styles are studied in (Lavelli et al, 2004). Roweis and Saul published in Science how to use
"locally linear embedding" (LLE) to discover representations of high dimensional data structures.
The area developed gradually and really took off after 2010, partly because important advances
had been made since then on the quality of vectors and the training speed of the model.
There are many branches and many research groups working on word embeddings. In 2013, a
team at Google led by Tomas Mikolov created word2vec, a word embedding toolkit which can train
vector space models faster than the previous approaches. Most new word embedding techniques
rely on a neural network architecture instead of more traditional n-gram models and unsupervised
learning.
Limitations
One of the main limitations of word embeddings (word vector space models in general) is that
possible meanings of a word are conflated into a single representation (a single vector in the
semantic space). Sense embeddings are a solution to this problem: individual meanings of words
are represented as distinct vectors in the space.
For biological sequences: BioVectors
Word embeddings for n-grams in biological sequences (e.g. DNA, RNA, and Proteins) for
bioinformatics applications have been proposed by Asgari and Mofrad. Named bio-vectors
(BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins
(amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representa.
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
Experimental work done regarding the use of Topic Modeling for the implementation and the improvement of some common tasks of Information Retrieval and Word Sense Disambiguation.
First of all it describes the scenario, the pre-processing pipeline realized and the framework used. After we we face a discussion related to the investigation of some different hyperparameters configurations for the LDA algorithm.
This work continues dealing with the retrieval of relevant documents mainly through two different approaches: inferring the topics distribution of the held out document (or query) and comparing it to retrieve similar collection’s documents or through an approach driven by probabilistic querying. The last part of this work is devoted to the investigation of the word sense disambiguation task.
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.
LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.
Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.
This talk introduces computational social science as a new research discipline, gives a brief introduction to natural language processing and explains how word vector representations are computed and how to use them in Python.
Word vector representations like word2vec encode semantic relationships like gender and "is the capital city of". This makes it easy to find similar words and compare them visually. In this talk, I'm comparing Wikipedia article revisions using word2vec.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
(Slides from my talk at SEA: Search Engines Amsterdam)
Online information access systems, like recommender systems and search, mediate what information gets exposure and thereby influence their consumption at scale. There is a growing body of evidence that information retrieval (IR) algorithms that narrowly focus on maximizing ranking utility of retrieved items may disparately expose items of similar relevance from the collection. Such disparities in exposure outcome raise concerns of algorithmic fairness and bias of moral import, and may contribute to both representational harms—by reinforcing negative stereotypes and perpetuating inequities in representation of women and other historically marginalized peoples—and allocative harms, from disparate exposure to economic opportunities. In this talk, we present a framework of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in retrieval.
What’s next for deep learning for Search?Bhaskar Mitra
In this talk, I will share some of my personal reflections on the progress in the field of neural IR and some of the ongoing and future research directions that I am personally excited about. This talk will be informed by my own research in this area as well as my experience both as a developer/organizer of the MS MARCO benchmark and the TREC Deep Learning Track and as an applied researcher previously working on web scale search systems at Bing. My goal in this talk would be to move the conversation beyond neural reranking models towards a richer and bolder vision of search powered by deep learning.
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
In this talk, I share some of my personal reflections and learnings on benchmark development and community building for making robust scientific progress. This talk is informed by my experience as a developer of the MS MARCO benchmark and as an organizer of the TREC Deep Learning Track. My goal in this talk is to situate the act of releasing a dataset in the context of broader research visions and to draw due attention to considerations of scientific and social outcomes that are invariably salient in the acts of dataset creation and distribution.
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
Emerging machine learning approaches, including deep learning methods, for information retrieval (IR) have recently demonstrated significant improvements in accuracy of relevance estimation at the cost of increasing model complexity and corresponding rise in computational and environmental costs of training and inference. In web search, these costs are further compounded by the necessity to train on large-scale datasets, consume long documents as inputs, and retrieve relevant documents from web-scale collections within milliseconds in response to high volume query traffic. A typical playbook for developing deep learning models for IR involves largely ignoring efficiency concerns during model development and then later scaling these methods by either finding faster approximations of the same models or employing heuristics to reduce the input space over which these models operate. Domain knowledge about the specific IR task and deeper understanding of system design and data structures in whose context these models are deployed can significantly help with not only model simplification but also to inform data-structure specific machine learning model design. Alternatively, predictive machine learning can also be employed specifically to improve efficiency in large scale IR settings. In this talk, I will cover several case studies for both improving efficiency of machine learning models for IR as well as direct application of machine learning to improve retrieval efficiency, and conclude with a brief discussion on potential future directions for efficiency-sensitive benchmarking of machine learning models for IR.
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
Online information access systems, like recommender systems and search, mediate what information gets exposure and thereby influence their consumption at scale. There is a growing body of evidence that information retrieval (IR) algorithms that narrowly focus on maximizing ranking utility of retrieved items may disparately expose items of similar relevance from the collection. Such disparities in exposure outcome raise concerns of algorithmic fairness and bias of moral import, and may contribute to both representational harms—by reinforcing negative stereotypes and perpetuating inequities in representation of women and other historically marginalized peoples—and allocative harms, from disparate exposure to economic opportunities. In this talk, we present a framework of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in retrieval. The development of expected exposure based metrics also opens up new opportunities and challenges for model optimization. We demonstrate how stochastic ranking policies can be optimized towards target expected exposure and highlight the trade-offs that may exist in optimizing for different fairness dimensions.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
The emergence of deep learning based methods for search poses several challenges and opportunities not just for modeling, but also for benchmarking and measuring progress in the field. Some of these challenges are new, while others have evolved from existing challenges in IR benchmarking exacerbated by the scale at which deep learning models operate. Evaluation efforts such as the TREC Deep Learning track and the MS MARCO public leaderboard are intended to encourage research and track our progress, addressing big questions in our field. The goal is not simply to identify which run is "best" but to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This entails a wider conversation in the IR community about what constitutes meaningful progress, how benchmark design can encourage or discourage certain outcomes, and about the validity of our findings. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track--and reflect on the state of the field and the road ahead.
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track. In particular, we study the impact of incorporating: (i) Explicit term matching to complement matching based on learned representations (i.e., the “Duet principle”), (ii) query term independence (i.e., the “QTI assumption”) to scale the model to the full retrieval setting, and (iii) the ORCAS click data as an additional document description field. We find evidence which supports that all three aforementioned strategies can lead to improved retrieval quality.
Lecture slides presented at Northeastern University (December, 2020).
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
The emergence of deep learning-based methods for information retrieval (IR) poses several challenges and opportunities for benchmarking. Some of these are new, while others have evolved from existing challenges in IR exacerbated by the scale at which deep learning models operate. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track, and reflect on the road ahead.
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.
Tutorial presented at ACM SIGIR/SIGKDD Africa Summer School on Machine Learning for Data Mining and Search (AFIRM 2020) conference in Cape Town, South Africa.
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recently, other machine learning approaches—such as adversarial learning and reinforcement learning—have started to find interesting applications in retrieval systems. At Bing, we have been exploring some of these methods in the context of web search. In this talk, I will share couple of our recent work in this area that we presented at SIGIR 2018.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
5. Vector Space Models
Represent an item (e.g., word) as a vector of numbers.
The vector can correspond to documents in which the word occurs.
0 1 0 1 0 0 2 0 1 0 1 0banana
Doc7 Doc9Doc2 Doc4 Doc11
6. Vector Space Models
Represent an item (e.g., word) as a vector of numbers.
The vector can correspond to neighboring word context.
e.g., “yellow banana grows on trees in africa”
0 1 0 1 0 0 2 0 1 0 1 0banana
(grows, +1) (tree, +3)(yellow, -1) (on, +2) (africa, +5)
+1 +3-1 +2 +50 +4
7. Vector Space Models
Represent an item (e.g., word) as a vector of numbers.
The vector can correspond to character trigrams in the word.
0 1 0 1 0 0 2 0 1 0 1 0banana
ana nan#ba na# ban
8. Notions of Relatedness
Comparing two vectors (e.g., using cosine similarity) estimates how
similar the two words are. However, the notion of relatedness
depends on what vector representation you have chosen for the
words.
or
seattle similar to denver?
Because they are both cities.
seattle similar to seahawks?
Because “Seattle Seahawks”.
(Go Seahawks!)
Important note: In previous slides I showed raw counts. They should either be
normalized (e.g., using pointwise-mutual information) or (matrix) factorized. More on
9. Let’s consider the following example…
We have four (tiny) documents,
Document 1 : “seattle seahawks jerseys”
Document 2 : “seattle seahawks highlights”
Document 3 : “denver broncos jerseys”
Document 4 : “denver broncos highlights”
10. If we use document occurrence vectors…
1 1 0 0seattle
Document 1 Document 3
Document 2 Document 4
1 1 0 0seahawks
0 0 1 1denver
0 0 1 1broncos
similar
similar
In the rest of this talk, we refer to this notion of relatedness as Topical similarity.
11. If we use word context vectors…
0 2 0 0 0 1 0 1seattle
(seattle, -1) (denver, -1)
(seahawks, +1) (broncos, +1)
(jerseys, + 1)
(jerseys, + 2)
(highlights, +1)
(highlights, +2)
2 0 0 0 1 0 1 0seahawks
0 0 0 2 0 1 0 1denver
0 0 2 0 1 0 1 0broncos
similar
similar
In the rest of this talk, we refer to this notion of relatedness as Typical (by-type)
12. If we use character trigram vectors…
This notion of relatedness is similar to string edit-distance.
1 1 0 1 0 1 1 1seattle
#se set
sea eat
ett
att
ttl
tle
1 0 1 0 1 0 1 1settle
1
1
le#
similar
13. DIY: Learning Word Types
Take a sentence or query corpus and extract
Word-Context pairs, where Context is the
<neighbouring word, distance> tuple.
Compute (Positive) Pointwise Mutual
Information for every Word-Context pair.
Compute the cosine similarity between the
context score vectors to estimate word
similarity by type.
14. Word Analogy Task
man is to woman as king is to ?
good is to best as smart is to ?
china is to beijing as russia is to ?
Turns out the word-context based vector model we just learnt is
good for such analogy tasks,
[king] – [man] + [woman] ≈ [queen]
Levy, Goldberg, and Israel, Linguistic Regularities in Sparse and Explicit Word Representations, CoNLL. 2014.
15. Embeddings
The vectors we have been
discussing so far are very high-
dimensional (thousands, or even
millions) and sparse.
But there are techniques to learn
lower-dimensional dense vectors
for words using the same
intuitions.
These dense vectors are called
embeddings.
16. Learning Dense Embeddings
Matrix Factorization
Factorize word-context matrix.
E.g.,
LDA (Word-Document),
GloVe (Word-NeighboringWord)
Neural Networks
A neural network with a bottleneck, word and
context as input and output respectively.
E.g.,
Word2vec (Word-NeighboringWord)
Context1 Context1 …. Context
k
Word1
Word2
⁞
Wordn
Deerwester, Dumais, Landauer, Furnas, and Harshman, Indexing by latent semantic analysis, JASIS, 1990.
Pennington, Socher, and Manning, GloVe: Global Vectors for Word Representation, EMNLP, 2014.
Mikolov, Sutskever, Chen, Corrado, and Dean, Distributed representations of words and phrases and their compositionality, NIPS, 2013.
17. Exercise
Both Word2vec and GloVe define context as the neighboring word
only, without considering the distance from the current word.
How does this change the relationship that is learnt by the
embedding space?
18. How do word analogies work?
Visually, the vector {china → beijing}
turns out to be almost parallel to the
vector {russia → moscow}.
But if you aren’t queasy about reading
a lot of equations, read the following
paper…
Arora, et al. RAND-WALK: A Latent Variable
Model Approach to Word Embeddings, 2015.
Mikolov, Sutskever, Chen, Corrado, and Dean, Distributed
representations of words and phrases and their compositionality, NIPS,
2013.
19. Word embeddings for Document Ranking
Traditional IR uses Term matching,
→ # of times the doc says Albuquerque
We can use word embeddings to
compare all-pairs of query-document
terms,
→ # of terms in the doc that relate to
Albuquerque
Passage about Albuquerque
Passage not about Albuquerque
Nalisnick, Mitra, Craswell, and Caruana, Improving Document Ranking with Dual Word Embeddings, in WWW, 2016.
Mitra, Nalisnick, Craswell, and Caruana, A Dual Embedding Space Model for Document Ranking, arXiv:1602.01137, 2016
20. Beyond words…
Deep Semantic Similarity Model (DSSM) trains on multi-word short-text. Like with word
embeddings, you can train them to capture either Typical or Topical relationships.
Huang, Po-Sen, et al., Learning deep structured semantic models for web search using clickthrough data, CIKM, 2013.
Mitra and Craswell, Query Auto-Completion for Rare Prefixes, in CIKM, 2015.
21. What’s next?
Train your own or use a pre-trained embedding
Word2vec
Word2vec trained on queries
GloVe
DSSM
Get your hands dirty and try to build some fun demos!
22. Remember these are exciting times…
Fang et. al., From Captions to Visual Concepts and Back, CVPR, 2015.
Vinyals et. al., A Neural Conversational Model, ICML, 2015.
24. Neu-IR 2016
The SIGIR 2016 Workshop on
Neural Information Retrieval
July 21st, 2016
Pisa, Tuscany, Italy
http://research.microsoft.com/neuir2016
https://twitter.com/neuir2016
(Call for Participation)
W. Bruce Croft
University of Massachusetts
Amherst, US
Jiafeng Guo
Chinese Academy of Sciences
Beijing, China
Maarten de Rijke
University of Amsterdam
Amsterdam, The Netherlands
Bhaskar Mitra
Bing, Microsoft
Cambridge, UK
Nick Craswell
Bing, Microsoft
Bellevue, US
Organizers