Image search interfaces either prompt the searcher to provide a search image (image-to-image search) or a text description of the image (text-to-image search). Image to Image search is generally implemented as a nearest neighbor search in a dense image embedding space, where the embedding is derived from Neural Networks pre-trained on a large image corpus such as ImageNet. Text to image search can be implemented via traditional (TF/IDF or BM25 based) text search against image captions or image tags.
In this presentation, we describe how we fine-tuned the OpenAI CLIP model (available from Hugging Face) to learn a joint image/text embedding representation from naturally occurring image-caption pairs in literature, using contrastive learning. We then show this model in action against a dataset of medical image-caption pairs, using the Vespa search engine to support text based (BM25), vector based (ANN) and hybrid text-to-image and image-to-image search.
Evolving a Medical Image Similarity SearchSujit Pal
Slides for talk at Haystack Conference 2018. Covers evolution of an Image Similarity Search Proof of Concept built to identify similar medical images. Discusses various image vectorizing techniques that were considered in order to convert images into searchable entities, an evaluation strategy to rank these techniques, as well as various indexing strategies to allow searching for similar images at scale.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...JacobSilbiger1
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object Detection Model
By: Nissim Cantor, Avi Radinsky, Jacob Silbiger
Github: https://github.com/ndcantor/tensorflow-street-classifier
Demo: https://www.youtube.com/watch?v=ItXdPJ3okMo
Searching Images: Recent research at SouthamptonJonathon Hare
Knowledge Media Institute seminar series. The Open University. 23rd March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
Searching Images: Recent research at SouthamptonJonathon Hare
Intelligence, Agents, Multimedia Seminar series. University of Southampton. 7th March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will
start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
Evolving a Medical Image Similarity SearchSujit Pal
Slides for talk at Haystack Conference 2018. Covers evolution of an Image Similarity Search Proof of Concept built to identify similar medical images. Discusses various image vectorizing techniques that were considered in order to convert images into searchable entities, an evaluation strategy to rank these techniques, as well as various indexing strategies to allow searching for similar images at scale.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...JacobSilbiger1
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object Detection Model
By: Nissim Cantor, Avi Radinsky, Jacob Silbiger
Github: https://github.com/ndcantor/tensorflow-street-classifier
Demo: https://www.youtube.com/watch?v=ItXdPJ3okMo
Searching Images: Recent research at SouthamptonJonathon Hare
Knowledge Media Institute seminar series. The Open University. 23rd March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
Searching Images: Recent research at SouthamptonJonathon Hare
Intelligence, Agents, Multimedia Seminar series. University of Southampton. 7th March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will
start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
Searching Images: Recent research at SouthamptonJonathon Hare
Information Retrieval group seminar series. The University of Glasgow. 21st February 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around Glasgow's Terrier software. The talk will also cover some of our recent work on image classification and image search result diversification.
Minor Project Report on Denoising Diffusion Probabilistic Modelsoxigoh238
Denoising Diffusion Probabilistic Model
Contrastive models like CLIP as a key inspiration.
Demonstrates robust image representations capturing both semantics and style.
Project Objectives:
Two-stage model proposed:
Prior generating a CLIP image embedding from a given text.
Decoder generating an image based on these CLIP image embeddings.
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
This is a tutorial on topic modelling techniques - that informs the reader about the basic ingredients of all topic models, and allows them to develop a new model in the end.
NIT Silchar ML Hackathon 2019 Session on Computer Vision with Deep Learning.
Targeted Audience: Pre-requisite: Basic knowledge on Machine Learning and Deep Learning
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
Data is increasing day by day and so is the cost of data storage and handling. However, by understanding the concepts of machine learning one can easily handle the excessive data and can process it in an affordable manner.
The process includes making models by using several kinds of algorithms. If the model is created precisely for certain task, then the organizations have a very wide chance of making use of profitable opportunities and avoiding the risks lurking behind the scenes.
Learn more about:
» Understanding Machine Learning Objectives.
» Data dimensions in Machine Learning.
» Fundamentals of Algorithms and Mapping from Input/Output.
» Parametric and Non-parametric Machine Learning Algorithms.
» Supervised, Unsupervised and Semi-Supervised Learning.
» Estimating Over-fitting and Under-fitting.
» Use Cases.
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSujit Pal
We describe our Dictionary based Named Entity Recognizer and Semantic Matcher that enables us to leverage our Knowledge Graph to provide Concept Search. We also describe our Named Entity Linking based Concept Recommender to support manual curation of our Knowledge Graph.
Youtube URL for talk: https://youtu.be/5UWrS_j8dDg
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
Slides accompanying project submission video for Google AI Hackathon. Describes a LCEL and DSPy based evaluation framework inspired by the RAGAS project.
Accompanying video URL: https://youtu.be/yOIU65chc98
More Related Content
Similar to Learning a Joint Embedding Representation for Image Search using Self-supervised means
Searching Images: Recent research at SouthamptonJonathon Hare
Information Retrieval group seminar series. The University of Glasgow. 21st February 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around Glasgow's Terrier software. The talk will also cover some of our recent work on image classification and image search result diversification.
Minor Project Report on Denoising Diffusion Probabilistic Modelsoxigoh238
Denoising Diffusion Probabilistic Model
Contrastive models like CLIP as a key inspiration.
Demonstrates robust image representations capturing both semantics and style.
Project Objectives:
Two-stage model proposed:
Prior generating a CLIP image embedding from a given text.
Decoder generating an image based on these CLIP image embeddings.
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
This is a tutorial on topic modelling techniques - that informs the reader about the basic ingredients of all topic models, and allows them to develop a new model in the end.
NIT Silchar ML Hackathon 2019 Session on Computer Vision with Deep Learning.
Targeted Audience: Pre-requisite: Basic knowledge on Machine Learning and Deep Learning
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
Data is increasing day by day and so is the cost of data storage and handling. However, by understanding the concepts of machine learning one can easily handle the excessive data and can process it in an affordable manner.
The process includes making models by using several kinds of algorithms. If the model is created precisely for certain task, then the organizations have a very wide chance of making use of profitable opportunities and avoiding the risks lurking behind the scenes.
Learn more about:
» Understanding Machine Learning Objectives.
» Data dimensions in Machine Learning.
» Fundamentals of Algorithms and Mapping from Input/Output.
» Parametric and Non-parametric Machine Learning Algorithms.
» Supervised, Unsupervised and Semi-Supervised Learning.
» Estimating Over-fitting and Under-fitting.
» Use Cases.
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSujit Pal
We describe our Dictionary based Named Entity Recognizer and Semantic Matcher that enables us to leverage our Knowledge Graph to provide Concept Search. We also describe our Named Entity Linking based Concept Recommender to support manual curation of our Knowledge Graph.
Youtube URL for talk: https://youtu.be/5UWrS_j8dDg
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
Slides accompanying project submission video for Google AI Hackathon. Describes a LCEL and DSPy based evaluation framework inspired by the RAGAS project.
Accompanying video URL: https://youtu.be/yOIU65chc98
Building Learning to Rank (LTR) search reranking models using Large Language ...Sujit Pal
Search engineers have many tools to address relevance. Older tools are typically unsupervised (statistical, rule based) and require large investments in manual tuning effort. Newer ones involve training or fine-tuning machine learning models and vector search, which require large investments in labeling documents with their relevance to queries.
Learning to Rank (LTR) models are in the latter category. However, their popularity has traditionally been limited to domains where user data can be harnessed to generate labels that are cheap and plentiful, such as e-commerce sites. In domains where this is not true, labeling often involves human experts, and results in labels that are neither cheap nor plentiful. This effectively becomes a roadblock to adoption of LTR models in these domains, in spite of their effectiveness in general.
Generative Large Language Models (LLMs) with parameters in the 70B+ range have been found to perform well at tasks that require mimicking human preferences. Labeling query-document pairs with relevance judgements for training LTR models is one such task. Using LLMs for this task opens up the possibility of obtaining a potentially unlimited number of query judgment labels, and makes LTR models a viable approach to improving the site’s search relevancy.
In this presentation, we describe work that was done to train and evaluate four LTR based re-rankers against lexical, vector, and heuristic search baselines. The models were a mix of pointwise, pairwise and listwise, and required different strategies to generate labels for them. All four models outperformed the lexical baseline, and one of the four models outperformed the vector search baseline as well. None of the models beat the heuristics baseline, although two came close – however, it is important to note that the heuristics were built up over months of trial and error and required familiarity of the search domain, whereas the LTR models were built in days and required much less familiarity.
The ability to handle long question style queries is often de rigueur for modern search engines. Search giants such as Bing and Google are addressing this by building Large Language Models (LLMs) into their search pipelines. Unfortunately, this approach requires large investments in infrastructure and involves high operational costs. It can also lead to loss of confidence when the LLM hallucinates non-factual answers.
A best practice for designing search pipelines is to make the search layer as cheap and fast as possible, and move heavyweight operations into the indexing layer. With that in mind, we present an approach that combines the use of LLMs during indexing to generate questions from passages, and matching them to incoming questions during search, using either text based or vector based matching. We believe this approach can provide good quality question answering capabilities for search applications and address the cost and confidence issues mentioned above.
Vector search goes far beyond just text, and, in this interactive workshop, you will learn how to use it for multimodal search through an in-depth look at CLIP, a vision and language model, developed by OpenAI. Sujit Pal, technology research director at Elsevier, and Raphael Pisoni, senior computer vision engineer at Partium.io, will walk you through two applications of image search and then have a panel discussion with our staff developer advocate, James, on how to use CLIP for image and text search.
The power of community: training a Transformer Language Model on a shoestringSujit Pal
I recently participated in a community event to train an ALBERT language model for the Bengali language. The event was organized by Neuropark, Hugging Face, and Yandex Research. The training was done collaboratively in a distributed manner using free GPU resources provided by Colab and Kaggle. Volunteers were recruited on Twitter and project coordination happened on Discord. At its peak, there were approximately 50 volunteers from all over the world simultaneously engaged in training the model. The distributed training was done on the Hivemind platform from Yandex Research, and the software to train the model in a data-parallel manner was developed by Hugging Face. In this talk I provide my perspective of the project as a somewhat curious participant. I will describe the Hivemind platform, the training regimen, and the evaluation of the language model on downstream tasks. I will also cover some challenges we encountered that were peculiar to the Bengali language (and Indic languages in general).
Accelerating NLP with Dask and Saturn CloudSujit Pal
Slides for talk delivered at NY NLP Meetup. Abstract -- Python has a great ecosystem of tools for natural language processing (NLP) pipelines, but challenges arise when data sizes and computational complexity grows. Best case, a pipeline is left to run overnight or even over several days. Worst case, certain analyses or computations are just not possible. Dask is a Python-native parallel processing tool that enables Python users to easily scale their code across a cluster of machines. This talk presents an example of an NLP entity extraction pipeline using SciSpacy with Dask for parallelization. This pipeline extracts named entities from the CORD-19 dataset, using trained models from the SciSpaCy project, and makes them available for downstream tasks in the form of structured Parquet files. The pipeline was built and executed on Saturn Cloud, a platform that makes it easy to launch and manage Dask clusters. The talk will present an introduction to Dask and explain how users can easily accelerate Python and NLP code across clusters of machines.
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Sujit Pal
Python has a great ecosystem of tools for natural language processing (NLP) pipelines, but challenges arise when data sizes and computational complexity grows. Best case, a pipeline is left to run overnight or even over several days. Worst case, certain analyses or computations are just not possible. Dask is a Python-native parallel processing tool that enables Python users to easily scale their code across a cluster of machines.
This talk presents an example of an NLP entity extraction pipeline using SciSpacy with Dask for parallelization, which was built and executed on Saturn Cloud. Saturn Cloud is an end-to-end data science and machine learning platform that provides an easy interface for Python environments and Dask clusters, removing many barriers to accessing parallel computing. This pipeline extracts named entities from the CORD-19 dataset, using trained models from the SciSpaCy project, and makes them available for downstream tasks in the form of structured Parquet files. We will provide an introduction to Dask and Saturn Cloud, then walk through the NLP code.
Leslie Smith's Papers discussion for DL Journal ClubSujit Pal
This deck discusses two papers by Dr Leslie Smith. The first paper discusses empirical findings around learning rate (LR) and other regularization parameters for neural networks, and leads to the idea of Cyclic Learning Rates (CLR). The second paper discusses CLR in depth, as well as how to estimate its parameters. The slides also covers LR Finder, a tool first introduced in the Fast.AI library to find optimal parameters for CLR, including how to run it and interpret its outputs.
Using Graph and Transformer Embeddings for Vector Based RetrievalSujit Pal
For the longest time, term-based vector representations based on whole-document statistics, such as TF-IDF, have been the staple of efficient and effective information retrieval. The popularity of Deep Learning over the past decade has resulted in the development of many interesting embedding schemes. Like term-based vector representations, these embeddings depend on structure implicit in language and user behavior. Unlike them, they leverage the distributional hypothesis, which states that the meaning of a word is determined by the context in which it appears. These embeddings have been found to better encode the semantics of the word, compared to term-based representations. Despite this, it has only recently become practical to use embeddings in Information Retrieval at scale.
In this presentation, we will describe how we applied two new embedding schemes to Scopus, Elsevier’s broad coverage database of scientific, technical, and medical literature. Both schemes are based on the distributional hypothesis but come from very different backgrounds. The first embedding is a graph embedding called node2vec, that encodes papers using citation relationships between them as specified by their authors. The second embedding leverages Transformers, a recent innovation in the area of Deep Learning, that are essentially language models trained on large bodies of text. These two embeddings exploit the signal implicit in these data sources and produce semantically rich user and content-based vector representations respectively. We will evaluate these embedding schemes and describe how we used the Vespa search engine to search these embeddings for similar documents within the Scopus dataset. Finally, we will describe how RELX staff can access these embeddings for their own data science needs, independent of the search application.
Transformer Mods for Document Length InputsSujit Pal
The Transformer architecture is responsible for many state of the art results in Natural Language Processing. A central feature behind its superior performance over Recurrent Neural Networks is its multi-headed self-attention mechanism. However, the superior performance comes at a cost, an O(n2) time and memory complexity, where n is the size of the input sequence. Because of this, it is computationally infeasible to feed large documents to the standard transformer. To overcome this limitation, a number of approaches have been proposed, which involve modifying the self-attention mechanism in interesting ways.
In this presentation, I will describe the transformer architecture, and specifically the self-attention mechanism, and then describe some of the approaches proposed to address the O(n2) complexity. Some of these approaches have also been implemented in the HuggingFace transformers library, and I will demonstrate some code for doing document level operations using one of these approaches.
Question Answering as Search - the Anserini Pipeline and Other StoriesSujit Pal
In the last couple of years, we have seen enormous breakthroughs in automated Open Domain Restricted Context Question Answering, also known as Reading Comprehension, where the task is to find an answer to a question from a single document or paragraph. A potentially more useful task is to find an answer for a question from a corpus representing an entire body of knowledge, also known as Open Domain Open Context Question Answering.
To do this, we adapted the BERTSerini architecture (Yang, et al., 2019), using it to answer questions about clinical content from our corpus of 5000+ medical textbooks. The BERTSerini pipeline consists of two components -- a BERT model fine-tuned for Question Answering, and an Anserini (Yang, Fang, and Lin, 2017) IR pipeline for Passage Retrieval. Anserini, in turn, consists of pluggable components for different kinds of query expansion and result reranking. Given a question, Anserini retrieves candidate passages, which the BERT model uses to retrieve the answer from. The best answer is determined using a combination of passage retrieval and answer scores.
Evaluating this system using a locally developed dataset of medical passages, questions, and answers, we adapted the BERT Question Answering component to our content using a combination of fine-tuning with third party SQuAD data, and pre-training the model using our medical content. However, when we replaced the canned passages with passages retrieved using the Anserini pipeline, performance dropped significantly, indicating that the relevance of the retrieved passages was a limiting factor.
The presentation will describe the actions taken to improve the relevance of passages returned by the Anserini pipeline.
Building Named Entity Recognition Models Efficiently using NERDSSujit Pal
Named Entity Recognition (NER) is foundational for many downstream NLP tasks such as Information Retrieval, Relation Extraction, Question Answering, and Knowledge Base Construction. While many high-quality pre-trained NER models exist, they usually cover a small subset of popular entities such as people, organizations, and locations. But what if we need to recognize domain specific entities such as proteins, chemical names, diseases, etc? The Open Source Named Entity Recognition for Data Scientists (NERDS) toolkit, from the Elsevier Data Science team, was built to address this need.
NERDS aims to speed up development and evaluation of NER models by providing a set of NER algorithms that are callable through the familiar scikit-learn style API. The uniform interface allows reuse of code for data ingestion and evaluation, resulting in cleaner and more maintainable NER pipelines. In addition, customizing NERDS by adding new and more advanced NER models is also very easy, just a matter of implementing a standard NER Model class.
Our presentation will describe the main features of NERDS, then walk through a demonstration of developing and evaluating NER models that recognize biomedical entities. We will then describe a Neural Network based NER algorithm (a Bi-LSTM seq2seq model written in Pytorch) that we will then integrate into the NERDS NER pipeline.
We believe NERDS addresses a real need for building domain specific NER models quickly and efficiently. NER is an active field of research, and the hope is that this presentation will spark interest and contributions of new NER algorithms and Data Adapters from the community that can in turn help to move the field forward.
Graph Techniques for Natural Language ProcessingSujit Pal
Natural Language embodies the human ability to make “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). A relatively small number of words can be combined using a grammar in myriad different ways to convey all kinds of information. Languages model inter-relationships between their words, just like graphs model inter-relationships between their vertices. It is not surprising then, that graphs are a natural tool to study Natural Language and glean useful information from it, automatically, and at scale. This presentation will focus on NLP techniques to convert raw text to graphs, and present Graph Theory based solutions to some common NLP problems. Solutions presented will use Apache Spark or Neo4j depending on problem size and scale. Examples of Graph Theory solutions presented include PageRank for Document Summarization, Link Prediction from raw text for Knowledge Graph enhancement, Label Propagation for entity classification, and Random Walk techniques to find similar documents.
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
An introduction to Learning to Rank, with case studies using RankLib with and without plugins provided by Solr and Elasticsearch. RankLib is a library of learning to rank algorithms, which includes some popular LTR algorithms such as LambdaMART, RankBoost, RankNet, etc.
Learning to Rank (LTR) presentation at RELX Search Summit 2018. Contains information about history of LTR, taxonomy of LTR algorithms, popular algorithms, and case studies of applying LTR using the TMDB dataset using Solr, Elasticsearch and without index support.
Search summit-2018-content-engineering-slidesSujit Pal
Slides accompanying content engineering tutorial presented at RELX Search Summit 2018. Contains techniques for keyword extraction using various statistical, rule based and machine learning methods, keyword de-duplication using SimHash and Dedupe, and dimensionality reduction techniques such as Topic Modeling, NMF, Word vectors, etc.
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
Covers the services supported by SoDA v2. Includes some background on Named Entity Recognition and Resolution, popular approaches to Named Entity Recognition, hybrid approaches, scaling SoDA using Spark and Spark streaming, deployment strategies, etc.
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Prosigns: Transforming Business with Tailored Technology Solutions
Learning a Joint Embedding Representation for Image Search using Self-supervised means
1. April 2022
Sujit Pal (ORCID ID: https://orcid.org/0000-0002-6225-110X)
Learning a Joint Embedding
Representation for Image
Search using Self-supervised
means
Haystack Conference 2022
2. Who am I?
• Work for Elsevier Labs
• Technology Research Director
• Areas of interest: Search, NLP, ML
• Interested in improving search through
Machine Learning
5. • Image Search
• Existing Approaches
− Text search on image captions
− Searching on image tags
• Embedding based search
• Unsupervised or self-
supervised
Motivation
The Scream is the popular
name given to a composition
created by
Norwegian Expressionist artist
Edvard Munch in 1893. The
agonised face in the painting
has become one of the most
iconic images of art, seen as
symbolising the anxiety of
the human condition.
6. • Distributional Hypothesis –
words that are used or occur in
similar contexts tend to purport
similar meanings [Wikipedia]
• Word2Vec – neural network learns
weights that project words from a
sparse high dimensional
vocabulary space to a dense low
dimensional (300-D) space.
• Words that have similar meaning
tend to cluster together in this
space.
Embeddings: Origin Story
Image Credits: Explainable Artificial Intelligence (Part II) – Model Interpretation Strategies (Sarkar, 2018)
7. • Same clustering effect is observed
with images as well
• Unsupervised: MNIST images
encoded using Sparse AutoEncoders
and projected to 2D show clustering
behavior.
• Supervised: Images encoded using
pre-trained ImageNet networks and
encodings used as input to
downstream classifiers. Needs
(relatively) small amounts of training
data for task.
Embeddings for Image Search
Image Credits: Deep Learning of Part-based Representation of Data using Sparse Autoencoders with Non-negativity
constraints (Hosseini et al, 2015), Learning a multi-view weighted majority vote classifier (Goyal, 2018)
8. • Image Search == Image Similarity, i.e.,
a ranking problem
• Relevant results must appear above
less-relevant results
• Classification Loss (cross-entropy)
better suited to learning hyperplanes
that separate one class of images
from another.
• Contrastive Loss better suited for
ranking images by similarity
• Pairwise loss – embedding pushes
positive pairs close together and
negative pairs far apart.
The need for Contrastive Learning
Image Credit: Understanding Ranking Loss, Contrastive Loss, Margin Loss, Hinge Loss and all those confusing names (blog post by Raul Gomez)
9. • Uses Contrastive Loss instead of Cross-Entropy
• Given a pair of items x0 and x1 and a label y (1 if x0 and x1 are similar, 0
otherwise), contrastive loss is defined as:
• m is the minimum marginal loss for negative pairs. If the margin is already
high for dissimilar pairs, no additional effort is wasted trying to push them
further apart.
Contrastive Learning
10. • No explicit labels
• Model learns from pairs of similar
or dissimilar items
• Only similar pairs need to be
provided, dissimilar pairs can
generally be inferred
• Pairs can be multi-modal – for
example, text + audio, text +
video, and text + images.
• In our case (images + captions)
Advantages of Contrastive Learning
Image Credit: Improving self-supervised representation learning by synthesizing challenging negatives (Kalantidis et al, 2020)
11. • My inspiration for this work.
• Trained on medical images and
captions (self-supervised)
• Siamese network
• Image Encoder: ResNet50
• Text Encoder: BERT
• Minimizes bi-directional loss
between positive image-text pairs
• ConVIRT image encoders uniformly
did better on downstream tasks
compared to previous approaches.
Previous Work: ConVIRT
Image Credit: Contrastive Learning of Medical Visual Representations from Paired Images and Texts (Zhang et al, 2020)
12. • Part of HuggingFace JAX/Flax
community week event
• Part of team of ”non-work
colleagues” from TWIML
• Meant to train on medical images
but another team was quicker, so
trained on satellite images instead
• Placed third
• That’s how I learned about CLIP
• Blog post has more details
Fine tuning CLIP for a different domain
14. • Contrastive Language Image Pretraining
• Pretrained model, trained in self-supervised
manner on 400M image-caption pairs from the
Internet
• Trained on 256 GPUs for 2 weeks
• Matches original ResNet50 performance on
ImageNet
• Good zero-shot performance on ”natural
images”
• Architecture:
− Text Encoder: Text Transformer
− Image Encoder: Vision Transformer
• Available on HuggingFace
Our Base Model: OpenAI CLIP
Image Credit: Learning Transferable Visual Models from Natural Language Superviseion (Radford et al, 2021)
15. • Fine-tuning == extending the
contrastive pretraining with
additional image + caption pairs
from the target domain
• For each batch of size N:
− N positive pairs
− N2 – N negative pairs
• Trained using Contrastive Objective
against positive pairs and sampling
from in-batch negative pairs.
Fine-tuning CLIP
Image Credit: Learning Transferable Visual Models from Natural Language Superviseion (Radford et al, 2021)
16. • ImageCLEF 2017 Caption
Prediction Task
− 164,614 training images and
captions
− 10,000 validation images and
captions
− 10,000 test images (without
caption)
• Repurposed training and
validation sets for fine-tuning on
medical images + captions.
Dataset
17. • Ignore provided test split for training
• Split training dataset 90:10 into training and validation
• Make the validation split the new test split
• Final data split
− 148,153 training images + captions
− 16,461 validation images + captions
− 10,000 test images + captions
Dataset Preprocessing
18. • Criteria: Mean Reciprocal Rank (MRR) @k
• Baseline: evaluated pre-trained CLIP model OOB (before fine-tuning)
• Correct caption for image returned 43% of the time at first position, 57% of
the time within the first 10 positions.
Evaluation
19. • Best training hyperparameters
− Batch size 64
− Optimizer ADAM
− Learning Rate 5e-5
− Number of epochs 10
− Number of training samples 50,000
• Regularization achieved by sampling random
subset of 50k samples from 148k training
dataset
• No image or text augmentation
• Training loss still falling, but validation loss
flattening out, maybe still some room for
improvement
Model Training
20. • Evaluated for multiple runs against the (new) test set.
• Results show significant improvement over baseline.
Training Results
22. • Encode images into L2-normalized vectors
using fine-tuned model
• Store encoded image vectors in vector store
• At query time, query text or image is encoded
using same fine-tuned model
• Vector store searched for encodings that are
“close” to the query vector
• Images and captions corresponding to the top
K closest vectors returned in query results
Image Search Architecture
23. • Exhaustive Search – match the query vector to every vector in the
database (O(N))
• Approximate Nearest Neighbors (ANN) – tradeoff accuracy for speed.
− Space partitioning – see blog post by Eyal Trabelsi)
o Using Forests (ANNOY)
o Using LSH
o Pre-clustering and matching centroids
o Quantization / Product Quantization
− Hierarchical Navigable Small Worlds Graph (HNSW) – hierarchy of probability
skip lists (see blog post by James Briggs)
Vector Search
24. • We chose Vespa as our vector store
• Hybrid Search platform supports both
text (BM25) and vector (HNSW) search,
allowing easy comparison of legacy and
new behavior
• Our demo application allows:
− Text to image search – using caption text,
caption vector, and hybrid
− Image to image search – using image
vector, and image vector + caption
Demo Application
26. • Ranking Profiles
− caption-search – bm25(caption_text)
− image-search – closeness(field, clip_vector)
− combined-search – bm25(caption_text) +
closeness(clip_vector, query_vector)
• Query Vector template needs to be
defined in s/m/a/search/query-
profiles/types/root.xml
• Managing vespa – I used homegrown
scripts from (sujitpal/vespa-poc) but
probably better to use the official scripts
from vespa-cli.
Vespa Configuration (cont’d)
28. • Demo link for clip-imageclef
− internal, need to be in Elsevier VPN
− http://10.169.23.142/clip-demo
• Demo link for clip-rsicd
− publicly available on HuggingFace spaces
− https://huggingface.co/spaces/sujitpal/clip-rsicd-demo
Demo Links
29. • Code for fine-tuning CLIP with ImageCLEF data and image search using
Vespa – elsevierlabs-os/clip-image-search
Resources