Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
Fine tune and deploy Hugging Face NLP modelsOVHcloud
This webinar discusses fine-tuning and deploying Hugging Face NLP models. The agenda includes an overview of Hugging Face and NLP, a demonstration of fine-tuning a model, a demonstration of deploying a model in production, and a summary. Hugging Face is presented as the most popular open source NLP library with over 4,000 models. Fine-tuning models allows them to be adapted for specific tasks and domains and is more data efficient than training from scratch. OVHcloud is highlighted as providing tools for full AI workflows from storage and processing to training and deployment.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
This paper proposes LLaMA-Adapter, a lightweight method to efficiently fine-tune the LLaMA language model into an instruction-following model. It uses learnable adaption prompts prepended to word tokens in higher transformer layers. Additionally, it introduces zero-initialized attention with a gating mechanism that incorporates instructional signals while preserving pre-trained knowledge. Experiments show LLaMA-Adapter can generate high-quality responses comparable to fully fine-tuned models, and it can be extended to multi-modal reasoning tasks.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
The document discusses recent developments in pre-trained language models including ELMO, ULMFiT, BERT, and GPT-2. It provides overviews of the core structures and implementations of each model, noting that they have achieved great performance on natural language tasks without requiring labeled data for pre-training, similar to how pre-training helps in computer vision tasks. The document also includes a comparison chart of the types of natural language tasks each model can perform.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Reinventing Deep Learning with Hugging Face TransformersJulien SIMON
The document discusses how transformers have become a general-purpose architecture for machine learning, with various transformer models like BERT and GPT-3 seeing widespread adoption. It introduces Hugging Face as a company working to make transformers more accessible through tools and libraries. Hugging Face has seen rapid growth, with its hub hosting over 73,000 models and 10,000 datasets that are downloaded over 1 million times daily. The document outlines Hugging Face's vision of facilitating the entire machine learning process from data to production through tools that support tasks like transfer learning, hardware acceleration, and collaborative model development.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
This webinar discusses fine-tuning and deploying Hugging Face NLP models. The agenda includes an overview of Hugging Face and NLP, a demonstration of fine-tuning a model, a demonstration of deploying a model in production, and a summary. Hugging Face is presented as the most popular open source NLP library with over 4,000 models. Fine-tuning models allows them to be adapted for specific tasks and domains and is more data efficient than training from scratch. OVHcloud is highlighted as providing tools for full AI workflows from storage and processing to training and deployment.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
This paper proposes LLaMA-Adapter, a lightweight method to efficiently fine-tune the LLaMA language model into an instruction-following model. It uses learnable adaption prompts prepended to word tokens in higher transformer layers. Additionally, it introduces zero-initialized attention with a gating mechanism that incorporates instructional signals while preserving pre-trained knowledge. Experiments show LLaMA-Adapter can generate high-quality responses comparable to fully fine-tuned models, and it can be extended to multi-modal reasoning tasks.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
The document discusses recent developments in pre-trained language models including ELMO, ULMFiT, BERT, and GPT-2. It provides overviews of the core structures and implementations of each model, noting that they have achieved great performance on natural language tasks without requiring labeled data for pre-training, similar to how pre-training helps in computer vision tasks. The document also includes a comparison chart of the types of natural language tasks each model can perform.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Reinventing Deep Learning with Hugging Face TransformersJulien SIMON
The document discusses how transformers have become a general-purpose architecture for machine learning, with various transformer models like BERT and GPT-3 seeing widespread adoption. It introduces Hugging Face as a company working to make transformers more accessible through tools and libraries. Hugging Face has seen rapid growth, with its hub hosting over 73,000 models and 10,000 datasets that are downloaded over 1 million times daily. The document outlines Hugging Face's vision of facilitating the entire machine learning process from data to production through tools that support tasks like transfer learning, hardware acceleration, and collaborative model development.
An introduction to computer vision with Hugging FaceJulien SIMON
In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
1) Transformers use self-attention to solve problems with RNNs like vanishing gradients and parallelization. They combine CNNs and attention.
2) Transformers have encoder and decoder blocks. The encoder models input and decoder models output. Variations remove encoder (GPT) or decoder (BERT) for language modeling.
3) GPT-3 is a large Transformer with 175B parameters that can perform many NLP tasks but still has safety and bias issues.
Building NLP applications with TransformersJulien SIMON
The document discusses how transformer models and transfer learning (Deep Learning 2.0) have improved natural language processing by allowing researchers to easily apply pre-trained models to new tasks with limited data. It presents examples of how HuggingFace has used transformer models for tasks like translation and part-of-speech tagging. The document also discusses tools from HuggingFace that make it easier to train models on hardware accelerators and deploy them to production.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
This document summarizes the lda2vec model, which combines aspects of word2vec and LDA. Word2vec learns word embeddings based on local context, while LDA learns document-level topic mixtures. Lda2vec models words based on both their local context and global document topic mixtures to leverage both approaches. It represents documents as mixtures over sparse topic vectors similar to LDA to maintain interpretability. This allows it to predict words based on local context and global document content.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
The release of TensorFlow 2.0 comes with a significant number of improvements over its 1.x version, all with a focus on ease of usability and a better user experience. We will give an overview of what TensorFlow 2.0 is and discuss how to get started building models from scratch using TensorFlow 2.0’s high-level api, Keras. We will walk through an example step-by-step in Python of how to build an image classifier. We will then showcase how to leverage a transfer learning to make building a model even easier! With transfer learning, we can leverage other pretrained models such as ImageNet to drastically speed up the training time of our model. TensorFlow 2.0 makes this incredibly simple to do.
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI.
Link to GPT-3 paper: https://arxiv.org/abs/2005.14165
Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The document discusses the BERT model for natural language processing. It begins with an introduction to BERT and how it achieved state-of-the-art results on 11 NLP tasks in 2018. The document then covers related work on language representation models including ELMo and GPT. It describes the key aspects of the BERT model, including its bidirectional Transformer architecture, pre-training using masked language modeling and next sentence prediction, and fine-tuning for downstream tasks. Experimental results are presented showing BERT outperforming previous models on the GLUE benchmark, SQuAD 1.1, SQuAD 2.0, and SWAG. Ablation studies examine the importance of the pre-training tasks and the effect of model size.
Interpretability beyond feature attribution quantitative testing with concept...MLconf
TCAV is a method for interpreting machine learning models by quantitatively measuring the importance of user-chosen concepts for a model's predictions, even if those concepts were not part of the model's training data or input features. It does this by learning concept activation vectors (CAVs) that represent concepts and using the CAVs to calculate a model's sensitivity or importance to each concept via directional derivatives. TCAV was shown to validate ground truths from sanity check experiments, uncover geographical biases in widely used models, and match domain expert concepts for diabetic retinopathy versus those a model may use, helping ensure models' values and knowledge are properly aligned and reflected.
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
Olga Petrova gives an introduction to transformers for natural language processing (NLP). She begins with an overview of representing words using tokenization, word embeddings, and one-hot encodings. Recurrent neural networks (RNNs) are discussed as they are important for modeling sequential data like text, but they struggle with long-term dependencies. Attention mechanisms were developed to address this by allowing the model to focus on relevant parts of the input. Transformers use self-attention and have achieved state-of-the-art results in many NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) provides contextualized word embeddings trained on large corpora.
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Lviv Startup Club
The document discusses tips and tricks for building automated visual data annotation systems. It covers key topics like what data annotation is, why it is important, challenges in the annotation process, popular computer vision libraries, and techniques like zero-shot classification, few-shot semantic segmentation, and open-vocabulary object detection that can help build automated annotation systems. The document provides examples of how language models like CLIP can be used for tasks like zero-shot classification and relevant image search to help reduce the need for large labeled datasets.
An introduction to computer vision with Hugging FaceJulien SIMON
In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
1) Transformers use self-attention to solve problems with RNNs like vanishing gradients and parallelization. They combine CNNs and attention.
2) Transformers have encoder and decoder blocks. The encoder models input and decoder models output. Variations remove encoder (GPT) or decoder (BERT) for language modeling.
3) GPT-3 is a large Transformer with 175B parameters that can perform many NLP tasks but still has safety and bias issues.
Building NLP applications with TransformersJulien SIMON
The document discusses how transformer models and transfer learning (Deep Learning 2.0) have improved natural language processing by allowing researchers to easily apply pre-trained models to new tasks with limited data. It presents examples of how HuggingFace has used transformer models for tasks like translation and part-of-speech tagging. The document also discusses tools from HuggingFace that make it easier to train models on hardware accelerators and deploy them to production.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
This document summarizes the lda2vec model, which combines aspects of word2vec and LDA. Word2vec learns word embeddings based on local context, while LDA learns document-level topic mixtures. Lda2vec models words based on both their local context and global document topic mixtures to leverage both approaches. It represents documents as mixtures over sparse topic vectors similar to LDA to maintain interpretability. This allows it to predict words based on local context and global document content.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
The release of TensorFlow 2.0 comes with a significant number of improvements over its 1.x version, all with a focus on ease of usability and a better user experience. We will give an overview of what TensorFlow 2.0 is and discuss how to get started building models from scratch using TensorFlow 2.0’s high-level api, Keras. We will walk through an example step-by-step in Python of how to build an image classifier. We will then showcase how to leverage a transfer learning to make building a model even easier! With transfer learning, we can leverage other pretrained models such as ImageNet to drastically speed up the training time of our model. TensorFlow 2.0 makes this incredibly simple to do.
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI.
Link to GPT-3 paper: https://arxiv.org/abs/2005.14165
Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The document discusses the BERT model for natural language processing. It begins with an introduction to BERT and how it achieved state-of-the-art results on 11 NLP tasks in 2018. The document then covers related work on language representation models including ELMo and GPT. It describes the key aspects of the BERT model, including its bidirectional Transformer architecture, pre-training using masked language modeling and next sentence prediction, and fine-tuning for downstream tasks. Experimental results are presented showing BERT outperforming previous models on the GLUE benchmark, SQuAD 1.1, SQuAD 2.0, and SWAG. Ablation studies examine the importance of the pre-training tasks and the effect of model size.
Interpretability beyond feature attribution quantitative testing with concept...MLconf
TCAV is a method for interpreting machine learning models by quantitatively measuring the importance of user-chosen concepts for a model's predictions, even if those concepts were not part of the model's training data or input features. It does this by learning concept activation vectors (CAVs) that represent concepts and using the CAVs to calculate a model's sensitivity or importance to each concept via directional derivatives. TCAV was shown to validate ground truths from sanity check experiments, uncover geographical biases in widely used models, and match domain expert concepts for diabetic retinopathy versus those a model may use, helping ensure models' values and knowledge are properly aligned and reflected.
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
Olga Petrova gives an introduction to transformers for natural language processing (NLP). She begins with an overview of representing words using tokenization, word embeddings, and one-hot encodings. Recurrent neural networks (RNNs) are discussed as they are important for modeling sequential data like text, but they struggle with long-term dependencies. Attention mechanisms were developed to address this by allowing the model to focus on relevant parts of the input. Transformers use self-attention and have achieved state-of-the-art results in many NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) provides contextualized word embeddings trained on large corpora.
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Lviv Startup Club
The document discusses tips and tricks for building automated visual data annotation systems. It covers key topics like what data annotation is, why it is important, challenges in the annotation process, popular computer vision libraries, and techniques like zero-shot classification, few-shot semantic segmentation, and open-vocabulary object detection that can help build automated annotation systems. The document provides examples of how language models like CLIP can be used for tasks like zero-shot classification and relevant image search to help reduce the need for large labeled datasets.
Multimodal foundation models are a revolutionary class of AI models that provide impressive abilities to generate multimedia content and do so by interactive prompts in a seemingly creative manner. These foundation models are often self-supervised transformer-based models pre-trained on large volumes of data, typically collected from the web. They already form the basis of all state-of-the-art systems in computer vision and natural language processing across a wide range of tasks and have shown impressive transfer learning abilities. Despite their immense potential, these foundation models face challenges in fundamental perception tasks such as spatial grounding and temporal reasoning, have difficulty to operate on low-resource scenarios, and neglect human-alignment for ethical, legal, and societal acceptance. In this talk I will highlight recent work from my lab that identifies several of these challenges as well as ways to update foundation models to address these challenges and to do so in a sustainable way, without the need to retrain from scratch.
Filip Panjevic is a Co-Founder and CTO at ydrive.ai - startup dealing with self-driving cars, and one of the founders of Petnica Machine Learning School.
Filip's talk will focus on the story of Petnica School, how did it start, what has changed since the beginning, how the concept of school looks right now and why is that concept good for making new data scientists. This talk will be perfect for people who consider starting their careers in the data science field!
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Deep learning applications in e-commerce search: Dynamic talks Chicago 3/14/2019Grid Dynamics
In this talk, we will discuss how recent advancements in artificial intelligence are transforming traditional search technologies. Deep learning-based image analysis and natural language processing open exciting new horizons for search and recommendation applications across the industries. We’ll talk about how deep learning models can help conventional search engines to achieve better relevance. We will share our experience implementing innovative solutions for online retailers, finance and high tech customers.
The document summarizes Assaf Mushinsky's presentation at CVPR 2017. Some key points:
- He discussed state-of-the-art research in object detection, segmentation, pose estimation, and network architectures from papers presented at CVPR 2017.
- Papers presented efficient object detection methods that improved speed and accuracy trade-offs like YOLO9000 and Feature Pyramid Networks. Mask R-CNN was discussed for instance segmentation and pose estimation.
- New network architectures like Densely Connected Networks, Xception, and ResNeXt were covered that improved accuracy and efficiency over ResNet and Inception.
- The presentation highlighted recent advances in computer vision from the CVPR conference but did not cover older
This is an intensive meetup at Samsung Next IL covering most interesting papers that were presented in CVPR 2017 last month. It is a good opportunity to have an overview of recent advancements in the field of Deep Learning with applications to Computer-Vision.
The following topics are covered:
• Object detection
• Pose estimation
• Efficient networks
Vector search goes far beyond just text, and, in this interactive workshop, you will learn how to use it for multimodal search through an in-depth look at CLIP, a vision and language model, developed by OpenAI. Sujit Pal, technology research director at Elsevier, and Raphael Pisoni, senior computer vision engineer at Partium.io, will walk you through two applications of image search and then have a panel discussion with our staff developer advocate, James, on how to use CLIP for image and text search.
Lessons learned from building practical deep learning systemsXavier Amatriain
1. There are many lessons to be learned from building practical deep learning systems, including choosing the right evaluation metrics, being thoughtful about your data and potential biases, and understanding dependencies between data, models, and systems.
2. It is important to optimize only what matters and beware of biases in your data. Simple models are often better than complex ones, and feature engineering is crucial.
3. Both supervised and unsupervised learning are important, and ensembles often perform best. Your AI infrastructure needs to support both experimentation and production.
One of the most popular buzz words nowadays in the technology world is “Machine Learning (ML).” Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes. This is leading many organizations to seek experts who can implement Machine Learning into their businesses.
The paper will be written for statistical programmers who want to explore Machine Learning career, add Machine Learning skills to their experiences or enter a Machine Learning fields. The paper will discuss about personal journey to become to a Machine Learning Engineer from a statistical programmer. The paper will share my personal experience on what motivated me to start Machine Learning career, how I started it, and what I have learned and done to be a Machine Learning Engineer. In addition, the paper will also discuss the future of Machine Learning in Pharmaceutical Industry, especially in Biometric department.
This document summarizes Kevin McGuinness' presentation on deep learning for computer vision. It discusses visual attention models and their ability to predict eye gaze, applications in image cropping, retrieval and classification. It also covers medical image analysis using deep learning for knee osteoarthritis grading and neonatal brain segmentation. Deep crowd analysis is examined for crowd counting. Finally, interactive deep vision for image segmentation using user interactions is presented.
Deep Representation: Building a Semantic Image Search EngineC4Media
This document discusses building a semantic image search engine through deep representation learning. It describes using a pre-trained convolutional neural network to extract image embeddings, which are then used to build an index for fast proximity search of similar images. The model is further trained to predict word embeddings for images, allowing image search to be guided by text queries beyond the initial training classes. This provides a way to build a joint image-text embedding space and enable generalized image search using only a small dataset for training.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit.
This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-brailovskiy
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Ilya Brailovskiy, Principal Engineer at Amazon Lab126, presents the "How Image Sensor and Video Compression Parameters Impact Vision Algorithms" tutorial at the May 2017 Embedded Vision Summit.
Recent advances in deep learning algorithms have brought automated object detection and recognition to human accuracy levels on various test datasets. But algorithms that work well on an engineer’s PC often fail when deployed as part of a complete embedded system. In this talk, Brailovskiy examines some of the key embedded vision system elements that can degrade the performance of vision algorithms.
For example, in many systems video is compressed, transmitted, and then decompressed before being presented to vision algorithms. Not surprisingly, video encoding parameters, such as bit rate, can have a significant impact on vision algorithm accuracy. Similarly, image sensor parameters can have a profound effect on the nature of the images captured, and therefore on the performance of vision algorithms. He explores how image sensor and video compression parameters impact vision algorithm performance, and discusses methods for selecting the best parameters to aid vision algorithm accuracy.
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
1. The presentation discusses how to use transfer learning to bootstrap image classification and question answering tasks. Transfer learning allows leveraging knowledge from existing models trained on large datasets and applying it to new tasks with less data.
2. For image classification, the presentation recommends using features from pretrained convolutional neural networks on ImageNet as general purpose image features. Fine-tuning the top layers of these networks on smaller datasets can achieve good accuracy.
3. For natural language processing tasks, transfer learning techniques like using pretrained word embeddings, language models like ULMFiT and ELMo, and models trained on question answering datasets can help bootstrap tasks with less text data.
This document discusses applying GRASP (General Responsibility Assignment Software Patterns) principles to object-oriented design. It introduces five GRASP patterns: Creator, Information Expert, Controller, Low Coupling and High Cohesion. These patterns provide guidance on assigning responsibilities to classes in a way that promotes qualities like low coupling, high cohesion, and encapsulation. The document uses a board game example to illustrate applying the Creator and Information Expert patterns in UML class diagrams and sequence diagrams.
Product matching is the challenge of examining two different representations of retail products (think items that you see on e-commerce websites) and determining whether they both refer to the same product. Tackling this problem requires a mix of NLP (to deal with text data), computer vision (to deal with product images), ontology management and more (to ingest a host of other signals on offer).
I’ve been working on this problem in various capacities for a few years now at Semantics3. During this period, I’ve made a fair number of mistakes which in turn have taught me useful lessons about applying deep/machine learning in an industry setting.
During this talk, I’d like to walk you through 5 specific scenarios in which I attempted to achieve a specific goal in the context of product matching, but ran into an unexpected problem that threw a spanner in the works. I’ll then talk about the root cause that sprouted the problem in the first place and the lesson I learned having made this discovery. Where relevant, I’ll bring in examples from outside the retail domain to broaden the perspective offered.
The goal of the talk isn’t to provide a guidebook for solving the product matching problem — the goal is to give you insight into the ups and downs of working through a specific data-science problem, and in the process, delivering packaged lessons that you could potentially draw on in your own field of work.
Similar to Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI (20)
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...Lviv Startup Club
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас Agile? (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Natalia Renska & Roman Astafiev: Нарциси і психопати в організаціях. Як це вп...Lviv Startup Club
Natalia Renska & Roman Astafiev: Нарциси і психопати в організаціях. Як це впливає на розробку продуктів та реалізацію інноваційних рішень (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Igor Protsenko: Difference between outsourcing and product companies for prod...Lviv Startup Club
Igor Protsenko: Difference between outsourcing and product companies for product managers and related challenges (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Kseniya Leshchenko: Shared development support service model as the way to ma...Lviv Startup Club
Kseniya Leshchenko: Shared development support service model as the way to make small projects with small budgets profitable for the company (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Anna Kompanets: Проблеми впровадження проєктів, про які б ви ніколи не подума...Lviv Startup Club
Anna Kompanets: Проблеми впровадження проєктів, про які б ви ніколи не подумали (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Anton Hlazkov: Впровадження змін – це процес чи проєкт? Чому важливо розуміти...Lviv Startup Club
Anton Hlazkov: Впровадження змін – це процес чи проєкт? Чому важливо розуміти різницю і як це впливає на результат (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
1. Zero-shot learning capabilities
of CLIP model from
Yurii Pashchenko AI&BigData Online Day 2021
Yurii Pashchenko
Sr ML Engineer at Depositphotos
2. About me
❏ Yurii Pashchenko
❏ Sr Machine Learning Engineer at Depositphotos
❏ Over 8 years of research and commercial experience in
applying Deep Learning models
❏ Object Detection Specialist
❏ Knowledge Sharing Master at Transformer* at least I want to
become
��
3. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
4. What is Zero-Shot Learning
Understanding Zero-Shot Learning — Making ML More Human
5. Motivation of CLIP from OpenAI?
● Costly datasets
● Narrow
● Poor real-world performance
CLIP: Connecting Text and Images
6. CLIP: Contrastive Language-Image
Pre-training
Learning Transferable Visual Models From Natural Language Supervision
● 400 million (image, text) pairs collected
from Internet.
● Trained modifications of ResNet-50
and ViT-B
● Batch size 32 768 for 32 epochs
● The largest ResNet model, RN50x64,
took 18 days to train on 592 V100
GPUs while the largest Vision
Transformer took 12 days on 256
V100 GPUs
7. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
8. CLIP for Zero-Shot Classification
Learning Transferable Visual Models From Natural Language Supervision
Ensembling around 80
prompts improve
ImageNet accuracy by
almost 5%
11. CLIP Zero-Shot vs Few-Shot
Learning Transferable Visual Models From Natural Language Supervision
12. CLIP on FairFace
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and
Mitigation
CLIP has a top-1 accuracy of 59.2% for “in the
wild” celebrity image classification when
choosing from 100 candidates and a top-1
accuracy of 43.3% when choosing from 1000
possible choices
13. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
14. CLIP for Image Ranking
DALL·E: Creating Images from Text
“an armchair in the shape of an avocado”
“a living room with two white armchairs and a painting of the
collosseum. the painting is mounted above a modern fireplace”
15. CLIP for Image Search
Text-to-Image
Unsplash Image Search
16. CLIP for Image Search
Image-to-Image
Unsplash Image Search
17. CLIP for Image Search
Text+Text-to-Image
Unsplash Image Search
19. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
20. CLIP limitations
Learning Transferable Visual Models From Natural Language Supervision
● poor generalization to images not covered
in its pre-training dataset (MNIST)
21. Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
an elephant a zebra a lake
Text:
examples from this collab
CLIP limitations
22. CLIP limitations
Learning Transferable Visual Models From Natural Language Supervision
● poor generalization to images not covered
in its pre-training dataset (MNIST)
● counting the number of objects in an image
● predicting how close the nearest object is in
a photo
● CLIP’s zero-shot classifiers can be sensitive
to wording or phrasing and sometimes
require trial and error “prompt engineering”
to perform well.
23. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
24. You can’t just make an Object Detector
from a Classifier
… without fine-tuning
25. Assembling Object Detector with CLIP
Rich feature hierarchies for accurate object detection and semantic segmentation
CLIP
Text
Encoder
person
26. Region proposals alternatives
Salient Object Detection Techniques in Computer Vision—A Survey
Salient object detection (SOD) is an important computer vision task aimed at precise
detection and segmentation of visually distinctive image regions from the perspective of the
human visual system
27. Region proposals alternatives
Open-World Entity Segmentation
Entity Segmentation is a segmentation task with the aim to segment everything in an image
into semantically-meaningful regions without considering any category labels.
28. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
29. What is knowledge distillation?
Knowledge Distillation : Simplified
Knowledge distillation refers to the idea of model compression by teaching a smaller network,
step by step, exactly what to do using a bigger already trained network.
37. Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
43. Thank you for your attention!
Yurii Pashchenko AI&BigData Online Day 2021
Yurii Pashchenko
Sr ML Engineer at Depositphotos
yurii_pas
george.pashchenko@gmail.com