The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
A Simple Framework for Contrastive Learning of Visual RepresentationsSeunghyun Hwang
Review : A Simple Framework for Contrastive Learning of Visual Representat
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 330번째 논문 리뷰입니다.
오늘은 무려 5만개의 학습된 ViT model을 제공하는 구글스러운 논문을 리뷰해보았습니다. ViT가 CNN을 조금씩 대체해가고 있는데요, ViT는 CNN과 달리 inductive bias가 적은 관계로
좋은 성능을 위해서는 굉장히 많은 data가 필요하거나, augmentation과 regularization을 많이 써줘야 합니다.
그런데 이렇게 다양한 경우 즉 다양한 data, 다양한 model size, 다양한 augmentation 방법, 다양한 regularization, 다양한 data size 등등에 따른 ViT의 성능과 속도 등의 비교 분석 실험이 지금까지는 없었죠.
이 논문에서는 그 어려운 걸(?) 해냈습니다. 그리고 수많은 ViT를 이용해 실험을 하면서 몇가지 중요한 finding들을 찾았습니다.
요약하면 다음과 같습니다.
1. augmentation과 regularization을 잘 쓰면 1/10의 data로도 전체 data 다 쓴거랑 대부분 비슷한 성능을 낼 수 있다. 그런데 항상 그런건 아니다.
반대로 말하면 data가 10배 있으면 augmentation이나 regularization안 쓰고도 좋은 성능을 낼 수 있다.
2. downstream task 학습할 때 scratch부터 학습하는거랑 large dataset으로 pre-trained한 걸 이용해서 transfer learning하는 건 후자가 좋다.
3. transfer learning 할 때도 pre-trained model 중에 data 많이 써서 학습한게 더 좋다.
4. augmentation/regularization은 data가 많으면 별 도움이 안되고 둘 중에는 augmenation이 더 좋다.
5. pre-trained model이 많을 때 model을 고르는 방법은 그냥 upstream에서 제일 잘됐던 걸 고르면 얼추 잘된다.
6. 속도를 빠르게 하고 싶을 때는 model을 작은거 쓰지말고 patch size를 키워라. 그래야 성능이 별로 안떨어진다.
입니다.
흥미로운 결과들이 많으니 자세한 내용은 아래 영상을 참고해주세요!
감사합니다!
영상링크: https://youtu.be/A3RrAIx-KCc
논문링크: https://arxiv.org/abs/2106.10270
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Knowledge distillation aims at transferring “knowledge” acquired in one model (teacher) to another model (student) that is typically smaller.
Previous approaches can be expressed as a form of training the student with output activations of data examples represented by the teacher.
We introduce a novel approach, dubbed relational knowledge distillation (Relational KD), that transfers relations among data examples represented by the teacher.
As concrete realizations of Relational KD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations.
Experiments conducted on different benchmark tasks show that the Relational KD improves the performance of the educated student networks with a significant margin, and even outperforms the teacher’s performance.
A Simple Framework for Contrastive Learning of Visual RepresentationsSeunghyun Hwang
Review : A Simple Framework for Contrastive Learning of Visual Representat
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 330번째 논문 리뷰입니다.
오늘은 무려 5만개의 학습된 ViT model을 제공하는 구글스러운 논문을 리뷰해보았습니다. ViT가 CNN을 조금씩 대체해가고 있는데요, ViT는 CNN과 달리 inductive bias가 적은 관계로
좋은 성능을 위해서는 굉장히 많은 data가 필요하거나, augmentation과 regularization을 많이 써줘야 합니다.
그런데 이렇게 다양한 경우 즉 다양한 data, 다양한 model size, 다양한 augmentation 방법, 다양한 regularization, 다양한 data size 등등에 따른 ViT의 성능과 속도 등의 비교 분석 실험이 지금까지는 없었죠.
이 논문에서는 그 어려운 걸(?) 해냈습니다. 그리고 수많은 ViT를 이용해 실험을 하면서 몇가지 중요한 finding들을 찾았습니다.
요약하면 다음과 같습니다.
1. augmentation과 regularization을 잘 쓰면 1/10의 data로도 전체 data 다 쓴거랑 대부분 비슷한 성능을 낼 수 있다. 그런데 항상 그런건 아니다.
반대로 말하면 data가 10배 있으면 augmentation이나 regularization안 쓰고도 좋은 성능을 낼 수 있다.
2. downstream task 학습할 때 scratch부터 학습하는거랑 large dataset으로 pre-trained한 걸 이용해서 transfer learning하는 건 후자가 좋다.
3. transfer learning 할 때도 pre-trained model 중에 data 많이 써서 학습한게 더 좋다.
4. augmentation/regularization은 data가 많으면 별 도움이 안되고 둘 중에는 augmenation이 더 좋다.
5. pre-trained model이 많을 때 model을 고르는 방법은 그냥 upstream에서 제일 잘됐던 걸 고르면 얼추 잘된다.
6. 속도를 빠르게 하고 싶을 때는 model을 작은거 쓰지말고 patch size를 키워라. 그래야 성능이 별로 안떨어진다.
입니다.
흥미로운 결과들이 많으니 자세한 내용은 아래 영상을 참고해주세요!
감사합니다!
영상링크: https://youtu.be/A3RrAIx-KCc
논문링크: https://arxiv.org/abs/2106.10270
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Knowledge distillation aims at transferring “knowledge” acquired in one model (teacher) to another model (student) that is typically smaller.
Previous approaches can be expressed as a form of training the student with output activations of data examples represented by the teacher.
We introduce a novel approach, dubbed relational knowledge distillation (Relational KD), that transfers relations among data examples represented by the teacher.
As concrete realizations of Relational KD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations.
Experiments conducted on different benchmark tasks show that the Relational KD improves the performance of the educated student networks with a significant margin, and even outperforms the teacher’s performance.
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
This talk by Emily Denton from New York University on "Unsupervised Learning of Disentangled Representations from Video" was presented at the Learning Image Representations event on 30th August at Twitter as part of the Creative AI meetup.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang
Review : How useful is self-supervised pretraining for Visual tasks?
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
This talk by Emily Denton from New York University on "Unsupervised Learning of Disentangled Representations from Video" was presented at the Learning Image Representations event on 30th August at Twitter as part of the Creative AI meetup.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang
Review : How useful is self-supervised pretraining for Visual tasks?
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Recommender Systems from A to Z – Model TrainingCrossing Minds
This second meetup will be about training different models for our recommender system. We will review the simple models we can build as a baseline. After that, we will present the recommender system as an optimization problem and discuss different training losses. We will mention linear models and matrix factorization techniques. We will end the presentation with a simple introduction to non-linear models and deep learning.
Large Scale GAN Training for High Fidelity Natural Image SynthesisSeunghyun Hwang
Review : Large Scale GAN Training for High Fidelity Natural Image Synthesis
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
A presentation on the "no new UNet" model, which attempts to automate hyper-parameter selection for medical image segmentation. The paper was accepted to Nature Methods.
Machine learning lets you make better business decisions by uncovering patterns in your consumer behavior data that is hard for the human eye to spot. You can also use it to automate routine, expensive human tasks that were previously not doable by computers. In the business to business space (B2B), if your competitors can make wiser business decisions based on data and automate more business operations but you still base your decisions on guesswork and lack automation, you will lose out on business productivity. In this introduction to machine learning tech talk, you will learn how to use machine learning even if you do not have deep technical expertise on this technology.
Topics covered:
1.What is machine learning
2.What is a typical ML application architecture
3.How to start ML development with free resource links
4.Key decision factors in ML technology selection depending on use case scenarios
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
Tensorflow-KR 논문읽기모임 95번째 발표영상입니다
Modularity Matters라는 제목으로 visual relational reasoning 문제를 풀 수 있는 방법을 제시한 논문입니다. 기존 CNN들이 이런 문제이 취약함을 보여주고 이를 해결하기 위한 방법을 제시합니다. 관심있는 주제이기도 하고 Bengio 교수님 팀에서 쓴 논문이라서 review 해보았습니다
발표영상: https://youtu.be/dAGI3mlOmfw
논문링크: https://arxiv.org/abs/1806.06765
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
#PR12 #PR366
안녕하세요 논문 읽기 모임 PR-12의 366번째 논문리뷰입니다.
올해가 AlexNet이 나온지 10주년이 되는 해네요.
AlexNet이 2012년에 혜성처럼 등장한 이후, Solve computer vision problem = Use CNN이 공식처럼 사용되던 2010년대가 가고
2020년대 들어서 ViT의 등장을 시작으로 Transformer 기반의 network들이 CNN의 자리를 위협하고 상당부분 이미 뺏어간 상황입니다.
2020년대에 CNN의 가야할 길은 어디일까요?
Inductive bias가 적은 Transformer가 대용량의 데이터로 학습하면 항상 CNN보다 더 낫다는 건 진실일까요?
이 논문에서는 2020년대를 위한 CNN이라는 제목으로 ConvNeXt라는 새로운(?) architecture를 제안합니다.
사실 새로운 건 없고 그동안 있었던 것들과 Transformer에서 적용한 것들을 copy해와서 CNN에 적용해보았는데요,
Transformer보다 성능도 좋고 속도도 빠른 결과가 나왔다고 합니다.
결과에 대해서 약간의 논란이 twitter 상에서 나오고 있는데 이 부분 포함해서 자세한 내용은 영상을 통해서 보실 수 있습니다.
늘 재밌게 봐주시고 좋아요 댓글 구독 해주시는 분들께 감사드립니다 :)
논문링크: https://arxiv.org/abs/2201.03545
영상링크: https://youtu.be/Mw7IhO2uBGc
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
#PR12 #PR355
안녕하세요 논문 읽기 모임 PR-12의 355번째 논문리뷰입니다.
Computer Vision 분야에는 왜 BERT나 GPT 같은 model이 없을까요?
Self-supervised learning을 이용하여 pretraining 한 후, downstream task에서 supervised learning보다 성능이 잘나오는 model을 언제쯤 보게 될까요?
어쩌면 그 model 이 논문에 있을 수도 있습니다.
이 논문에서는 ViT 기반의 Autoencoder를 활용하여 ImageNet-1K training set을 이용하여 self-supervised pretraining으로 SOTA(ImageNet-1K only)를 달성하였습니다.
image를 patch로 만들고 75%의 patch를 masking한 후 25%의 patch만으로 masking된 75%의 pixel 값을 직접 예측하는 형태를 사용하였고,
다른 model들에 비하여 연산량과 memory 사용량이 적어서 big model로의 확장도 용이합니다.
재미있는 아이디어와 다양한 실험결과가 있으니 자세한 내용은 발표 영상을 참고해주세요!
영상링크: https://youtu.be/mtUa3AAxPNQ
논문링크: https://arxiv.org/abs/2111.06377v1
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
#PR12 #PR344
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 344번째 논문 리뷰입니다.
오늘은 중국과기대와 MSRA에서 나온 A Battle of Network Structures라는 강렬한 제목을 가진 논문입니다.
부제에서 잘 나와있듯이 이 논문은 computer vision에서 CNN, Transformer, MLP에 대해서 같은 환경에서 비교를 통해 어떤 특징들이 있는지를 알아본 논문입니다.
우선 같은 조건에서 실험하기 위하여 SPACH라는 unified framework을 만들고 그 안에 CNN, Transformer, MLP를 넣어서 실험을 합니다.
셋 모두 조건이 잘 갖춰지면 비슷한 성능을 내지만, MLP는 model size가 커지면 overfitting이 발생하고
CNN은 Transformer에 비해서 적은 data에서도 좋은 성능이 나오는 generalization capability가 좋고,
Transformer는 model capacity가 커서 data가 충분하고 연산량도 큰 환경에서 잘한다는 것이 실험의 한가지 결과입니다.
또하나는 global receptive field를 갖는 transformer나 MLP의 경우에도 local한 연산을 하는 local model을 같이 써줄때에 성능이 좋아진다는 것입니다.
이런 insight들을 통해서 이 논문에서는 CNN과 Transformer를 결합한 형태의 Hybrid model을 제안하여 SOTA 성능을 낼 수 있음을 보여줍니다.
개인적으로 놀랄만한 insight를 발견한 것은 아니었지만 세가지 network의 특징과 장단점에 대해서 정리해볼 수 있는 그런 논문이라고 평하고 싶습니다.
자세한 내용은 영상을 참고해주세요! 감사합니다
영상링크: https://youtu.be/NVLMZZglx14
논문링크: https://arxiv.org/abs/2108.13002
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
Computer Vision 분야에서 CNN은 과연 살아남을 수 있을까요?
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 317번째 논문 리뷰입니다.
이번에는 Google Research, Brain Team의 MLP-Mixer: An all-MLP Architecture for Vision을 리뷰해보았습니다.
Attention의 공격도 버거운데 이번에는 MLP(Multi-Layer Perceptron)의 공격입니다.
MLP만을 사용해서 Image Classification을 하는데 성능도 좋고 속도도 빠르고....
구조를 간단히 소개해드리면 ViT(Vision Transformer)의 self-attention 부분을 MLP로 변경하였습니다.
MLP block 2개를 사용하여 하나는 patch(token)들 간의 연산을 하는데 사용하고, 하나는 patch 내부 연산을 하는데 사용합니다.
사실 MLP를 사용하긴 했지만 논문에도 언급되어 있듯이, 이 부분을 일종의 convolution이라고 볼 수 있는데요...
그래도 transformer 기반의 network이 가질 수밖에 없는 quadratic complexity를 linear로 낮춰주고
convolution의 inductive bias 거의 없이 아주아주 simple한 구조를 활용하여 이렇게 좋은 성능을 보여준 점이 멋집니다.
반면에 역시나 data를 많이 써야 한다거나, MLP의 한계인 fixed length의 input만 받을 수 있다는 점은 단점이라고 생각하는데요,
이 연구를 시작으로 MLP도 다시한번 조명받는 계기가 되면 좋을 것 같네요
비슷한 시점에 나온 비슷한 연구들도 마지막에 간략하게 소개하였습니다.
재미있게 봐주세요. 감사합니다!
논문링크: https://arxiv.org/abs/2105.01601
영상링크: https://youtu.be/KQmZlxdnnuY
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 297번째 리뷰입니다
어느덧 PR-12 시즌 3의 끝까지 논문 3편밖에 남지 않았네요.
시즌 3가 끝나면 바로 시즌 4의 새 멤버 모집이 시작될 예정입니다. 많은 관심과 지원 부탁드립니다~~
(멤버 모집 공지는 Facebook TensorFlow Korea 그룹에 올라올 예정입니다)
오늘 제가 리뷰한 논문은 Facebook의 Training data-efficient image transformers & distillation through attention 입니다.
Google에서 나왔던 ViT논문 이후에 convolution을 전혀 사용하지 않고 오직 attention만을 이용한 computer vision algorithm에 어느때보다 관심이 높아지고 있는데요
이 논문에서 제안한 DeiT 모델은 ViT와 같은 architecture를 사용하면서 ViT가 ImageNet data만으로는 성능이 잘 안나왔던 것에 비해서
Training 방법 개선과 새로운 Knowledge Distillation 방법을 사용하여 mageNet data 만으로 EfficientNet보다 뛰어난 성능을 보여주는 결과를 얻었습니다.
정말 CNN은 이제 서서히 사라지게 되는 것일까요? Attention이 computer vision도 정복하게 될 것인지....
개인적으로는 당분간은 attention 기반의 CV 논문이 쏟아질 거라고 확신하고, 또 여기에서 놀라운 일들이 일어날 수 있을 거라고 생각하고 있습니다
CNN은 10년간 많은 연구를 통해서 발전해왔지만, transformer는 이제 CV에 적용된 지 얼마 안된 시점이라서 더 기대가 크구요,
attention이 inductive bias가 가장 적은 형태의 모델이기 때문에 더 놀라운 이들을 만들 수 있을거라고 생각합니다
얼마 전에 나온 open AI의 DALL-E도 그 대표적인 예라고 할 수 있을 것 같습니다. Transformer의 또하나의 transformation이 궁금하신 분들은 아래 영상을 참고해주세요
영상링크: https://youtu.be/DjEvzeiWBTo
논문링크: https://arxiv.org/abs/2012.12877
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 284번째 논문 review입니다.
이번 논문은 Facebook에서 나온 DETR(DEtection with TRansformer) 입니다.
arxiv-sanity에 top recent/last year에서 가장 상위에 자리하고 있는 논문이기도 합니다(http://www.arxiv-sanity.com/top?timefilter=year&vfilter=all)
최근에 ICLR 2021에 submit된 ViT로 인해서 이제 Transformer가 CNN을 대체하는 것 아닌가 하는 얘기들이 많이 나오고 있는데요, 올 해 ECCV에 발표된 논문이고 feature extraction 부분은 CNN을 사용하긴 했지만 transformer를 활용하여 효과적으로 Object Detection을 수행하는 방법을 제안한 중요한 논문이라고 생각합니다. 이 논문에서는 detection 문제에서 anchor box나 NMS(Non Maximum Supression)와 같은 heuristic 하고 미분 불가능한 방법들이 많이 사용되고, 이로 인해서 유독 object detection 문제는 딥러닝의 철학인 end-to-end 방식으로 해결되지 못하고 있음을 지적하고 있습니다. 그 해결책으로 bounding box를 예측하는 문제를 set prediction problem(중복을 허용하지 않고, 순서에 무관함)으로 보고 transformer를 활용한 end-to-end 방식의 알고리즘을 제안하였습니다. anchor box도 필요없고 NMS도 필요없는 DETR 알고리즘의 자세한 내용이 알고싶으시면 영상을 참고해주세요!
영상링크: https://youtu.be/lXpBcW_I54U
논문링크: https://arxiv.org/abs/2005.12872
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 258번째 논문 review입니다.
이번 논문은 MIT에서 나온 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks입니다.
Deep Learning 하시는 분들이면 ImageNet 모르시는 분들이 없을텐데요, 이 논문은 ImageNet의 labeling 방법의 한계와 문제점에 대해서 얘기하고 top-1 accuracy 기반의 평가 방법에도 문제가 있을 수 있음을 지적하고 있습니다.
ImageNet data의 20% 이상이 multi object를 포함하고 있지만 그 중에 하나만 정답으로 인정되는 문제가 있고, annotation 방법의 한계로 인하여 실제로 사람이 생각하는 것과 다른 class가 정답으로 labeling되어 있는 경우도 많았습니다. 또한 terrier만 20종이 넘는 등 전문가가 아니면 판단하기 어려운 label도 많다는 문제도 있었구요. 이 밖에도 다양한 실험을 통해서 정량적인 분석과 함께 human-in-the-loop을 이용한 평가로 현재 model들의 성능이 어디까지 와있는지, 그리고 앞으로 더 높은 성능을 내기 위해서 data labeling 측면에서 해결해야할 과제는 무엇인지에 대해서 이야기하고 있습니다. 논문이 양이 좀 많긴 하지만 기술적인 내용이 별로 없어서 쉽게 읽으실 수 있는데요, 자세한 내용이 궁금하신 분들은 영상을 참고해주세요!
논문링크: https://arxiv.org/abs/2005.11295
발표영상링크: https://youtu.be/CPMgX5ikL_8
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 217번째 논문 review입니다
이번 논문은 GoogleBrain에서 쓴 EfficientDet입니다. EfficientNet의 후속작으로 accuracy와 efficiency를 둘 다 잡기 위한 object detection 방법을 제안한 논문입니다. 이를 위하여 weighted bidirectional feature pyramid network(BiFPN)과 EfficientNet과 유사한 방법의 detection용 compound scaling 방법을 제안하고 있는데요, 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1911.09070
영상링크: https://youtu.be/11jDC8uZL0E
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 207번째 논문 review입니다
이번 논문은 YOLO v3입니다.
매우 유명한 논문이라서 크게 부연설명이 필요없을 것 같은데요, Object Detection algorithm들 중에 YOLO는 굉장히 특색있는 one-stage algorithm입니다. 이 논문에서는 YOLO v2(YOLO9000) 이후에 성능 향상을 위하여 어떤 것들을 적용하였는지 하나씩 설명해주고 있습니다. 또한 MS COCO의 metric인 average mAP에 대해서 비판하면서 mAP를 평가하는 방법에 대해서도 얘기를 하고 있는데요, 자세한 내용은 영상을 참고해주세요~
논문링크: https://arxiv.org/abs/1804.02767
영상링크: https://youtu.be/HMgcvgRrDcA
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 197번째 논문 review입니다
(2기 목표 200편까지 이제 3편이 남았습니다)
이번에 제가 발표한 논문은 FAIR(Facebook AI Research)에서 나온 One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers 입니다
한 장의 ticket으로 모든 복권에서 1등을 할 수 있다면 얼마나 좋을까요?
일반적인 network pruning 방법은 pruning 하기 이전에 학습된 network weight를 그대로 사용하면서 fine tuning하는 방법을 사용해왔습니다
pruning한 이후에 network에 weight를 random intialization한 후 학습하면 성능이 잘 나오지 않는 문제가 있었는데요
작년 MIT에서 나온 Lottery ticket hypothesis라는 논문에서는 이렇게 pruning된 이후의 network를 어떻게 random intialization하면 높은 성능을 낼 수 있는지
이 intialization 방법을 공개하며 lottery ticket의 winning ticket이라고 이름붙였습니다.
그런데 이 winning ticket이 혹시 다른 dataset이나 다른 optimizer를 사용하는 경우에도 잘 동작할 수 있을까요?
예를 들어 CIFAR10에서 찾은 winning ticket이 ImageNet에서도 winning ticket의 성능을 나타낼 수 있을까요?
이 논문은 이러한 질문에 대한 답을 실험을 통해서 확인하였고, initialization에 대한 여러가지 insight를 담고 있습니다.
자세한 내용은 발표 영상을 참고해주세요~!
영상링크: https://youtu.be/YmTNpF2OOjA
발표자료링크: https://www.slideshare.net/JinwonLee9/pr197-one-ticket-to-win-them-all-generalizing-lottery-ticket-initializations-across-datasets-and-optimizers
논문링크: https://arxiv.org/abs/1906.02773
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
TensorFlow-KR 논문읽기모임 PR12(12PR) 183번째 논문 review입니다.
이번에 살펴볼 논문은 Google Brain에서 발표한 MixNet입니다. Efficiency를 추구하는 CNN에서 depthwise convolution이 많이 사용되는데, 이 때 depthwise convolution filter의 size를 다양하게 해서 성능도 높이고 efficiency도 높이는 방법을 제안한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크 : https://arxiv.org/abs/1907.09595
발표영상 : https://youtu.be/252YxqpHzsg
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 155번째 논문 review 입니다.
이번에는 Facebook AI Research에서 최근에 나온(4/2) Exploring Randomly Wired Neural Networks for Image Recognition을 review해 보았습니다. random하게 generation된 network이 그동안 사람들이 온갖 노력을 들여서 만든 network 이상의 성능을 나타낸다는 결과로 많은 사람들에게 충격을 준 논문인데요, 자세한 내용은 자료와 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1904.01569
영상링크: https://youtu.be/NrmLteQ5BC4
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
Tensorfkow-KR 논문읽기모임 PR12 144번째 논문 review입니다.
이번에는 Efficient CNN의 대표 중 하나인 SqueezeNext를 review해보았습니다. SqueezeNext의 전신인 SqueezeNet도 같이 review하였고, CNN을 평가하는 metric에 대한 논문인 NetScore에서 SqueezeNext가 1등을 하여 NetScore도 같이 review하였습니다.
논문링크:
SqueezeNext - https://arxiv.org/abs/1803.10615
SqueezeNet - https://arxiv.org/abs/1602.07360
NetScore - https://arxiv.org/abs/1806.05512
영상링크: https://youtu.be/WReWeADJ3Pw
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Neuro-symbolic is not enough, we need neuro-*semantic*
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
1. A Simple Framework for Contrastive
Learning ofVisual Representations
Ting Chen, et al., “A Simple Framework for Contrastive Learning of Visual Representations”
8th March, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
2. References
• The Illustrated SimCLR Framework
https://amitness.com/2020/03/illustrated-simclr/
• Exploring SimCLR: A Simple Framework for Contrastive Learning of
Visual Representations
https://towardsdatascience.com/exploring-simclr-a-simple-framework-for-
contrastive-learning-of-visual-representations-158c30601e7e
• SimCLR: Contrastive Learning ofVisual Representations
https://medium.com/@nainaakash012/simclr-contrastive-learning-of-visual-
representations-52ecf1ac11fa
3. Introduction
• Learning effective visual representations without human supervision
is a long-standing problem.
• Most mainstream approaches fall into one of two classes: generative
or discriminative.
Generative approaches – pixel level generation is computationally expensive
and may not be necessary for representation learning.
Discriminative approaches learn representations using objective function like
supervised learning but pretext tasks have relied on somewhat ad-hoc
heuristics, which limits the generality of learn representations.
5. Contrastive Learning
• Contrastive methods aim to learn representations by enforcing
similar elements to be equal and dissimilar elements to be different.
6. Contrastive Learning – Data
• Example pairs of images which are similar and images which are
different are required for training a model
Images from “The Illustrated SimCLR Framework”
10. Contrastive Learning – Noise Contrastive
Estimator Loss
• x+ is a positive example and x- is a negative example
• sim(.) is a similarity function
• Note that each positive pair (x,x+) we have a set of K negatives
12. SimCLR – Overview
• A stochastic data augmentation module that transforms any given
data example randomly resulting in two correlated views of the same
example.
Random crop and resize(with random flip), color distortions, and Gaussian
blur
• ResNet50 is adopted as a encoder, and the output vector is from GAP
layer. (2048-dimension)
• Two layer MLP is used in projection head. (128-dimensional latent
space)
• No explicit negative sampling. 2(N-1) augmented examples within a
minibatch are used for negative samples. (N is a batch size)
• Cosine similarity function is a used similarity metric.
• Normalized temperature-scaled cross entropy(NT-Xent) loss is used.
13. SimCLR - Overview
• Training
Batch size : 256~8192
A batch size of 8192 gives 16382 negative examples per positive pair from
both augmentation views.
To stabilize the training, LARS optimizer is used.
Aggregating BN mean and variance over all devices during training.
With 128TPU v3 cores, it takes ~1.5 hours to train ResNet-50 with a batch
size of 4096 for 100 epochs
• Dataset – ImageNet 2012 dataset
• To evaluate the learned representations, linear evaluation protocol is
used.
14. Step by Step Example
Images from “The Illustrated SimCLR Framework”
15. Step by Step Example
Images from “The Illustrated SimCLR Framework”
16. Step by Step Example
Images from “The Illustrated SimCLR Framework”
17. Step by Step Example
Images from “The Illustrated SimCLR Framework”
18. Step by Step Example
Images from “The Illustrated SimCLR Framework”
19. Step by Step Example
Images from “The Illustrated SimCLR Framework”
20. Step by Step Example
Images from “The Illustrated SimCLR Framework”
21. Step by Step Example
Images from “The Illustrated SimCLR Framework”
22. Step by Step Example
Images from “The Illustrated SimCLR Framework”
23. Step by Step Example
Images from “The Illustrated SimCLR Framework”
24. Data Augmentation for Contrastive
Representation Learning
• Many existing approaches define contrastive prediction tasks by
changing architecture.
• The authors use only simple data augmentation methods, this simple
design choice conveniently decouples the predictive task from other
components such as the NN architecture.
25. Data Augmentation for Contrastive
Representation Learning
Augmentations in the red boxes are used
27. Evaluation of Data Augmentation
• Asymmetric data transformation method
is used.
Only one branch of the frame work is applied
the target transformation(s).
• No single transformation suffices to learn
good representations, even though the
model can almost perfectly identify the
positive pairs.
• Random cropping and random color
distortion stands out
When using only random cropping as data
augmentation is that most patches from an
image share a similar color distortion
28. Contrastive Learning Needs Stronger Data
Augmentation
• Stronger color augmentation substantially improves the linear
evaluation of the learned unsupervised models.
• A sophisticated augmentation policy(such as AutoAugment) does not
work better than simple cropping + (stronger) color distortion
• Unsupervised contrastive learning benefits from stronger (color) data
augmentation than supervised learning.
• Data augmentation that does not yield accuracy benefits for supervised
learning can still help considerably with contrastive learning.
29. Unsupervised Contrastive Learning Benefits from
Bigger Models
• Unsupervised learning benefits more from bigger models than its
supervised counter part.
30. Nonlinear Projection Head
• Nonlinear projection is better than a linear projection(+3%) and
much better than no projection(>10%)
31. Nonlinear Projection Head
• The hidden layer before the projection head is a better
representation than the layer after.
• The importance of using the representation before the nonlinear
projection is due to loss of information induced by the contrastive
loss. In particular z = g(h) is trained to be invariant to data
transformation.
32. Loss Function
• l2 normalization along with temperature effectively weights
different examples, and an appropriate temperature can help the
model learn from hard negatives.
• Unlike cross-entropy, other objective functions do not weigh the
negatives by their relative hardness.
33. Larger Batch Size and LongerTraining
• When the training epochs is small, larger batch size have a significant
advantage.
• Larger batch sizes provide more negative examples, facilitating
convergence, and training longer also provides more negative examples,
improving the results.
38. Appendix – Effects of LongerTraining for
Supervised Learning
• There is no significant benefit from training supervised models
longer on ImageNet.
• Stronger data augmentation slightly improves the accuracy of
ResNet-50 (4x) but does not help on ResNet-50.
40. Conclusion
• SimCLR differs from standard supervised learning on ImageNet only
in the choice of data augmentation, the use of nonlinear head at the
end of the network, and the loss function.
• Composition of data augmentations plays a critical role in defining
effective predictive tasks.
• Nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned
representations.
• Contrastive learning benefits from larger batch sizes and more
training steps compared to supervised learning.
• SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative
improvement over previous state-of-the-art, matching the
performance of a supervised ResNet-50.