Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
This work addresses the challenge of hate speech detection in Internet memes, and attempts using visual information to automatically detect hate speech, unlike any previous work of our knowledge. Memes are pixel-based multimedia documents that contain photos or illustrations together with phrases which, when combined, usually adopt a funny meaning. However, hate memes are also used to spread hate through social networks, so their automatic detection would help reduce their harmful societal impact. Our results indicate that the model can learn to detect some of the memes, but that the task is far from being solved with this simple architecture. While previous work focuses on linguistic hate speech, our experiments indicate how the visual modality can be much more informative for hate speech detection than the linguistic one in memes. In our experiments, we built a dataset of 5,020 memes to train and evaluate a multi-layer perceptron over the visual and language representations, whether independently or fused.
https://github.com/imatge-upc/hate-speech-detection
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
This presentation was prepared as part of the curriculum studies for CSCI-659 Topics in Artificial Intelligence Course - Machine Learning in Computational Linguistics.
It was prepared under guidance of Prof. Sandra Kubler.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...Analytics India Magazine
We started relying on the decisions made by deep learning models, however why it works and how it works are still big questions for most of us. We shall try to open that black box of deep learning which is essential to build trust for wide spread adoption. The speaker shall address the importance of feature visualization and localization in deep learning models esp. convolutional neural networks. He shares the results of applying methods such as activation map, deconvolution and Grad-CAM in healthcare.
Bias: It is the amount by which Machine Learning (ML) model predictions differ from the actual value of the target.
Variance: It is the amount by which the ML model prediction would change if we estimate it using different training datasets.
Reinforcement learning:policy gradient (part 1)Bean Yen
The policy gradient theorem is from "Reinforcement Learning : An Introduction". DPG and DDPG is from the original paper.
original link https://docs.google.com/presentation/d/1I3QqfY6h2Pb0a-KEIbKy6v5NuZtnTMLN16Fl-IuNtUo/edit?usp=sharing
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
This work addresses the challenge of hate speech detection in Internet memes, and attempts using visual information to automatically detect hate speech, unlike any previous work of our knowledge. Memes are pixel-based multimedia documents that contain photos or illustrations together with phrases which, when combined, usually adopt a funny meaning. However, hate memes are also used to spread hate through social networks, so their automatic detection would help reduce their harmful societal impact. Our results indicate that the model can learn to detect some of the memes, but that the task is far from being solved with this simple architecture. While previous work focuses on linguistic hate speech, our experiments indicate how the visual modality can be much more informative for hate speech detection than the linguistic one in memes. In our experiments, we built a dataset of 5,020 memes to train and evaluate a multi-layer perceptron over the visual and language representations, whether independently or fused.
https://github.com/imatge-upc/hate-speech-detection
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
This presentation was prepared as part of the curriculum studies for CSCI-659 Topics in Artificial Intelligence Course - Machine Learning in Computational Linguistics.
It was prepared under guidance of Prof. Sandra Kubler.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...Analytics India Magazine
We started relying on the decisions made by deep learning models, however why it works and how it works are still big questions for most of us. We shall try to open that black box of deep learning which is essential to build trust for wide spread adoption. The speaker shall address the importance of feature visualization and localization in deep learning models esp. convolutional neural networks. He shares the results of applying methods such as activation map, deconvolution and Grad-CAM in healthcare.
Bias: It is the amount by which Machine Learning (ML) model predictions differ from the actual value of the target.
Variance: It is the amount by which the ML model prediction would change if we estimate it using different training datasets.
Reinforcement learning:policy gradient (part 1)Bean Yen
The policy gradient theorem is from "Reinforcement Learning : An Introduction". DPG and DDPG is from the original paper.
original link https://docs.google.com/presentation/d/1I3QqfY6h2Pb0a-KEIbKy6v5NuZtnTMLN16Fl-IuNtUo/edit?usp=sharing
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
Seeing what a gan cannot generate: paper reviewQuantUniversity
Seeing what a GAN cannot Generate Paper review: Bau, David et al. “Seeing What a GAN Cannot Generate.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 4501-4510.
발표자: 이활석 (Naver Clova)
발표일: 2017.11.
(현) NAVER Clova Vision
(현) TFKR 운영진
개요:
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨지고 있습니다.
특히 컴퓨터 비전 기술 분야에서는 지도학습에 해당하는 이미지 내에 존재하는 정보를 찾는 인식 기술에서,
비지도학습에 해당하는 특정 정보를 담는 이미지를 생성하는 기술인 생성 기술로 연구 동향이 바뀌어 가고 있습니다.
본 세미나에서는 생성 기술의 두 축을 담당하고 있는 VAE(variational autoencoder)와 GAN(generative adversarial network) 동작 원리에 대해서 간략히 살펴 보고, 관련된 주요 논문들의 결과를 공유하고자 합니다.
딥러닝에 대한 지식이 없더라도 생성 모델을 학습할 수 있는 두 방법론인 VAE와 GAN의 개념에 대해 이해하고
그 기술 수준을 파악할 수 있도록 강의 내용을 구성하였습니다.
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2016-member-meeting-mit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Vivienne Sze, Assistant Professor at MIT, delivers the presentation "Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural Networks" at the September 2016 Embedded Vision Alliance Member Meeting. Sze describes the results of her team's recent research on optimized hardware for deep learning.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
in this presentation you can learn how to decompose an image into layout and find the predictions. In this presentation , I mention all the data in very convenient way , I hope you can take it easy.
Thank you.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (tutorial)
1. GaitSet
R e ga r d i n g G a i t a s a S e t Fo r
C r o s s - Vi e w G a i t R e c o g n i t i o n
2. Gait Recognition: Identify persons by gait
PAMIGroup FDU
Long Distance: Gait VS. Face, Fingerprint, iris…
No Need of Cooperation: Gait VS. Fingerprint, iris…
Has broad applications in crime prevention,
forensic identification and social security
Robust to the Appearance Change: Gait VS. Person re-id
Classical approach
Gait template
Segmentation, Alignment
Remove color
& texture
Extract temporal
information
Pixel-level Operations
3. A template aggregates information in pixel-level.
It’s not rational. Obviously, it will lose temporal information.
PAMIGroup FDU
4. A template aggregates information in pixel-level.
It’s not rational. Obviously, it will lose temporal information.
Neural network ought to have access to each frame.
Then… → Gait Sequence?
PAMIGroup FDU
5. A template aggregates information in pixel-level.
It’s not rational. Obviously, it will lose temporal information.
Neural network ought to have access to each frame.
Then… → Gait Sequence?
Order in a sequence could cause a bunch of issues:
How to unify frame rate? Unify walking speed? Align first frame?
PAMIGroup FDU
6. A template aggregates information in pixel-level.
It’s not rational. Obviously, it will lose temporal information.
Neural network ought to have access to each frame.
Then… → Gait Sequence?
Order in a sequence could cause a bunch of issues:
How to unify frame rate? Unify walking speed? Align first frame?
Let’s get rid of order!
SET
PAMIGroup FDU
7. Regarding Gait as Set
PAMIGroup FDU
Single Image Set of Image
Sequence of
Image
√
Permutation
invariance
Views
Walking
Condition
Single Multiple Multiple
Multiple
×
SingleSingle
√
8. GaitSet: Set Pooling (SP)
Use CNN to extract feature of each silhouette in an input set
Then What?
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
PAMIGroup FDU
9. GaitSet: Set Pooling (SP)
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set of feature maps → Feature map of set
Use CNN to extract feature of each silhouette in an input set
PAMIGroup FDU
10. GaitSet: Set Pooling (SP)
• A permutation invariant function
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set Pooling
(SP)
• Able to take a set with arbitrary cardinality
The permutation of the element in the input set
should not influence the output, formulated as:
𝐺 𝑣 𝑗
𝑗 = 1,2, … , 𝑛 = 𝐺 𝑣 𝜋(𝑗)
𝑗 = 1,2, … , 𝑛
where 𝜋 is any permutation.
To ensure the flexibility of the model
Since in real life scenario, the number of a person’s
gait silhouettes can be arbitrary
PAMIGroup FDU
11. GaitSet: Set Pooling (SP)
• Statistical Function
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
Set Pooling
(SP)
• Joint Functions
max ⋅ , mean ⋅ , median(⋅)
• max ⋅ + mean ⋅ + median ⋅
• 1_1𝐶(𝑐𝑎𝑡 max ⋅ , mean ⋅ , median ⋅ )
1_1𝐶: 1 × 1 convolutional layer,
𝑐𝑎𝑡: concatenate
• Attention
Copy𝑛times
1,𝑐,ℎ,𝑤
→(𝑛,𝑐,ℎ,𝑤)
𝐴
(1, 𝑐, ℎ, 𝑤)
×
(𝑛, 𝑐, ℎ, 𝑤) 1_1𝐶
𝑚𝑒𝑑𝑖𝑎𝑛
𝑚𝑒𝑎𝑛
𝑚𝑎𝑥
𝑐𝑎𝑡 + 𝑚𝑎𝑥 𝑧(𝑐, ℎ, 𝑤)
Use the global information to learn an attention
map for each frame-level feature map to refine itTorefine
featuremap
12. GaitSet: Horizontal Pyramid Mapping
• Strips: help the network focus on features with different scales
• Different discriminative information has different scales:
movement of hands, foots, heads, shoulders…
movement of arms, legs…
movement of whole body
• It is commonly used in person re-id
, , …
PAMIGroup FDU
13. GAP+GMP
𝑤
ℎ 𝑐 𝑑
𝑛=
𝑠=1
𝑆
2𝑠−1
fc1,1
fc2,2
fc2,1
fcS,1~2 𝑆−1
𝑑
𝑓′ 𝑓
𝑧2,1
𝑧2,2
GaitSet: Horizontal Pyramid Mapping
• Strips: help the network focus on features with different scales
• To be more efficient, we split the set-level feature map
instead of the original silhouettes.
CNN &
Pooling
CNN &
Pooling
CNN &
Pooling
SP
𝑐 𝑤
ℎ
GAP: Global Average Pooling
GMP: Global Max Pooling
• Parameters in FCs are not shared
Different strips represents features with
different scales in different positions.
PAMIGroup FDU
14. GaitSet: Main Pipeline
PAMIGroup FDU
CNN: extracting information from each silhouette
→ SP: aggregate frame information into set information
→ HPM: get discriminative representation
• Loss: Triplet Loss for each strip
• Test: concatenate all strips to
get the representation of the
input silhouette set
15. GaitSet: Pipeline
PAMIGroup FDU
• Multi-layer Global Pipeline (MGP)
Shallow layer focus on local and fine-grained information
Deep layer focus on global and coarse-grained information
Use MGP to collect various-level set information
16. Ablation
• Set VS. GEI (Line 1~2)
With identical network
Exceeds for over 10%
• HPM: Share VS. NOT (Line 2~3)
Exceeds around 2~3%
• MGP (Last 2 line)
Exceeds around 1~3%
• Different SP (Line 3~8)
CL: max(⋅)
BG: 1_1𝐶(𝑐𝑎𝑡 max, mean, median )
NM: attention
max(⋅) is chosen for its simplicity
CASIA-B:
124subjects×11views ×3walking conditions(10videos/view) = 13,640 videos
Walking Condition: NM: normal; BG: carrying bag; CL: with coat
PAMIGroup FDU
19. Fast
Directly learns the representation instead of measuring the similarity between a pair of gait.
DATA Network Representation
Learn representation
√ Linear: 𝑛 + 𝑚 × network complexity
For 𝑛 samples in probe and 𝑚 samples in gallery
Measure similarity between a pair of sample
× Quadratic: n × 𝑚 × network complexity
DATA1 Network
Same ID?
DATA2 Network
The average computational
complexity for one sample in
CASIA-B is 8.6GFLOPs
PAMIGroup FDU
20. Flexible: Limited Silhouettes
25.0
44.1
58.5
68.5
75.2
79.5
82.5
84.7 86.1
87.7
92.9
94.3 94.6 94.8
0.0
20.0
40.0
60.0
80.0
100.0
0 10 20 30 40 50 60 70 80 90 100
Random Images
All Images
95.0
25
Rank-1accuracy(%)
Number of selected Images
• Robustness: reach an 82.5% accuracy with 7 silhouettes
• Our method does learn the motional gait information
• The accuracy rises monotonically with the increase of the silhouette’s number
• One gait period contains around 25 silhouettes. More silhouettes will NOT bring much more
motional information. Consistently, the accuracy is close to the best performance at this position.
PAMIGroup FDU
21. Flexible: Multiple Views & Walking Conditions
Make full use of each silhouette
An input set can contain any number of non-consecutive silhouettes filmed
under different viewpoints with different walking conditions
Set Contains Two Views
• Combine both parallel and vertical information
• Generally, the larger the difference between
two views is, the better the results are
Set Contains Two Walking Conditions
• The accuracies rise with the increase of
silhouettes number.
• BG & CL have complementary information
PAMIGroup FDU