The document provides an introduction to computer vision concepts including neural network structures, activation functions, convolution operators, pooling layers, and batch normalization. It then discusses image classification, including popular datasets, classification networks from LeNet to DLA, and experiments on car brand classification. Finally, it covers object detection, comparing region-based methods like R-CNN, Fast R-CNN, Faster R-CNN, and R-FCN to region-free methods like YOLO.
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Â
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly â the algorithms that make self-driving cars possible.Â
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well. Â
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly â the algorithms that make self-driving cars possible.Â
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well. Â
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.Â
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Â
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly â the algorithms that make self-driving cars possible.Â
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well. Â
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly â the algorithms that make self-driving cars possible.Â
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well. Â
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.Â
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
Â
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in todayâs day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Â
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
Face detection basedon image processing by using the segmentation methods for detection of the various types of the faces to helpfull for the many different careers and it will easy to do.
Learn the fundamentals of Deep Learning, Machine Learning, and AI, how they've impacted everyday technology, and what's coming next in Artificial Intelligence technology.
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Â
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
Efficient and accurate object detection has been an important topic in the advancement of computer vision systems.
Our project aims to detect the object with the goal of achieving high accuracy with a real-time performance.
In this project, we use a completely deep learning based approach to solve the problem of object detection.
The input to the system will be a real time image, and the output will be a bounding box corresponding to all the objects in the image, along with the class of object in each box.
Objective -
Develop a application that detects an object and it can be used for vehicles counting, when the object is a vehicle such as a bicycle or car, it can count how many vehicles have passed from a particular area or road and it can recognize human activity too.
Itâs long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest â the so-called AI winter. However, the last 3 years changed everything â again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. âThe pace of progress in artificial general intelligence is incredible fastâ (Elon Musk â CEO Tesla & SpaceX) leading to an AI that âwould be either the best or the worst thing ever to happen to humanityâ (Stephen Hawking â Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Letâs look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that âmachine learning is a core transformative way by which Google is rethinking everything they are doingâ and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industryâ (Jen-Hsun Huang â CEO NVIDIA).
Either a new AI âwinter is comingâ (Ned Stark â House Stark) or this new wave of innovation might turn out as the âlast invention humans ever need to makeâ (Nick Bostrom â AI Philosoph). Or maybe itâs just another great technology helping humans to achieve more.
Object extraction from satellite imagery using deep learningAly Abdelkareem
Â
Presentation for extract objects from satellite imagery using deep learning techniques. you find a comparison between state-of-art approaches in computer vision.
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
Â
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in todayâs day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Â
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
Face detection basedon image processing by using the segmentation methods for detection of the various types of the faces to helpfull for the many different careers and it will easy to do.
Learn the fundamentals of Deep Learning, Machine Learning, and AI, how they've impacted everyday technology, and what's coming next in Artificial Intelligence technology.
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Â
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
Efficient and accurate object detection has been an important topic in the advancement of computer vision systems.
Our project aims to detect the object with the goal of achieving high accuracy with a real-time performance.
In this project, we use a completely deep learning based approach to solve the problem of object detection.
The input to the system will be a real time image, and the output will be a bounding box corresponding to all the objects in the image, along with the class of object in each box.
Objective -
Develop a application that detects an object and it can be used for vehicles counting, when the object is a vehicle such as a bicycle or car, it can count how many vehicles have passed from a particular area or road and it can recognize human activity too.
Itâs long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest â the so-called AI winter. However, the last 3 years changed everything â again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. âThe pace of progress in artificial general intelligence is incredible fastâ (Elon Musk â CEO Tesla & SpaceX) leading to an AI that âwould be either the best or the worst thing ever to happen to humanityâ (Stephen Hawking â Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Letâs look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that âmachine learning is a core transformative way by which Google is rethinking everything they are doingâ and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industryâ (Jen-Hsun Huang â CEO NVIDIA).
Either a new AI âwinter is comingâ (Ned Stark â House Stark) or this new wave of innovation might turn out as the âlast invention humans ever need to makeâ (Nick Bostrom â AI Philosoph). Or maybe itâs just another great technology helping humans to achieve more.
Object extraction from satellite imagery using deep learningAly Abdelkareem
Â
Presentation for extract objects from satellite imagery using deep learning techniques. you find a comparison between state-of-art approaches in computer vision.
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
Come puoi gestire i difetti? Se sei in una fabbrica, la produzione può produrre oggetti con difetti. Oppure i valori dei sensori possono dirti nel tempo che alcuni valori non sono "normali". Cosa puoi fare come sviluppatore (non come Data Scientist) con .NET o Azure per rilevare queste anomalie? Vediamo come in questa sessione.
How can you handle defects? If you are in a factory, production can produce objects with defects. Or values from sensors can tell you over time that some values are not "normal". What can you do as a developer (not a Data Scientist) with .NET o Azure to detect these anomalies? Let's see how in this session.
Yinyin Liu presents at SD Robotics Meetup on November 8th, 2016. Deep learning has made great success in image understanding, speech, text recognition and natural language processing. Deep Learning also has tremendous potential to tackle the challenges in robotic vision, and sensorimotor learning in a robotic learning environment. In this talk, we will talk about how current and future deep learning technologies can be applied for robotic applications.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit-baidu
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Dr. Ren Wu, former distinguished scientist at Baidu's Institute of Deep Learning (IDL), presents the keynote talk, "Enabling Ubiquitous Visual Intelligence Through Deep Learning," at the May 2015 Embedded Vision Summit.
Deep learning techniques have been making headlines lately in computer vision research. Using techniques inspired by the human brain, deep learning employs massive replication of simple algorithms which learn to distinguish objects through training on vast numbers of examples. Neural networks trained in this way are gaining the ability to recognize objects as accurately as humans.
Some experts believe that deep learning will transform the field of vision, enabling the widespread deployment of visual intelligence in many types of systems and applications. But there are many practical problems to be solved before this goal can be reached. For example, how can we create the massive sets of real-world images required to train neural networks? And given their massive computational requirements, how can we deploy neural networks into applications like mobile and wearable devices with tight cost and power consumption constraints?
In this talk, Ren shares an insiderâs perspective on these and other critical questions related to the practical use of neural networks for vision, based on the pioneering work being conducted by his former team at Baidu.
Note 1: Regarding the ImageNet results included in this presentation, the organizers of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) have said: âBecause of the violation of the regulations of the test server, these results may not be directly comparable to results obtained and reported by other teams.â (http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015)
Note 2: The presenter, Ren Wu, has told the Embedded Vision Alliance that âThere was some ambiguity with the rules. According to the âofficialâ interpretation of the rules, there should be no more than 52 submissions within a half year. For us, we achieved the reported results after 200 tests total within a half year. We believe there is no way to obtain any measurable gains, nor did we try to obtain any gains, from an 'extra' hundred tests as our networks have billions of parameters and are trained by tens of billions of training samples.â
The guide for design wrapper of tensorflow to build model easily.
All the codes above are available on my github.
https://github.com/NySunShine/fusion-net
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
Â
Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
Â
Online vehicle marketplaces are embracing artificial intelligence to ease the process of selling a vehicle on their platform. The tedious work of copying information from the vehicle registration document into some web form can be automated with the help of smart text-spotting systems, in which the seller takes a picture of the document, and the necessary information is extracted automatically.
Florian Wilhelm details the components of a text-spotting system, including the subtasks of object detection and optical character recognition (OCR). Florian elaborates on the challenges of OCR in documents with various distortions and artifacts, which rule out off-the-shelf products for this task. After offering an overview of semisupervised learning based on generative adversarial networks (GANs), Florian evaluates the performance gains of this method compared to supervised learning. More specifically, for a varying amount of labeled data, he compares the accuracy of a convolution neural network (CNN) to a GANthat uses additional unlabeled data during the training phase, showing that GANs significantly outperform classical CNNs in use cases with a lack of labeled data.
What you'll learn:
Understand how semisupervised learning with GANs works
Explore beneficial semisupervised methods based on GANs for use cases with a limited amount of labeled data
Gain insight into an interesting OCR use case of an online vehicle marketplace
Event: O'Reilly Artificial Intelligence Conference, London, 11.10.2018
Speaker: Dr. Florian Wilhelm
Mehr Tech-Vorträge: www.inovex.de/vortraege
Mehr Tech-Artikel: www.inovex.de/blog
Object detection is a central problem in computer vision and underpins many applications from medical image analysis to autonomous driving. In this talk, we will review the basics of object detection from fundamental concepts to practical techniques. Then, we will dive into cutting-edge methods that use transformers to drastically simplify the object detection pipeline while maintaining predictive performance. Finally, we will show how to train these models at scale using Determinedâs integrated deep learning platform and then serve the models using MLflow.
What you will learn:
Basics of object detection including main concepts and techniques
Main ideas from the DETR and Deformable DETR approaches to object detection
Overview of the core capabilities of Determinedâs deep learning platform, with a focus on its support for effortless distributed training
How to serve models trained in Determined using MLflow
Similar to Computer vision for transportation (20)
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
Â
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 4: retinex model based low light enhancement
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
Â
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 3: prior embedding deep super resolution
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
Â
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 2: text centric image style transfer
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
Â
ICME2019 Tutorial: Intelligent Image Enhancement and Restoration - From Prior Driven Model to Advanced Deep Learning Part 1: prior embedding deep rain removal
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Â
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Â
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Â
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
Â
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
ER(Entity Relationship) Diagram for online shopping - TAE
Â
Computer vision for transportation
1. Haifeng SHEN
DiDi AI Labs
Zhengping CHE
DiDi AI Labs
Guangyu LI
DiDi AI Labs
Yuhong GUO
DiDi AI Labs
Carleton University
Jieping YE
DiDi AI Labs
Univ. of Michigan, Ann Arbor
17. LeNet
LeNet-5 (1998)
⢠A neural network architecture for handwritten and
machine-printed character recognition in 1990s
⢠Consists of seven layers including
⢠Convolution operations
⢠Pooling operations
⢠Full connections
Yann LeCun, et al., Gradient-Based Learning Applied to Document Recognition, 1998
Bottom-right: https://engmrk.com/lenet-5-a-classic-cnn-architecture/
18. AlexNet
AlexNet (2012)
⢠ILSVRC 2012 winner (16.4% top-5 error)
⢠60 million parameters and 650,000 neurons
⢠8 learned layers: 5 convolutional and 3 fully-connected layers
⢠A 1000-way softmax layer after the last fully-connected layer
⢠Dropout and ReLU
⢠Trained parallelly on 2 GPUs
Alex Krizhevsky, et al., ImageNet Classification with Deep Convolutional Neural Networks, 2012
Bottom-right: Nitish Srivastava, et al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, 2014
19. VGGNet
⢠Six versions with 5 group convolutions of 11 - 19 layers
⢠VGG16 (138 million parameters) and VGG19
⢠Only 3x3 conv and 2x2 max-pooling layers before FC layers
⢠Results @ ILSVRC 2014
⢠1st in localization task
⢠2nd in classification task (7.3% top-5 error)
VGGNet (2014)
Karen Simonyan, et al., Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014
20. GoogLeNet
⢠ILSVRC 2014 winner (6.7% top-5 error)
⢠22 layers with only 5 million model parameters
⢠Inception concept
⢠Multiple conv kernels including 1x1, 3x3, and 5x5
⢠1x1 kernel for dimension reduction
⢠Better representational power + fewer network parameters
⢠More advanced Inception modules (V2, V3, and V4) Inception-V1 Module
GoogLeNet (2014)
Christian Szegedy, et al., Going Deeper with Convolutions, 2015
21. ResNet
⢠1st place on the ILSVRC 2015 classification task (3.6% top-5 error)
⢠Deeper model with fewer filters and lower complexity
⢠34-layer baseline
⢠3.6 billion FLOPs
⢠only 18% of VGG-19 (19.6 billion FLOPs)
⢠Up to 152 layers!
⢠Initialization, batchnorm, residual blockâŚ
ResNet Block
ResNet (2015, top)
Kaiming He, et al., Deep Residual Learning for Image Recognition, 2016
http://kaiminghe.com/icml16tutorial/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf
22. DenseNet
â˘
! !"#
$
direct connections for % layers
⢠Fewer parameters and less computation
DenseNet Block
DenseNet (2016)
!" = $" !%, !', ⌠, !")'
Gao Huang, et al., Densely Connected Convolutional Networks, 2016
23. SENet
⢠ILSVRC 2017 winner (2.251% top-5 error)
⢠Squeeze-and-excitation block
⢠Squeeze: Global average pooling
⢠Excitation: Channel association
⢠Scale: Channel attention
⢠Integration with modern architectures
Squeeze-and-Excitation Block
SENet (2017)
Jie Hu, et al., Squeeze-and-Excitation Networks, 2018
24. DLA: Deep Layer Aggregation
DLA (2018)
⢠Layer aggregation to better fuse information
⢠Iterative deep aggregation (IDA)
⢠Semantic fusion
⢠Resolutions and scales
⢠Hierarchical deep aggregation (HDA)
⢠Spatial fusion
⢠Channels and depths (modules)
Fisher Yu, et al., Deep Layer Aggregation, 2018
25. Classification Experiments
Classification Accuracy
Method
Car Brand
Classification
with 66 classes
Car Brand
Classification
with 2506 classes
ResNet 94.60% -
SENet 92.30% -
DLA 96.02% 93.75%
⢠Dataset-1
⢠193186 images of 66 classes
⢠Collected offline
⢠Dataset-2
⢠549169 images of 2506
classes
⢠Collected offline + online
⢠Similar settings to the Stanford
Cars dataset
30. R-CNN: Regions with CNN Features
⢠Selective Search + CNN + SVM
⢠Start to use CNN features instead of the traditional features
⢠~2k bottom-up region proposals from selective search
⢠Time consuming
⢠Extracting feature for every proposal separately
Ross Girshick, et al., Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014
Bottom-Right: https://dl.dropboxusercontent.com/s/vlyrkgd8nz8gy5l/fast-rcnn.pdf
R-CNN (2014)
31. Fast R-CNN
⢠One image + multiple RoIs + a fully CNN
⢠RoI pooling: to generate fixed-size feature vector for each proposal
⢠Outputs: softmax probabilities + bounding-box regression offsets
⢠End-to-end training with a multi-task loss
Fast R-CNN (2015)
Right: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Ross Girshick, Fast R-CNN, 2015
32. Faster R-CNN
⢠Region proposal network (RPN) + Fast R-CNN
⢠RPN & detection network share full-image convolutional features
⢠Anchors with multiple scales and aspect ratios
Bottom-Left: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Shaoqing Ren, et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2015
Faster R-CNN (2015) Region Proposal Network
33. R-FCN: Region-based Fully Convolutional Networks
⢠Position-sensitive score map before RoI pooling
⢠9 positions: top/middle/bottom-left/center/right
⢠Position-sensitive RoI pooling instead of standard RoI pooling
⢠fully convolutional detection network instead of fully-connected detection network in Faster
Jifeng Dai, et al., R-FCN: Object Detection via Region-based Fully Convolutional Networks, 2016
R-FCN (2016)
Position-Sensitive Score Map
34. Light-Head R-CNN
⢠Heavy head
⢠E.g., Faster R-CNN & R-FCN
⢠Intensive computations around RoI warping
⢠Light-Head R-CNN
⢠Thin feature maps from large separable convolution layers
⢠Cheap R-CNN subnet with 1 FC-layer
Zeming Li, et al., Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017
Light-Head R-CNN (2017)
âHeavyâ-Head Detectors
Large Separable Convolution
35. FPN: Feature Pyramid Networks
⢠Bottom-up pathway
⢠Top-down pathway
⢠Lateral connection
Tsung-Yi Lin, et al., Feature Pyramid Networks for Object Detection, 2017
Different Feature Maps FPN Block
⢠Feature pyramid: Combination of
⢠Low-resolution, semantically strong features
⢠High-resolution, semantically weak features
36. Cascade R-CNN
⢠Multi-stage extension of R-CNN
⢠Trained sequentially using output of
the previous stage
⢠Cascaded bbox regression
⢠! ", $ = !& â !&() â ⯠â !) ", $
⢠Cascaded detection
⢠A sequence of detectors trained with
increasing IoU thresholds
Zhaowei Cai, et al., Cascade R-CNN: Delving into High Quality Object Detection, 2018
Cascade R-CNN
37. SNIP: Scale Normalization for Image Pyramids
⢠CNNs are not robust to changes in scale
⢠Multi-scale image pyramids for objects
with different scales
⢠Detections from each scale are rescaled
and combined using NMS
⢠Small objects from high-resolution image
⢠Large objects from low-resolution image
Bharat Singh, Scale Invariance in Object Detection - SNIP, 2018
38. YOLOv3 (2018)
YOLO: You Only Look Once
⢠End-to-end one-stage method
⢠Directly use full images to predict each bounding box
⢠Extremely fast in real-time speed
⢠YOLOv2
⢠Darknet19 backbone
⢠Anchor mechanism
⢠YOLOv3
⢠Multi-scale features
⢠Darknet53 backbone
Joseph Redmon, et al., You Only Look Once: Unified, Real-Time Object Detection, 2016
Joseph Redmon, et al., YOLO9000: Better, Faster, Stronger, 2017
Joseph Redmon, et al., YOLOv3: An Incremental Improvement, 2018
Top-Left: https://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/
Bottom-Left: https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b/
YOLO (2016)
39. SSD: Single Shot Detector
⢠Multiple feature maps with different resolutions and scales
⢠Improved speed/accuracy trade-off
Wei Liu, et al., SSD: Single Shot MultiBox Detector, 2016
SSD (2016)
YOLOv1
40. DSSD: Deconvolutional SSD
⢠Encoder-decoder Hourglass structure
⢠Wide â Narrow â Wide
⢠Convolution and deconvolution modules
⢠Deconvolution: To introduce additional
large-scale context for object detection
⢠Two prediction modules
⢠Each with one residual block
Cheng-Yang Fu, et al., DSSD: Deconvolutional Single Shot Detector, 2017
SSD
DSSD (2017)
Selected Prediction Module
41. RetinaNet
⢠Focal loss instead of cross entropy function
⢠Focus on training on a sparse set of hard samples
⢠!" #$ = â 1 â #$
( log #$
Tsung-Yi Lin, Focal Loss for Dense Object Detection, 2017
RetinaNet (2017)
42. RefineDet
Shifeng Zhang, et al., Single-Shot Refinement Neural Network for Object Detection, 2018
RefineDet (2018)
⢠Anchor refinement module
⢠Filtering out easy negatives
⢠Coarsely adjusting anchors
⢠Object detection module
⢠Further improving regression
⢠Prediction multi-class
Transfer Connection Block
43. CornerNet
⢠Object as a pair of bounding box corners
⢠No need for anchor boxes
⢠Regression problem
â Corner prediction problem
⢠Corner pooling
⢠To better localize corners of bounding box
Hei Law, et al, CornerNet: Detecting Objects as Paired Keypoints, 2018
CornerNet (2018)
Corner Pooling
44. ⢠Multiple Receptive Field block (MRF): Multiple receptive field and more features for prediction
⢠Auxiliary Semantic Segmentation block (ASM): Auxiliary semantic segmentation focusing on small object
⢠Object Detection block (ODM): Combining MRF and ASM with parallel training
⢠Loss function:
MRFSWSnet:
Siyang Sun, et al., Multiple Receptive Fields and Small-Object-Focusing Weakly-Supervised Segmentation Network for Fast Object Detection, 2019
Multiple Receptive Field Small-Object-Focusing
Weakly-Supervised Segmentation Net
45. Experiments on MRFSWSnet
Method Recall Precision F1 Score
Faster R-CNN 97.57 96.47 97.01
RetinaNet 97.80 97.80 97.80
Light-Head R-CNN 97.71 95.13 96.40
YOLOv3 98.57 97.32 97.94
MRFSWSnet 98.71 97.32 98.01
⢠Images collected by dash camera
⢠Detection on cellphone usage during driving
⢠1000 testing images
Siyang Sun, et al., Multiple Receptive Fields and Small-Object-Focusing Weakly-Supervised Segmentation Network for Fast Object Detection, 2019
46. ⢠Depend on large amount of labeled data, induce expensive annotation cost
⢠Difficult to be applied directly in new operation environments
⢠Computation intensive, highly demanding in computational resources
⢠Complicated models, time/memory consuming, which prevents usage in real
time operation systems(e,g. DMS)
Challenge
47. Yuhong GUO DiDi AI Labs & Carleton University
Part II: Advanced Topics
50. ⢠Definition [Pan et al., IJCAI13 ]:
Ability of a system to recognize and apply knowledge and skills learned in
previous domains/tasks to novel domains/tasks
⢠.
Domain Adaptation/Transfer Learning
S. Pan, Q. Yang and W. Fan. Tutorial: Transfer Learning with Applications, IJCAI 2013.
Tan, Chuanqi, et al. "A survey on deep transfer learning." International Conference on Artificial Neural Networks. Springer, Cham, 2018.
51. § Successful Application of ML in industry depends on learning from large
amount of labeled data
ĂExpensive, time consuming to collect labels
ĂDifficult or dangerous to collect data in certain scenarios, e.g, auto driving
§ Domain Adaptation/Transfer Learning provides essential ability of
ĂźReusing existing labeled resources
ĂźAdapting to changing environment
ĂźLearning from simulations
Why Domain Adaptation
52. Transfer Learning vs Traditional ML
Transfer Learning/Domain Adaptation
Training
domain/task A
Test
domain/task B
§
§
§
Traditional ML
(Semi-)Supervised Learning
Training
domain/task A
Test
domain/task B
§
§
§
55. Adapting to New Domains
§ Reuse existing datasets, hence the annotation information
ĂObject Recognition
ĂObject Detection
ĂPerson Re-Identification
ĂImage Segmentation
ĂImage Classification ⌠...
56. Learning from Simulations
§ Gathering data and training model are either too expensive, time-
consuming, or too dangerous
§ Solution: create data, learning from simulations
Ă
Ă
OpenAI's Universe will potentially allow us to train a
self-driving car using GTA 5 or other video games.
Training models on real robotics
is too slow and expensive
http://ruder.io/transfer-learning/index.html
57. Common Datasets
§ Object recognition:
Office-31:
§
§
§
ImageCLEF-DA:
§
§
§
§ Visual domain adaptation challenge
dataset VisDA-2017
§ Digits: MNIST, SVHN, USPS
§ Syn2Real dataset â a new dataset for object recognition
[Peng et al, 2018]
60. Three main classes:
§ Reweighting/Instance-based Methods
Ăź
§ Feature-based/Representation Learning Methods
Ăź
§ Parameter/Model- based Methods
Ăź
Categories of DA Methods
71. § A-distance, measure of distance between probability distribution
§ Bound on target domain error
Ă
Ă
Theoretical Connection
Ben-David et al. "Analysis of Representations for Domain Adaptationâ, NIPS 06
Kifer et al. Detecting change in data streams. In Very Large Databases (VLDB), 2004.
73. § DANN: Adversarial is
implemented via GRL (gradient
reverse layer)
Domain Adversarial Neural Network (DANN)
74. § Adversarial Discriminative Domain Adaptation (ADDA)
source CNN is trained without sacrificing any discriminativity
Model Sharing and Adversarial Adaptation
75. § Re-weight source domain label
distribution to help reduce domain
discrepancy and adapt classifier
§ Reweighted adversarial loss (RAAN)
Reweighted Adversarial Adaptation [Chen et al, CVPR 18]
Chen, et al. " â, CVPR 18
76. § Maximum Classifier Discrepancy (MCD):
Ă
Ă
§ Adversarial loss:
Target domain
prediction discrepancy
Alternative Adversarial Terms
K. Saito, et al. " Maximum Classifier Discrepancy for Unsupervised Domain Adaptationâ, CVPR 18
Train both classifiers and generator to
classify the source samples correctly
81. Object detection DA-Faster-R-CNN
§ Adversarial loss via GRL at both image level and instance level
§ Consistent regularization at the two levels
Multi-Level Adversarial Adaptation
Chen, et al. " â, CVPR 18
86. § Limitation of domain alignment techniques:
Ă
Ă
§ CyCADA:
Ă
Ă
Ă
Cycle-Consistent Adversarial DA
et al. " â, ICML 18
et al. ICML18
87. Cycle-Consistent Adversarial DA
et al. " â, ICML 18
et al. ICML18
image-level GAN loss (green), the feature level GAN loss (orange), the source and target semantic
consistency losses (black), the source cycle loss (red), and the source task loss (purple).
90. §
§
Pseudo-Label based Methods
Some positive application in domain adaptation:
ĂProgressive domain adaptation for Object detection
ĂFor recognition:
Zhang et al. " Collaborative and Adversarial Network for Unsupervised domain adaptation :â, CVPR 18
Inoue et al. " Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptationâ, CVPR 18
91. ⢠Unsupervised domain adaptation has received a lot of attention
⢠Open domain learning remains to be challenging, but starts drawing
attentions
⢠Most study has focused on classification problems
⢠Much less effort has been made on more complex tasks such as
object detection
Summary
93. Basics
Number of multiplications for one standard convolutional layer:
Input: !" x !" x M Output: !# x !# x N
!$: kernel size
M: number of input channels
N: number of output channels
!#: output dimension
94. Basics
⢠Architecture designâ lightweight models
Ă Use two 3 x 3 conv layer to replace 5 x 5 conv
layer:
(3x3+3x3)/(5x5)
Ă Use two sequential 1xn and n x 1 conv layers to
replace n x n conv layers
(1xn + n x 1)/(n x n)
97. Inception Module
Inception module with dimension reduction
V1 block (from googlenet)
Traditional 3X3
convolution block
Input: 28 X 28 X 192
Output: 28 X 28 X 256
#Model parameters:
3 X 3 X 192 X 256 = 442k
1 X 1 X 192 X 64
+1 X 1 X 192 X 96 + 3 X 3 X 96 X 128
+1 X 1 X 192 X 16 + 5 X 5 X 16 X 32
+0(maxpooling)+1 X 1 X 192 X 32 =163k
Previous layer
3X3 convolution
output layer
Szegedy et al. Going Deeper with Convolutions, https://arxiv.org/abs/1409.4842. 2014.
â˘
â˘
98. Inception V1, V2, V3
Szegedy et al. Going Deeper with Convolutions, https://arxiv.org/abs/1409.4842. 2014.
Sergey Ioffe et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,
http://arxiv.org/abs/1502.03167.2015
Rethinking the Inception Architecture for Computer Vision, http://arxiv.org/abs/1512.00567. 2015.
â˘
⢠Use two 3 x 3 conv
to replace 5 x 5 conv
â˘
1
99. Xception
François Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357. 2016-2017.
⢠Depthwise separable convolution
⢠Ă
â˘
(3 x 3 x 1 x M/7 x 112 x 112) x 7 â˘
â˘
100. SqueezeNet
Input: F x F x M
Squeeze:
⢠1x1 convs
output: F x F x S (S< M)
Expand:
⢠1x1 convs
output: F x F x e1
⢠3x3 convs
output: F x F x e2
Concate: F x F x (e1+e2)
Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. https://arxiv.org/abs/1602.07360. 2016
101. ⢠Standard:
⢠Depthwise separable conv
(1) depthwise conv: 1filter takes 1 input channel
(2) pointwise conv
1x1 convs
⢠Computation Reduction
MobileNet V1:
Andrew G. Howard et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
https://arxiv.org/abs/1704.04861?context=cs. 2017.
!" !"
!" !"
102. ⢠Standard:
⢠Depthwise separable conv
(1) depthwise conv: 1filter takes 1 input channel
(2) pointwise conv
1x1 convs
⢠Computation Reduction
MobileNet V1:
Andrew G. Howard et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
https://arxiv.org/abs/1704.04861?context=cs. 2017.
!" !"
!" !"
103. MobileNet V1
⢠Use conv with stride=2 to
replace pooling
⢠Add two super parameters:
Width multiplier Îą and
resolution multiplier Ď
⢠ι =1.0, 0.75, 0.5, 0.25;
⢠standard MobileNet when ι=1
104. MobileNet V2
MobileNetV1
MobileNetV2
Increase # channels
Linear bottlenecks:
removed nonlinear
activation in the low dim
Mark Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. https://arxiv.org/abs/1801.04381.2018.
inverted residual block
Increase dim, then reduce dim
105. ShuffleNet V1
⢠pointwise group convolution (1x 1 Conv)
⢠channel shuffle: help the information flowing across feature channels
⢠Use concat operation to concatenate two different channels
Xiangyu Zhang et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. https://arxiv.org/abs/1707.01083. 2017.
#g (groups)
106. ShuffleNet V1
Xiangyu Zhang et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. https://arxiv.org/abs/1707.01083. 2017.
108. ShuffleNet V2
Reduce memory
access cost:
⢠Channel Split (2g)
⢠remove group
convolution
⢠Put channel shuffle
module after
channel
concatenation
1)) ( ( 2 1)) ( (
Ningning Ma et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. https://arxiv.org/abs/1807.11164.2018.
109. Experiments - Classification
Model
mAP
(%)
Precision
(%)
Recall
(%)
Size
(MB)
Computation
speed
(ms/photo)
Server-based
+ Yolov2
99.62 99.60 99.65 N/A N/A
1.00xShuffleNet V2
+Yolov2
96.43 97.16 96.83 5.20 80.00
0.50xShuffleNet V2
+Yolov2
95.86 97.28 96.28 1.70 40.00
0.50xShuffleNet V2
+SSD
97.73 90.61 97.98 7.90 65.00
0.25xShuffleNet V2
+SSD
97.25 90.46 97.59 5.00 45.00
Category Abbreviation
front page of ID card id_card_f
Back page of ID card id_card_b
Front page of driver license driver_license_f
Back page of driver license driver_license_b
Front of main page in car license
front
car_license_f
Back of main page in car license front car_license_b
Supplementary Page in car license vehicle_license
Real car photo( whole car) Whole car
Real car photo(car plate) plate
115. Application - Pay by smiling
⢠In Sep. 2017, Alibaba's Ant
Financial affiliate and KFC
China announced facial-
recognition payment
available for customers in the
fast food restaurant chain's
new KPRO store in Hangzhou.
⢠"Smile to Pay" facial
recognition payment solution
at KFC enables customers to
pay without their wallets.
https://www.jrzj.com/194328.html
116. Application - Check-in at station
Taiyuan South railway stationBeijing West railway station Shanghai metro station
https://baijiahao.baidu.com/s?id=1552314447507461&wfr=spider&for=pc
http://www.sohu.com/a/220124437_99966914
http://dy.163.com/v2/article/detail/D5U3QH2P0525KG01.html
126. Overview - Detection & landmark dataset
Face detection
dataset
Available # faces # images Website Remarks
FDDB Public 5171 2845 http://vis-www.cs.umass.edu/fddb/ unconstrained face
WiderFace Public
32,20
3
393,703
http://mmlab.ie.cuhk.edu.hk/projects/W
IDERFace
Easy, Medium, Hard set, a high
degree of variability in scale, pose
and occlusion.
MALF Public
11,93
1
5,250 http://www.cbsr.ia.ac.cn/faceevaluation/
Bounding box, multi-Attribute
Labelled Faces, pose and facial
attributes
Caltech
10,000
Web Faces
Public - 10,524
http://www.vision.caltech.edu/Image_Da
tasets/Caltech_10K_WebFaces/
Collect from Google image search,
4 landmarks(two eyes, nose and
mouth)
PUB Public 9971
http://biometrics.put.poznan.pl/put-
face-database/
30 landmarks, 194 contour points
AFLW Public 25,993
https://www.tugraz.at/institute/icg/rese
arch/team-bischof/lrs/downloads/aflw/
Collect from Flickr, 21 landmarks
127. Overview - Detection - MTCNN
Kaipeng Zhang et al. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. https://arxiv.org/abs/1604.02878v1.2016.
⢠propose a deep cascaded multi-task framework with three stages, P-Net, R-
Net and O-Net.
⢠Each is a shallow network.
⢠P-Net: proposal network, produces candidate windows quickly through a
shallow CNN
⢠R-Net: refine network, refines the candidates to reject a large number of
non-faces windows through a more complex CNN
⢠O-Net: output network, use a more powerful CNN to refine the result and
output facial landmarks positions
128. Overview - Detection - Face RFCN
Yitong Wang et al. Detecting Faces Using Region-based Fully Convolutional Networks. https://arxiv.org/abs/1709.05256. 2017.
⢠The framework is based on the R-FCN.
⢠propose a region-based face detector applying deep networks in a fully
convolutional fashion
⢠introduce additional smaller anchors and modify the position-sensitive RoI
pooling to a smaller size for suiting the detection of the tiny faces.
⢠propose to use position-sensitive average pooling instead of normal
average pooling for the last feature voting in R-FCN
⢠use multi-scale training strategy and online Hard Example Mining (OHEM)
strategy.
129. Overview - Detection - PyramidBox
Xu Tang et al. PyramidBox: A Context-assisted Single Shot Face Detector. https://arxiv.org/abs/1803.07737?context=cs. 2018.
⢠Baidu proposes the PyramidBox.
⢠extended VGG16 backbone and generate
the feature maps at different levels
⢠generate a series of anchors
corresponding to larger regions related to
a face that contain more contextual
information, such as head, shoulder and
body.
130. Overview - Recognition - Dataset
Dataset Available # People # images Website Remarks
LFW Public 5K 13K
http://vis-
www.cs.umass.edu/lfw/#views
Labeled Faces in the Wild
YFD Public 1.5K 3.4K (Video)
https://www.cs.tau.ac.il/~wolf/ytfac
es/
YouTube Faces Database
CelebA
(CelebFaces
Attributes
Dataset)
Public 10K 202K
http://mmlab.ie.cuhk.edu.hk/project
s/CelebA.html
Multimedia Lab, The Chinese
University of Hong Kong
CASIA-WebFace Public 10K 500K
http://www.cbsr.ia.ac.cn/english/CAS
IA-WebFace/CASIA-
WebFace_Agreements.pdf
MS-Celeb-1M public 100K 10M https://www.msceleb.org
VGGFace2 Public 9k 3.3M
http://www.robots.ox.ac.uk/~vgg/da
ta/vgg_face2/
downloaded from Google Image
Search and have large variations in
pose, age, illumination, ethnicity
and profession
Facebook Private 4K 4,400K N/A
Google Private 8000K 100-200M N/A
132. Overview - Recognition - Results
Time Method Training size Method description LFW Comments
1991 Eigenfaces < 10k Principal component analysis(PCA) 60.02%
2006 LBP+CSML < 10k
Local binary pattern(LBP) + Metric
learning
85.57%
2013 High-dim LBP 0.1m High-dim LBP + Joint Bayesian 95.17%
2014 DeepFace 4m CNN + 3D face alignment 97.35% Facebook
2014 Deep ID 0.2m CNN + Softmax 97.45% CUHK
2015 VGGFace 2.6m VGG + Softmax 98.95% Oxford
2015 FaceNet 200m Inception + Triplet-Loss 99.63% Google
2015 Ensemble face 1.2m CNN + Multi-patch + Deep metric 99.77% Baidu
2016 Effective face 2.5m CNN + Augmentation 98.06% Pose + Shape + Expression
2017 SphereFace 0.5m CNN + Angular-Softmax 99.42%
Multiplicative angular margin:
cos(mθ)
2018 ArcFace 6.8m CNN + Additive angular margin 99.83%
Additive angular margin: cos(θ
+ m)
2019 Combined loss N/A cos(m1θ + m2) â m3
133. Overview - Recognition - DeepFace
Yaniv Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification.
https://ieeexplore.ieee.org/document/6909616. CVPR 2014.
⢠CNN + DNN structure
⢠L4 - L6 are locally connected layers without weight sharing, rather than the standard
convolutional layers
⢠The last two layers, i.e. F7 and F8 are fully-connected
⢠Employ 3D face modeling to apply the affine transformation for 3D face alignment and
get the frontal face
⢠more than 120 million parameters
⢠Train using four million facial images belonging to more than 4,000 identities
134. Overview - Recognition - DeepID
Yi Sun, Xiaogang Wang, Xiaoou Tang. Deep Learning Face Representation from Predicting 10,000 Classes. https://www.cv-
foundation.org/openaccess/content_cvpr_2014/papers/Sun_Deep_Learning_Face_2014_CVPR_paper.pdf. CVPR2014.
⢠Use face patch method and each patch use one ConvNet
⢠Each ConvNet has 4 layers
⢠60 face patches with ten regions, three scales, and RGB or gray channel.
⢠60 ConvNets x two 160-dimensional vectors and flipped counterpart, totally 19200-dimensional
vector for face verification
⢠achieves 97.45% face verification accuracy on LFW
⢠Based on DeepID1, Chinese University of Hong Kong provides DeepID2 and DeepID3
135. Overview - Recognition - FaceNet
Florian Schroff et al. FaceNet: A Unified Embedding for Face Recognition and Clustering. https://arxiv.org/abs/1503.03832. CVPR 2015.
⢠Google proposes the structure.
⢠Directly use a deep convolutional network
⢠Use triplet loss for training: minimizes the distance between an anchor and a positive,
both of which have the same identity, and maximizes the distance between the
anchor and a negative of a different identity
⢠Use the Euclidean distance to measure the face similarity for verification.
136. Overview - Recognition - Ensemble Face
Jingtuo Liu et al. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. https://arxiv.org/pdf/1506.07310. 2015.
⢠Multi-patch feature extraction.
⢠9 image patches and each patch is centered at different landmarks on face region.
⢠Each patch: 9 convolution layers and a softmax layer at the end
⢠Concatenate the last convolution layer of each network to build the high dimensional feature for the face
representation
⢠metric learning method with triplet loss is used for feature reduction and obtain 128/256 dimensions.
⢠achieve the accuracy (99.77%) of LFW under 6000 pair evaluation protocol
137. Overview - Recognition - Effective Face
Iacopo Masi et al. Do We Really Need to Collect Millions of Faces for Effective Face Recognition. https://arxiv.org/abs/1603.07057. CVPR 2016.
⢠Use a single VGGNet with 19 layers
⢠Training on both real and augmented data
⢠use the CASIA WebFace collection data and generate the artificial data
by introducing pose variations, shape variation and expression
variation
138. Jiankang Deng et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. https://arxiv.org/abs/1801.07698. 2019.
Multiplicative angular
margin: cos(mθ)
Additive angular
margin: cos(θ + m)
Additive cosine
margin: cos(θ ) - mcosθ
Combined loss:
Overview - Recognition - Combined loss
139. Experiments - Combined loss
Test set feature softmax shpereface cosface arcface
Combined
loss
LFW public 98.75 99.52 99.50 99.55 99.60
7k private 93.60 95.45 95.90 96.72 97.13
50k private 93.28 95.93 95.50 97.08 96.90
zc private 99.18 99.37 99.45 99.57 99.52
avg 96.20 97.57 97.59 98.23 98.29
⢠7k/50k The test set is extracted from registered driver photo database. 3K positive
pair and 3k negative pair are randomly selected from 7k/50k drivers respectively.
⢠zc the test set is randomly extracted from premier driver photo database. 3K
positive pair and 3K negative pair are randomly selected for the testing.
140. Experiments - Virtual learning
drastically improves the performances over the baseline softmax on both LFW and SLLFW datasets, e.g. from 99.10%
to 99.46% and 94.59% to 95.85%, respectively.
Binghui Chen, Weihong Deng, Haifeng Shen. Virtual Class Enhanced Discriminative Embedding Learning. https://arxiv.org/abs/1811.12611. 2018
142. Experiments - Face detection
q WIDER FACE dataset is a face detection benchmark dataset,
collected from the publicly available WIDER dataset.
q Choose 32,203 images and label 393,703 faces with a high
degree of variability in scale, pose and occlusion as depicted
in the sample images.
q Propose DFS method and use semantic fused feature maps
as contextual cues and construct a semantic segmentation
for training supervision and to learn the best representations
q Win 5 rank-1 results in April. 2019
Widerface: http://shuoyang1213.me/WIDERFACE/index.html
Wanxin Tian, Zixuan Wang, Haifeng Shen, Weihong Deng, et al. Learning Better Features for Face Detection
with Feature Fusion and Segmentation Supervision. https://arxiv.org/abs/1811.08557. 2018-2019.
144. What can we learn from Driving Scenario?
⢠What is in a driving scenario?
⢠How far are they from ego-vehicle?
⢠How does human driver interact with environment?
Vision Perception
3D Reconstruction
Behavior Analysis
145. Driving Scenarios v.s. General Computer Vision
Data
⢠Multi-modal (i.e. multiple sensors including Camera LiDAR, GPS, IMU etc.)
⢠Collected from 3D Open Area (Not Indoor/Lab Environments)
⢠Ego-centric / First Person
Requirements
â˘
â˘
â˘
Opportunities
â˘
â˘
â˘
146. Main Components
⢠Pedestrian
⢠Vehicle
⢠Road
⢠Traffic Sign / Light
Vision Perception in Driving Scenario
Detect, Segment, Track and Classify Object-of-interest in Driving Scenarios
What does Vision Perception do:
149. Vision Perception â Pedestrian Detection
Pedestrian detection at 100FPS
⢠Uses Cascades
⢠Fast features
⢠Not a CNN based model
Benenson et al â12 âVeryFastâ
100+ FPS detector. NO CNNs.
150. Vision Perception â Pedestrian Detection
Real-time Pedestrian Detection with CNNs
⢠Uses Cascades
⢠Uses fast non-CNN features
⢠Use CNNs for max accuracy with minimum speed
sacrifice
Angelova et al â15 âDeepCascadesâ
Real-time (15FPS) with CNNs
151. Vision Perception â Pedestrian Detection
Occlusion-aware pedestrian detection
⢠Aggregation loss (enforce proposals to be close
and locate compactly)
⢠Occlusion-aware region of interest (PORoI)
(integrate prior structure information of human
to handle occlusion)
⢠Based on Faster RCNN
Zhang et al â18 âOR-CNNâ
State of the Art (by April 2019)
153. Vision Perception â Vehicle Detection
Vehicle detection in 3D from image
⢠Directly from 2D image
⢠Proposal Generation as Energy Minimization
⢠Orientation Estimation Network
Chen et al â16 â3D Bounding Boxâ
Breakthrough for 3D Detection with Mono Image
154. Vision Perception â Vehicle Detection
Multi-View 3D object Detection
⢠Multi-sensor fusion
Chen et al â17 âMV3Dâ
Impressive accuracy gain for considering multi-sensors fusion
155. Vision Perception â Vehicle Detection
Multi-level Fusion based 3D Object Detection
from Mono Images
⢠Simultaneously propose 2D RPN and predict 3D
location, orientation, dimensions
Xu et al â18 âMulti-level Fusionâ
State of the Art for 3D Detection from Mono Camear Images
156. Vision Perception â Road Segmentation
Joint Semantic Prediction
⢠KITTI Road Detection top performance 2017
⢠Multi-task framework
⢠Real-time
⢠Uses RGB image only
Teichmann et al â17 âMultiNetâ
Speed + Accuracy with RGB image only
157. Vision Perception â Road Segmentation
LIDAR-Camera Fusion
⢠KITTI Road Detection top performance 2018
⢠Cross Fusion mechanism with FCN
Caltagirone et al â18 âLidCamNetâ
LIDAR-Camera Fusion RULES
158. Vision Perception â Road Segmentation
LIDAR-Camera Fusion with LIDAR Adaptation
⢠KITTI Road Detection current top performance
⢠Progressive LIDAR Adaptation
Chen et al â19 âPLARDâ
State of the Art Performance
160. Vision Perception â Traffic Sign Detection
IJCNN 2011 Traffic Sign Recognition Competition
⢠Ciresan et al â11: 0.56% error
⢠Human: 1.16% error
⢠Non-CNN: 3.86%
Ciresan et al â11 âTraffic Sign Recognitionâ
Traffic Sign Recognition is EASY (Super-human Performance)
161. Vision Perception â Traffic Sign Detection
Detecting Small Signs from Large Images
⢠Brake large image into small patches
⢠Small-Object-Sensitive-CNN (SOS-CNN)
⢠Based on SSD
Meng et al â17 âSOS-CNNâ
Handle Small Objects
162. What can we learn from Driving Scenario?
⢠What is in a driving scenario?
⢠How far are they from ego-vehicle?
⢠How does human driver interact with environment?
Vision Perception
3D Reconstruction
Behavior Analysis
163. Main Components
â˘
â˘
â˘
â˘
3D Reconstruction in Driving Scenario
Recover real-world Location and Pose of Driving Scenario Objects (2D to 3D)
What does 3D Reconstruction do:
5 mins Theoretic Backgrounds (a little Math)
169. 3D Reconstruction â Semantic Reconstruction
Kundu et al â14 âJoint semantic and 3D reconstruction from monocular videoâ
Semantic + 3D Reconstruction from Mono Camera
170. 3D Reconstruction â Semantic Reconstruction
Cherabier et al â16 âMulti-label semantic 3d reconstruction using voxel blocksâ
Efficient Dense Semantic + 3D Reconstruction
171. What can we learn from Driving Scenario?
⢠What is in a driving scenario?
⢠How far are they from ego-vehicle?
⢠How does human driver interact with environment?
Vision Perception
3D Reconstruction
Behavior Analysis
172. Driving Scenario Understanding
Honda Research Institute Driving Dataset
⢠104 Hours Real Human Driving records
⢠Driving Behavior and Causal Reasoning annotation
Ramanishka et al â18 âHDDâ
First Dataset towards Driving Scenario Understanding
173. Driving Scenario Understanding
Driving Attention Prediction from Video
⢠Focus on Driverâs Attention
⢠In-car v.s. In-lab test
Xia et al â18 âPredicting Driver Attentionâ
Introduce Attention Heat Maps
175. GAIA Open Dataset
⢠Dataset : D2 âCity Dataset
⢠D²-City is a large-scale driving video dataset that provides more than 10k videos recorded
in 720p HD or 1080p FHD from front-facing dashcams, with annotations for object
detection and tracking.
n 1k videos -
annotation of the
bounding boxes and
tracking IDs of road
objects into 12
different categories.
n 9k videos -
annotation the
bounding boxes in
key frames.