Modern perception pipelines in autonomous driving (AD) systems are based on Deep Neural Networks (DNNs) which utilize multiple hyper-parameter configurations and training strategies. Data augmentations is now a well-established training strategy to improve the generalization of DNNs, especially in a low dataset regime. Self-supervised learning and semi-supervised methods depend heavily on data augmentation strategies. In this study we view generalization due to data augmentations training DNNs since they implicitly model the geometric, viewpoint based transformations present on images/pointclouds due to noise, perspective, motion of the ego-vehicle. We shortly review current data augmentation strategies for perception tasks in AD, and recent developments on understanding its effects on model generalization.
In the talk we shall review data augmentation strategies through two case studies:
- Improving model performance of monocular 3D object detection model by using geometry preserving data augmentations on images
- Understand the role of data augmentation in reducing data redundancy and improving label efficiency within an active learning pipeline
- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Title: Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Microscopy Images
THE 28th IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS
5 - 7 October 2020
Video Link: https://youtu.be/b5tGt6GMN9E
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Title: Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Microscopy Images
THE 28th IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS
5 - 7 October 2020
Video Link: https://youtu.be/b5tGt6GMN9E
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
A NOVEL APPROACH FOR IMAGE WATERMARKING USING DCT AND JND TECHNIQUESijiert bestjournal
Today�s world is digital world. Nowadays,in every field there is enormous use of digital contents. Information handled on internet and multimedia netw ork system is in digital form. The copying of digital content without quality loss is not so diff icult. Due to this,there are more chances of copyi ng of such digital information. So,there is great need o f prohibiting such illegal copyright of digital med ia. Digital watermarking is the powerful solution to ad dress this problem. Digital watermarking is the technology in which there is embedding of various t ypes of information in digital content which we have to protect from illegal copying. This embedded information to protect the data is embedded as watermark. This paper introduces two novel techniqu es for image watermarking such as DCT and JND. The DCT based approach adapted to embed waterm arks in DC,low,mid and high frequency components coefficient of DCT. The JND based approa ch gives robust and transparent scheme of watermarking that exploits the �human visual system s� sensitivity to local image characteristics obtained from the spatial domain,improving upon th e content based image watermarking scheme.
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Uncertainty-quantification tasks are often ``many query'' in nature, as they require repeated evaluations of a model that often corresponds to a parameterized system of nonlinear equations (e.g., arising from the spatial discretization of a PDE). To make this task tractable for large-scale models, low-fidelity models (e.g., reduced-order models, coarse-mesh solutions) must be employed. However, such approximations introduce additional error, which may be treated as a source of epistemic uncertainty that must be quantified to ensure rigor in the ultimate UQ result. We present a new approach to quantify the error (i.e., epistemic uncertainty) introduced by these low-fidelity models approximations. The approach (1) engineers features that are informative of the error using concepts related to dual-weighted residuals and rigorous error bounds, and (2) applies machine learning regression techniques (e.g., artificial neural networks, random forests, support vector machines) to construct a statistical model of the error from these features. We consider both (signed) errors in quantities of interest, as well as global state-space error norms. We present several examples to demonstrate the effectiveness of the proposed approach compared to more conventional feature and regression choices. In each of the examples, the predicted errors have a coefficient of determination value of at least 0.998.
Image Compression using Combined Approach of EZW and LZWIJERA Editor
Image Processing is most popular and widely used technique. In this paper we had proposed a technique for image compression. Here the user can return the processed image to the original image without any loss. In this proposed technique the test images are processed through the image compression algorithm. Here we apply combined approach of EZW and LZW. This technique is implemented on different types of images like .bmp, .tiff,.dcm (medical images),binary images. Performance of the proposed technique can be evaluated compared to LBG techniques by the parameters like PSNR, Compression ratio and MSE
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
A NOVEL APPROACH FOR IMAGE WATERMARKING USING DCT AND JND TECHNIQUESijiert bestjournal
Today�s world is digital world. Nowadays,in every field there is enormous use of digital contents. Information handled on internet and multimedia netw ork system is in digital form. The copying of digital content without quality loss is not so diff icult. Due to this,there are more chances of copyi ng of such digital information. So,there is great need o f prohibiting such illegal copyright of digital med ia. Digital watermarking is the powerful solution to ad dress this problem. Digital watermarking is the technology in which there is embedding of various t ypes of information in digital content which we have to protect from illegal copying. This embedded information to protect the data is embedded as watermark. This paper introduces two novel techniqu es for image watermarking such as DCT and JND. The DCT based approach adapted to embed waterm arks in DC,low,mid and high frequency components coefficient of DCT. The JND based approa ch gives robust and transparent scheme of watermarking that exploits the �human visual system s� sensitivity to local image characteristics obtained from the spatial domain,improving upon th e content based image watermarking scheme.
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Uncertainty-quantification tasks are often ``many query'' in nature, as they require repeated evaluations of a model that often corresponds to a parameterized system of nonlinear equations (e.g., arising from the spatial discretization of a PDE). To make this task tractable for large-scale models, low-fidelity models (e.g., reduced-order models, coarse-mesh solutions) must be employed. However, such approximations introduce additional error, which may be treated as a source of epistemic uncertainty that must be quantified to ensure rigor in the ultimate UQ result. We present a new approach to quantify the error (i.e., epistemic uncertainty) introduced by these low-fidelity models approximations. The approach (1) engineers features that are informative of the error using concepts related to dual-weighted residuals and rigorous error bounds, and (2) applies machine learning regression techniques (e.g., artificial neural networks, random forests, support vector machines) to construct a statistical model of the error from these features. We consider both (signed) errors in quantities of interest, as well as global state-space error norms. We present several examples to demonstrate the effectiveness of the proposed approach compared to more conventional feature and regression choices. In each of the examples, the predicted errors have a coefficient of determination value of at least 0.998.
Image Compression using Combined Approach of EZW and LZWIJERA Editor
Image Processing is most popular and widely used technique. In this paper we had proposed a technique for image compression. Here the user can return the processed image to the original image without any loss. In this proposed technique the test images are processed through the image compression algorithm. Here we apply combined approach of EZW and LZW. This technique is implemented on different types of images like .bmp, .tiff,.dcm (medical images),binary images. Performance of the proposed technique can be evaluated compared to LBG techniques by the parameters like PSNR, Compression ratio and MSE
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...ijtsrd
Presently present TAM FD, a novel expansion of tricolor constriction model custom fitted for the difficult issue of shadow identification in pictures. Past strategies for shadow discovery center on learning the neighborhood appearance of shadow areas, while utilizing restricted nearby setting thinking as pairwise possibilities in a Conditional Random Field. Interestingly, the proposed methodology can display more elevated amount connections and worldwide scene attributes. We train a shadow locator that relates to the generator of a restrictive TAM, and expand its shadow precision by consolidating the run of the mill TAM misfortune with an information misfortune term utilizing highlight descriptor. Shadows happen when articles impede direct light from a wellspring of enlightenment, which is generally the sun. As indicated by the rule of arrangement, shadows can be separated into cast shadow and self shadow. Cast shadow is planned by the projection of articles toward the light source self shadow alludes to the piece of the item that isnt enlightened. For a cast shadow, the piece of it where direct light is totally hindered by an article is named the umbra, while the part where direct light is mostly blocked is named the obscuration. On account of the presence of an obscuration, there wont be an unequivocal limit among shadowed and non shadowed regions the shadows cause incomplete or all out loss of radiometric data in the influenced zones, and therefore, they make errands like picture elucidation, object identification and acknowledgment, and change recognition progressively troublesome or even inconceivable. SDI record improves by 1.76 . Shading segment record for safeguard shading difference during evacuation of shadow procedure is improved by 9.75 . Standardize immersion esteem discovery file NSVDI is improve by 1.89 for distinguish shadow pixel. Rakesh Dangi | Anjana Nigam ""Shadow Detection and Removal using Tricolor Attenuation Model Based on Feature Descriptor"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25127.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25127/shadow-detection-and-removal-using-tricolor-attenuation-model-based-on-feature-descriptor/rakesh-dangi
This presentation is an analysis of the paper,"SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing"
Noise Removal in Traffic Sign Detection SystemsCSEIJJournal
The application of Traffic sign detection and recognition is growing in traffic assistant driving systems and
automatic driving systems. It helps drivers and automatic driving systems to detect and recognize the
traffic signs effectively. However, it is found that it may be difficult for these systems to work in challenging
environments like rain, haze, hue, etc. To help the detection systems to have better performance in
challenging conditions like rain and haze, we propose the use of a deep learning technique based on a
Convolutional Neural Network to process visual data. The processed data could be used in the detection.
We are using the NoiseNet model [11], a noise reduction network for our architecture. The model is
trained to enhance images in patches instead of as a whole. The training is done using the Challenging
Unreal and Real Environment - Traffic Sign Detection Dataset(CURE-TSD) which contains videos of
different roads in various challenging situations. The
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. N AVYA IN TRO DU CTIO N
2
Ambition level 4 for all our platforms
PASSENGERS TRANSPORT GOODS TRANSPORT
Au t o n o m ® S h u t t l e
Au t o n o m ® S h u t t l e
E v o
A u t o n o m ® T r a c t
A T 1 3 5
CUSTOM SELF-
DRIVING SOLUTION
D R I V E N B Y N AV YA
3. ML team at Navya principally works on:
Camera :
2D Object detection and drivable zone segm. (2D-OD, MTL)
Traffic light detection and relevancy (TLDR)
3D Monocular object detection (3D-MOD)
LiDAR :
Large scale semantic segmentation on pointclouds
Instance segmentation on pointclouds
Semantic Navya Dataset
N AVYA ML TE AM
D e e p l e a r n i n g m o d u l e s o n c a m e r a a n d L i D AR
3
Alexandre Almin Thomas Gauthier Hao Liu
Leo Lemarie Anh Duong
B Ravi Kiran
4. OVE RVIE W
4
What is a data augmentation (DA)
Categories of data augmentation (classification, detection, segmentation)
Theoretical frameworks for DA
Geometry preserving 2D-DA for 3D monocular detection
DA for 3D monocular object detection
Self supervision pretext tasks for 3D monocular detection
Evaluation on KITTI 3D detection dataset
DA for data redundancy in Active learning (AL) pipeline
Semantic segmentation on pointclouds
Building an AL pipeline for mining informative samples
Evaluation on Semantic-KITTI dataset
5. Generates augmented I/O pairs
Performs model regularization & reduce the effect of overfitting in
low dataset regime
Increases the diversity of small datasets
Fundamentally models invariance/equivariance to real world
transformations that generate samples of a dataset
Image vs pointcloud augmentations
Representations:
• Images conserve an inherent matrix representation
• Pointclouds are sets with arbitrary input domain size
Instance transform:
• Objects in pointclouds are separable from their background,
enabling for easy 3D transformations (rotation translation).
• Though this does require the transformation to account for
change in pointcloud density with change in
distance/orientation
DATA AU GME N TATIO N
A B r i e f r e v i e w
5
https://blog.waymo.com/2020/04/using-automated-data-augmentation-to.html
https://albumentations.ai/docs/ ,
https://imgaug.readthedocs.io/en/latest/
6. DA model invariances to represent data Anselmi, et al. 2016
Translation invariance already baked into CNNs due to convolutions
Rotations, Scaling, color transformations…
DA connection with kernel theory Dao, Tri, et al 2019
DA augmentations are seen as a Markov process with transitions defined per sample
DA are represented within a group G Jane H. Lee et al 2020
Augmented sample distribution gX are approximately invariant under the action of group elements g
X ~= gX, g from G
The probability of an augmentation sample is approximately equal to the original sample
WH Y DATA AU GME N TATIO N S WO RK
R e v i e w i n g t h e o r e t i c a l u n d e r s t a n d i n g
6
• Anselmi, et al. "On invarianceand selectivity in representation learning." Information and Inference: A Journalof the IMA 2016.
• Dao, Tri, et al. "A kernel theory of moderndata augmentation." InternationalConference onMachine Learning.PMLR, 2019.
• Chen, Shuxiao, Edgar Dobriban, and Jane H. Lee. "A group-theoretic frameworkfor data augmentation." JMLR21.245 (2020): 1-71.
7. 3D-MOD is a key component of
obstacle detection pipeline
Redundancy & Fusion with LiDAR 3D detection pipeline
Stereo depth estimation pipelines are progressively
replaced with monocular depth estimation
Motivation : DA for 3D-MOD
Datasets for 3D-MOD are costly to create
Data augmentations on 2D object detector’s change
image geometry
View synthesis methods are robust, but heavy
How to reuse existing annotations to be a self-supervised
task ?
C A S E ST U DY 1 : 3 D M O N O C U L A R O B J E C T D E T E C T I O N ( 3 D - M O D )
E x p l o r i n g 2 D D a t a Au g m e n t a t i o n ( D A) f o r 3 D M o n o c u l a r O b j e c t D e t e c t i o n
7
Joint work: Sugirtha T, Sridevi M, Khailash Santhakumar, Hao Liu B Ravi Kiran, Thomas Gauthier, Senthil
Yogamani Accepted at ICCV Workshop on Self-supervised learning for Autonomouas Driving (SSLAD 2021)
Problem formulation : Data augmentation
What transformationsor augmentations could be
performed to (image, 2D-BB) pair that do not change
the depth, orientation or scale of the bounding box?
Problemformulation : Pretext SSL task
What scalable auxilliary task along with a methodology
to generate annotation could be added to the 3D-MOD
detector primary task ?
8. DATA AU GME N TATIO N S F O R O BJ E CT DE TE CTIO N
8
Weather
based
Label independent
image transforms
Bounding boxInstance based
2D boxes
Monocular 3D boxes
Mosaic
Flip
Affine
Geometry
preserving
Geometry
Aware*
Mosaic-Tile
Affine
Color Tx
Cutmix
Blur
Cutout
Mixup
Shadow
Rain , Snow
Fog, Flare
Gravel
Autumn
Box Mixup
Box
Cutpaste
View
synthesis
Camera
Postion
Apply X
tobox
Learning
based
RandAugment
FastAugment
RL-Search
Focal Length
Receptive
Field
Differentia
ble Neural
Rendering GAN based
Style Transfer
Style Transfer
Self-Supervised & Semi-supervised augmentations
Random
window
classifier
Jigsaw
Augmix
Adeversarial
Rotation
prediction
Contrastive
Losses
*Closest work : Lian, Qing, et al. "Geometry-aware data
augmentation for monocular 3D object detection." arXiv
preprint arXiv:2104.05858 (2021).
Blur, Cutout
Mask
Cutpaste
9. GE O ME TRY PRE SE RVIN G 2D DATA AU GME N TATIO N S
F o r m o n o c u l a r 3 d o b j e c t d e t e c t i o n
9
New data augmentations : Box-Mixup, Box-Cutpaste and Mosaic Tile
Geometry preservingaugmentation: These transformations do not change the
camera viewpoint or the 3D orientationof the objects in the scene
10. Self-supervised learning aims
at adding auxiliary/pretext tasks
Where the labels are either automatically
generated either by another sensors (LiDAR) or
are correlated task
The pretext task is correlated with the primary
task and thus training on the pretext task
provides better performance on the primary task
Multi-object labeling (MOL)
Established pretext tasks for 2D object detection
Generate random windows covering existing
foreground bounding boxes
Create soft label showing the proportion of areas
of different classes in the random window
SE LF -SU PE RVISE D LE ARN IN G
F o r m o n o c u l a r 3 d o b j e c t d e t e c t i o n
10
Implentation from :
Lee, Wonhee, Joonil Na, and Gunhee Kim. "Multi-task self-supervised object
detection via recycling of bounding box annotations." Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2019.
Soft-label over randomwindow
11. SE LF -SU PE RVISE D LE ARN IN G
F o r m o n o c u l a r 3 d o b j e c t d e t e c t i o n
11
Li, Peixuan, et al. "Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving." Computer Vision–ECCV 2020: 16th
European Conference, Glasgow
, UK, August 23–28, 2020, Proceedings, Part III 16. Springer International Publishing, 2020.
12. E VALUATIO N ME TRICS
E v a l u a t i n g 3 D o b j e c t d e t e c t i o n
12
• We evaluateperformance using the mean Average
Precision (mAP)
• We propose a new weighted ICFW mAP using the
Inverse of the Class Frequency Weights to evaluate
gains in non majority classes (especially in KITTI)
• We use the KITTI 3D object detection metrics
• Average Precision (AP) per class
• mAP 2D and ICFW mAP 2D
• mAP 3D and ICFW mAP 3D
• mAP BEV and ICFW mAP BEV
• mAP AOS* and ICFW mAP AOS
* AOS : Averageorientation similarity
BEV mAP: Bird Eye View 2d Box Map
Normalized Class Frequency on validation set of KITTI 3D
New proposed metric
13. RE SU LTS : SE LF SU PE RVISE D LE ARN IN G WITH DA
13
IoU=0.5 mAP2D mAPBEV mAP3D ICFW mAP2D ICFW mAPBEV ICFW mAP3D
Baseline (B) 41.44 21.17 19.12 33 15.1 14.65
Self-Supervised Learning (SSL) with MOL
B + 8W 0.85 0.53 0.46 0.83 0.7 0.54
B + 16W 0.59 -0.75 -0.59 0.57 -1.88 -1.73
B + 32W 1.4 0.29 0.12 1.75 0.12 -0.17
Data Augmentation (DA)
B + Cutout4 -0.91 0.11 -0.71 -2.79 0.15 -0.54
B + BoxMixup 0.39 0.29 0.21 0.53 0.12 0.04
B + Cutpaste 1.63 1.10 0.34 3.22 1.91 0.49
B + Mosaic -2.61 -1.43 -0.26 -2.96 -0.17 0.09
SSL-MOL + DA
B + 16W + Cutout 1.54 1.27 0.43 2.17 2.81 1.02
B + 16 W + box mixup 1.2 1.67 1.66 1.42 2.57 2.59
B + 16 W + boxmixup cutout 3.51 1.84 1.01 5.57 2.53 1.02
B +16 W + cutpaste cutout 2.87 1.38 2.26 5 1.13 1.19
B +16 W + cutpaste 0.98 0.67 0.72 1.61 0.65 0.73
The number of windows hyper-parameter with composition of data augmentation has been
optimized for in this study and requires either a grid search or DA-search.
14. RE SU LTS : E XAMPLE S
14
MOL-SSL with cutout & cutpaste DA (bottom)
Baseline (top)
Baseline BEV
Left
MOL-SSL cutout &
cutpaste DA BEV Right
SSL withdataaugmentations performs better at detecting pedestrians/cyclistsas well as cars at larger distances
15. MTL vs SSL
2D DA for 3D-MOD
DA stand-alone improves the performance of the main task
Box mixup/cut-paste augmentation performs well
MOL-SSL task
Enables the RTM3D network to classify foreground regions with
different classes and background proportions better
This is correlated with the box localisation task
DA-SSL Synergy :
Data augmentation also helps the main task by providing
representations that generalize for both main and augmented
pretext-tasks
SSL pretext tasks and their augmentations are both good
regularizers (inductive bias) and can be combined fruitfully
SSL-pretext task provides a soft-label and makes training with DA
generalize better (hypothesis)
Cons : Seperating augmentations between main and pretext task
is not possible.
CO N CLU SIO N CASE STU DY 1
15
Backbone
Task 1
Task 2
Backbone
Main
Task
Pretext
task
MTL : https://ruder.io/multi-task/
16. Large scale pointcloud semantic segmentation are fundamental
building blocks in modern AD perception stacks:
Semantic Map layer in modern HDMaps
Drivable zone extraction & Path planning
Semantic re-localization and others…
CASE STU DY 2 : DATA RE DU N DAN CY O N LARGE DATASE TS
S t u d y i n g d a t a a u g m e n t a t i o n u n d e r Ac t i v e l e a r n i n g s e t u p t o r e d u c e d a t a r e d u n d a n c y
16
This work was granted access to the HPC resources of [TGCC/CINES/IDRIS] under the allocation
2021- [AD011012836] made by GENCI (Grand Equipement National de Calcul Intensif)
Compressing Semantic-KITTI: Reducing dataredundancy on pointclouds by Active learning,
Anh Duong, Alexandre Almin, Leo Lemarie, B Ravi Kiran, In Submission 2021
17. Pointcloud semantic
segmentation datasets have large
amounts of redundant information
Similar scans due to temporal correlation
Similar scans due to similar urban environments
Similar scans due to symmetries
Motivation :
Data augmentations on large dataset had little gains
How do we reduce redundancy or similar samples
(pointcloud, GT) by selecting
Full Dataset = A core subset + Augmentations
Approach :
Study the effect of data augmentations on the active
learning sampling
CASE STU DY 2 : DATA RE DU N DAN CY O N LARGE DATASE TS
S t u d y i n g d a t a a u g m e n t a t i o n u n d e r Ac t i v e l e a r n i n g s e t u p t o r e d u c e d a t a r e d u n d a n c y
17
Large dataset (Core-subset, Aug)
Intuition: Augmentationshere model
equivariancesand should enable us to
compress the dataset
18. Active learning (AL) is an interactive
learning procedure
That greedily samples the most informative sample(s)to
maximize the performanceof the model.
Multiple ways to decide the mostinformative subsetto label
: likelihood P(x|M), uncertainty P(y|x)
AL Components:
Training subsetS
Query subsetQ
Heuristic func.h (random, entropy, BALD)
Aggregation func.(aggregatespixel scores to scalar)
Redundant/large datasetD
Testset T
Goals :
AL goal : Reduce annotation requests to Oracle
• Reduces Labeling Cost
Our goal : Reduce redundancylarge datasets
• ReducesTraining Time
ACTIVE LE ARN IN G
B a c k g r o u n d
18
2. Evaluate&
Save Model
3. Query
Samples Q
4. Get
labelson Q
Annotator
Largeredundant
dataset D (w/o T)
Oracle
OR OR
5. Update
subset S with Q
1. TrainModel
With DA/no DA
0. Select Initial
subset S from D
Common Test
set T from D
19. Large scale pointcloud sequences with semantic labels per point
Annotations include semantic class along with instance ID information
Panoptic-Nuscenes provides panoptic tracklet level labels which are temporally consistent across pointcloud scans
Established Architectures : Rangenet++, Salsanext, Cylinder3D
PO IN TCLO U D SE MAN TIC SE GME N TATIO N DATASE T
19
Semantic
KITTI Panoptic
nuscenes
20. DATASE T SIZ E
S e m a n t i c s e g m e n t a t i o n o n p o i n t c l o u d s
20
Dataset Cities Sequences
(Or Points)
#classes Annotation Sequential
Semantic
KITTI
1x Germany 22 (long) 28 Point, Instance Yes
Panoptic
Nuscenes
Boston
Singapore
1000
40K scans
32 Point, Box,
Instance
Yes
PandaSet 2x USA 100 37 Point, Box Yes
Semantic
Navya(ours*)
22x Cities in 10 Countries
France, Swiss, US, Denmark,
Japan, Germany, Australia,
Israel, Norway, New Zealand
22 (long)
50K scans
24 Point,
Instance
Yes
*in construction
21. SE MAN TIC N AVYA
L a r g e s c a l e s e m a n t i c s e g m e n t a t i o n d a t a s e t
21
22. R AN GE IMAGE RE PRE SE N TATIO N
22
P o i n t c l o u d r e p r e s e n t a t i o n
23. Entropy heuristic :
Choose sample with the largest entropy
Samples that are most uncertain
BALD
Measures information gain between model
predictions, and perturbed* model predictions
Select samples that maximize the information
gain from model parameters
H E U RISTIC F U N CTIO N
23
24. DATA AU GME N TATIO N IN PO IN TCLO U DS
U s i n g t h e r a n g e i m a g e r e p r e s e n t a t i o n
24
25. DATA AU GME N TATIO N IN PO IN TCLO U DS
U s i n g t h e r a n g e i m a g e r e p r e s e n t a t i o n
25
26. E XPE RIME N TAL SE TU P O N SE MAN TIC KIT TI
26
2. Evaluate &
Save Model
3. Query Set Q
(240 samples)
4. Get labels
on Q
Annotator
Subsampled
Semantic KITTI
dataset D (6000)
Oracle
OR OR
5. Update subset
S with Q
1. TrainSqSegV2
With no-DA OR DA
0. Select Initial
subset 100 samples
Test set T
Semantic KITTI
2000 samples
Dataaugmentationsetup
Uniformsamplingon5 DA
- RandomDropout
- Cutout/FrustumDropout
- Gaussiannoise depth
- Gaussiannoise intensity
- Instance Cutpaste
AL training setup
AL steps : 25 Budget : 240 samples
Full Dataset D: 6000 samples
[Random 1/5th subset of Semantic KITTI]
Model : SqueezeSegV2
Pointcloud representation : Spherical Range image
Test Set T : 2000 samples
Heuristics : Random, Entropy, BALD
Data-augmentation : with and without
27. RE SU LTS
L a b e l E f f i c i e n c y
27
DA provides gains in performance at each AL step on Full dataset of 6000
samples (Semantic KITTI subset)
28. LABE L E F F ICIE N CY
W i t h a n d w i t h o u t d a t a a u g m e n t a t i o n
28
BALD with DA has best performance on SemanticKITTI.
29. R AN KE D H ARD E XAMPLE S BY TH E H E U RISTIC
B AL D v s R a n d o m
29
• Larger diversity in samples from the BALD heuristic.
• No Heuristic scores are available for Random heuristic since they are sampled uniformly
Ground truth
Prediction
Heuristic function
Ground truth
Prediction
Heuristic function
Ground truth
Prediction
Heuristic function
30. E F F E CT O F DATA AU GME N TATIO N F O R CLASSIF ICATIO N
30
Beck, Nathan, et al. "Effective Evaluation of Deep Active Learning on
ImageClassification Tasks." arXiv preprintarXiv:2106.15324 (2021).
31. DA enable better accuracies at each AL
loop step
DA provides better label efficiency by sampling pointclouds that
are different from dataset+DA samples
A heuristic’s performance with DA applied
depends on the task
Entropy with DA was better than a sophisticated heuristic
function such as BADGE
BALD along with DA was better than Random which performed
better than Entropy
Tasks : Classification vs Semantic Segmentation
Recent work on Semi-Supervised learning
to AL framework
Similar effect of Data augmentations while working with
unsupervised DA (consistency loss)
CO N CLU SIO N S & F U TU RE WO RK CASE STU DY 2
31
Complete benchmark on full semantic KITTI
and Semantic Navya dataset
Aggregation maps uncertainty scores across
a whole PC/image into a scalar
Require a way to sample regions/volumes of PCs
Heuristic functions are scalars and confound multiple regions of the
image
Find the most informativeset of data
augmentations for a dataset to reduce
redundancy
Conclusions Future Work
32. 1. Shorten, Connor, and Taghi M. Khoshgoftaar. "A survey on image data augmentation for deep learning." Journal
of Big Data 6.1 (2019): 1-48.
2. Li, Peixuan, et al. "Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous
driving." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part III 16. Springer International Publishing, 2020.
3. Hu, Hou-Ning, et al. "Monocular Quasi-Dense 3D Object Tracking." arXiv preprint arXiv:2103.07351 (2021).
4. Santhakumar, Khailash, et al. "Exploring 2D data augmentation for 3D monocular object detection." arXiv preprint
arXiv:2104.10786 (2021).
5. Lian, Qing, et al. "Geometry-aware data augmentation for monocular 3D object detection." arXiv preprint
arXiv:2104.05858 (2021).
6. Ning, Guanghan, et al. "Data Augmentation for Object Detection via Differentiable Neural Rendering." arXiv
preprint arXiv:2103.02852 (2021).
7. Chen, Shuxiao, Edgar Dobriban, and Jane H. Lee. "A group-theoretic framework for data augmentation." Journal
of Machine Learning Research 21.245 (2020): 1-71.
8. Beck, Nathan, et al. "Effective Evaluation of Deep Active Learning on Image Classification Tasks." arXiv preprint
arXiv:2106.15324 (2021).
RE F E RE N CES I
D a t a a u g m e n t a t i o n s f o r m o n o c u l a r 3 d d e t e c t i o n
32
33. 1. Behley, Jens, et al. "Semantickitti: A dataset for semantic scene understanding of lidar
sequences." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
2. Fong, Whye Kit, et al. "Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation
and Tracking." arXiv preprint arXiv:2109.03805 (2021).
3. Milioto, Andres, et al. "Rangenet++: Fast and accurate lidar semantic segmentation." 2019 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019.
4. Cortinhal, Tiago, George Tzelepis, and Eren Erdal Aksoy. "SalsaNext: fast, uncertainty-aware semantic
segmentation of LiDAR point clouds for autonomous driving." arXiv preprint arXiv:2003.03653 (2020).
5. Zhou, Hui, et al. "Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation."
arXiv preprint arXiv:2008.01550 (2020).
6. Hahner, M., Dai, D., Liniger, A., & Van Gool, L. (2020). Quantifying data augmentation for lidar based 3d
object detection. arXiv preprint arXiv:2004.01643.
7. Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." arXiv
preprint arXiv:1906.03671 (2019), ICLR 2020. BADGE
8. https://baal.readthedocs.io/en/latest/
9. https://decile-team-distil.readthedocs.io/en/latest/index.html
10. Gao, Mingfei, et al. "Consistency-based semi-supervised active learning: Towards minimizing labeling cost."
European Conference on Computer Vision. Springer, Cham, 2020.
RE F E RE N CES II
D a t a a u g m e n t a t i o n s w i t h i n Ac t i v e l e a r n i n g f o r d a t a r e d u n d a n c y
33