31.10.2018
NOVI SAD APPLIED INTELLIGENCE
COMMUNITY
City.AI NOVI SAD shaping up around a community whose goal
are :
To help local actors develop efficiently the Serbian
branch on AI internationally
To work around applied AI challenges with the local &
global ecosystem actors
To democratize AI innovation and close the gap
between technology and society
To train and challenge the local community
LEVERAGING THE POTENTIAL OF AI IN 50+ CITIES
AFRICA
Accra - Lagos
ASIA
Bangalore - Bangkok - Beirut - Chiang Mai - Hanoi - Hong Kong -
Jakarta - Johor Bahru - Karachi - Lahore - Manila - Pune - Seoul -
Singapore - Taipei
AUSTRALASIA
Wellington
EUROPE
Amsterdam - Berlin - Bratislava - Bristol - Brussels - Bucharest -
Budapest - Cambridge - Cluj - Cologne - Copenhagen - Hamburg -
Iasi - Krakow - Kyiv - London - Madrid - Munich - Novi Sad - Oxford
- Paris - Sofia - Stockholm - Stuttgart - Tallinn - Tirana - Valencia -
Valletta - Vienna - Vilnius
NORTH AMERICA
Austin - LA - New York - San Diego - San Francisco
SOUTH AMERICA
Bogota - La Paz - Sao Paulo
NOVI SAD AI TEAM
Jovan Stojanovic
Ambassador of Novi Sad AI and
Belgrade AI
Marko Jocic
Co-Ambassador of Novi Sad-AI
Jovana Miletic
Co-ambassador of Novi Sad-AI
Jovana Vukicevic
Co-Ambassador of Belgrade-AI
Jovan Stojanovic
Co-Ambassador of Nis-AI
PhD Velibor Ilic
RT RK - Senior research & development engineer
on
Semantic segmentation of images using deep convolutional
neural networks (pixel level segmentation)
Aleksa Corovic
Machine learning engineer
on
The Usage of the YOLO Algorithm for Traffic Participants
Detection
LESSONS LEARNED BY
SEMANTIC SEGMENTATION OF IMAGES USING DEEP
CONVOLUTIONAL NEURAL NETWORKS
(pixel level segmentation)
Novi Sad, October 2018
dr Velibor Ilić
RT RK Automotive
[Novi Sad AI] #3.0 - Deep Learning in Automotive Industry
1. Convolutive neural networks
2. What is Semantic Segmentation?
3. Architecture of Neural Network for Image Segmentation
4. Examples
Overview of the presentation
• Applying of semantic segmentation on geological data
- detection of salt in the soil
• Detection of traffic participants in photos and videos
AI & Machine Learning in Automotive Industry
Smart routing and Point of
Interest optimization
In-vehicle intelligence
Predictive decisions
Computer vision
Predictive vehicle
maintenance
AI & Machine Learning in Automotive Industry
Smart routing and Point of
Interest optimization
In-vehicle intelligence
Predictive decisions
Computer vision
Predictive vehicle
maintenance
Convolutive neural networks
Convolution + ReLU
Maxpooling layer
Fully connected + ReLU
Softmax
• A convolutional neural network (CNN or ConvNet) is one of
the most popular algorithms for deep learning,
• model learns to perform classification tasks directly from
images, video, text, or sound.
• CNNs are useful for finding patterns in images to recognize
objects, faces, and scenes.
224x224x
3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Vehicle
- Buss
- Truck
- Bicycle
- …
- Pedestrian
Convolutive neural networks
Convolution + ReLU
Maxpooling layer
Fully connected + ReLU
Softmax
• A convolutional neural network (CNN or ConvNet) is one of
the most popular algorithms for deep learning,
• model learns to perform classification tasks directly from
images, video, text, or sound.
• CNNs are useful for finding patterns in images to recognize
objects, faces, and scenes.
224x224x
3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Vehicle
- Buss
- Truck
- Bicycle
- …
- Pedestrian
Convolutive neural networks
Convolution + ReLU
Maxpooling layer
Fully connected + ReLU
Softmax
• A convolutional neural network (CNN or ConvNet) is one of
the most popular algorithms for deep learning,
• model learns to perform classification tasks directly from
images, video, text, or sound.
• CNNs are useful for finding patterns in images to recognize
objects, faces, and scenes.
224x224x
3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Vehicle
- Buss
- Truck
- Bicycle
- …
- Pedestrian
Convolutive neural networks
Convolution + ReLU
Maxpooling layer
Fully connected + ReLU
Softmax
• A convolutional neural network (CNN or ConvNet) is one of
the most popular algorithms for deep learning,
• model learns to perform classification tasks directly from
images, video, text, or sound.
• CNNs are useful for finding patterns in images to recognize
objects, faces, and scenes.
1 0
0 1
0 1
1 0
224x224x
3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Vehicle
- Buss
- Truck
- Bicycle
- …
- Pedestrian
filters
Inputs
Output
Convolutive neural networks
224x224x3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Name 1
- Name 2
- Name 3
- …
- …
- Name x
Faces Vehicles Animals Chairs
Convolutive neural networks
Convolutive neural networks
Convolution + ReLU
Maxpooling layer
Fully connected + ReLU
Softmax
Three of the most common layers are: convolution, activation or ReLU, and pooling.
Convolution puts the input images through a set of convolutional filters, each of which activates
certain features from the images.
Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative
values to zero and maintaining positive values. This is sometimes referred to as activation,
because only the activated features are carried forward into the next layer.
Pooling simplifies the output by performing nonlinear downsampling, reducing the number of
parameters that the network needs to learn.
224x224x
3
224x224x64
112x112x128
56x56x256
28x28x512
7x7x51214x14x512
4096
1000
Flatten
- Vehicle
- Buss
- Truck
- Bicycle
- …
- Pedestrian
• Three-layer neural network with backpropagation training algorithm
• Input layer : 12x12 = 144
• Hidden layer: 35
• Output layer (output number): 30
• Number of examples: 1590 (30 letters in multiple variants)
• Input layer : 5x5 = 25
• Hidden layer: 20
• Output layer (number of outputs): 12
• Learning coefficient: 0.25
• Number of examples: 12x9 = 108
• Input layer: 3x3 = 9
• Hidden layer: 6
• Output layer (output number): 8
• There is no coefficient of learning: 0.25
• Number of examples: 8
OCR - Recognition of Cyrillic letters
Pattern recognition
Position detection
http://solair.eunet.rs/~ilicv/NeuroVCL.html
Machine learning from 1999 godine
• Local receptor
• The Convolution layer uses a filter matrix over the array of image pixels and
performs convolution operation to obtain a convolved feature map.
Amount of data
Performanse
Traditional ML algorithm
Small NN
Medium NN
Large NN
Difference between typical and convolutional neural network
Vehicle
Truck
Bus
Bicycle
Pedestrian
Classification
What's in the picture?
Localization
where is the object
Detection
What's in the picture and where?
Vehicle
Person
Semantic Segmentation
Determining regions belonging to different objects?
position x
position y
width
height
object class
position X
position Z
width
height
Obect 1
Object 2
object class
position X
position Z
width
height
Person
Vehicle
Background
Image analysis using convolutive neural networks
Semantic segmentation is an image
processing process where the class
of affiliation for each single pixel is
determined.
Regions colored with different
colors on the processed images
allow delimiting between different
objects.
Image segmentation is typically
used to locate objects and
boundaries.
What is Semantic Segmentation?
examples:
• Autonomous driving
• Industrial inspection
• Classification of terrain at satellite imagery
• Medical imaging analysis
Fully Convolutional network for segmentation
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
1x1conv
• A Fully Convolutional neural network (FCN) is a normal CNN, where the
last fully connected layer is substituted by another convolution layer with
a large receptive field. (The receptive field is basically how much a particular convolution window "see" on it's input tensor.)
• Loss function multi-class cross entropy
Architecture of Neural Network for Image Segmentation
Input image Segmentation result
Encoder-Decoder network architecture
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
3x3conv+relu
2x2pool
encoder decoder
2x2unpool
3x3conv+relu
2x2unpool
3x3conv+relu
2x2unpool
3x3conv+relu
2x2unpool
3x3conv+relu
Downsampling Upsampling
Architecture of Neural Network for Image Segmentation
max
pooling
pooling
layer
relu
layer
Convolution
layer
batch
normalization
upsample
layer
max
pooling
max
pooling
max
pooling
Convolution + batch normalization + relu
kernel 3x3
out16
kernel 3x3
out16kernel 3x3
out32
kernel 3x3
out32
kernel 3x3
out64
kernel 3x3
out64kernel 3x3
out128
kernel 3x3
out128
Convolution network Deconvolution network
Convolution
Residual connection
Residual connection
Residual connection
Architecture of Neural Network for Image Segmentation
Batch normalization is a technique for improving the
performance and stability of artificial neural networks. It is a
technique to provide any layer in a neural network with inputs
that are zero mean/unit variance.
max
pooling
pooling
layer
relu
layer
Convolution
layer
batch
normalization
max
pooling
max
pooling
Convolution + batch normalization + relu
kernel 3x3
out16
kernel 3x3
out16kernel 3x3
out32
kernel 3x3
out32
kernel 3x3
out64
kernel 3x3
out64kernel 3x3
out128
kernel 3x3
out128
Convolution network Deconvolution network
upsample
layer
Residual connection
Residual connection
Residual connection
Architecture of Neural Network for Image Segmentation
5 2 -3 6
4 -7 2 -1
8 4 1 2
3 7 5 -3
ReLU
The function returns 0 if it receives any negative input, but for any positive value x it
returns that value back. So it can be written as f(x)=max(0,x).
Rectified linear unit (ReLU) allows for faster and more effective training by mapping
negative values to zero and maintaining positive values. This is sometimes referred to as
activation, because only the activated features are carried forward into the next layer.
5 2 0 6
4 0 2 0
8 4 1 2
3 7 5 0
ReLU activation function
max
pooling
pooling
layer
relu
layer
Convolution
layer
batch
normalization
max
pooling
max
pooling
Convolution + batch normalization + relu
kernel 3x3
out16
kernel 3x3
out16kernel 3x3
out32
kernel 3x3
out32
kernel 3x3
out64
kernel 3x3
out64kernel 3x3
out128
kernel 3x3
out128
Convolution network Deconvolution network
upsample
layer
Residual connection
Residual connection
Residual connection
Architecture of Neural Network for Image Segmentation
5 2 3 6
4 7 2 1
8 4 1 2
3 7 5 3
7 6
8 5
5 2
4 7
0 0 0 2
0 5 0 0
4 0 0 0
0 0 7 0
4 2
1 31 2
3 4
1 2
3 4
1 2
3 4
1 2
3 4
maxpooling
Kernel = 2x2
Stride = 2,2
unpuling
Kernel = 2x2
Stride = 2,2
Index position
Detected class
The Stride parameter indicates the number of pixels for which the image moves
max
pooling
pooling
layer
relu
layer
Convolution
layer
batch
normalization
max
pooling
max
pooling
Convolution + batch normalization + relu
kernel 3x3
out16
kernel 3x3
out16kernel 3x3
out32
kernel 3x3
out32
kernel 3x3
out64
kernel 3x3
out64kernel 3x3
out128
kernel 3x3
out128
Convolution network Deconvolution network
upsample
layer
Residual connection
Residual connection
Residual connection
Architecture of Neural Network for Image Segmentation
Training from scratch
Transfer learning
Feature extraction
+++
+++
++
++
+
+
Types of training convolutional neural networks
• Applying of semantic segmentation on geological data -
detection of salt in the soil
• Detection of participants in traffic on pictures and
videos of traffic situations
Examples of semantic segmentation of images
• Aleksa Ćorović
• Siniša Đurić
• Mihajlo Jovanović
• Marko Gostović
• dr Mališa Marjan
• dr Velibor Ilić
TGS Salt Identification Challenge
(Kaggle competition)
Applying of semantic segmentation on geological data - detection of salt in the soil
Training data
• Seismic images 101x101 pixel (4000 images)
• depth (50 - 959m)
• Test data (18000 images)
Applying of semantic segmentation on geological data - detection of salt in the soil
Input
image Mask
Input
image Mask Input
image Mask
https://www.kaggle.com/c/tgs-salt-identification-challenge
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Data loading
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Data loading
Train/validation split
Applying of semantic segmentation on geological data - detection of salt in the soil
Normalization
Standarization
Input data can be expressed in different units
By normalizing, the existing data is converted to the range 0..1
0 .. 1
0 .. 1
0 .. 1
0 .. 1
(0 .. 10)
(10 .. 10000)
(-51 .. 23)
(0.02 .. 1.24)
a
b
c
d
range
Data in individual variables may be unevenly distributed
Standardization reduces the importance of extreme values
0 1
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
• random shifting,
• rotation,
• flipping and
• scaling
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Total params: 5,119,857
Trainable params: 5,112,497
Non-trainable params: 7,360
# 101 -> 50
# 50 -> 25
# 25 -> 12
# 12 -> 6
# Middle 6 -> 3
# 6 -> 12
# 12 -> 25
# 25 -> 50
# 50 -> 101
deconv3 = Conv2DTranspose(start_neurons * 4, (3, 3), strides=(2, 2), padding="valid")(uconv4)
uconv3 = concatenate([deconv3, conv3])
uconv3 = Dropout(DropoutRatio)(uconv3)
uconv3 = Conv2D(start_neurons * 4, (3, 3), activation=None, padding="same")(uconv3)
uconv3 = residual_block(uconv3,start_neurons * 4)
uconv3 = residual_block(uconv3,start_neurons * 4, True)
conv1 = Conv2D(start_neurons * 1, (3, 3), activation=None, padding="same")(input_layer)
conv1 = residual_block(conv1,start_neurons * 1)
conv1 = residual_block(conv1,start_neurons * 1, True)
pool1 = MaxPooling2D((2, 2))(conv1)
pool1 = Dropout(DropoutRatio/2)(pool1)
Convolution
Deconvolution
100
50
10
0
Output Ground trough
Input
Loss value
Loss functions
BCE loss,
Dice Loss (soft dice),
BCE Dice loss,
Jaccard Loss (soft Jaccard),
Lovasz loss or their combinations
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
p – result of segmentation
r – ground trought label
def cross_entropy(X,y):
# X is the output from fully connected layer .
(num_examples x num_classes)
# y is labels (num_examples x 1)
r = y.shape[0]
p = softmax(X)
log_likelihood = -np.log(p[range(m),y])
loss = np.sum(log_likelihood) / r
return loss
H(y,p)=−∑iyilog(pi)
Loss function
Weighted cross-entropy (WCE) can be expressed by the
following formula
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Loss function
Dice similarity coefficient (DSC) measures the similarity
between the two regions in the images.
loss function with one layer
S – result of segmentation
R – ground trough label
L – number of layer
wl – weight coefficient of layer
loss function with several layers
def soft_dice(y_pred, y_true):
# y_pred is softmax output of shape (num_samples, num_classes)
# y_true is one hot encoding of target (shape= (num_samples, num_classes))
intersect = T.sum(y_pred * y_true, 0)
denominator = T.sum(y_pred, 0) + T.sum(y_true, 0)
dice_scores = T.constant(2) * intersect / (denominator + T.constant(1e-6))
return dice_scores
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Loss
Trainaccuracy
Epoch Epoch
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Hardware used for training
• Desktop PC, Intel I7, GF 1050 or more advanced
• Kaggle cloud, NVIDIA TESLA K80
• epochs = 200
• batch_size = 32
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
loss
Learning rate
Learning rate
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
EarlyStoping
ReduceLearning rate
optimizers
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
EarlyStoping
ReduceLearning rate
optimizers
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Test images
Array of matrix
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Test images
Array of matrix
Decimal numbers
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
Test images
Array of matrix PNG images
Decimal numbers Black & White pixels
Create output images
Data loading
Train/validation split
Data augmentation
Build model
Preprocesing
Prediction
Find optimal threshold
Training the model
Applying of semantic segmentation on geological data - detection of salt in the soil
• Nives Kaprocki
• Dušan Kenjić
• Filip Baba
• Milorad Marković
• Ninoslav Jovanov
• Srđan Usorac
• dr Milan Bjelica
• dr Velibor Ilić
Detection of traffic participants in photos and videos
Detection of traffic participants in photos and videos
Datasets
• https://www.cityscapes-dataset.com/
• https://deepdrive.berkeley.edu/
• The Daimler Urban Segmentation Datase
http://www.6d-vision.com/scene-labeling
• Kiti data set
http://www.cvlibs.net/datasets/kitti/eval_road.php
U-Net: Convolutional Networks
https://arxiv.org/pdf/1505.04597.pdf
Training set 29000 labeled images,
Validation set 6000-6500 labeled images
(TOPS DL = Deep Learning Tera-Ops)
NVidia drive
Technical Hardware Specifications
•2x Xavier SoCs with integrated Hardware Engines
• 8-core “Carmel” CPUs based on ARM v8 ISA
• Two NVIDIA Deep Learning Accelerators (DLAs) for processing
convolutional neural networks used for object detection and
recognition: 5 TOPS (FP16) | 10 TOPS (INT8)
• Volta-class GPU: 20 TOPS (INT8) | 1.3 TFLOPS (FP32)
• Programmable Vision Accelerator (PVA): 1.6 TOPS
• Stereo and Optical Flow Engine (SOFE): 6 TOPS
• Image Signal Processor (ISP): 1.5 Giga Pixels/s
Thank you for your attention!
dr Velibor Ilić
ilicv@EUnet.rs
http://SOLAIR.EUnet.rs/~ilicv
http://www.linkedin.com/in/velibor
https://www.researchgate.net/profile/Velibor_Ilic/
PhD Velibor Ilic
RT RK - Senior research & development engineer
on
Semantic segmentation of images using deep convolutional
neural networks (pixel level segmentation)
Aleksa Corovic
Machine learning engineer
on
The Usage of the YOLO Algorithm for Traffic Participants
Detection
LESSONS LEARNED BY
NOVI SAD AI
Deep learning in Automotive
industry
Aleksa Ćorović
RT-RK Automotive
The Usage of the
YOLO Algorithm for
Traffic Participants
Detection
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
1/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Outline
1.Motivation
2.Algorithm
3.Training
4.Results
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
2/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Autonomous driving
• Environment perception
• Different types of sensors
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
3/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Why camera?
• Resolution
• Camera: Full HD x 36 FPS = 74M points
• LIDAR: ~300k points
• Details
• Textures
VS
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
4/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Outline
1.Motivation
2.Algorithm
3.Training
4.Results
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
5/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Joseph Redmon
• You Only Look Once
• Object detection:
• Localization
• Classification
Is there a car on the picture?
yes/no
Object on the picture is:
car, pedestrian, ...
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
6/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Deep convolutional neural
network
• Input:
• Image
• Output:
• Object’s coordinates
• Object’s class
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
7/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Neural network architecture
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
8/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Divide image on cells
1 2 3 4
1
2
3
• Predict bounding boxes
• Outputs
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
9/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Detection layer’s output
tensor dimensions:
N x N x ((4 + 1 + classes №) x 3)N x N x ((4 + 1 + classes №) x 3)
Number of cells
N x N x ((4 + 1 + classes №) x 3)
Bounding box dimensions
N x N x ((4 + 1 + classes №) x 3)
Probability that the object is
inside the bounding box
N x N x ((4 + 1 + classes №) x 3)
Each cell predicts 3 bounding
boxes
• Total number of bounding boxes:
10 647 = (13 x 13 + 26 x 26 + 52 x 52) x 3
N x N x ((4 + 1 + classes №) x 3)
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
10/24
Aleksa Ćorović
aleksa.corovic@systemli.org
YOLO algorithm
• Filtration of the bounding boxes:
• IoU threshold
• Non-max suppression
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
11/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Outline
1.Motivation
2.Algorithm
3.Training
4.Results
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
12/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Three parts of the loss function:
Localization loss functionObjectness loss function
Classification loss function
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
13/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Berkley Deep Drive dataset:
• 100 000 images of traffic
• Each image has .json file with annotations
• Different weather, parts of the day
Classes Number Percentage
Car 714 121 56,59%
Pedestrian 91 735 7,25%
Truck 30 012 2,38%
Traffic sign 239 961 19.01%
Traffic light 186 301 14,76%
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
14/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Training image example:
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
15/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Training image example:
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
16/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Anchor boxes concept
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
17/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Training
• Software:
• Darknet neural network framework
• Hardware:
• PC with NVIDIA GTX 1060 (6 Gb VRAM)
• Training duration:
• 14 days
• 125 epochs
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
18/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Outline
1.Motivation
2.Algorithm
3.Training
4.Results
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
19/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Results
Epoch
number
Precission Recall F1 score mAP avg IoU
40 0.37 0.35 0.36 18.98% 24.19%
47 0.39 0.37 0.38 21.44% 26.12%
56 0.37 0.39 0.38 23.49% 25.44%
75 0.40 0.48 0.44 30.98% 28.12%
90 0.58 0.53 0.56 44.06% 44.06%
109 0.60 0.54 0.57 44.53% 43.65%
120 0.63 0.55 0.59 46.60% 45.98%
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
20/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Results
• Example:
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
21/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Results
• Example:
NOVI SAD AI
meetup #3.0
Deep learning in Automotive industry
31.10.2018.
22/24
Aleksa Ćorović
aleksa.corovic@systemli.org
Results
• Example:
NOVI SAD AI
Aleksa Ćorović
RT-RK Automotive
Thank you!
https://rs.linkedin.com/in/aleksacorovic
PhD Velibor Ilic
RT RK - Senior research & development engineer
on
Semantic segmentation of images using deep convolutional
neural networks (pixel level segmentation)
Aleksa Corovic
Machine learning engineer
on
The Usage of the YOLO Algorithm for Traffic Participants
Detection
QA WITH:
World Summit World Summit
www.worldsummit.ai
10-11th of October
6000+ ATTENDEES
100+ COUNTRIES
140+ SPEAKERS
5+ CONTENT STREAMS
Conference session + panel discussion
PEER TALKS @ Artificial Intelligence Novi Sad City AI Conference
I. Jovan Stojanovic - Novi Sad City AI, ambassador / Where is AI today?
II. Karthik Muthuswamy - Google/SAP, Senior Data Scientist / Human-centred machine learning
III. Sasha Lazarevic - IBM Switzerland, Senior Solution Manager / AI with IBM Watson
IV. Oskar Marko - BioSense Institute, Researcher / AI in agriculture? It's possible
V. Cedric Bonard - Artificial intelligence in Safety managment
VI. Caroline Jeanmaire Harvard/Future Society - Key Issues for ethical machines
VII. Ruxandra Burtica - ADOBE- Lead Machine learning engineer
Panel session and QA - Karthik Mswamy. Marko Oskar, Caroline Jeanmaire,Jovan Stojanovic, Sasa Lazarevic
PEER TALKS @ Artificial Intelligence Novi Sad City AI Workshop Sesion
I. Karthik Muthuswamy - Google/SAP, Senior Data Scientist - Large-scale Machine learning with TPUs
II. Marko Oskar- Biosense, Deep learning engineer - Evolutionary Algorithms
III. Filip Jekic Maximus artificial intelligence - Deep learning recommender system in Retail
IV. Valentina Djordjevic - Anomaly detection in Telecommunications
V. Ruxandra Burtica - ADOBE
Follow us at Novi Sad AI

Novi sad ai event 3-2018

  • 1.
  • 2.
    NOVI SAD APPLIEDINTELLIGENCE COMMUNITY City.AI NOVI SAD shaping up around a community whose goal are : To help local actors develop efficiently the Serbian branch on AI internationally To work around applied AI challenges with the local & global ecosystem actors To democratize AI innovation and close the gap between technology and society To train and challenge the local community
  • 3.
    LEVERAGING THE POTENTIALOF AI IN 50+ CITIES AFRICA Accra - Lagos ASIA Bangalore - Bangkok - Beirut - Chiang Mai - Hanoi - Hong Kong - Jakarta - Johor Bahru - Karachi - Lahore - Manila - Pune - Seoul - Singapore - Taipei AUSTRALASIA Wellington EUROPE Amsterdam - Berlin - Bratislava - Bristol - Brussels - Bucharest - Budapest - Cambridge - Cluj - Cologne - Copenhagen - Hamburg - Iasi - Krakow - Kyiv - London - Madrid - Munich - Novi Sad - Oxford - Paris - Sofia - Stockholm - Stuttgart - Tallinn - Tirana - Valencia - Valletta - Vienna - Vilnius NORTH AMERICA Austin - LA - New York - San Diego - San Francisco SOUTH AMERICA Bogota - La Paz - Sao Paulo
  • 4.
    NOVI SAD AITEAM Jovan Stojanovic Ambassador of Novi Sad AI and Belgrade AI Marko Jocic Co-Ambassador of Novi Sad-AI Jovana Miletic Co-ambassador of Novi Sad-AI Jovana Vukicevic Co-Ambassador of Belgrade-AI Jovan Stojanovic Co-Ambassador of Nis-AI
  • 5.
    PhD Velibor Ilic RTRK - Senior research & development engineer on Semantic segmentation of images using deep convolutional neural networks (pixel level segmentation) Aleksa Corovic Machine learning engineer on The Usage of the YOLO Algorithm for Traffic Participants Detection LESSONS LEARNED BY
  • 6.
    SEMANTIC SEGMENTATION OFIMAGES USING DEEP CONVOLUTIONAL NEURAL NETWORKS (pixel level segmentation) Novi Sad, October 2018 dr Velibor Ilić RT RK Automotive [Novi Sad AI] #3.0 - Deep Learning in Automotive Industry
  • 7.
    1. Convolutive neuralnetworks 2. What is Semantic Segmentation? 3. Architecture of Neural Network for Image Segmentation 4. Examples Overview of the presentation • Applying of semantic segmentation on geological data - detection of salt in the soil • Detection of traffic participants in photos and videos
  • 8.
    AI & MachineLearning in Automotive Industry Smart routing and Point of Interest optimization In-vehicle intelligence Predictive decisions Computer vision Predictive vehicle maintenance
  • 9.
    AI & MachineLearning in Automotive Industry Smart routing and Point of Interest optimization In-vehicle intelligence Predictive decisions Computer vision Predictive vehicle maintenance
  • 10.
    Convolutive neural networks Convolution+ ReLU Maxpooling layer Fully connected + ReLU Softmax • A convolutional neural network (CNN or ConvNet) is one of the most popular algorithms for deep learning, • model learns to perform classification tasks directly from images, video, text, or sound. • CNNs are useful for finding patterns in images to recognize objects, faces, and scenes. 224x224x 3 224x224x64 112x112x128 56x56x256 28x28x512 7x7x51214x14x512 4096 1000 Flatten - Vehicle - Buss - Truck - Bicycle - … - Pedestrian
  • 11.
    Convolutive neural networks Convolution+ ReLU Maxpooling layer Fully connected + ReLU Softmax • A convolutional neural network (CNN or ConvNet) is one of the most popular algorithms for deep learning, • model learns to perform classification tasks directly from images, video, text, or sound. • CNNs are useful for finding patterns in images to recognize objects, faces, and scenes. 224x224x 3 224x224x64 112x112x128 56x56x256 28x28x512 7x7x51214x14x512 4096 1000 Flatten - Vehicle - Buss - Truck - Bicycle - … - Pedestrian
  • 12.
    Convolutive neural networks Convolution+ ReLU Maxpooling layer Fully connected + ReLU Softmax • A convolutional neural network (CNN or ConvNet) is one of the most popular algorithms for deep learning, • model learns to perform classification tasks directly from images, video, text, or sound. • CNNs are useful for finding patterns in images to recognize objects, faces, and scenes. 224x224x 3 224x224x64 112x112x128 56x56x256 28x28x512 7x7x51214x14x512 4096 1000 Flatten - Vehicle - Buss - Truck - Bicycle - … - Pedestrian
  • 13.
    Convolutive neural networks Convolution+ ReLU Maxpooling layer Fully connected + ReLU Softmax • A convolutional neural network (CNN or ConvNet) is one of the most popular algorithms for deep learning, • model learns to perform classification tasks directly from images, video, text, or sound. • CNNs are useful for finding patterns in images to recognize objects, faces, and scenes. 1 0 0 1 0 1 1 0 224x224x 3 224x224x64 112x112x128 56x56x256 28x28x512 7x7x51214x14x512 4096 1000 Flatten - Vehicle - Buss - Truck - Bicycle - … - Pedestrian filters
  • 14.
  • 15.
    Faces Vehicles AnimalsChairs Convolutive neural networks
  • 16.
    Convolutive neural networks Convolution+ ReLU Maxpooling layer Fully connected + ReLU Softmax Three of the most common layers are: convolution, activation or ReLU, and pooling. Convolution puts the input images through a set of convolutional filters, each of which activates certain features from the images. Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and maintaining positive values. This is sometimes referred to as activation, because only the activated features are carried forward into the next layer. Pooling simplifies the output by performing nonlinear downsampling, reducing the number of parameters that the network needs to learn. 224x224x 3 224x224x64 112x112x128 56x56x256 28x28x512 7x7x51214x14x512 4096 1000 Flatten - Vehicle - Buss - Truck - Bicycle - … - Pedestrian
  • 17.
    • Three-layer neuralnetwork with backpropagation training algorithm • Input layer : 12x12 = 144 • Hidden layer: 35 • Output layer (output number): 30 • Number of examples: 1590 (30 letters in multiple variants) • Input layer : 5x5 = 25 • Hidden layer: 20 • Output layer (number of outputs): 12 • Learning coefficient: 0.25 • Number of examples: 12x9 = 108 • Input layer: 3x3 = 9 • Hidden layer: 6 • Output layer (output number): 8 • There is no coefficient of learning: 0.25 • Number of examples: 8 OCR - Recognition of Cyrillic letters Pattern recognition Position detection http://solair.eunet.rs/~ilicv/NeuroVCL.html Machine learning from 1999 godine
  • 18.
    • Local receptor •The Convolution layer uses a filter matrix over the array of image pixels and performs convolution operation to obtain a convolved feature map. Amount of data Performanse Traditional ML algorithm Small NN Medium NN Large NN Difference between typical and convolutional neural network
  • 19.
    Vehicle Truck Bus Bicycle Pedestrian Classification What's in thepicture? Localization where is the object Detection What's in the picture and where? Vehicle Person Semantic Segmentation Determining regions belonging to different objects? position x position y width height object class position X position Z width height Obect 1 Object 2 object class position X position Z width height Person Vehicle Background Image analysis using convolutive neural networks
  • 20.
    Semantic segmentation isan image processing process where the class of affiliation for each single pixel is determined. Regions colored with different colors on the processed images allow delimiting between different objects. Image segmentation is typically used to locate objects and boundaries. What is Semantic Segmentation? examples: • Autonomous driving • Industrial inspection • Classification of terrain at satellite imagery • Medical imaging analysis
  • 21.
    Fully Convolutional networkfor segmentation 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool 1x1conv • A Fully Convolutional neural network (FCN) is a normal CNN, where the last fully connected layer is substituted by another convolution layer with a large receptive field. (The receptive field is basically how much a particular convolution window "see" on it's input tensor.) • Loss function multi-class cross entropy Architecture of Neural Network for Image Segmentation Input image Segmentation result
  • 22.
    Encoder-Decoder network architecture 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool 3x3conv+relu 2x2pool encoderdecoder 2x2unpool 3x3conv+relu 2x2unpool 3x3conv+relu 2x2unpool 3x3conv+relu 2x2unpool 3x3conv+relu Downsampling Upsampling Architecture of Neural Network for Image Segmentation
  • 23.
    max pooling pooling layer relu layer Convolution layer batch normalization upsample layer max pooling max pooling max pooling Convolution + batchnormalization + relu kernel 3x3 out16 kernel 3x3 out16kernel 3x3 out32 kernel 3x3 out32 kernel 3x3 out64 kernel 3x3 out64kernel 3x3 out128 kernel 3x3 out128 Convolution network Deconvolution network Convolution Residual connection Residual connection Residual connection Architecture of Neural Network for Image Segmentation
  • 24.
    Batch normalization isa technique for improving the performance and stability of artificial neural networks. It is a technique to provide any layer in a neural network with inputs that are zero mean/unit variance. max pooling pooling layer relu layer Convolution layer batch normalization max pooling max pooling Convolution + batch normalization + relu kernel 3x3 out16 kernel 3x3 out16kernel 3x3 out32 kernel 3x3 out32 kernel 3x3 out64 kernel 3x3 out64kernel 3x3 out128 kernel 3x3 out128 Convolution network Deconvolution network upsample layer Residual connection Residual connection Residual connection Architecture of Neural Network for Image Segmentation
  • 25.
    5 2 -36 4 -7 2 -1 8 4 1 2 3 7 5 -3 ReLU The function returns 0 if it receives any negative input, but for any positive value x it returns that value back. So it can be written as f(x)=max(0,x). Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and maintaining positive values. This is sometimes referred to as activation, because only the activated features are carried forward into the next layer. 5 2 0 6 4 0 2 0 8 4 1 2 3 7 5 0 ReLU activation function max pooling pooling layer relu layer Convolution layer batch normalization max pooling max pooling Convolution + batch normalization + relu kernel 3x3 out16 kernel 3x3 out16kernel 3x3 out32 kernel 3x3 out32 kernel 3x3 out64 kernel 3x3 out64kernel 3x3 out128 kernel 3x3 out128 Convolution network Deconvolution network upsample layer Residual connection Residual connection Residual connection Architecture of Neural Network for Image Segmentation
  • 26.
    5 2 36 4 7 2 1 8 4 1 2 3 7 5 3 7 6 8 5 5 2 4 7 0 0 0 2 0 5 0 0 4 0 0 0 0 0 7 0 4 2 1 31 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 maxpooling Kernel = 2x2 Stride = 2,2 unpuling Kernel = 2x2 Stride = 2,2 Index position Detected class The Stride parameter indicates the number of pixels for which the image moves max pooling pooling layer relu layer Convolution layer batch normalization max pooling max pooling Convolution + batch normalization + relu kernel 3x3 out16 kernel 3x3 out16kernel 3x3 out32 kernel 3x3 out32 kernel 3x3 out64 kernel 3x3 out64kernel 3x3 out128 kernel 3x3 out128 Convolution network Deconvolution network upsample layer Residual connection Residual connection Residual connection Architecture of Neural Network for Image Segmentation
  • 27.
    Training from scratch Transferlearning Feature extraction +++ +++ ++ ++ + + Types of training convolutional neural networks
  • 28.
    • Applying ofsemantic segmentation on geological data - detection of salt in the soil • Detection of participants in traffic on pictures and videos of traffic situations Examples of semantic segmentation of images
  • 29.
    • Aleksa Ćorović •Siniša Đurić • Mihajlo Jovanović • Marko Gostović • dr Mališa Marjan • dr Velibor Ilić TGS Salt Identification Challenge (Kaggle competition) Applying of semantic segmentation on geological data - detection of salt in the soil
  • 30.
    Training data • Seismicimages 101x101 pixel (4000 images) • depth (50 - 959m) • Test data (18000 images) Applying of semantic segmentation on geological data - detection of salt in the soil Input image Mask Input image Mask Input image Mask https://www.kaggle.com/c/tgs-salt-identification-challenge
  • 31.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 32.
    Create output images Train/validationsplit Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Data loading Applying of semantic segmentation on geological data - detection of salt in the soil
  • 33.
    Create output images Dataaugmentation Build model Preprocesing Prediction Find optimal threshold Training the model Data loading Train/validation split Applying of semantic segmentation on geological data - detection of salt in the soil
  • 34.
    Normalization Standarization Input data canbe expressed in different units By normalizing, the existing data is converted to the range 0..1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 (0 .. 10) (10 .. 10000) (-51 .. 23) (0.02 .. 1.24) a b c d range Data in individual variables may be unevenly distributed Standardization reduces the importance of extreme values 0 1 Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 35.
    • random shifting, •rotation, • flipping and • scaling Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 36.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil Total params: 5,119,857 Trainable params: 5,112,497 Non-trainable params: 7,360 # 101 -> 50 # 50 -> 25 # 25 -> 12 # 12 -> 6 # Middle 6 -> 3 # 6 -> 12 # 12 -> 25 # 25 -> 50 # 50 -> 101 deconv3 = Conv2DTranspose(start_neurons * 4, (3, 3), strides=(2, 2), padding="valid")(uconv4) uconv3 = concatenate([deconv3, conv3]) uconv3 = Dropout(DropoutRatio)(uconv3) uconv3 = Conv2D(start_neurons * 4, (3, 3), activation=None, padding="same")(uconv3) uconv3 = residual_block(uconv3,start_neurons * 4) uconv3 = residual_block(uconv3,start_neurons * 4, True) conv1 = Conv2D(start_neurons * 1, (3, 3), activation=None, padding="same")(input_layer) conv1 = residual_block(conv1,start_neurons * 1) conv1 = residual_block(conv1,start_neurons * 1, True) pool1 = MaxPooling2D((2, 2))(conv1) pool1 = Dropout(DropoutRatio/2)(pool1) Convolution Deconvolution
  • 37.
    100 50 10 0 Output Ground trough Input Lossvalue Loss functions BCE loss, Dice Loss (soft dice), BCE Dice loss, Jaccard Loss (soft Jaccard), Lovasz loss or their combinations Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 38.
    p – resultof segmentation r – ground trought label def cross_entropy(X,y): # X is the output from fully connected layer . (num_examples x num_classes) # y is labels (num_examples x 1) r = y.shape[0] p = softmax(X) log_likelihood = -np.log(p[range(m),y]) loss = np.sum(log_likelihood) / r return loss H(y,p)=−∑iyilog(pi) Loss function Weighted cross-entropy (WCE) can be expressed by the following formula Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 39.
    Loss function Dice similaritycoefficient (DSC) measures the similarity between the two regions in the images. loss function with one layer S – result of segmentation R – ground trough label L – number of layer wl – weight coefficient of layer loss function with several layers def soft_dice(y_pred, y_true): # y_pred is softmax output of shape (num_samples, num_classes) # y_true is one hot encoding of target (shape= (num_samples, num_classes)) intersect = T.sum(y_pred * y_true, 0) denominator = T.sum(y_pred, 0) + T.sum(y_true, 0) dice_scores = T.constant(2) * intersect / (denominator + T.constant(1e-6)) return dice_scores Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 40.
    Loss Trainaccuracy Epoch Epoch Create outputimages Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 41.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil Hardware used for training • Desktop PC, Intel I7, GF 1050 or more advanced • Kaggle cloud, NVIDIA TESLA K80 • epochs = 200 • batch_size = 32
  • 42.
    Applying of semanticsegmentation on geological data - detection of salt in the soil Create output images Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model loss Learning rate Learning rate
  • 43.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 44.
    EarlyStoping ReduceLearning rate optimizers Create outputimages Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 45.
    EarlyStoping ReduceLearning rate optimizers Create outputimages Data loading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 46.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil Test images Array of matrix
  • 47.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil Test images Array of matrix Decimal numbers
  • 48.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil Test images Array of matrix PNG images Decimal numbers Black & White pixels
  • 49.
    Create output images Dataloading Train/validation split Data augmentation Build model Preprocesing Prediction Find optimal threshold Training the model Applying of semantic segmentation on geological data - detection of salt in the soil
  • 50.
    • Nives Kaprocki •Dušan Kenjić • Filip Baba • Milorad Marković • Ninoslav Jovanov • Srđan Usorac • dr Milan Bjelica • dr Velibor Ilić Detection of traffic participants in photos and videos
  • 51.
    Detection of trafficparticipants in photos and videos Datasets • https://www.cityscapes-dataset.com/ • https://deepdrive.berkeley.edu/ • The Daimler Urban Segmentation Datase http://www.6d-vision.com/scene-labeling • Kiti data set http://www.cvlibs.net/datasets/kitti/eval_road.php U-Net: Convolutional Networks https://arxiv.org/pdf/1505.04597.pdf Training set 29000 labeled images, Validation set 6000-6500 labeled images (TOPS DL = Deep Learning Tera-Ops) NVidia drive Technical Hardware Specifications •2x Xavier SoCs with integrated Hardware Engines • 8-core “Carmel” CPUs based on ARM v8 ISA • Two NVIDIA Deep Learning Accelerators (DLAs) for processing convolutional neural networks used for object detection and recognition: 5 TOPS (FP16) | 10 TOPS (INT8) • Volta-class GPU: 20 TOPS (INT8) | 1.3 TFLOPS (FP32) • Programmable Vision Accelerator (PVA): 1.6 TOPS • Stereo and Optical Flow Engine (SOFE): 6 TOPS • Image Signal Processor (ISP): 1.5 Giga Pixels/s
  • 53.
    Thank you foryour attention! dr Velibor Ilić ilicv@EUnet.rs http://SOLAIR.EUnet.rs/~ilicv http://www.linkedin.com/in/velibor https://www.researchgate.net/profile/Velibor_Ilic/
  • 54.
    PhD Velibor Ilic RTRK - Senior research & development engineer on Semantic segmentation of images using deep convolutional neural networks (pixel level segmentation) Aleksa Corovic Machine learning engineer on The Usage of the YOLO Algorithm for Traffic Participants Detection LESSONS LEARNED BY
  • 55.
    NOVI SAD AI Deeplearning in Automotive industry Aleksa Ćorović RT-RK Automotive The Usage of the YOLO Algorithm for Traffic Participants Detection
  • 56.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 1/24 Aleksa Ćorović aleksa.corovic@systemli.org Outline 1.Motivation 2.Algorithm 3.Training 4.Results
  • 57.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 2/24 Aleksa Ćorović aleksa.corovic@systemli.org Autonomous driving • Environment perception • Different types of sensors
  • 58.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 3/24 Aleksa Ćorović aleksa.corovic@systemli.org Why camera? • Resolution • Camera: Full HD x 36 FPS = 74M points • LIDAR: ~300k points • Details • Textures VS
  • 59.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 4/24 Aleksa Ćorović aleksa.corovic@systemli.org Outline 1.Motivation 2.Algorithm 3.Training 4.Results
  • 60.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 5/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Joseph Redmon • You Only Look Once • Object detection: • Localization • Classification Is there a car on the picture? yes/no Object on the picture is: car, pedestrian, ...
  • 61.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 6/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Deep convolutional neural network • Input: • Image • Output: • Object’s coordinates • Object’s class
  • 62.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 7/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Neural network architecture
  • 63.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 8/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Divide image on cells 1 2 3 4 1 2 3 • Predict bounding boxes • Outputs
  • 64.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 9/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Detection layer’s output tensor dimensions: N x N x ((4 + 1 + classes №) x 3)N x N x ((4 + 1 + classes №) x 3) Number of cells N x N x ((4 + 1 + classes №) x 3) Bounding box dimensions N x N x ((4 + 1 + classes №) x 3) Probability that the object is inside the bounding box N x N x ((4 + 1 + classes №) x 3) Each cell predicts 3 bounding boxes • Total number of bounding boxes: 10 647 = (13 x 13 + 26 x 26 + 52 x 52) x 3 N x N x ((4 + 1 + classes №) x 3)
  • 65.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 10/24 Aleksa Ćorović aleksa.corovic@systemli.org YOLO algorithm • Filtration of the bounding boxes: • IoU threshold • Non-max suppression
  • 66.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 11/24 Aleksa Ćorović aleksa.corovic@systemli.org Outline 1.Motivation 2.Algorithm 3.Training 4.Results
  • 67.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 12/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Three parts of the loss function: Localization loss functionObjectness loss function Classification loss function
  • 68.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 13/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Berkley Deep Drive dataset: • 100 000 images of traffic • Each image has .json file with annotations • Different weather, parts of the day Classes Number Percentage Car 714 121 56,59% Pedestrian 91 735 7,25% Truck 30 012 2,38% Traffic sign 239 961 19.01% Traffic light 186 301 14,76%
  • 69.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 14/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Training image example:
  • 70.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 15/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Training image example:
  • 71.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 16/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Anchor boxes concept
  • 72.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 17/24 Aleksa Ćorović aleksa.corovic@systemli.org Training • Software: • Darknet neural network framework • Hardware: • PC with NVIDIA GTX 1060 (6 Gb VRAM) • Training duration: • 14 days • 125 epochs
  • 73.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 18/24 Aleksa Ćorović aleksa.corovic@systemli.org Outline 1.Motivation 2.Algorithm 3.Training 4.Results
  • 74.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 19/24 Aleksa Ćorović aleksa.corovic@systemli.org Results Epoch number Precission Recall F1 score mAP avg IoU 40 0.37 0.35 0.36 18.98% 24.19% 47 0.39 0.37 0.38 21.44% 26.12% 56 0.37 0.39 0.38 23.49% 25.44% 75 0.40 0.48 0.44 30.98% 28.12% 90 0.58 0.53 0.56 44.06% 44.06% 109 0.60 0.54 0.57 44.53% 43.65% 120 0.63 0.55 0.59 46.60% 45.98%
  • 75.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 20/24 Aleksa Ćorović aleksa.corovic@systemli.org Results • Example:
  • 76.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 21/24 Aleksa Ćorović aleksa.corovic@systemli.org Results • Example:
  • 77.
    NOVI SAD AI meetup#3.0 Deep learning in Automotive industry 31.10.2018. 22/24 Aleksa Ćorović aleksa.corovic@systemli.org Results • Example:
  • 78.
    NOVI SAD AI AleksaĆorović RT-RK Automotive Thank you! https://rs.linkedin.com/in/aleksacorovic
  • 79.
    PhD Velibor Ilic RTRK - Senior research & development engineer on Semantic segmentation of images using deep convolutional neural networks (pixel level segmentation) Aleksa Corovic Machine learning engineer on The Usage of the YOLO Algorithm for Traffic Participants Detection QA WITH:
  • 80.
    World Summit WorldSummit www.worldsummit.ai 10-11th of October 6000+ ATTENDEES 100+ COUNTRIES 140+ SPEAKERS 5+ CONTENT STREAMS
  • 81.
    Conference session +panel discussion PEER TALKS @ Artificial Intelligence Novi Sad City AI Conference I. Jovan Stojanovic - Novi Sad City AI, ambassador / Where is AI today? II. Karthik Muthuswamy - Google/SAP, Senior Data Scientist / Human-centred machine learning III. Sasha Lazarevic - IBM Switzerland, Senior Solution Manager / AI with IBM Watson IV. Oskar Marko - BioSense Institute, Researcher / AI in agriculture? It's possible V. Cedric Bonard - Artificial intelligence in Safety managment VI. Caroline Jeanmaire Harvard/Future Society - Key Issues for ethical machines VII. Ruxandra Burtica - ADOBE- Lead Machine learning engineer Panel session and QA - Karthik Mswamy. Marko Oskar, Caroline Jeanmaire,Jovan Stojanovic, Sasa Lazarevic PEER TALKS @ Artificial Intelligence Novi Sad City AI Workshop Sesion I. Karthik Muthuswamy - Google/SAP, Senior Data Scientist - Large-scale Machine learning with TPUs II. Marko Oskar- Biosense, Deep learning engineer - Evolutionary Algorithms III. Filip Jekic Maximus artificial intelligence - Deep learning recommender system in Retail IV. Valentina Djordjevic - Anomaly detection in Telecommunications V. Ruxandra Burtica - ADOBE
  • 82.
    Follow us atNovi Sad AI