Borys Tymchenko (Senior ML Research Engineer at VITech)
Поговорим о том, что делать, если из органов зрения есть только IP/CCTV камеры, а хочется детектировать объекты и их свойства.
https://dataphoenix.info/ods-ai-odessa-meetup-3/
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
ODS.ai Odessa Meetup #3: Object Detection in the Wild
1. Object Detection in the Wild
Borys Tymchenko
VITech
(an embarrassingly simple approach)
2. What do we need to solve?
- Detect objects
- Detect their attributes
- Track them in time
3. What do we need to solve in particular?
- Detect people
- Detect their clothes
- Helmet
- Vest
- Glasses
- Headphones
- Track them in time
- Challenging environment
4. Environment challenges us to be creative!
- People can occlude each other
- People can carry things
- People can be occluded by environment
- Tracks must be consistent
- Clothes must be detected per person
5. Detecting clothes: three approaches
- Top down approach:
- Detect person
- Within it detect clothes
- Play around matching
- Bottom-up approach
- Detect clothes
- Try to stitch them into person
- Single pass
- Detect person in the frame
- Detect clothes in the frame
- Match them together
6. Top-down and bottom-up downsides
- Sensitive to occlusions
- Unattended clothes items can lead to false positives
- Overlapping people lead to false reconstructions
- Temporal inconsistency
7.
8. Doing everything in a single pass: the method
Combine CenteNet and FCOS to detect simultaneously:
- object classes
- bounding boxes
- attributes
- tracking features
10. Training
- Focal Loss on objectness head
- Training target is gaussian with the value 1 in the center and 0 on the edge of the box
- L1 loss on bbox coordinates head
- Training target: for every pixel inside bbox there are 4 coordinates (lop, left, right, bottom)
- Focal Loss on classes head
- Training target is gaussian with the value 1 in the center and 0 on the edge of the box
- Focal Loss on attributes head
- Training target is gaussian with the value 1 in the center and 0 on the edge of the box
- Cross-Entropy on features head
- Training target: class for every object in the dataset
11. Inference
- Select top-k peaks on the objectness heatmap
- From their coordinates get values from other heatmaps
- Apply scaling/sigmoid if needed
Tracking?
DeepSORT with provided features
12. Problem detected: heatmaps are not aligned
Solution:
Multiply inputs to all heads by objectness during training (cue: CenterNetv2)
14. What about data? And how to validate this?
- Pretrain on synthetic dataset (in-house)
- Collect data from different locations, label everything)
- Train jointly, validate separately
- Ensure to leave some locations to validate if it generalizes well
If your project is non-commercial, use CrowdHuman, COCO, etc. to pretrain