During this presentation, we will look at the different versions of SaaS architectures built on the basis of ML / computer vision: – The advantages and disadvantages of using different design patterns of services – Modes of “serving” models (in most cases, TF) – Influence of architecture and the way it is implemented on product development. Bonus: Does the data scientist (y) need to know something other than data science?
3. Main tasks in CV for surveillance
● Verification (1:1, Border control)
● Age/Gender/emotion recognition ( sometimes other properties)
● Identification (1:N, Surveillance)
● Events and action recognition
Interest
&
Difficulty
ⒸDataI
4. Lets build one together
Definition of product (technical) success:
Our CV product have to answer these
questions:
● Who?
● Where?
● When?
● What?
ⒸDataI
5. Why bother, Lets use API (or SDK)
● Amazon Rekognition
● Face ++
● Meerkat
● Azure Cognitive Services
● Google CLOUD VISION ( No Face recognition)
● VeriLook SDK
● iFace SDK
● Cognitec VACS SDK
● Luxand FACE SDK
● Affectiva SDK
● Betaface SDK
ⒸDataI
7. Module pipeline
Face detector
Face alignment
Age + Gender
models
FacePrint Face Search
Person
detector
Body Keypoint
classifier
Face selector
Action
recognition
Tracker
TRACK
with
metadata
Decisions,
BA material
ⒸDataI
9. Objects and attributes
Objects Static Attributes Dynamic
Attributes
Body Body Embedding Location
Actions
Head Count
Face Embedding//ID
Age, Gender,
Race
Emotions
Head Pose
ⒸDataI
12. OpenCV
+ Very simple to use
- GIL Python limitations for multithreading
NVIDIA DeepStream SDK
+ Fast & Flexible
- Only Jetson & Tesla
Video Streaming (Edge)
ⓒ nVidia DeepStream SDK
ⒸDataI
13. Video Streaming (Edge)
3. Gstreamer
+ plugin-based architecture
+ easy/fast Video Record (any supported format)
+ easy/fast overlay draw (using Cairo)
+ wide variety of ready-to-use plugins
- difficult for understanding
- weak support by Gstreamer community
- too many plugins has internal bugs
ⒸDataI & Taras lishchenko
18. Min. Gstreamer Pipeline
Supported video sources
Live:
● Web Camera
● Rtsp Camera
Not-live:
● Video File (supported format)
● Multiple Files:
○ video01.mp4, video02.mp4,
video03.mp4 ...
● acquire frames from video
source
● decode
● convert to RGB
ⒸDataI & Taras Lischenko
23. Problems with Gstreamer
- Buffer offset/timestamps
- when live video - offset is const, when not - offset is in range [0, maxint]
- Solution: GstIdentity (Force offset increment [0, maxint])
- timestamp - CLOCK_MONOTONIC (project requires CLOCK_REAL to sync video with
annotations)
- Solution: Store Map CLOCK_MONOTONIC -> CLOCK_REAL
- X264enc too slow (requires a lot of CPU power)
- Solution: Use plugins (h264parse) for DVR without conversion from RGB (Drawbacks: Can’t
draw on non-RGB buffer)
- Python has limitations compared to C version:
- Passing objects from Python to C buffers (Metadata)
- Solution: DIY Python wrappers for C-libs that works with Gstreamer Objects
ⒸDataI & Taras Lishchenko
26. Sync
Batch mode
+ Guarantee buffers order
+ Easy to sync annotations with
frame data
- N waiting points
- Drop buffers
- Gstreamer should emit buffers
with small delay, to reduce wait
time in batch collector
ⒸDataI & Taras Lishchenko
27. Async Batch mode
+ No waiting points
+ No need for gstreamer to emit buffers faster
+ Better GPU load
- Not guaranteed order between streams
- Hard to sync frames with annotations
- additional complexity to handle queues
ⒸDataI & Taras Lishchenko
29. Evolution
1. Face detection (hard to track):
a. Dlib
i. bad for small faces (need to upscale image -> decrease performance)
ii. too slow (works on CPU)
b. MTCNN
i. Bad for many faces (performance decreases with increase number of faces due to
architecture)
2. Person detection:
a. Haar cascade
i. poor quality
ii. too slow (works on CPU)
b. Blob person detector
i. not invariant to noisy images
ii. too slow due to Background Substraction
c. TinyYolov2 (Darkflow)
d. Mobilenet-SSD
ⒸDataI & Taras Lishchenko
30. CPU usage
Min Max Mean
test_all_cpu 26.0656 28.6466 27.1232
test_on_7_cpu 26.9883 32.5846 28.8077
test_on_6_cpu 26.9680 31.8769 29.0395
test_on_5_cpu 27.0186 32.4739 30.0355
test_on_4_cpu 28.0154 37.4240 32.2915
test_on_3_cpu 34.6486 44.4604 37.7983
test_on_2_cpu 48.0222 60.0206 50.0859
test_on_1_cpu 83.4291 96.7086 88.9920
Model: BodyEmbeddings
CPU: i7-7700HQ CPU @ 2.80GHz
Tensorflow (explanation):
● intra_op_parallelism_threads=[0, NUM_CORES] (0 - best)
● inter_op_parallelism_threads=[0, NUM_CORES] (0 - best)
Conclusion: Some models could be executed in
parallel if there is a Number of cores =< Half of
Total Num of cores, without huge performance loss
ⒸDataI & Taras Lishchenko
31. One Session vs Multiple Sessions
Model: Body Embeddings
Conclusion: Performance can benefit from Single Graph in Single Session only on CPU. But not
significant difference (12%)
ⒸDataI & Taras Lishchenko
32. Resize Methods
Nearest Neighbours Resize with different implementations
Conclusion: OpenCV faster when resize. Use Nearest Neighbours method to gain max performance
ⒸDataI & Taras Lishchenko
33. np.stack instead np.concatenate (Batch)
Batch size: 20
Image size: 640x360x3
Conclusion: When collect images in batch:
- put to list
- np.stack(list)
ⒸDataI & Taras Lishchenko
41. Closed set vs Open set search
Closed-set evaluation:
● cumulative match characteristics
(CMC) curves
● receiver operating characteristic
(ROC) curves.
Open-set evaluation
● detection and identification rate
(DIR) curves (TPIR,FPIR,FNIR,...)
ⒸDataI
42. Open-set video evaluation
Gallery: a set of images of interest.
Probes: a set of images from for
querying.
In our case probe might be:
● the best-quality face image among all
images within the same person track
● any face image from the video
ⒸDataI & Vlad KhizanovⒸDataI
43. Metrics for Open Set Evaluation
Definition: Query is succeed if top result has similarity greater than t.
FPIR(t) = # of success non-mate search queries / # of queries
TPIR(t) = # of success mate search queries / # of queries
MISS(t) = # of non-success search queries / # of queries
FPIR(t) + TPIR(t) + MISS(t) = 1
Note: sometimes *FPIR = 1 - FPIR
Note: here TPIR(t) = TPIR(t, 1), where 1 is a rank
Mate searches are those for which the person in the search image has a face image in the enrolled dataset
Non-mate searches are those which the person in the search image does not have a face image in the enrolled dataset.
ⒸDataI & Vlad Khizanov
44. Metrics on chart
Usually metrics visualized as a
curve in parametric form:
x(t) = FPIR(t)
y(t) = TPIR(t)
t = 0.0, 0.01, 0.02, …, 1.0
Note: for usage it’s useful to pick optimal
threshold.
FNIR=1-TPIR
FPIR
FPIR@FNIR
ⒸDataI & Vlad Khizanov
45. Extreme Value Machine
Given the conditions for the Margin Distribution
Theorem, the probability that x’ is included in the
boundary estimated by xi
is given by:
Ψ(xi , x0 ; κi , λi ,) = exp− ||xi−x 0 || λi κi (1)
where ||xi
− x’ || is the distance of x’ from sample xi
, and
κi
, λi
are Weibull shape and scale parameters
respectively obtained from fitting to the smallest
pairwise margin estimate.
ⒸDataI & Oleksiy Udod
https://arxiv.org/pdf/1506.06112.pdf
48. Serving with RESTful API
Market server N
Storage
AWS
Consumers
auto
scaling
group
Local
kafka
broker
Mirror
Maker
Market server 1
Local
kafka
broker
Mirror
Maker
Consumer A
Consumer B
Consumer C
Kafka broker
1
Kafka broker
2
Kafka broker
3
N partition
N
partition
N partition
RESTful API
Model1
RESTful API
Model2..n
ⒸDataI & Konstantin Bulgakov
49. Serving in Kafka Streams
Market server N
Storage
AWS
Consumers with TF,
auto scaling
group
Local
kafka
broker
Mirror
Maker
Market server 1
Local
kafka
broker
Mirror
Maker
Consumer A
Consumer B
Consumer C
Kafka broker
1
Kafka broker
2
Kafka broker
3
N partition
N
partition
N
partition
ⒸDataI & Konstantin Bulgakov
50. Msg size
distribution
Size distribution for
14K records was
measured at the
producer side,
counting the str
length of every
message
*1 str element ~ at
least 1 byte
ⒸDataI & Olesia Stestsiuk
52. Pyflame
● based on the Linux ptrace(2) system call not sys.settrace()
● no modification of the source code required
● profiling embedded Python interpreters like uWSGI.
● profiling multi-threaded Python programs.
● written in C++, with attention to speed and performance.
● Pyflame usually introduces less overhead than the builtin profile (or cProfile)
modules, and also emits richer profiling data.
Just sudo pyflame -s 600 -r 0.001 --threads -p 1493 |
./flamegraph.pl >10_min_every_milisec.svg
http://eng.uber.com/pyflame/ⒸDataI & Olesia Stestsiuk
53. How to read Flame Graphs
● Each box represents a function in the stack (a "stack frame").
● The y-axis shows stack depth (number of frames on the stack). The top box shows the function
that was on-CPU. Everything beneath that is ancestry. The function beneath a function is its
parent, just like the stack traces shown earlier.
● The x-axis spans the sample population. It does not show the passing of time from left to right, as
most graphs do. The left to right ordering has no meaning (it's sorted alphabetically to maximize
frame merging).
● The width of the box shows the total time it was on-CPU or part of an ancestry that was on-CPU
(based on sample count). Functions with wide boxes may consume more CPU per execution than
those with narrow boxes, or, they may simply be called more often. The call count is not shown
(or known via sampling).
● The sample count can exceed elapsed time if multiple threads were running and sampled
concurrently.
ⒸDataI & Olesia Stestsiuk