Lessons learned building an AI powered live streaming camera at ClueCon15, Chicago
Real-time communications have been evolving and going mainstream with the improvement and appearance of new open source technologies that make the development more affordable.
In this talk I will go through the design and development process of a live streaming application running on a Raspberry Pi and powered with image detection. I will be talking about some open source media servers and frameworks to achieve that, the pros and cons of some of this potential solutions, what I learned building it and what are some of the potential use cases of AI on WebRTC applications.
4. RTMP
WebRTC.ventures August 2019
TCP based
Adaptive bitrate streaming
Low latency (< 1 sec)
RTMP does not work in HTML5, iOS or
Android natively
Video Stream RTMP Server Client with Flash
5. HLS
WebRTC.ventures August 2019
TCP based
High latency (30s-60s)
HLS works in all major OS and
browsers natively
Video Stream HLS Server Web client
6. WebRTC
WebRTC.ventures August 2019
UDP based
Adaptive bitrate streaming
Low latency (<1s)
Works in all major OS and browsers natively
Video Stream WebRTC gateway Web client
8. WebRTC.ventures August 2019
WebRTC native peer to peer live
streaming
• It is cheap!
• But doesn't sound like a good
idea…
• Broadcaster will need to
upload its stream as many
times as there are viewers
• And the processing will be
done in the broadcaster
9. WebRTC.ventures August 2019
WebRTC with media server
for live streaming
AI Video Processing on the edge
• Easier to develop and test
• Cheaper for the provider
AI Video Processing on the server
• Low battery consumption for
clients
• “No” CPU limitations
“The future of AI is on the edge”
Samsung
“ML algorithms that continuously learn require the computational
horsepower and storage that only a server can provide”
Security Magazine
10. WebRTC.ventures August 2019
Just use a CPaaS for live
streaming
• Easier to implement
• It is more expensive to use
• No infra maintenance
• The processing is not easy to do
on a server that you don’t
manage
CPaaS
Infraestructure
11. AI, AI everywhere…
2/3 of our 2019 WebRTC survey responders are working or
WebRTC application with AI
12. WebRTC.ventures August 2019
AI image detection options
OpenCV TensorFlow
More options
available
I can train my
algorithm
Faster
manipulating
data
Easy to use
But there are may other
alternatives…
Someone said PyTorch?
Combine both?
14. WebRTC.ventures August 2019
How to stream video from a
Raspberry Pi
There are many options and frameworks…
Comparison in a Raspberry Pi 3
Framework Latency (ms) CPU Framerate Bitrate
Raspivid + VLC server 3000-4000 2% 30 fps 150 kbps
UV4L + VLC server 2000-3000 3% 30 fps 150 Kbps
Raspivid + Gstreamer* RTP to Janus 1000-2000 2% 30 fps 150 kbps
UV4L WebRTC to Kurento 100-200 90% 30 fps 150 kbps
UV4L WebRTC to Janus 100-200 90% 30 fps 150 kbps
*Using the default x264 encoding without playing with parameters
16. WebRTC.ventures August 2019
Live Streaming with image
detection on the edge
OK/Slow when doing basic operations
640×480 at < 15fps
Peer 1 Peer 2
Bad if we start doing more CPU intensive stuff
640×480 at < 1fps!
Peer 1 Peer 2
Haar-cascade Object Detection with OpenCV
https://github.com/agonza1/native-webrtc-peer-to-peer
DeepLab MobileNetv2 image segmentation with TF
18. WebRTC.ventures August 2019
Live Streaming with image
detection on the media server
Kurento already has some
modules…
• Some examples exist
• We can use WebRTC in both
legs easily
Janus + OpenCV
• Well maintained (RTP plugin
works great)
• We will need to create a new
plugin… or not?
There are other options too
But we can’t do everything
20. WebRTC.ventures August 2019
Live Streaming with image
detection on Janus
Goal First try
RTP
Media
Server
RTP
VS
Video parsing and encoding using GStreamer
magic is not that easy
21. WebRTC.ventures August 2019
Live Streaming with image
detection on Janus
The video OpenCV service captures and processes the RTP video
stream
const vCap = new cv.VideoCapture('udpsrc port=5000 ! application/x-rtp,payload=96 ! rtph264depay !
h264parse ! avdec_h264 ! decodebin ! videoconvert ! appsink location=/dev/stdout');
const w = new cv.VideoWriter('appsrc ! videoconvert ! video/x-
raw,format=I420,width=640,height=480,framerate=25/1 ! x264enc ! rtph264pay ! udpsink host=127.0.0.1
port=8004', 0, 25, new cv.Size(640, 480));
while (!done) {
let frame = vCap.read();
// process frame
w.write(pFrame);
}
22. WebRTC.ventures August 2019
Live Streaming with image
detection on Janus
Thief Detected!
Viewers: 2
Frameworks Latency
(ms)
MaxCPU Framerate Bitrate
Raspivid + Gstreamer + OpenCV
Janus Streaming
300-2000* 1% 30 fps 150
*Depending on the GStreamer configuration we can optimize for latency
**We used Haar-cascade face detection
24. WebRTC.ventures August 2019
Some Conclusions
Camera WebRTC live streaming
with video ML operations at
under half second latency is
possible
ML/AI on the edge is easier to
scale but today is limited
By optimizing the algorithm and
the transcoding it is possible to
reduce 80% the latency
ML/AI on the server provides
higher quality without affecting
the client battery but has
scalability and cost challenges
25. WebRTC.ventures August 2019
Projects Links
Native WebRTC with OpenCV
https://github.com/agonza1/native-webrtc-peer-to-peer/tree/opencv-facedetection
Native WebRTC with TF.js
https://github.com/agonza1/native-webrtc-peer-to-peer/tree/tensorflowjs
WebRTC Live Streaming using Janus and OpenCV
https://github.com/agonza1/WebRTC-Live-Streaming-with-AI
WebRTC Live Streaming using Kurento and OpenCV face detection
https://github.com/agonza1/kurento-rpi-live-streaming
For those of you I haven’t met yet…
I came to ClueCon all the way from Chicago uptown to talk about some of the things I learned building live streaming video applications for several projects through a demo project using Raspberry Pi what options are there…
Use cases in many verticals:
Content creation/social networks
Ads
Broadcasting/news
Livestreaming video games or playing live (lately has become very popular, on sites such as Twitch. By 2014, Twitch streams had more traffic than HBO's online service! And what about HQ trivia?!)
WebRTC, HLS and RTMP protocols (search popularity)
Real-Time Messaging Protocol (RTMP) was initially a proprietary protocol developed for streaming audio, video and data over the Internet, between a Flash player and a server.
HLS streams video by breaking the overall stream into a sequence of small HTTP-based file downloads, each download loading one short chunk of the stream
WebRTC native peer to peer
WebRTC MCU or SFU
WebRTC using CPaaS
There is another option of relaying the stream to another peer and this other peer relays it to another one, and so on…This is great to solve the bandwidth/CPU issue with the broadcaster but will end up adding a lot of latency and quality degradation!
This hybrid approach is a very common approach
Some camera manufacturers have reserved space on their cameras to allow third-party plugin analytics to be installed which pass data directly to the server. The video doesn’t need to be decoded which saves precious CPU/GPU cycles
AI in and for RTC
Speech Analytics
Voicebots / AI assistants
Computer Vision
RTC Optimization (safari facetime making you appear to see at the other person when looking at the screen)
Forecasting events
It is not an apples to apples comparison but definitely 2 well known ML frameworks capable of image detection
OpenCV: easy to use, its CPU performance is better and has been tested more. More robust!
TensorFlow: more complex, I can train my algorithm. Wider set of tools around TensorFlow
We could also combine both!
The chart is not mine (from wiki) but it is a great example of k-means clustering. Which is used for image segmentation. There many methods, using models or motion of the image too.
Cluster is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other.
H264 ALL 3 ABOVE
VP8 when using WebRTC (so no transcoding needed on the server)
Good: Cheaper, which means that it will be easier to scale.
Bad: limited by CPU, device might heat up, battery
Haar feature-based cascade classifiers is an effective object detection method. With higher execution speed, Haar-based classifiers typically involve less computations
(the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it)
TF: trained with VOC 2012 (Visual Object Classes Challenge 2012) http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#introduction
Summary: while well-trained CNNs could learn more parameters (and thus detect a larger variety of faces), Haar-based classifiers run faster. If we need very high quality algorithm with CNN (a lot of success rate) running things on the edge becomes a problem today
If you are going to do image detection on the edge, do AI on the client/viewer side or don’t do WebRTC on the RPI
Send RTP with effects
Hardware accelerated H264?
If you are going to do image detection on the edge, do AI on the client/viewer side or don’t do WebRTC on the RPI
Send RTP with effects
Processing time above between 100ms to 700ms per frame!
CPU without WebRTC usage goes up to 90%
WebRTC + OpenCV on RPI starts throwing frames…
The easiest way to stream to browser is just to stream images, although this isn't a performant solution.
FreeSWITCH, Wowza, RED5…
Kurento already has modules to go
RPI can handle WebRTC at < 60% of CPU
200-500ms at 500kpps
Can someone guess from which show is that helmet?
The easiest way to stream to browser is just to stream images, although this isn't a performant solution…
I want to keep the video and tried with gstreamer, I was able to send media, modify it and stream it to Janus. Somewhere in the service OpenCV service I am generating a malformed video
But finally got it right! format=I420 explicitly in the videoWriter of OpenCV
If you are going to do image detection on the edge, do AI on the client/viewer side or don’t do WebRTC on the RPI
Send RTP with effects
Hardware accelerated H264?
Then bottleneck in Gstreamer sending the RTP stream, transcoding for processing. Etc
We can improve it by changing the configuration, for example adding tune=zerolatency we are being lossless, passing frames even if they aren’t in order
Then the problem was the processing in OpenCV, without optimizing the algorithm we had a lot of missing frames and relatively latencies of about 1 second
To optimize the OpenCV face detection.
Bitrate increase didn’t affect a lot latency, for example, 1Mbps only increased the CPU usage a bit, from 1 to 3% for example in the case of RTP stream.
Then the problem was the processing in OpenCV, without optimizing the algorithm we had a lot of missing frames and relatively latencies of about 1 second
To optimize the OpenCV face detection we just did 2 things:
1) Changing the minimum possible object size. Objects smaller than that are ignored. Processing will improve a lot! 50-80% reduction
2) Play with scaleFactor – Parameter specifying how much the image size is reduced at each image scale
And that’s it! Thank you and feel free to ask me any questions!
In the future I hope to build something more complex with OpenCV…I have a couple of ideas already