The slides for the techniques used in the Temporal Segment Network (TSN), including the basic ideas, recall of BN-Inception, optical flow and tricks in application. Used in group paper reading in University of Sydney.
Embedded Android system development workshop is focused on integrating new device with Android framework. Our hands-on approach makes Emertxe as the best institute to learn android system development training. This workshop deep dives into Android porting, Android Hardware Abstraction Layer (HAL), Android Services and Linux device driver ecosystem. This workshop based training program will enable you to efficiently integrate new hardware with Android HAL / Framework.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Présentation du talk de Fabien Arcellier- OCTO Technology
Avant 10 JVMs, maintenant 300 microservices avec des runtimes
transients. Le nombre de service à assembler pour construire
nos applications augmente.
Le SI se modernise, le monitoring peine à suivre. Que faut-il
repenser ?
Embedded Android system development workshop is focused on integrating new device with Android framework. Our hands-on approach makes Emertxe as the best institute to learn android system development training. This workshop deep dives into Android porting, Android Hardware Abstraction Layer (HAL), Android Services and Linux device driver ecosystem. This workshop based training program will enable you to efficiently integrate new hardware with Android HAL / Framework.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Présentation du talk de Fabien Arcellier- OCTO Technology
Avant 10 JVMs, maintenant 300 microservices avec des runtimes
transients. Le nombre de service à assembler pour construire
nos applications augmente.
Le SI se modernise, le monitoring peine à suivre. Que faut-il
repenser ?
Monitoring Java Applications with Prometheus and GrafanaJustin Reock
Learn how to modernize your Java application monitoring and dashboarding with Prometheus and Grafana. There's a lot of information out there when it comes to monitoring a Kubernetes cluster with Prometheus, but, in the modern enterprise landscape, applications are still what matters. Learn how to leverage Prometheus and Grafana to build slick, modern monitoring dashboards and threshold logic for Java applications.
Embedded Android System Development - Part II talks about Hardware Abstraction Layer (HAL). HAL is an interfacing layer through which Android service can place a request to device. Uses functions provided by Linux system to service the request from android framework. A C/C++ layer with purely vendor specific implementation. Packaged into modules (.so) file & loaded by Android system at appropriate time
Contributing to Automotive Grade Linux (AGL) and GENIVI Development Platform ...Leon Anavi
Presentation from Embedded Linux Conference 2017 in Portland, Oregon (21-23 February) about contributing to Automotive Grade Linux (AGL) and GENIVI Development Platform (GDP).
Containers are incredibly convenient to package applications and deploy them quickly across the data center.
This talk will introduce RunX, a new project under LF Edge that aims at bringing containers to the edge with extra benefits. At the core, RunX is an OCI-compatible container runtime to run software packaged as containers as Xen micro-VMs. RunX allows traditional containers to be executed with a minimal overhead as virtual machines, providing additional isolation and real-time support.
It also introduces new types of containers designed with edge and embedded deployments in mind. RunX enables RTOSes, and baremetal apps to be packaged as containers, delivered to the target using the powerful containers infrastructure, and deployed at runtime as Xen micro-VMs. Physical resources can be dynamically assigned to them, such as accelerators and FPGA blocks.
This presentation will go through the architecture of RunX and the new deployment scenarios it enables. It will provide an overview of the integration with Yocto Project via the meta-virtualization layer and describe how to build a complete system with Xen and RunX.
The presentation will come with a live demo on embedded hardware.
This lecture Introduces how Linux Handles Input and Output of its processes. It discusses the TTY/PTY devices and different types of terminals (Physical Terminal, Virtual Terminals, and Emulated Terminals)
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowHyeongmin Lee
이번 논문은 ECCV2020에서 Best Paper를 받은 논문으로, 기존 방법들과는 다르게 반복적인 Update를 통해 Optical Flow를 예측하여 꽤나 높은 성능을 기록한 논문입니다.
paper link: https://arxiv.org/pdf/2003.12039.pdf
video link: https://youtu.be/OnZIDatotZ4
Final Year Project
> Creating a Video Player with Python
> Used Software - Pycharm & Pyhton 3.6
> Can play any Video , mp3, mp4,
> Features:
+ Play
+ Pause
+ Next
+ Previous
+ Volume contol
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
This presentation will let you know all about doing performance testing. type of performance testing and step to do it. Then demo with Tsung.
Code Mania 110 at KMUTT Thailand on November 25, 2017
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
This presentation was prepared for ViPr Reading group at Multimedia University, Cyberjaya. The goal of this presentation was to make aware the lab members about the recent advancements in action recognition.
Monitoring Java Applications with Prometheus and GrafanaJustin Reock
Learn how to modernize your Java application monitoring and dashboarding with Prometheus and Grafana. There's a lot of information out there when it comes to monitoring a Kubernetes cluster with Prometheus, but, in the modern enterprise landscape, applications are still what matters. Learn how to leverage Prometheus and Grafana to build slick, modern monitoring dashboards and threshold logic for Java applications.
Embedded Android System Development - Part II talks about Hardware Abstraction Layer (HAL). HAL is an interfacing layer through which Android service can place a request to device. Uses functions provided by Linux system to service the request from android framework. A C/C++ layer with purely vendor specific implementation. Packaged into modules (.so) file & loaded by Android system at appropriate time
Contributing to Automotive Grade Linux (AGL) and GENIVI Development Platform ...Leon Anavi
Presentation from Embedded Linux Conference 2017 in Portland, Oregon (21-23 February) about contributing to Automotive Grade Linux (AGL) and GENIVI Development Platform (GDP).
Containers are incredibly convenient to package applications and deploy them quickly across the data center.
This talk will introduce RunX, a new project under LF Edge that aims at bringing containers to the edge with extra benefits. At the core, RunX is an OCI-compatible container runtime to run software packaged as containers as Xen micro-VMs. RunX allows traditional containers to be executed with a minimal overhead as virtual machines, providing additional isolation and real-time support.
It also introduces new types of containers designed with edge and embedded deployments in mind. RunX enables RTOSes, and baremetal apps to be packaged as containers, delivered to the target using the powerful containers infrastructure, and deployed at runtime as Xen micro-VMs. Physical resources can be dynamically assigned to them, such as accelerators and FPGA blocks.
This presentation will go through the architecture of RunX and the new deployment scenarios it enables. It will provide an overview of the integration with Yocto Project via the meta-virtualization layer and describe how to build a complete system with Xen and RunX.
The presentation will come with a live demo on embedded hardware.
This lecture Introduces how Linux Handles Input and Output of its processes. It discusses the TTY/PTY devices and different types of terminals (Physical Terminal, Virtual Terminals, and Emulated Terminals)
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowHyeongmin Lee
이번 논문은 ECCV2020에서 Best Paper를 받은 논문으로, 기존 방법들과는 다르게 반복적인 Update를 통해 Optical Flow를 예측하여 꽤나 높은 성능을 기록한 논문입니다.
paper link: https://arxiv.org/pdf/2003.12039.pdf
video link: https://youtu.be/OnZIDatotZ4
Final Year Project
> Creating a Video Player with Python
> Used Software - Pycharm & Pyhton 3.6
> Can play any Video , mp3, mp4,
> Features:
+ Play
+ Pause
+ Next
+ Previous
+ Volume contol
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
This presentation will let you know all about doing performance testing. type of performance testing and step to do it. Then demo with Tsung.
Code Mania 110 at KMUTT Thailand on November 25, 2017
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
This presentation was prepared for ViPr Reading group at Multimedia University, Cyberjaya. The goal of this presentation was to make aware the lab members about the recent advancements in action recognition.
Temporal Superpixels Based on Proximity-Weighted Patch MatchingNAVER Engineering
발표자: 이세호(고려대 박사과정)
발표일: 2018.4.
슈퍼픽셀 알고리즘은 입력 영상을 다수의 의미 있는 영역으로 과분할하는 기법이다. 입력 영상을 픽셀 단위로 표현할 때와 비교하여, 슈퍼픽셀 단위의 표현은 입력 영상의 단위의 수를 크게 줄이는 장점이 있어, 여러 컴퓨터 비전 기법에 전처리로 이용된다. 또한 슈퍼픽셀 알고리즘을 동영상으로 확장한 동영상 슈퍼픽셀 (temporal superpixel) 알고리즘은 동영상 기반의 컴퓨터 비전 기법에 적용될 수 있다. 기존의 동영상 슈퍼픽셀 기법은 시간적 유사성을 유지하기 위하여 움직임 정보를 이용하는데, 움직임 정보의 추출에는 많은 계산 복잡도가 요구된다. 따라서 이를 보완하기 위해, 본 연구에서는 근접성 가중치 패치 정합 (proximity-weighted patch matching) 기반의 동영상 슈퍼픽셀 기법을 제안한다.
Contour-Constrained Superpixels for Image and Video ProcessingNAVER Engineering
발표자: 이세호(고려대 박사과정)
발표일: 2017.8.
개요:
슈퍼픽셀 알고리즘은 입력 영상을 다수의 의미 있는 영역으로 과분할 하는 기법이다. 입력 영상을 픽셀 단위로 표현할 때와 비교하여, 슈퍼픽셀 단위의 표현은 입력 영상의 단위의 수를 크게 줄이는 장점이 있다. 각 슈퍼픽셀은 객체의 윤곽선을 넘어서는 영역을 포함하지 않는 동시에, 단일 객체만을 담아야 한다. 본 발표에서는 객체의 윤곽선 정보를 고려한 윤곽선 제약 슈퍼픽셀 기법(contour-constrained superpixel algorithm)을 제안한다.
Optic Flow
Brightness Constancy Constraints
Aperture Problem
Regularization and Smoothness Constraints
Lucas-Kanade algorithm
Focus of Expansion (FOE)
Discrete Optimization for Optical Flow
Large Displacement Optical Flow: Descriptor Matching
DeepFlow: Large displ. optical flow with deep matching
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
Optical Flow with Piecewise Parametric Model
Flow Fields: Dense Correspondence Fields for Accurate Large Displacement Optical Flow Estimation
Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids
FlowNet: Learning Optical Flow with Convol. Networks
Deep Discrete Flow
Optical Flow Estimation using a Spatial Pyramid Network
A Large Dataset to Train ConvNets for Disparity, Optical Flow, and Scene Flow Estimation
DeMoN: Depth and Motion Network for Learning Monocular Stereo
Unsupervised Learning of Depth and Ego-Motion from Video
Appendix A: A Database and Evaluation Methodology for Optical Flow
Appendix B: Learning and optimization
Introduction to Wavelet Transform and Two Stage Image DE noising Using Princi...ijsrd.com
In past two decades there are various techniques are developed to support variety of image processing applications. The applications of image processing include medical, satellite, space, transmission and storage, radar and sonar etc. But noise in image effect all applications. So it is necessary to remove noise from image. There are various methods and techniques are there to remove noise from images. Wavelet transform (WT) has been proved to be effective in noise removal but this have some problems that is overcome by PCA method. This paper presents an efficient image de-noising scheme by using principal component analysis (PCA) with local pixel grouping (LPG). This method provides better preservation of image local structures. In this method a pixel and its nearest neighbors are modeled as a vector variable whose training samples are selected from the local window by using block matching based LPG. In image de-noising, a compromise has to be found between noise reduction and preserving significant image details. PCA is a statistical technique for simplifying a dataset by reducing datasets to lower dimensions. It is a standard technique commonly used for data reduction in statistical pattern recognition and signal processing. This paper proposes a de-noising technique by using a new statistical approach, principal component analysis with local pixel grouping (LPG). This procedure is iterated second time to further improve the de-noising performance, and the noise level is adaptively adjusted in the second stage.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
2. Contents
• Temporal Segment Network (TSN) :
basic ideas, method and tricks in training and test phases.
• Two-Stream CNN:
combination of spatial and temporal features, late fusion comparison.
• BN-Inception:
review the structure in details, derived from GoogLeNet, usage in TSN
• Optical Flow and Warped Optical Flow:
basic idea and different methods, dense flow, warped flow.
3. Authors
• Limin Wang (王利民): BS in NJU, PhD in CUHK with Xiaoou Tang, now
postdoc in ETHZ.
• Yuanjun Xiong (熊元军): BE in Tsinghua, PhD in CUHK with Xiaoou Tang,
now postdoc in CUHK.
• Zhe Wang (王哲): BE in ZJU, PhD in CUHK with Xiaogang Wang.
• Yu Qiao (乔宇): Professor in SIAT.
• Dahua Lin (林达华): Professor in CUHK. BS in USTC, PhD in MIT.
• Xiaoou Tang (汤晓鸥): Professor in CUHK. BE in USTC, PhD in MIT.
• Luc Van Gool: Professor in ETHZ.
5. Issues
1. Segments: How to select key frames/segments?
2. Modality: How to compute Optical Flow features? And how to utilize the
flow features in CNN?
3. Training and test: How to train and how to test?
4. Fusion of two CNNs: Is there any other ways beside late fusion?
7. Two-Stream CNN
The idea comes from human visual cortex, which contains two ways: ventral stream
(object recognition), dorsal stream (motion detection)
[Simonyan, NIPS2014]
8. Two-Stream CNN
This method is proved to be useful. The following picture is the 96 learnt 7x7
filter for flow stack (10 for x and 10 for y).
This image can also show the way to use optical flow features: stack flow
images as channels. TSN also derives from here.
10. Recall: GoogLeNet
Differences from BN-
Inception:
layers
filter numbers
avg poolings
add bn layers before
each ReLU
[Szegedy, CVPR2015]
11. Recall: GoogLeNet
Differences from BN-
Inception:
layers
filter numbers
avg poolings
add bn layers before
each ReLU
[Szegedy, CVPR2015]
12. Recall: GoogLeNet
Differences from BN-
Inception:
layers
filter numbers
avg poolings
add bn layers before
each ReLU
[Szegedy, CVPR2015]
13. Recall: GoogLeNet
Differences from BN-
Inception:
layers
filter numbers
avg poolings
add bn layers before
each ReLU
[Szegedy, CVPR2015]
14. BN and Partial BN
Batch Normalization in Caffe: Two layers
– BatchNorm Layer: normalize each scalar feature independently
– Scale Layer: enable the net to recover the original activations
While in TSN, things has changed:
– Flow images are quite different from that of RGB images, so it does not make
sense when transfer the features or layer parameters directly from ImageNet.
– Even RGB images are in different domain from ImageNet for we are dealing
with action recognition instead of object recognition.
In that case: Partial Batch Normalization
– The mean and variance parameters are frozen as the initialized parameters from
ImageNet except for the first conv layer.
– The scale parameters (slope and bias) are treated as usual.
[Ioffe, ICML2015]
16. Optical Flow
Core problem:
– How to locate the corresponding point in the latter frame?
Basic assumption:
– Brightness of an image point remains constant over time.
– Displacement and time steps are small.
Methods (built in OpenCV):
– Lucas-Kanade Method and its pyramidal implementation: the first method,
sparse optical flow (calcOpticalFlowPyrLK)
– Farneback Method: used in TSN, dense optical flow (calcOpticalFlowFarneback)
– Brox Method: used in Two-stream CNN (BroxOpticalFlow)
17. Optical Flow: Lucas-Kanade Method
Suppose the point in image has brightness .
Optical flow is defined as , where:
With the two assumptions and Taylor’s Theory:
we have
Assume that within a small patch, remains the same. We could solve the
above equation using Least Square method.
( , , ) ( , , )I x x y y t t I x y tδ δ δ+ + + =
,
x x
u v
t t
∂ ∂
= =
∂ ∂
( , , )I x y t( , )x y
( , )u v
( , , ) ( , , )
I I I
I x x y y t t I x y t x y t
x y t
δ δ δ δ δ δ
∂ ∂ ∂
+ + + = + + +
∂ ∂ ∂
0
I I I
u v
x y t
∂ ∂ ∂
+ + =
∂ ∂ ∂
( , )u v
[Lucas, 1981]
18. Warped Flow
Intuition:
– The movement of camera is encoded in the frames.
Method:
– Find the correspondences between two frames
• Compute SURF descriptors of consecutive frames.
• Compute OF using Farneback Method and select the
motion vectors for salient feature points
• Estimate the homography using RANSAC
– Remove inconsistent matches due to humans
(Human actions are outliers corresponding to
camera movement)
• Use human detector for each frame
• Remove feature matches inside the human bounding
box during homography estimation
– Remove camera movement from optical flow
[Wang, ICCV2013]
20. Training: Initialization
For the RGB ConvNet, they use pre-trained model from BN-Inception which
is trained in ImageNet.
For the Flow ConvNet, they use modified RGB pre-trained model.
– Rescale the flow images to a [0, 255] range, which makes the weights of optical
flow fields to be the same with RGB images.
– Modify the weights of first convolution layer of RGB models by averaging the
weights across the RGB channels and replicating the average by the channel
number of the temporal network input.
Original channel numbers of each ConvNet:
– Spatial (RGB) net: 3, stands for RGB
– Temporal (Flow) net: 10, stands for 5 x-flow and 5 y-flow
[Wang, ECCV2016]
21. Training: Segment selection and processing
Why use segments:
– ConvNets are unable to model long-range temporal structure.
– A sparsely sampled sequence could represent the action.
Steps:
– Divide the original video into K segments of equal durations.
– Randomly sample one frame during each segment.
– In the classifier layers, each frame will have a score matrix for all classes. Evenly
average will generate better results than maximum and weighted average.
Specially, when K=3, the input dims of two nets (train_val):
– Spatial (RGB) net: N x 9 x 224 x 224
– Temporal (Flow) net: N x 30 x 224 x 224
[Wang, ECCV2016]
22. Training: Data Augmentation
The original size of input images are 256 x 340. When feeding into the net,
the images are cropped to become 224 x 224.
Corner Cropping:
– Previous method is random cropping, which means any part of the large image
could be selected.
– For this method, only four corners and the center are taken into consideration.
Scale Jittering:
– Randomly select sizes from [256, 224, 192, 168], width and height are the same.
– Rescale the cropped image into 224 x 224.
Although two methods are exploited, the number of frames each batch is
not increased. However, the variants for each frame could be 40.
[Wang, ECCV2016]
23. Test: Get video level scores and accuracy
There are no segment operation in test phase. From the paper, the batch
size is set to be 25. So the input size of the two nets becomes:
– Spatial (RGB) net: 25 x 3 x 224 x 224
– Temporal (Flow) net: 25 x 10 x 224 x 224
However, there are still tricks in the process:
– For short videos with less than 25 frames: repeat the first frame for 25 times.
– For each input frame, the original size is still 256 x 340, so the crop operation in
four corners and the center and the horizontal flipping still occurs. In that case,
the output blobs for each video is 25 x 10 x class_num
– We would want video level accuracy instead of frame level accuracy. The above
blobs are averaged first in 10 variants and then in 25 frames to get scores.
– Combination of two modalities: with weights 1 for RGB, 1.5 for Flow.
[Wang, ECCV2016]
24. Evaluation
For example, for UCF101 split 1, my test result is 86.02% for RGB, and 87.63% for Flow.
The combined result (1:1.5) is 93.5%.
25. Contributions of TSN
Features:
• Use warped flow for ConvNets
• Tried RGB difference features, but this modality is proved to be not useful
Structures:
• Two-stream based on batch normalization
• Segment ConvNets
Methods:
• Partial Batch Normalization
• Cross-Modality Initialization
26. Reference
[Wang, ECCV2016] Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016, Octob
er). Temporal segment networks: Towards good practices for deep action recognition. In European C
onference on Computer Vision (pp. 20-36). Springer International Publishing.
[Simonyan, NIPS2014] Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for
action recognition in videos. In Advances in neural information processing systems (pp. 568-576).
[Ioffe, ICML2015] Ioffe, S., & Szegedy, C. (2015, June). Batch Normalization: Accelerating Deep Netwo
rk Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (pp.
448-456).
[Szegedy, CVPR2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich,
A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision a
nd pattern recognition (pp. 1-9).
[Lucas, 1981] Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an appli
cation to stereo vision. Proceeding of Imaging Understanding Workshop, 1981: 120-131.
[Wang, ICCV2013] Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In Pr
oceedings of the IEEE international conference on computer vision (pp. 3551-3558).