Cross Platform
Computer Vision
Optimization
Yossi Cohen
Lecture at
Google Technology User Group Tel-Aviv

                                        1
And You Should do it!




                        2
Computer Vision Application Types
 Augmented
                               Gestures
 Reality




                               Text
                               Recognition




                               CV with
 Depth
                               active IR
 Mapping
                               camera



                                             3
Conflicting Requirements
          Cross Platform Solution
          Run on All Devices
          Code Once
          Low cost maintenance / update


              Platform Specific
              Utilize all SoC capabilities for:
              Fast processing / fast response
              Low Power requirements

                                                  4
Conflicting Development
         Cross Platform Development
         HTML5             Doesn’t work for
         Java              Computer Vision (Yet)

           Platform Specific Development
           SIMD Optimization (ASM)
           Use platform specific GPU, DSP
           Use Platform Specific HW accelerators:
           CODECs
           Rotators
           Color Space Convertors…

                                                5
Possible Solutions
                    • Too much Power Consumption
 Don't Optimize     • Too Sloooowwww




Optimize for one    • Best performance for a single platform (market leader SoC)
                    • Lose (50%+) market share
 platform (SoC)

                    • Good Performance for all ARM platforms
Optimize for ARM    • Lose MIPS, X86 Market
  NEON Only         • Lose GPU, DSP and HW specific acceleration capabilities



                    • Development Costs
 Optimize for all   • Knowledge problem
   platforms        • Fragmented Code, high update & maintenance costs


                                                                                   6
Optimize for one processor architecture
 Select a Processor based on Target Market:
  For Android its ARM
 Optimize for SIMD Instruction
  NEON Optimization (Alternatively SSE or 3DNow)
 Advantages
  ~x1-x8 Acceleration (depending on function)
  Fit ~95%+ of Android Market
 Disadvantages                                 ARM NEON
  Not Suited for x86 & MIPS                    Optimization   Unutilized
  Does not utilize 100% of SoC capabilities:
    Internal DSP
    GPU
    HW Accelerators
    VFU
                                                                       7
Optimize for a Single Processor
 Select a Single Processor based on Target Market:
  8960 - the fastest processor
  250 Design wins
 Optimize NEON
 Optimize DSP
 Optimize for GPU
 Advantages
  Youll have the fastest app on
   the best most widely used processor      Optimized           Optimized
 Disadvantages
  Development Time                                     CPU         GPU


  need to support inferior/legacy processor as well
                                                        VeNum        DSP


                                                                           8
Selecting between two sub-optimal solutions



Isn’t there someone that will solve this in a better
way?


                                                       9
10
Khronos
 Standardization organization
 Generates OPEN, Royalty free API (unlike Oracle)
  for Cross HW software
 Most Known API – OpenGL
 In Android:
 OpenGL ES
 OpenMAX
 OpenSL




                                                     11
Khronos Vision of Cross Platform Computer Vision


                   Application Layer
                    Sensory Input

                       OpenCV
                 High Level Algorithm
 Camera
  Input                                   Video Out
                       OpenVL
                   Integration Layer

                      OpenCL
              DSP, HW Accelerators, GPU

                                                      12
OpenVL
 Integration API for Computer Vision
 (like OpenGL for graphic )
 implements computer vision primitives




                                          13
All we have to do is wait 5-7 years for market
adaptation…..

If only there was a solution which is both optimized
for ARM NEON and for the fastest CPU in the market
                                                  14
One Development Toolkit – Two Implementations

      FastCV for ARM         FastCV for Snapdragon




      CPU    GPU                  CPU     GPU




      Neon   DSP                  VeNum   DSP




                                                     15
Fast CV Overview
 Fast CV is an API & library which enables Real-Time Computer Vision (CV)
    applications.
   FastCV enables mobile devices to run CV applications efficiently.
   FastCV allows developers to HW accelerate their CV application.
   FastCV is analogous to OpenGL ES in the rendering domain
   FastCV is a clean modular library.




                                                                             16
FastCV Architecture
  Applications
      CV




                                       AR            Gestures          Facial Recognition              Other




                     Augmented Reality APIs            Gestures APIs              Facial Recognition APIs                 Defined API
   Framework
   Optimized




                   QC Augmented                     QC Gesture                   QC Facial                     3rd Party CV
                      Reality                       Processing                  Recognition                    Frameworks



                                                                                                                   Computer Vision APIs


                                              FastCV Snapdragon                                                FastCV ARM
 Kernel




                                                    Display Drivers       Camera Drivers
 Hardware




                 Snapdragon                                                                                    Connectivity
                                              Adreno GPU           Video Core                Hexagon
                 CPU Core (s)                                                                                  Sensors etc



                                                                                                                                          17
FastCV 1.0 – Feature Grouping
 Math / Vector Operations
   Commonly used vector & math functions
 Image processing
   Image filtering, convolution and scaling operations
 Image transformation
   Warp perspective, affine transformations
 Feature detection
   Fast corner detection, harris corner detection, canny edge detection
 Object detection
   NCC based template matching object detection functions.
 3D reconstruction
   Homography, pose evaluation functions
 Color conversion
   Commonly used formats supported: e.g., YUV, RGB, YCrCb, etc.
 Clustering and search
   K clusters best fitting of a set of input points

                                                                           18
Industry Computer Vision Solutions
 FastCV is a processor-core agnostic acceleration API
 Khronos is looking to provide a standard CV API
   Potentially utilizing portions of OpenCV
 FastCV will evolve as Khronos standard is defined

                                     Application

                                                                    Media interface

     High-level CV algorithms library



                           Hardware Abstraction Layer
                        FastCV Hardware Acceleration API



             FastCV source
               Open for ARM                        HW Specific Implementations
                                                     Hardware vendor
                 reference
                (Reference
                                                     implementations
              implementation
             implementation)                   FastCV for   FastCV for   FastCV for   FastCV for
                                               Snapdragon     Nvidia        Intel      Others…


                                                                                                   19
FastCV Compared To OpenCV
                                                       FastCV
                       Function    OpenCV   FastCV
                                                     Snapdragon

                           NCC      1.0x     9.0x      23.1x
              Dot Product 128x4     1.0x     4.0x      10.0x
               Convert YUV420       1.0x     1.4x       1.3x
                          Sobel     1.0x     1.8x       7.8x
                    Median3x3       1.0x     3.8x      51.9x
                   Gaussian3x3      1.0x     2.6x       4.1x
                   Gaussian5x5      1.0x     1.4x       2.9x
                     Threshold      1.0x     0.7x       9.7x
                 Integral Image     1.0x     1.1x       1.3x
                  Harris Corner     1.0x     2.8x       8.6x
                          Dilate    1.0x     1.4x      15.0x
                         Erode      1.0x     1.3x      15.0x
                 Perspective Fit    1.0x    21.5x      37.8x
                LK Optical Flow     1.0x     2.0x      14.3x      20
Gain Is More Than Time
 Measure CPU frequency along with times
   Utilize single CPU in Linux performance mode
   Legend:       CPU Frequency         Long algorithm time   Short algorithm time




                                                                                     21
References
 More on
  OpenMAX http://www.slideshare.net/DSPIP/openmax-overview
  OpenCL http://www.slideshare.net/DSPIP/opencl-programming-101
  OpenSL http://www.slideshare.net/DSPIP/android-audio-opensl
 Download FastCV
  https://developer.qualcomm.com/develop/mobile-technologies/computer-vision-
   fastcv/getting-started-guide




                                                                                 22
Thank you!
More About me:
 Video Expert
                                       Yossi Cohen
 Lectures on Video / Android / VoIP
                                       yossicohen19@gmail.com
 Android Native Developer             http://www.mobilevideotech.com
                                       +972-545-313092




                                                                        23

Cross platform computer vision optimization

  • 1.
    Cross Platform Computer Vision Optimization YossiCohen Lecture at Google Technology User Group Tel-Aviv 1
  • 2.
  • 3.
    Computer Vision ApplicationTypes Augmented Gestures Reality Text Recognition CV with Depth active IR Mapping camera 3
  • 4.
    Conflicting Requirements Cross Platform Solution Run on All Devices Code Once Low cost maintenance / update Platform Specific Utilize all SoC capabilities for: Fast processing / fast response Low Power requirements 4
  • 5.
    Conflicting Development Cross Platform Development HTML5 Doesn’t work for Java Computer Vision (Yet) Platform Specific Development SIMD Optimization (ASM) Use platform specific GPU, DSP Use Platform Specific HW accelerators: CODECs Rotators Color Space Convertors… 5
  • 6.
    Possible Solutions • Too much Power Consumption Don't Optimize • Too Sloooowwww Optimize for one • Best performance for a single platform (market leader SoC) • Lose (50%+) market share platform (SoC) • Good Performance for all ARM platforms Optimize for ARM • Lose MIPS, X86 Market NEON Only • Lose GPU, DSP and HW specific acceleration capabilities • Development Costs Optimize for all • Knowledge problem platforms • Fragmented Code, high update & maintenance costs 6
  • 7.
    Optimize for oneprocessor architecture  Select a Processor based on Target Market:  For Android its ARM  Optimize for SIMD Instruction  NEON Optimization (Alternatively SSE or 3DNow)  Advantages  ~x1-x8 Acceleration (depending on function)  Fit ~95%+ of Android Market  Disadvantages ARM NEON  Not Suited for x86 & MIPS Optimization Unutilized  Does not utilize 100% of SoC capabilities:  Internal DSP  GPU  HW Accelerators  VFU 7
  • 8.
    Optimize for aSingle Processor  Select a Single Processor based on Target Market:  8960 - the fastest processor  250 Design wins  Optimize NEON  Optimize DSP  Optimize for GPU  Advantages  Youll have the fastest app on the best most widely used processor Optimized Optimized  Disadvantages  Development Time CPU GPU  need to support inferior/legacy processor as well VeNum DSP 8
  • 9.
    Selecting between twosub-optimal solutions Isn’t there someone that will solve this in a better way? 9
  • 10.
  • 11.
    Khronos  Standardization organization Generates OPEN, Royalty free API (unlike Oracle) for Cross HW software  Most Known API – OpenGL  In Android: OpenGL ES OpenMAX OpenSL 11
  • 12.
    Khronos Vision ofCross Platform Computer Vision Application Layer Sensory Input OpenCV High Level Algorithm Camera Input Video Out OpenVL Integration Layer OpenCL DSP, HW Accelerators, GPU 12
  • 13.
    OpenVL  Integration APIfor Computer Vision  (like OpenGL for graphic )  implements computer vision primitives 13
  • 14.
    All we haveto do is wait 5-7 years for market adaptation….. If only there was a solution which is both optimized for ARM NEON and for the fastest CPU in the market 14
  • 15.
    One Development Toolkit– Two Implementations FastCV for ARM FastCV for Snapdragon CPU GPU CPU GPU Neon DSP VeNum DSP 15
  • 16.
    Fast CV Overview Fast CV is an API & library which enables Real-Time Computer Vision (CV) applications.  FastCV enables mobile devices to run CV applications efficiently.  FastCV allows developers to HW accelerate their CV application.  FastCV is analogous to OpenGL ES in the rendering domain  FastCV is a clean modular library. 16
  • 17.
    FastCV Architecture Applications CV AR Gestures Facial Recognition Other Augmented Reality APIs Gestures APIs Facial Recognition APIs Defined API Framework Optimized QC Augmented QC Gesture QC Facial 3rd Party CV Reality Processing Recognition Frameworks Computer Vision APIs FastCV Snapdragon FastCV ARM Kernel Display Drivers Camera Drivers Hardware Snapdragon Connectivity Adreno GPU Video Core Hexagon CPU Core (s) Sensors etc 17
  • 18.
    FastCV 1.0 –Feature Grouping  Math / Vector Operations  Commonly used vector & math functions  Image processing  Image filtering, convolution and scaling operations  Image transformation  Warp perspective, affine transformations  Feature detection  Fast corner detection, harris corner detection, canny edge detection  Object detection  NCC based template matching object detection functions.  3D reconstruction  Homography, pose evaluation functions  Color conversion  Commonly used formats supported: e.g., YUV, RGB, YCrCb, etc.  Clustering and search  K clusters best fitting of a set of input points 18
  • 19.
    Industry Computer VisionSolutions  FastCV is a processor-core agnostic acceleration API  Khronos is looking to provide a standard CV API  Potentially utilizing portions of OpenCV  FastCV will evolve as Khronos standard is defined Application Media interface High-level CV algorithms library Hardware Abstraction Layer FastCV Hardware Acceleration API FastCV source Open for ARM HW Specific Implementations Hardware vendor reference (Reference implementations implementation implementation) FastCV for FastCV for FastCV for FastCV for Snapdragon Nvidia Intel Others… 19
  • 20.
    FastCV Compared ToOpenCV FastCV Function OpenCV FastCV Snapdragon NCC 1.0x 9.0x 23.1x Dot Product 128x4 1.0x 4.0x 10.0x Convert YUV420 1.0x 1.4x 1.3x Sobel 1.0x 1.8x 7.8x Median3x3 1.0x 3.8x 51.9x Gaussian3x3 1.0x 2.6x 4.1x Gaussian5x5 1.0x 1.4x 2.9x Threshold 1.0x 0.7x 9.7x Integral Image 1.0x 1.1x 1.3x Harris Corner 1.0x 2.8x 8.6x Dilate 1.0x 1.4x 15.0x Erode 1.0x 1.3x 15.0x Perspective Fit 1.0x 21.5x 37.8x LK Optical Flow 1.0x 2.0x 14.3x 20
  • 21.
    Gain Is MoreThan Time  Measure CPU frequency along with times  Utilize single CPU in Linux performance mode  Legend: CPU Frequency Long algorithm time Short algorithm time 21
  • 22.
    References  More on  OpenMAX http://www.slideshare.net/DSPIP/openmax-overview  OpenCL http://www.slideshare.net/DSPIP/opencl-programming-101  OpenSL http://www.slideshare.net/DSPIP/android-audio-opensl  Download FastCV  https://developer.qualcomm.com/develop/mobile-technologies/computer-vision- fastcv/getting-started-guide 22
  • 23.
    Thank you! More Aboutme:  Video Expert Yossi Cohen  Lectures on Video / Android / VoIP yossicohen19@gmail.com  Android Native Developer http://www.mobilevideotech.com +972-545-313092 23

Editor's Notes

  • #7 Best Suboptimal Solution