1. The document describes a mobile image recognition system using a CNN model called Network-in-Network. It was implemented as iOS and Android apps that can recognize food images without needing an online server.
2. The system achieves high accuracy of 78.8% for top-1 and 95.2% for top-5 recognition of food images from the UECFOOD100 dataset, with processing speeds of 55.7ms. It uses techniques like batch normalization and multi-threading to optimize performance on mobile.
3. The architecture was modified from the original Network-in-Network by adding batch normalization, reducing layers and kernels, and using multiple image sizes to balance recognition accuracy and speed. Global average pooling
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/how-transformers-are-changing-the-direction-of-deep-learning-architectures-a-presentation-from-synopsys/
Tom Michiels, System Architect for DesignWare ARC Processors at Synopsys, presents the “How Transformers are Changing the Direction of Deep Learning Architectures” tutorial at the May 2022 Embedded Vision Summit.
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrast them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes the presentation with insights on why Synopsys thinks transformers are an important approach for future visual perception tasks.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/how-transformers-are-changing-the-nature-of-deep-learning-models-a-presentation-from-synopsys/
Tom Michiels, System Architect for ARC Processors at Synopsys, presents the “How Transformers Are Changing the Nature of Deep Learning Models” tutorial at the May 2023 Embedded Vision Summit.
The neural network models used in embedded real-time applications are evolving quickly. Transformer networks are a deep learning approach that has become dominant for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrasts them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes with insights on why his company thinks transformers will become increasingly important for visual perception tasks.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/how-transformers-are-changing-the-direction-of-deep-learning-architectures-a-presentation-from-synopsys/
Tom Michiels, System Architect for DesignWare ARC Processors at Synopsys, presents the “How Transformers are Changing the Direction of Deep Learning Architectures” tutorial at the May 2022 Embedded Vision Summit.
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrast them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes the presentation with insights on why Synopsys thinks transformers are an important approach for future visual perception tasks.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/how-transformers-are-changing-the-nature-of-deep-learning-models-a-presentation-from-synopsys/
Tom Michiels, System Architect for ARC Processors at Synopsys, presents the “How Transformers Are Changing the Nature of Deep Learning Models” tutorial at the May 2023 Embedded Vision Summit.
The neural network models used in embedded real-time applications are evolving quickly. Transformer networks are a deep learning approach that has become dominant for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrasts them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes with insights on why his company thinks transformers will become increasingly important for visual perception tasks.
Future Internet: Managing Innovation and TestbedShinji Shimojo
Innovation is a big key word for ICT research and development. However, a road toward innovation is facing full of uncertainties and there are many obstacles. key elements to overcome these obstacles seems to be agile management of people, software and hardware. In addition, we think involvement of users in R&D will have much effect on the management of uncertainty in R&D. In this talk, I talk on our approach to this user involvement in JGN-X, an international future internet testbed and Knowledge Capital, Osaka, an smart city experimental testbed.
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
Slides from Ottawa Machine Learning Meetup from January 16, 2016.
Pierre Paulin, Director of R&D at Synopsys (Embedded Vision Subsystems) , will be will be making a presentation on:
“Applying Deep Learning Vision Technology to Low-Cost, Low-Power Embedded Systems: An Industrial Perspective”
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...MIPI Alliance
Panel discussion with Haran Thanigasalam, Intel Corporation, MIPI Camera Working Group chair; Natsuko Ibuki, Google, LLC;
Yuichi Mizutani, Sony Corporation; and Wonseok Lee, Samsung Electronics, Co.
Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...Willy Anugrah Cahyadi
It is a summary of my dissertation for the Ph.D. degree. The main topic is optical camera communication (OCC), which was being standardized under the revision to IEEE 802.15.7-2011 standard in 2018.
Inria Tech Talk : Améliorez vos applications de robotique & réalité augmentéeStéphanie Roger
Que vous œuvriez dans le secteur de l’industrie, la robotique, la santé ou la réalité augmentée, profitez de ViSP pour développer de nouvelles opportunités business / transfert industriel.
ViSP (Visual Servoing Platform) est une solution technologique utilisée en robotique et réalité augmentée pour commander un robot à l’aide d’une caméra.
Rethinking the Mobile Code Offloading Paradigm: From Concept to PracticeMobileSoft
Rethinking the Mobile Code Offloading Paradigm: From Concept to Practice by José I. Benedetto Andrés Neyem Jaime Navón Guillermo Valenzuela. MobileSoft 2017, Buenos Aires.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/12/trends-in-neural-network-topologies-for-vision-at-the-edge-a-presentation-from-synopsys/
For more information about edge AI and computer vision, please visit:
https://www.edge-ai-vision.com
Pierre Paulin, Director of R&D for Embedded Vision at Synopsys, presents the “Trends in Neural Network Topologies for Vision at the Edge” tutorial at the September 2020 Embedded Vision Summit.
The widespread adoption of deep neural networks (DNNs) in embedded vision applications has increased the importance of creating DNN topologies that maximize accuracy while minimizing computation and memory requirements. This has led to accelerated innovation in DNN topologies.
In this talk, Paulin summarizes the key trends in neural network topologies for embedded vision applications, highlighting techniques employed by widely used networks such as EfficientNet and MobileNet to boost both accuracy and efficiency. He also touches on other optimization methods—such as pruning, compression and layer fusion—that developers can use to further reduce the memory and computation demands of modern DNNs.
This presentation depicts my research and teching activities since the achievement of the PhD degree, and it is mainly based on the talk I gave to defend my HDR (Habilitation à Diriger des Recherches)
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/mipi-csi-2-image-sensor-interface-standard-features-enable-efficient-embedded-vision-systems-a-presentation-from-the-mipi-alliance/
Haran Thanigasalam, Camera and Imaging Consultant to the MIPI Alliance, presents the “MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedded Vision Systems” tutorial at the May 2023 Embedded Vision Summit.
As computer vision applications continue to evolve rapidly, there’s a growing need for a smarter standardized interface connecting multiple image sensors to processors for real-time perception and decision-making. In this presentation, Thanigasalam provides a deep dive into the latest version of the widely implemented CSI-2 v4.0 interface from MIPI Alliance.
This new version includes key features specifically designed to support computer vision applications, including democratized Smart Region of Interest, Always-On Sentinel Conduit, Multi-Pixel Compression and Latency Reduction and Transport Efficiency. These novel features enable sophisticated machine awareness with reduced system power and processing needs, making them well suited for consumer, commercial and infrastructure platforms.
The Network Revolution, John Zannos, CanonicalAlan Quayle
The Network Revolution, changing how network enabled services are consumed.
John Zannos, Vice President - Cloud Platform / Alliances, Canonical
The telecom industry has been undergoing a revolution. NFV, Cloud and IoT has rapid changed what it means to be carrier. They are expected to supply services at internet speeds with carrier reliability. Moving to a software based model for network services is the only way to accelerate time to market for new services.
Presented at TADSummit 2016, 15-16 Nov, Lisbon in the Sponsors' Plenary
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
This presentation will demonstrate our recent progress in developing advanced computer vision algorithms using embedded platforms for video-based face recognition, vehicle attribute analysis, urban management event detection, and high-density crowd counting. These algorithms combine the traditional CV approach with recent advances in deep learning to make high-performance computer vision systems practical and enable products in several vertical markets including intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. We will demonstrate algorithm design and optimization scheme for several recently available processors from Movidius, Nvidia, and ARM.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/fomo-real-time-object-detection-on-microcontrollers-a-presentation-from-edge-impulse/
Jan Jongboom, Co-founder and CTO of Edge Impulse, delivers the “FOMO: Real-Time Object Detection on Microcontrollers” tutorial at the May 2022 Embedded Vision Summit.
Object detection models are vital for many computer vision applications. They can show where an object is in a video stream, or how many objects are present. But they’re also very resource-intensive—models like MobileNet SSD can analyze a few frames per second on a Raspberry Pi 4, using a significant amount of RAM. This has put object detection out of reach for the most interesting devices: microcontrollers.
Microcontrollers are cheap, small, ubiquitous and energy efficient—and are thus attractive for adding computer vision to everyday devices. But microcontrollers are also very resource-constrained, with clock speeds of up to 200 MHz and less than 256 Kbytes of RAM—far too little to run complex object detection models. But… that’s about to change! In this talk, Jongboom outlines his company’s work on FOMO (“faster objects, more objects”), a novel DNN architecture for object detection, designed from the ground up to run on microcontrollers.
Future Internet: Managing Innovation and TestbedShinji Shimojo
Innovation is a big key word for ICT research and development. However, a road toward innovation is facing full of uncertainties and there are many obstacles. key elements to overcome these obstacles seems to be agile management of people, software and hardware. In addition, we think involvement of users in R&D will have much effect on the management of uncertainty in R&D. In this talk, I talk on our approach to this user involvement in JGN-X, an international future internet testbed and Knowledge Capital, Osaka, an smart city experimental testbed.
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
Slides from Ottawa Machine Learning Meetup from January 16, 2016.
Pierre Paulin, Director of R&D at Synopsys (Embedded Vision Subsystems) , will be will be making a presentation on:
“Applying Deep Learning Vision Technology to Low-Cost, Low-Power Embedded Systems: An Industrial Perspective”
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...MIPI Alliance
Panel discussion with Haran Thanigasalam, Intel Corporation, MIPI Camera Working Group chair; Natsuko Ibuki, Google, LLC;
Yuichi Mizutani, Sony Corporation; and Wonseok Lee, Samsung Electronics, Co.
Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...Willy Anugrah Cahyadi
It is a summary of my dissertation for the Ph.D. degree. The main topic is optical camera communication (OCC), which was being standardized under the revision to IEEE 802.15.7-2011 standard in 2018.
Inria Tech Talk : Améliorez vos applications de robotique & réalité augmentéeStéphanie Roger
Que vous œuvriez dans le secteur de l’industrie, la robotique, la santé ou la réalité augmentée, profitez de ViSP pour développer de nouvelles opportunités business / transfert industriel.
ViSP (Visual Servoing Platform) est une solution technologique utilisée en robotique et réalité augmentée pour commander un robot à l’aide d’une caméra.
Rethinking the Mobile Code Offloading Paradigm: From Concept to PracticeMobileSoft
Rethinking the Mobile Code Offloading Paradigm: From Concept to Practice by José I. Benedetto Andrés Neyem Jaime Navón Guillermo Valenzuela. MobileSoft 2017, Buenos Aires.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/12/trends-in-neural-network-topologies-for-vision-at-the-edge-a-presentation-from-synopsys/
For more information about edge AI and computer vision, please visit:
https://www.edge-ai-vision.com
Pierre Paulin, Director of R&D for Embedded Vision at Synopsys, presents the “Trends in Neural Network Topologies for Vision at the Edge” tutorial at the September 2020 Embedded Vision Summit.
The widespread adoption of deep neural networks (DNNs) in embedded vision applications has increased the importance of creating DNN topologies that maximize accuracy while minimizing computation and memory requirements. This has led to accelerated innovation in DNN topologies.
In this talk, Paulin summarizes the key trends in neural network topologies for embedded vision applications, highlighting techniques employed by widely used networks such as EfficientNet and MobileNet to boost both accuracy and efficiency. He also touches on other optimization methods—such as pruning, compression and layer fusion—that developers can use to further reduce the memory and computation demands of modern DNNs.
This presentation depicts my research and teching activities since the achievement of the PhD degree, and it is mainly based on the talk I gave to defend my HDR (Habilitation à Diriger des Recherches)
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/mipi-csi-2-image-sensor-interface-standard-features-enable-efficient-embedded-vision-systems-a-presentation-from-the-mipi-alliance/
Haran Thanigasalam, Camera and Imaging Consultant to the MIPI Alliance, presents the “MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedded Vision Systems” tutorial at the May 2023 Embedded Vision Summit.
As computer vision applications continue to evolve rapidly, there’s a growing need for a smarter standardized interface connecting multiple image sensors to processors for real-time perception and decision-making. In this presentation, Thanigasalam provides a deep dive into the latest version of the widely implemented CSI-2 v4.0 interface from MIPI Alliance.
This new version includes key features specifically designed to support computer vision applications, including democratized Smart Region of Interest, Always-On Sentinel Conduit, Multi-Pixel Compression and Latency Reduction and Transport Efficiency. These novel features enable sophisticated machine awareness with reduced system power and processing needs, making them well suited for consumer, commercial and infrastructure platforms.
The Network Revolution, John Zannos, CanonicalAlan Quayle
The Network Revolution, changing how network enabled services are consumed.
John Zannos, Vice President - Cloud Platform / Alliances, Canonical
The telecom industry has been undergoing a revolution. NFV, Cloud and IoT has rapid changed what it means to be carrier. They are expected to supply services at internet speeds with carrier reliability. Moving to a software based model for network services is the only way to accelerate time to market for new services.
Presented at TADSummit 2016, 15-16 Nov, Lisbon in the Sponsors' Plenary
Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm.
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
This presentation will demonstrate our recent progress in developing advanced computer vision algorithms using embedded platforms for video-based face recognition, vehicle attribute analysis, urban management event detection, and high-density crowd counting. These algorithms combine the traditional CV approach with recent advances in deep learning to make high-performance computer vision systems practical and enable products in several vertical markets including intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. We will demonstrate algorithm design and optimization scheme for several recently available processors from Movidius, Nvidia, and ARM.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/fomo-real-time-object-detection-on-microcontrollers-a-presentation-from-edge-impulse/
Jan Jongboom, Co-founder and CTO of Edge Impulse, delivers the “FOMO: Real-Time Object Detection on Microcontrollers” tutorial at the May 2022 Embedded Vision Summit.
Object detection models are vital for many computer vision applications. They can show where an object is in a video stream, or how many objects are present. But they’re also very resource-intensive—models like MobileNet SSD can analyze a few frames per second on a Raspberry Pi 4, using a significant amount of RAM. This has put object detection out of reach for the most interesting devices: microcontrollers.
Microcontrollers are cheap, small, ubiquitous and energy efficient—and are thus attractive for adding computer vision to everyday devices. But microcontrollers are also very resource-constrained, with clock speeds of up to 200 MHz and less than 256 Kbytes of RAM—far too little to run complex object detection models. But… that’s about to change! In this talk, Jongboom outlines his company’s work on FOMO (“faster objects, more objects”), a novel DNN architecture for object detection, designed from the ground up to run on microcontrollers.
OpenCV DNN moduleとOur methodのruntimeを比較したスライドで、13th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services(MOBIQUITOUS)(http://mobiquitous.org/2016/show/home) のworkshopで発表したスライドの一部になっています。画像認識部分の詳細は省略しました。
1. 2016 UEC Tokyo.
Introduction of Mobile CNN
2016/11/10(Thu)
Department of Informatics,
The University of Electro-Communications,
Yanai Laboratory,
Ryosuke Tanno
2. ⓒ 2016 UEC Tokyo.
• Affiliation: master 1 student at University of Electro-
Communications(Yanai Laboratory)
• Research:
– Bachelor: Implementation and Comparative Analysis of Image
Recognition System based on Deep Learning on Mobile OS
– Master: Image Recognition and Image Transfer
based on Deep Learning
Self Introduction
3. ⓒ 2016 UEC Tokyo.
Contributions
• Stand-alone DCNN-based mobile image recognition
– No need of a recognition server and communication.
– Built-it trained DCNN model with UECFOOD-100
– Implemented as iOS/Android app.
– Released as iOS app on https://goo.gl/4m2tQz
– as Android app (APK) on http://foodcam.mobi/
• Excellent performance with reasonable speed and model size
– UECFOOD100 : 78.8% (top-1) 95.2% (top-5)
in 55.7 [ms] with 5.5M weights (22MB)
– Employing Network-in-Network
– Adding batch normalization and additional layers
• Multi-scale recognition
– User can choose the balance between speed and accuracy
• 26.2[ms] for 160x160 images ⇔ 55.7[ms] for 227x227 images (on iPhone 7 Plus)
4. ⓒ 2016 UEC Tokyo.
CNN architecture (1)
• The amounts of weights in AlexNet and VGG-16 are
too much for mobile.
• GoogLeNet is too complicated
for efficient parallel implemen
-tation. (It has many branches.)
5. ⓒ 2016 UEC Tokyo.
CNN architecture (2)
• We adopt Network-in-Network (NIN).
– No fully-connected layers (which bring less weights)
– Straight flow and consisting of many conv layers
⇒ It’s easy for parallel implementation.
Efficient computation for conv layers is needed !
Network-In-Network(NIN)
6. ⓒ 2016 UEC Tokyo.
Extension of NIN
adding BN, 5layers, multiple image size
• Modified models (BN, 5layer, multi-scale)
– adding BN layers just after all the conv/cccp layers
– replaced 5x5 conv with two 3x3 conv layers
– reduced the number of kernels in conv 4 from 1024 to 768
– replaced fixed average pooling with Global Average Pooling
• Multiple image size
4layers
5layers+BN
227x227 180x180 160x160 Trade-off: Accuracy vs speed
227x227
55.7ms 78.8%
180x180
35.5ms 76.0%
160x160
26.3ms 71.5%Global Average Pooling (GAP)
7. ⓒ 2016 UEC Tokyo.
• Speeding up Conv layers →Speeding up GEMM
– computation of conv layer is decomposed into “im2col”
operation and generic matric multiplications(GEMM)
– Multi-threading: Use 2cores in iOS , 4 cores in Android in
parallel
– SIMD instruction(NEON in ARM-based processor)
• Total: iOS: 2Core*4 = 8calculation, Android: 4Core*4 = 16 calculation
– BLAS library(highly optimized for iOS ⇔ not optimized for
Android)
• BLAS(iOS: BLAS in iOS Accelerate Framework, Android: OpenBLAS)
Fast Implementation on Mobile
9. ⓒ 2016 UEC Tokyo.
Evaluation: Processing time
• iOS: BLAS >> NEON, Android: BLAS << NEON
– The BLAS library in iOS Accelerate Framework is very efficient !
iOS
iOS
iOS
Android
Trade-off:
accuracy vs. speed
by changing the size of input images
Fastest !
Most accurate !
Achieved “real”
real-time 26.2 ms!
10. ⓒ 2016 UEC Tokyo.
Comparison to FV-based FoodCam
with UEC-FOOD100 dataset
• Much improved ( 65.3% ⇒ 81.5% (top-1) )
• Even for 160x160 improved ( 65.3% ⇒ 71.5% )
60.0%
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
1 2 3 4 5 6 7 8 9 10
AlexNet
NIN 5layer [104ms]
NIN 4layer [67ms]
NIN 4layer (160x160) [33ms]
FV (Color+HOG) [65ms]
Top1:
81.5%
Top1:
65.3%
Top5:
96.2%
Top5:
86.7%