ADS Team 8 Final Presentation

•Download as PPTX, PDF•

0 likes•89 views

The document describes a project to perform object detection in videos. The team's scope was to identify, list, localize and bound objects in video frames using machine learning. They chose the MS-COCO dataset and the SSD model for its efficiency and speed at object detection. A comparative analysis found SSD_MOBILENET_V1_COCO to have the best balance of speed and accuracy. The team performed transfer learning to customize the model for new object types. They developed a web application using Flask that streams video frames from the client to perform object detection and returns bounding box coordinates.

Software

Object Detection in Videos
Team 8:
Priyesh Kaushik
Pranay Mankad

Introduction
• Videos are basically multiple frames in a sequence which have several
objects in them at any given moment. Machine learning can be used
to identify these objects and make them searchable using tags.

Our Scope
• Identify objects in videos
• Listing objects
• Localizing them per frame and
• Bounding them with boxes

Our Approach - Dataset
• We chose our dataset based on observations of mean objects per
image. We observed that the maximum were in the MS-COCO
dataset.

Approach – Selecting Model
• There are several models available for making Convoluted Neural
Networks. Based on research we found that Faster R-CNN and The
SSD (Single-Shot Multibox Detector) are highly efficient at detecting
objects in frames.
• Based on comparitive results we decided to go with the SSD model,
with the coco-dataset.

Comparative Analysis
Model Name Speed (ms) COCO mAP [^1]
ssd_mobilenet_v1_coco 30 21
ssd_inception_v2_coco 42 24
faster_rcnn_inception_v2_coco 58 28
faster_rcnn_resnet50_coco 89 30
mAP is the mean average precision that is calculated for the basis of classification.
After the comparative analysis, we decided on using the SSD_MOBILENET_V1_COCO. Here are some details
about what we’re dealing with.

Single Shot Multibox Detection Specifics
• Takes inputs of 300x300
• Training requires image and the ground bounding boxes
• Performs non-maximum suppression internally

SSD v/s The Rest
On the basis of a different dataset, but proportions stay the same with COCO.

Transfer Learning
• We performed transfer learning over the SSD model, using Python,
LXML, LabelImg, Paperspace and Tensorflow.
• Steps involved were:
• Gathering Images for custom objects,
• Drawing bounding box for images,
• Generating an XML with dimensions for the bounding box,
• Using Tensorflow to train model on the object,
• Used Paperspace for utilizing a GPU.
• Used Tensorboard to monitor accuracy at various iterations.

How we made it
• We started off using openCV for capturing videos and rendering as
images.
• But openCV was harder to configure on cloud platforms as an API for
accessing web camera footage, which was a goal.
• So here’s what we followed.
Flask
Application
Client
Side
WebRTC Image
Stream
Start Object
Detection
Client
Side
Classify and
Box Images
Return
Coordinates
Render on
Browser using
JS

Further down the line
• This application can be used in inventory management using
computer vision. We see segmentation as a possibility for bring smart
checkouts to convenience stores that may not be as heavy on
infrastructure as Amazon or competition.
• Achieve better performance by pruning the model.

Work Allocation
• We split the work almost equally across all fields.
Priyesh Kaushik Pranay Mankad
Implementation 50% Implementation 50%
Model Training Custom Object Training
Web Interfacing 25% Web Interfacing 75%
Transfer Learning 75% Transfer Learning 24%
Documentation 50% Documentation 50%
Presentation 49% Presentation 49%

What's hot

Anomaly detection in deep learning (Updated) EnglishAdam Gibson

Open Source vs. Open Standards by Sage WeilRed_Hat_Storage

Open Stack Cheng Du Swift Alex YangOpenCity Community

CEPH technical analysis 2014Erwan Quigna

"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance

2016 08-05 - Intro to OpenStackAlfonso Peletier

The Old New Crash: Cloud Memory Dump AnalysisDmitry Vostokov

Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Spark Summit

Long running aws lambda - Joel Schuweiler, MinneapolisAWS Chicago

Dell openstack cloud with inktank ceph – large scale customer deploymentKamesh Pemmaraju

Disaggregating Ceph using NVMeoFShapeBlue

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Romeo Kienzler

Object Storage in a Cloud-Native Container EnvirnomentMinio

Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...AWS Chicago

Running a Massively Parallel Self-serve Distributed Data System At ScaleZhenzhong Xu

Tensorflow vs MxNetAshish Bansal

DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0Plain Concepts

Writing high performance code in NetCore 3.0Javier Cantón Ferrero

Scaling drupal on amazon web services drTristan Roddis

Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingYanpingWang

What's hot (20)

Anomaly detection in deep learning (Updated) English

Open Source vs. Open Standards by Sage Weil

Open Stack Cheng Du Swift Alex Yang

CEPH technical analysis 2014

"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...

2016 08-05 - Intro to OpenStack

The Old New Crash: Cloud Memory Dump Analysis

Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...

Long running aws lambda - Joel Schuweiler, Minneapolis

Dell openstack cloud with inktank ceph – large scale customer deployment

Disaggregating Ceph using NVMeoF

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16

Object Storage in a Cloud-Native Container Envirnoment

Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...

Running a Massively Parallel Self-serve Distributed Data System At Scale

Tensorflow vs MxNet

DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0

Writing high performance code in NetCore 3.0

Scaling drupal on amazon web services dr

Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

XpertSolvers: Your Partner in Building Innovative Software Solutions

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝

What is Binary Language? Computer Number Systems

5 Signs You Need a Fashion PLM Software.pdf

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

Salesforce Certified Field Service Consultant

Asset Management Software - Infographic

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

DNT_Corporate presentation know about us

HR Software Buyers Guide in 2024 - HRSoftware.com

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

The Evolution of Karaoke From Analog to App.pdf

EY_Graph Database Powered Sustainability

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

ADS Team 8 Final Presentation

1. Object Detection in Videos Team 8: Priyesh Kaushik Pranay Mankad

2. Introduction • Videos are basically multiple frames in a sequence which have several objects in them at any given moment. Machine learning can be used to identify these objects and make them searchable using tags.

3. Our Scope • Identify objects in videos • Listing objects • Localizing them per frame and • Bounding them with boxes

4. Our Approach - Dataset • We chose our dataset based on observations of mean objects per image. We observed that the maximum were in the MS-COCO dataset.

5. Approach – Selecting Model • There are several models available for making Convoluted Neural Networks. Based on research we found that Faster R-CNN and The SSD (Single-Shot Multibox Detector) are highly efficient at detecting objects in frames. • Based on comparitive results we decided to go with the SSD model, with the coco-dataset.

6. Comparative Analysis Model Name Speed (ms) COCO mAP [^1] ssd_mobilenet_v1_coco 30 21 ssd_inception_v2_coco 42 24 faster_rcnn_inception_v2_coco 58 28 faster_rcnn_resnet50_coco 89 30 mAP is the mean average precision that is calculated for the basis of classification. After the comparative analysis, we decided on using the SSD_MOBILENET_V1_COCO. Here are some details about what we’re dealing with.

7. CNN – a birds eye view

8. Object Detection – a birds eye view

9. Single Shot Multibox Detection Specifics • Takes inputs of 300x300 • Training requires image and the ground bounding boxes • Performs non-maximum suppression internally

10. SSD v/s The Rest On the basis of a different dataset, but proportions stay the same with COCO.

11. Transfer Learning • We performed transfer learning over the SSD model, using Python, LXML, LabelImg, Paperspace and Tensorflow. • Steps involved were: • Gathering Images for custom objects, • Drawing bounding box for images, • Generating an XML with dimensions for the bounding box, • Using Tensorflow to train model on the object, • Used Paperspace for utilizing a GPU. • Used Tensorboard to monitor accuracy at various iterations.

12.

13.

14. How we made it • We started off using openCV for capturing videos and rendering as images. • But openCV was harder to configure on cloud platforms as an API for accessing web camera footage, which was a goal. • So here’s what we followed. Flask Application Client Side WebRTC Image Stream Start Object Detection Client Side Classify and Box Images Return Coordinates Render on Browser using JS

15. What we could do with this

16. Further down the line • This application can be used in inventory management using computer vision. We see segmentation as a possibility for bring smart checkouts to convenience stores that may not be as heavy on infrastructure as Amazon or competition. • Achieve better performance by pruning the model.

17. Work Allocation • We split the work almost equally across all fields. Priyesh Kaushik Pranay Mankad Implementation 50% Implementation 50% Model Training Custom Object Training Web Interfacing 25% Web Interfacing 75% Transfer Learning 75% Transfer Learning 24% Documentation 50% Documentation 50% Presentation 49% Presentation 49%

18. Have Questions?

19. • Thank you!

ADS Team 8 Final Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ADS Team 8 Final Presentation

Similar to ADS Team 8 Final Presentation (20)

Recently uploaded

Recently uploaded (20)

ADS Team 8 Final Presentation