SlideShare a Scribd company logo
1 of 75
Download to read offline
SMART-SCOOTER RIDER ASSISTANCE SYSTEM USING INTERNET OF
WEARABLE THINGS AND COMPUTER VISION
By
DEVANSH GUPTA
Submitted in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Computer and Data Sciences
CASE WESTERN RESERVE UNIVERSITY
May 2021
CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
We hereby approve the thesis/dissertation of
DEVANSH GUPTA
candidate for the degree of
Master of Science
Committee Chair
Dr. Ming-Chun Huang
Committee Member
Dr. Yanfang (Fanny) Ye
Committee Member
Dr. An Wang
Committee Member
Dr. Yinghui Wu
Date of Defense
26th
April 2021
∗
We also certify that written approval has been obtained
for any proprietary material contained therein.
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1: Introduction, Objective and Contributions . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2: BACKGROUND of the Smart Scooter rider assistance system . . . . 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Hardware and Software Requirements . . . . . . . . . . . . . . . . . . . . 9
2.3 Wearable Gait Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Android OS and Programming . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Smart Scooter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
iii
2.7 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Multi-Task Cascaded Convolutional Neural Network (MTCNN) . . . . . . 13
2.9 FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.10 MobileV2Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.11 Support Vector Classification . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 Implementation of the Masked face recognition system . . . . . . . . . . . 23
2.13 Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.14 Graphical Processing Units and CUDA . . . . . . . . . . . . . . . . . . . . 26
2.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 3: Related Work for the Smart scooter rider assistance system . . . . . 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 4: Implementation of STEADi application . . . . . . . . . . . . . . . . 35
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Method to calculate Balance of the rider . . . . . . . . . . . . . . . . . . . 36
4.3 Path Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 PotHoles Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 5: Experiment, Results, and Discussion . . . . . . . . . . . . . . . . . . 46
5.1 Experiment Design for Masked Face authentication . . . . . . . . . . . . . 46
iv
5.2 Results for Masked Face authentication system . . . . . . . . . . . . . . . 47
5.2.1 Results from Test of Image processing and cropping with different
Image Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Results from Test of Masked Face recognition with different Image
Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Experiment Design for the STEADi application . . . . . . . . . . . . . . . 50
5.4 Results for the STEADi application . . . . . . . . . . . . . . . . . . . . . . 52
5.4.1 First-time rider of the Smart Scooter . . . . . . . . . . . . . . . . . 52
5.4.2 Riding scooter on different terrains: Up and down the slope and
riding on the grass . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 6: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 57
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Future Scope and Research . . . . . . . . . . . . . . . . . . . . . . . . . . 58
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
v
LIST OF TABLES
2.1 The value of losses at each stage of the . . . . . . . . . . . . . . . . . . . 16
5.1 Time taken to process an image vs the resolution of the image . . . . . . . . 50
5.2 Time taken to recognize face mask vs the resolution of the image . . . . . . 50
vi
LIST OF FIGURES
1.1 What discouraged students and locals from using an e-scooter? [8] . . . . . 3
1.2 Raw Images from the MAFA dataset. . . . . . . . . . . . . . . . . . . . . 5
1.3 Raw image from RMFRD . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Pictures taken automatically using OpenCV for training . . . . . . . . . . . 6
1.5 Extracted pictures from the automatic pictures taken using OpenCV . . . . 6
2.1 Hardware and Software of Smart Insole System. (a) 3D bifurcation of the
Insole System with Assembly structure (b) Insole Foot Pressure measure-
ment for fore, mid, and hind section. The highest pressure in that area
during a stride is indicated by the black dot in each area. The Red dashed
line indicates US-sized insole used in this research. The Graph shows the
measured GRF of each area during a gait cycle. . . . . . . . . . . . . . . . 10
2.2 Basic Android layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Execution environment of the Android applications. . . . . . . . . . . . . 11
2.4 Network structure of MTCNN that includes three-stage multi-task deep
convolution networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 FaceNet take image as an input ans return a 128 number vector output . . . 17
2.6 Triplets when at Step 1 and Triplets after FaceNet at Step 6. . . . . . . . . . 18
2.7 Depthwise convolution, uses 3 kernels to transform a 12x12x3 image to a
8x8x3 image and, Pointwise convolution, transforms an image of 3 chan-
nels to an image of 1 channel [22] . . . . . . . . . . . . . . . . . . . . . . 19
2.8 The main building block in MobileNetV2 . . . . . . . . . . . . . . . . . . 20
vii
2.9 The compression and decompression inside a building block of the Mo-
bileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.10 Flowchart of Masked Face Recognition Process using SVC. . . . . . . . . . 24
2.11 Basic structure of the face recognition program. . . . . . . . . . . . . . . . 25
2.12 A multiprocessing optimization of the Masked Face Recognition system. . . 27
2.13 Memory model of CUDA Programming [23] . . . . . . . . . . . . . . . . 28
2.14 Complete overview of blocks using CUDA for face recognition system . . 29
2.15 Proposed system for CUDA based Masked Face recognition. . . . . . . . . 30
4.1 STEADi application flow-chart showcasing the workflow of how the appli-
cation checks for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Code for the balancing algorithm . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 (a) When the STEADi shows that rider is balanced (b) When the STEADi
shows that rider is not balanced . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Code to ask user for the location access. . . . . . . . . . . . . . . . . . . . 40
4.5 Settings page which gives various options to the user and grants the back-
ground location access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Code to update the location of the user. . . . . . . . . . . . . . . . . . . . . 42
4.7 STEADi application detecting potholes on the road . . . . . . . . . . . . . 43
4.8 Making sure if OpenCV loads properly. . . . . . . . . . . . . . . . . . . . 44
4.9 Code for accessing the camera. . . . . . . . . . . . . . . . . . . . . . . . . 44
4.10 Working of YOLO framework . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 (a) Face Mask detection with wearing mask. (b) Face Mask detection with-
out wearing mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Precision-Recall curve for the Face mask detection system . . . . . . . . . 48
viii
5.3 (a) Face Mask recognition with wearing a mask. (b) Face Mask recognition
without wearing a mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Time taken to process an image vs the resolution of the image . . . . . . . . 49
5.5 Time taken to recognize face mask vs the resolution of the image . . . . . . 51
5.6 Insole sensor data when the rider was riding for the first-time. . . . . . . . . 53
5.7 Insole sensor data when the rider goes up the hill. . . . . . . . . . . . . . . 54
5.8 Insole sensor data when the rider tried every experiment simultaneously. . . 55
ix
ACKNOWLEDGMENTS
I would like to thank for supporting me and encouraging my passion for machine learn-
ing and software engineering in general. I would also like to thank my friends for their
support of my work and endless patience throughout the entire process. Most importantly,
I would like to thank my advisor, Dr. Ming-Chun Huang for his great ideas and without
whom this thesis would not be possible. Finally, I would like to thank my committee mem-
bers, Dr. Yanfang (Fanny) Ye, Dr. An Wang and, Dr. Yinghui Wu for reviewing my thesis
and providing me with valuable feedback.
x
LIST OF ACRONYMS
ADAS Advanced Driver Assistance Systems
AI Artificial Intelligence
AOH Ahead of Time
CUDA Compute Unified Device Architecture
FBI Federal Bureau of Investigation
GPU Graphical Processing Units
ID Identification
IoWT Internet of Wearable Things
MTCNN Multi-Task Cascaded Convolutional Neural Network
NMS Non-Maximum Suppression
NNN Natural Neural Network
OHA Open Handset Alliance
OpenCV Open Source Computer Vision
OS Operating System
PReLU Parametric ReLU
SVC Support Vector Classifier
WGL Wearable Gait Lab
xi
Smart-Scooter Rider Assistance System using Internet of Wearable Things and
Computer Vision
By
Devansh Gupta
Abstract
The use of intelligent human/computer interaction systems has become an irreplace-
able part of a student’s life, be it the extensive use of personal mobility vehicles like
smart-scooters, or bicycles or the use of biometric authentication systems for any kind
of authentication, the use of intelligent human-computer interaction systems has become
an irreplaceable part of a student’s life. The aim of this thesis is to propose an IoWT
and computer vision-based solution that enhances campus safety focusing on the personal
mobility vehicle, Smart-scooters and facial recognition system for students, faculties and,
workers wearing masks on campus. The thesis presents a one-of-a-kind ”Smart-Scooter
rider-assistance system,” STEADi, which focuses on the personal mobility vehicles such
as smart scooters safety for helping the riders while riding the scooter on side-roads or the
sidewalks. The system proposed in this thesis uses a self-training parallel Masked face
recognition system for authorization and, sensors for safety monitoring.
xii
CHAPTER 1
INTRODUCTION, OBJECTIVE AND CONTRIBUTIONS
1.1 Introduction
With the growth in the student population, the issues revolving around the student’s safety
on-campus are becoming a major concern around the world. The safety issues range from
the misuse of personal mobility vehicles to the need for new facial recognition systems
which can process masked images. Although the safety of students around the world has
improved over the past year, these safety issues can be improved through the use of Artifi-
cial Intelligence (AI) and the Internet of Wearable Things (IoWT). This thesis purposes a
systems to either improve the current solutions available or solve these issues.
The system proposed in this thesis is a ”Smart-Scooter rider-assistance system,” STEADi.
The primary problem this system attempts to solve focuses on helping the riders riding their
personal mobility vehicles, smart scooters, on side-roads or the sidewalks. This system uses
a Wearable Gait Lab [1] for, a wearable underfoot force-sensing intelligent unit, as one of
the main components. The purpose of this system is to help students who are new to using
smart scooters on campus avoid injuries and accidents by alerting the rider about unforesee-
able conditions. The system provides adequate data for path tracking, the pothole Detection
system, and human balancing ability for the Smart Scooter riders. A straightforward ma-
chine learning approach using the android application, Defectdetect [2] which makes it
easy to identify potholes and other similar road surface irregularities from the accelerom-
eter and the Wearable Gait Lab system data. Since the spread of the COVID-19 virus,
the use of the more established biometric systems based on fingerprints and passwords is
not safe, whereas the facial recognition systems have been having a hard time during the
COVID-19 pandemic. One of the rapidly increasingly common modern-day annoyance is
1
pulling out your smartphone for a hygienic contact-free payment at a shop, and glaring
down at an error message, ”Face Not Recognized”. One of the well-known examples is
Apples Face Identification (ID), a technology, that uses a grid of infrared dots to calibrate
the physical appearance of a users face. The proposed system also implements a unique
authorization technology to improve the present-day facial recognition systems which can
recognize faces wearing masks. The face recognition module is based on three models:
Face detection using Multi-Task Cascaded Convolutional Neural Network (MTCNN) [3],
Face Mask Recognition system using MobileNet V2 [4], and an Support Vector Classifier
(SVC)-based masked face recognition system [5]. Two algorithms are used to improve the
performance of the system. The first is a multiprocessing system implemented in Python
and, the second is using a python library for Cuda), PyCUDA which gives Pythonic access
to Nvidia’s CUDA parallel computation API.
The thesis evaluates the proposed system using four balance tests which are based on
different terrain and with different diverse riding experiences related to the Smart Scoot-
ers. The test results showed successful detection of several potholes in and around the
Cleveland area with a 64.56% average precision and 69.12% recall on the smartphone. The
system was successfully able to alert the riders, including the lesser experienced ones while
riding on different terrains for the potential road-related threats and balancing. The system
requires the rider to place a smart insole under each foot, while connected using Bluetooth
Low Energy (BLE) to a smartphone. The data is collected using the android application,
then normalized and used with the orientation sensors data of the smartphone to calculate
a balancing score. The balancing score will tell the user if he/she is balancing the scooter
properly or not [6].
1.2 Objectives
The main objective behind researching the use of computer vision applications is to find a
way to enhance the safety around campus. The smart-scooter company, the Bird, conducted
2
a study on scooter safety. The study concluded that scooters involve similar risks as bikes
and other small personal mobility vehicles. As per the report in 2017, the bikes related
emergency department visits topped, 59 visits per 1 million miles cycled. Based on the
data gathered solely by Bird scooter riders, 38 injuries per 1 million miles were reported by
the company. Out of the 38 injuries suffered by the Bird scooter riders, 27 of these 38 were
around a university or an education campus. According to the book published byFederal
Bureau of Investigation (FBI) on Campus Attacks [7], every 2 out of 5 attacks on campus,
the suspect’s face was covered and was difficult to recognize from the surveillance cameras.
In the past year, coronavirus masks have become a boon for crooks who hide their faces
previously using bandanas. These are only one of the few safety issues students face during
the pandemic due to the lack of masked face recognition systems. The system discussed in
this thesis provide some novel methods which use computer vision technology and IoWT
to tackle these safety-related problems.
Figure 1.1: What discouraged students and locals from using an e-scooter? [8]
1.3 Contributions
The contributions of this thesis are as follows:
• Data Collection and Processing: The first step in implementation of the systems
3
is collecting data and processing. The data for the Smart-scooter rider assistance
system was collected in two parts.
1. First, the data collected is used for accessing the balance of the rider and the
second data collected is used for the pothole detection system. To collect data
for the balancing system, we used Wearable Gait Lab (WGL) [1]. The main
component of WGL that we used in this thesis is the Smart Insole System. The
Smart Insole System consists of 96 pressure sensors uniformly distributed on
the pressure sensor array, which ensures a high spatial resolution for plantar
pressure measurement. For details of the design method and mechanism of the
pressure sensor array, please refer to the former research [9]. The insole system
was put in the shoe of the rider and then pressure sensors data is collected
using an Android based smartphone device. For pothole detection, we used the
Pothole-600 dataset [10]. The data provided was collected using a Zen stereo
camera.
2. The data for the Masked face recognition system was also collected in two
parts. The first data is open-datasets available online and the second dataset is
generated using the user’s faces. In this project, the two open datasets we used
were provided by the National Natural Science Foundation of China, Wuhan
University [11]. The two open datasets are as follow:
(a) Masked Face Detection Dataset (MFDD): The dataset contains 24,77 im-
ages of people wearing masks. The definition of the face position in the
MAFA dataset is quite different from general face datasets. The MAFA
face frame is square near the eyebrows, and the labeling frame is not
strict (the frame has a gap from the edge of the face), while in the nor-
mal datasets, the face frame is above the forehead of the person. Figures
Figure 1.2 show the raw images from the MFDD dataset.
4
(a) (b)
Figure 1.2: Raw Images from the MAFA dataset.
Figure 1.3: Raw image from RMFRD
(b) Real-world Masked Face Recognition Dataset (RMFRD): The dataset is a
collection of 5,000 pictures out of which 525 people are wearing masks,
and 90,000 images of the 525 people without masks. Figure Figure 1.3
shows the raw image from the RMFRD dataset.
The second dataset is generated by prompting the user to enter his/her name
and then automatically taking pictures using the OpenCV library for 1 minute
continuously. For the first 30 seconds, the user is advised to take pictures with-
out a mask and the later 30 seconds with a mask. The pictures are then cropped
by using a previously trained Hassar Cascade classifier to detect and crop the
area around the faces in the picture. Figure Figure 1.4 shows the pictures taken
and processed using OpenCV.
5
(a) (b) (c)
Figure 1.4: Pictures taken automatically using OpenCV for training
(a) (b) (c)
Figure 1.5: Extracted pictures from the automatic pictures taken using OpenCV
• Provide a unique smart-scooter rider assistance system.
• Provide a self-learning real-time Masked Face recognition system.
• Provide a comparison of performance of the masked face recognition system achieved
by multiprocessing and Compute Unified Device Architecture (CUDA) implementa-
tions.
1.4 Thesis Organisation
The thesis has been organized in the following chapters:
• Chapter 1: Introduction and the research objective behind this study.
• Chapter 2: Presents an overview of the smart-scooter rider assistance system, STEADi.
• Chapter 3: Detailed Literature review and related approaches for balancing systems,
face recognition system and an upcoming rider assistance system.
• Chapter 4: Presents the methodology used and implementation for the proposed
smart-scooter rider assistance system.
6
• Chapter 5: Provides the experiments conducted and their results to prove the work-
ing of the proposed system. It also focuses on the discussion of the analysis carried
out and its conclusion.
• Chapter 6: Draws the Conclusion from the provided results and presents potential
future work.
7
CHAPTER 2
BACKGROUND OF THE SMART SCOOTER RIDER ASSISTANCE SYSTEM
2.1 Introduction
The ”rider/driver assistance system” are most commonly known as Advanced Driver Assis-
tance Systems (ADAS) and was first derived from the field of automobiles. The definition
of these ADAS differ based on criteria for classification. In the beginning it was described
as a system that supports a driver/rider for example, remote-starter in the cars. With the pas-
sage of time and everyday technological advancement, these systems started getting more
and more complex and reliable, from the anti-braking systems to the navigation systems,
they were mainly designed to render help to the driver. The significance of an advanced
human/computer interaction system like a facial recognition system is critical to the false
alarm rate, the accuracy of the system and, the computational cost it incurs to process an
image. Since the first commercial facial recognition applications released, efficiency and
speed have become the most crucial part of these applications. [12].
With a growing number of people infected with Corona-virus cases around the world,
it has become difficult mainly for students using devices like Apple’s iPad from taking
notes to going through slides. Many universities around the world are going to follow the
hybrid teaching structure in which a student can select if he/she wants to take the class in-
person or online. With the new smartphones having mainly face recognition as the primary
unlocking system and the urge of students to use their mobile every 5 seconds there is a
need for a facial recognition system that can detect faces covered by masks and unlocks the
smartphones.
This chapter describes the tools, libraries, and technologies used while developing the
Smart Scooter rider assistance system. First, the hardware and software requirements used
8
for developing and testing the system. This is followed by the explanation of technologies
used to develop each mode of the system with the background of tools and libraries used.
2.2 Hardware and Software Requirements
The following requirements were chosen as the basis of our system:
• An android based smartphone with an inbuilt accelerometer and gyroscope, and a
functional camera that outputs an image with a resolution higher than 1440x1080.
We recommend any smartphone with a minimum Qualcomm Adreno 420 (600 MHz)
GPU and quad-core CPU (2.7 GHz vs. 2.5 GHz S801).
• A Smart Insole system to measure and record the gait parameters.
• A smart scooter to ride on.
2.3 Wearable Gait Lab
Wearable Gait Lab, is a wearable Gait system developed by SAIL lab at CWRU, which uses
a force platform with the capability of measuring ground-reaction forces when worn under
a person’s foot. The main component of WGL that we used in this project is the Smart
Insole System. Smart Insole is an important system for realizing “Gait analysis”. Up to 96
pressure sensors were uniformly distributed on the pressure sensor array, which ensures a
high spatial resolution for plantar pressure measurement. For details of the design method
and mechanism of the pressure sensor array, please refer to the former research [9]. Figure
Figure 2.1 (a) shows a circuit board for signal acquisition and data wireless transmission.
A flexible Printed Circuit (FPC) connector is used to connect the pressure sensor array
for pressure signal acquisition. The IMU sensor, including accelerometer and gyroscope,
is used to measure the foot motion. A Micro-controller Unit (MCU) is used to control
the process of signal acquisition and data transmission. The sample rate for sensor data
9
Figure 2.1: Hardware and Software of Smart Insole System. (a) 3D bifurcation of the Insole
System with Assembly structure (b) Insole Foot Pressure measurement for fore, mid, and
hind section. The highest pressure in that area during a stride is indicated by the black dot
in each area. The Red dashed line indicates US-sized insole used in this research. The
Graph shows the measured GRF of each area during a gait cycle.
acquisition is 30 Hz. A wireless module (classic Bluetooth) is used to transfer the acquired
sensor data to a smartphone app for further processing. [1]
Considering the fact that people prefer sensors embedded into their clothing or acces-
sories than wearing a technology separately [13], all the hardware systems were packed into
an insole shaped package which makes the use of Smart Insole similar to normal insoles.
2.4 Android OS and Programming
The Android Operating System (OS) is based on the Linus OS and is a very popular and
common computing platform. The first commercial version of the Android system came
in 2008, back when everyone was using a flip phone and Blackberry was the biggest thing
in the mobile phone industry. It first made its appearance in the form of a mobile platform.
The Android platform is the hard work of Open Handset Alliance (OHA), which is an
organization with a mission to collaborate to ”create a better mobile phone”, which started
10
Figure 2.2: Basic Android layers.
from the first Android phone, G1 manufactures by HTC. Figure 2.2 shows the basic view
of the Android layers.
Historically, Android applications have been written in the JAVA programming lan-
guage. The main logic of the application, bytecode is generated after formatting and com-
piling the JAVA source code and is converted into an executable code on the device during
run-time. This approach is known as Ahead of Time (AOH). Figure 2.3 shows the execution
environment of the Android applications.
Figure 2.3: Execution environment of the Android applications.
The Android application is developed and deployed on a device with a file which is
essential for every application. The file is named AndroidManifest.xml. From types of
events the application can process to permissions required by it, it tells the OS how to
interact with the application.
11
2.5 Android Studio
Android studio is the easiest way to get started with the development of Android appli-
cations. It can be easily downloaded for any operating system (Mac OS, Windows OS, or
Linux). The studio provided an efficient and user-friendly JAVA environment for deploying
and developing applications with the ability to test either on a simulator or a real device.
For more information about how to get started and build your own Android applications,
you can visit the official Android studio page [14].
2.6 Smart Scooter
The easiest way to describe a scooter is a bike without a pedal, seat, and chain. The needed
momentum required by a common scooter can be applied by pushing the ground backward
but a Smart scooter uses a battery and a motor to maneuver the scooter after the initial
push. The center of pressure for the smart scooter extends between the front and the back,
always going along a line ignoring the wind resistance The main challenge with a scooter
is balancing and falling left and right.
Smart scooters are an up-and-coming new mobility service adding to the small yet
vast category of bike share and car share. With the use of app-based technology and the
smart-scooter, the service provides a simple yet intelligent ability to rent the scooter for the
short term. To rent a scooter, the rider first unlocks a smart scooter through the company’s
smartphone app by scanning the bar code provided on the handle of the scooter or for those
without smartphones, a call, or text service to unlock. The ride can end the trip by parking
the scooter on the sidewalk. The parked scooter should be close to the curb and out of
the pedestrian travel zone. Sometimes the riders are required to confirm they have parked
the smart scooter correctly by submitting a photo through the company’s app to end their
rental process. Smart scooters are powered almost exclusively by an electric motor, after
an initial push to start the device. By the mid of 2018 multiple new companies entered the
12
market and introduced new Smart scooter models.
2.7 OpenCV
Open Source Computer Vision (OpenCV) is one of the most dependable and widely used
open source library for computer vision which can be used with wide variety of program-
ming languages and platforms (C++, Python, Android, Java, Kotlin, etc.). It was first
created at Intel by Gary Bradsky in 1999 and realsed in 2000 [15]. It provides various
functions like read and write images, display images, object detection, edge detection,etc.
The official Android port for the OpenCV library is OpenCV4Android [16]. The an-
droid support came in a limited version ”alpha” in 2010 with OpenCV 2.2 [], which was
later joined by NVIDIA and released a beta support version for Android OpenCV 2.3.1
[17]. OpenCV 2.4 was the first official version that was supported on Android. But now
the most stable and latest method to code OpenCV application to run on Android is using
the OpenCV Java API. In case of the OpenCV Java API, each function is wrapped in a
Java interface, whereas the OpenCV functions are written and compiled in C++. But there
was performance overhead. With the development of cross programming language com-
patibility, the OpenCV was directly called using the Java functions where OpenCV code
was written in C++. In this method, you first code, develop and test the OpenCV imple-
mentation on the host platform, C++. it is later rebuild into the Android environment using
Android tools.
2.8 Multi-Task Cascaded Convolutional Neural Network (MTCNN)
One of the widely used and most accurate face detection system is MTCNN. This thesis
uses a multi-task system which performs face feature point detection and face detection
simultaneously. The MTCNN framework uses a similar structure as that of Viola-Jones
which is based on a cascade of three CNNs. The network structure of the MTCNN algo-
rithm is shown in Figure Figure 2.4 and it consists of three stages:
13
1. Proposal Network(P-Net): A CNN used to generate bounding box regression vec-
tors and candidate windows. The candidate windows are calibrated using these esti-
mated bounding boxes and then the highly overlapped boxes are merged using Non-
Maximum Suppression (NMS) [18]. The NMS is used to eliminate the overlapping
candidate windows by removing the the windows with smaller probabilities with the
larger probability windows and delete them.
2. Refine Network (R-Net or N-Net): P-Net is feed to another CNN called refine net-
work which performs calibration and rejects all the false candidate windows and this
process is fed to the P-network.
3. Output-Network (O-Net): This network is used to find the output frame and the 5
feature points.
One of the main problems of the above-mentioned CNN is that the kernels are not differ-
entiable, hence limiting the discerning abilities of the CNN. So, all 3*3 kernels were used
to reduce parameters with an activation function that uses Parametric ReLU (PReLU) [19].
ReLU was introduced since sigmoid activation functions were having vanishing gradient
problem. But, ReLU was zeroing out the negative input functions, so another activation
function was introduced to handle this problem and increase further accuracy of ReLU.
LeakyReLU was introduced, which instead of zeroing out, multiplies the negative inputs
by a small number. But, this change does not show a considerable amount of increase
in accuracy of the models. Hence, PReLU was introduced, which used backpropagation
to learn that small value( slope parameter) to adapt better to parameters like biases and
weights. The number of slope parameter to be learned is equal to the number of layers in a
feed-forward network. The formula for the PReLU can be written as:
f(yi) = max(0, yi) + aimin(0, yi)
,
14
Figure 2.4: Network structure of MTCNN that includes three-stage multi-task deep convo-
lution networks.
• f(yi) is a ReLU function, if ai = 0
• f(yi) is a LeakyReLU function, if ai > 0
• f(yi) is a PReLU function, if ai is a parameter than can be laerned.
The network is a multi-task cascading network, which consists of three tasks each as-
signed to the above mentioned three stages.
1. Face/No Face Binary classification: In this task, the network classify if the object in
question is a face or not. This is handled using Logistic Regression, where loss is
15
given by:
Ldet
i = −(ydet
i log(pi) + (1 − ydet
i )(1 − log(pi))
where, the real label is represented by ydet
i .
2. Regression of Frames: After finding the face, the second task is to put that face in a
frame. This is a regression problem and the Euclidean distance loss is given by:
Lbox
i = ||ŷbox
i − ybox
i ||2
2
3. Feature Point Calibration: This is also a regression task, which find and marks points
on the face features like right and left eyes, right and left side of the lips and nose.
The Euclidean distance loss is given by:
Lmark
i = ||ŷmark
i − ymark
i ||2
2
The three losses mentioned above are combined using a weight problem and is given as:
min
N
X
i=1
X
j∈(det,box,mark)
αjβj
i Lj
i
where, β is the used to determine is loss is used and α is the weight difference between
loss. Table 2.1 shows the values of losses at each stage.
Stage α β Loss
P-Net 1 0.5 0.5
R-Net 1 0.5 0.5
O-Net 1 0.5 1
Table 2.1: The value of losses at each stage of the MTCNN
16
In this theise, we choose to use MTCNN due to its speed, accuracy and efficiency of
the model.
2.9 FaceNet
FaceNet is a deep learning approach produced by Google which is used to extract fea-
tures from an image [20]. It is a one-shot learning model which uses an end-end learning
method instead of traditional softmax classification methods. In this network, the softmax
function is removed and L2 normalization is used to calculate the feature representation
loss. FaceNet output a vector of 128 numbers which represents the features of an image.
These features have used as input by the face recognition system. Figure 2.5 shows that
FaceNet take image as an input ans return a 128 number vector output. This vector is know
as embedding and same faces have similar embedding.
Figure 2.5: FaceNet take image as an input ans return a 128 number vector output
Since these embedding are in 128 dimensional space, they can be assumed them in a
2D plane and then can interpret in a Cartesian coordinate system. This mean the images
can be plotted in a coordinate system using their embedding. FaceNet works by calculating
a vector embedding for an unseen image and then calculating the distances between the
unseen image and the images in the dataset or the known person by the network. Let say
the embedding distance is close enough with that of person ”XYZ”, it can be said that the
17
unseen image is of the person ”XYZ”. The FaceNet works in the following steps:
1. Generates random vector embedding for each image in the dataset, meaning each
image will be randomly plotted on a 2D coordinate system.
2. Selects a person image randomly, called a pillar.
3. Randomly selects another image of the pillar image, meaning positive examples.
4. Randomly selects another image which is not of the pillar image, meaning negative
examples.
5. The FaceNet parameters are adjusted so that the positive examples are closer to the
pillar than the negative examples.
6. The whole process from 2 to 5 is repeated until no further adjustments are needed.
The process defined above is know as triplet loss which consists of triplets (pillar, positive
and negative examples). Then a objective function is used to optimize the triplets that do
not meet the requirements and all others are just passed. Figure 2.6 shows the hypothetical
version of Step 1 and Step 6.
Figure 2.6: Triplets when at Step 1 and Triplets after FaceNet at Step 6.
18
2.10 MobileV2Net
MobileNetV2 is developed by Google and is based on an idea similar to its predecessor
MobileNetV1. MobileNetV1 [21] is used to find the convolution layer that are expensive
to compute and the replace then with a new layer named as deepthwise separable convo-
lution layer. In normal convolution, one filter is applied to every input channel for every
output channel for a layer, whereas in depthwise separable convolution is divided into two
parts: depthwise convolution, which applies, K, the number of filters to each input channel
and a pointwise convolution, which uses a 1x1 convolution to produce the required output.
Figure Figure 2.7 shows the working of the depthwise separable convolution. The first
layer in the MobileNetV1 is a 3*3 convolution followed by 13 times deepthwise separable
convolution layer. These layers are followed by batch normalization with activation func-
tion ReLU6. This version of ReLU prevents the over expansion of the activations. This is
further followed by an average pooling layer and then a classification layer (1x1 convolu-
tion) with softmax. MobileNet are usually 9 times more robust and less work than normal
CNN with same accuracy.
Figure 2.7: Depthwise convolution, uses 3 kernels to transform a 12x12x3 image to a
8x8x3 image and, Pointwise convolution, transforms an image of 3 channels to an image
of 1 channel [22]
MobileNetV2 is the latest version of MobileNetV1, the same convolution layer archi-
tecture with deepthwise separable convolution layer. In MobileNetV2, deepthwise separa-
ble convolution layer block, have three layer instead of two as in V1. The new block looks
as in Figure 2.8. The first new layer is a 1x1 convolution layer, whose main purpose is to
19
Figure 2.8: The main building block in MobileNetV2
20
Figure 2.9: The compression and decompression inside a building block of the Mo-
bileNetV2
increase the number of channels in the input data, and defined as the ”expansion layer”.
The last two layers are same: a depthwise convolution for filtering the inputs,then a 1×1
pointwise convolution layer. In V2, the pointwise layer performs different function that it
used to in V1. In V1, it used to either double the number of channels or keep them same,
but in V2, it tries to decrease the number of channels by projecting a high dimensional
channel into a small dimensional tensor and hence, it is know as the ”projection layer”.
The first layer in the MobileNetV2 is a 3*3 convolution layer instead of the new 1x1
expansion layer which is followed by 16 times the main building block defined above.
These layers are followed by batch normalization with activation function ReLU6. This
version of ReLU prevents the over expansion of the activations. This is further followed by
an average pooling layer and then a classification layer (1x1 convolution) with softmax.
The main idea behind V2 is to reduce the expensive computation convolution layer
with cheaper ones. The trick behind is to decompress and the compress the data as it
follows through the main building block with trainable parameters than can best od this
job. Figure 2.9 shows how this decompression and compression works in a building block.
The main job of the MobileNetV2 [4] in the system proposed in this thesis is to detection
if a person is wearing mask or not.
21
2.11 Support Vector Classification
A face recognition system can be classified as a multi-class classification problem. The
process involves defining several classes and finding which class the corresponding subject
falls. A basic face recognition system takes an input face image, extracts the facial features
which are then used to compare to the features of labeled facial data in a dataset. A feature
similarity metric is used for comparing the labels and the most analogous dataset entry is
used to label the input face image.
A one-against-all strategy is used to train a multi-class Support vector machine(SVM)
with k SVM classifiers, where k is the number of subjects need to be recognized. A multi-
class SVM classification is known as Support Vector Classification(SVC)[5]. Given a ker-
nel function,
K(a, b) = φ(a)φ̇(b),
the SVC can be used to find the optimal approximation by maximizing giving the separating
function f(x),
f(x) =
n
X
i=1
yiαiK(x, xi) + b
Let assume k is the number of faces are to be recognized, where n is the number of
training faces and, ni is the number of training faces of the person i.
n =
k
X
i=1
ni
The k SVM classifiers, ak can be build, where for each aj classifier, j = 1, 2, ...., k, the nj
examples are positive and all other
Pk
i=1,i6=j ni examples are negative. After training the
separating function f(x), we obtained k classifiers,
aj =
n
X
i=1
yiαjiK(xi, x) + bj
22
Each of the k classifiers generate its own outputs ai. An assumption was that the test image
should belong to one of the class to be recognized and hence, the recognition returns:
id = argmaxk
i=1ci
where id is the identification number of the recognized face.
2.12 Implementation of the Masked face recognition system
Figure 2.10 shows our proposed system for Masked face recognition (MFR). For MFR, the
system first asks the name of the user and then automatically takes pictures of the user for
2 minutes. The system asks the user to sit still and prompts the user to wear a mask after 1
minute. The system then uses the Face Mask detection algorithm which uses the MTCNN
which is an object detection algorithm, to detect if the images of the user are with a mask or
not. If images are without masks, the system creates a new image by superimposing a face
mask on the images. During the training phase, first, the multi-class classifier was trained
on the MFDD and RMFRD datasets and then on the user’s face dataset. The recognition
is done by comparing the selected features from the target image against selected features
from the corresponding template images in the user face dataset.
2.13 Multiprocessing
Neural networks can be efficiently used to solve problems where the difficulty of mathemat-
ical modeling of the problem in-creases. With the increase in size, there is an exponential
increase in the time required to evaluate or train. Many efforts have been made to reduce
training time taken by neural networks. One of the solutions to reduce the training or the
evaluation time is using the parallelization capacities of multicore/multithreaded CPUs or
of graphic processing units (GPUs). Two software environments, CUDA, and multipro-
cessing have established themselves as quasi-standard for GPU and multi-core systems
23
Figure 2.10: Flowchart of Masked Face Recognition Process using SVC.
24
Figure 2.11: Basic structure of the face recognition program.
respectively.
Multiprocessing loosely refers to using more than one CPU unit in a computer system.
The face recognition program runs as shown in the flow chart in Figure 2.11. In our system,
each process is tasked to find the potential candidate output. The system is divided into two
main processes. First, if the system is to be used to recognize the Masked face in a picture
with multiple faces, then each face detected is assigned to a process that returns the id
number of the person detected. The maximum number of processes created depends on
the number of cores of the CPU. If there are 4 cores in a CPU and each process is single-
threaded, then the number of processes that can be spawned is 4. If the total number of
faces detected in a photo is 6 then in a 4-core processor, each process will process 1 face
25
and the rest 2 will be queued for any of the processes to be completed. If there is only one
person to be detected then the number of processes created will be 1 and the time is taken by
this method to recognize will be the same as without multiprocessing. The corresponding
version after implementing multiprocessing is shown in Figure 2.12.
Second, if the system is used for real-time facial recognition or in a video, each process
is used to process number−of−frames
number−of−processes
per second. If 4 processes are used and the total
number of frames in the video each second is 40, then the number of frames to be processed
by each process is 10 frames, which are executed in parallel.
2.14 Graphical Processing Units and CUDA
Initially, GPU’s were only used for their graphics display capabilities. At that time, the
program would have to be converted to use either OpenGL or Direct 3D, which made it
difficult for someone to use a GPU for normal calculations. Nvidia released CUDA as
a potential alternative use for Nvidia’s devices, [23]. It is basically a C++ programming
language extension for programming the graphics processing units provided by Nvidia
[24]. It provides preprogramed operations for massively parallel operations using several
libraries and a direct control over the graphical processing units(GPU).
26
Figure 2.12: A multiprocessing optimization of the Masked Face Recognition system.
27
Figure 2.13: Memory model of CUDA Programming [23]
The CUDA implementation, even though being simple in design than the multipro-
cess version, was significantly more complex. One of the many considerations involving
processing code on a GPU is the data that is stored in system RAM can not be directly
accessible by the GPU. So, all the data should be transferred to some kind of GPU memory
types when launched.
Figure Figure 2.13 shows a memory model of CUDA with a parallel computation
block called grid which is launched from the host, where the CPU is the host and GPU
is the device. Each device is 3 dimensional with x,y,z coordinates and can be launched
in either one device kernel1 or kernel 2. Each grid contains a thread box in 3 dimensions
referred to as thread Index and Block Index and they are unique for each thread. Each
thread computes its own face to find whether the face matches the dataset present (as shown
in Figure Figure 2.14). Once the face is found the output is returned and another face is
28
Figure 2.14: Complete overview of blocks using CUDA for face recognition system
processed for recognition instantly. The proposed system is shown in Figure 2.15
2.15 Conclusion
The main aim of this chapter is to make familiar with the tools and technologies used for
this system. We first looked at the hardware and software requirements for the system.
This chapter then further explains the use and a brief description of the WGL and a various
models behind the implementation of the masked face recognition system. This helped us
better understand the technologies and tools currently available to develop solutions for
issue in question.
29
Figure 2.15: Proposed system for CUDA based Masked Face recognition.
30
CHAPTER 3
RELATED WORK FOR THE SMART SCOOTER RIDER ASSISTANCE SYSTEM
3.1 Introduction
The progress in technology has given rise to new techniques and devices that allow an un-
biased evaluation of balance parameters and many advanced human/computer interactions
system, hence providing researchers with reliable information on a rider’s ability to balance
a personal mobility vehicle as well as providing with various authorization methods. This
chapter provides an in-depth review of some of the related works.
3.2 Related Work
One of the new techniques to measure balance and pathology parameters is the use of ul-
trasonic sender and receivers. Ultrasonic sensors have been used to measure short step and
stride length and the distance between feet [wahab2011Gait]. Even though the technique
can be used massively to analyze the balancing of a walking person, the technique cannot
be used to evaluate the balancing of a person while riding a smart scooter due to the fixed
positions of the feet of the rider on the smart scooter.
Stephen M. Cain, James A. Ashton-Miller, and Noel C. Perkins studied the physics
behind bike balancing. They conducted experiments indoors that utilize training rollers
mounted on a force platform (OR6-5-2000, AMTI), an instrumented bicycle [25], and a
motion capture system (Optotrak 3020, NDI) to measure the balancing dynamics of bicycle.
The bicycle [25] was equipped with sensors that measured steer torque, angle, bicycle
frame roll rate, and speed. The experiments conducted by them showed that the best method
to compute the balance of the ride is the interrelation between pressure distribution and
center of mass. Hence, we used the smart insole to calculate the pressure distribution of the
31
smart scooter rider.
The balancing of the smart scooter is not the only problem faced by the riders. The
unknown road conditions like potholes, the sudden change in the terrain, etc. are also some
major problems. There are various pothole sensing systems. One of the most known is
the PotHole patrol system [26], developed by the Massachusetts Institute of Technology.
The system uses a GPS attached Linux-powered Soekris 4801 embedded computers with
external accelerometers (sampling rate of 380Hz). The algorithm uses a simple machine
learning approach that takes the smart scooter speed and X and Z axis acceleration into
consideration to filter the potholes vs the non-potholes-related events like a railroad cross-
ing.
National Taiwan University [27] developed a system that uses HTC Diamond, a motorcycle-
based smartphone as a hardware platform. The smartphone comes with an external GPS
and a built-in accelerometer with a sampling rate < 24Hz. The system uses a supervised
and unsupervised machine learning approach for pothole detection and is divided into two
tasks systems. The server-side task uses a smooth-road model and an SVM(support vector
machine) whereas the client-side perform feature extraction, filtering, and segmentation.
Although the above-proposed systems for pothole detection show good performance,
they were not suitable to be implemented on a device with limited storage, processing
power, and software resources. Our system for measuring the balance of the rider as well
as the pothole detection distinct from the prior work as:
1. It only concentrates on the potholes as a single event, hence better utilization of the
hardware and software resources.
2. The balancing algorithm uses both the pressure distribution as well as the orientation
of the smart scooter using the smartphone device to examine the balance of the rider.
Shivashankar J. Bhutekar1 et al. proposed a modern GPU-based Facial recognition
system [28]. The researchers implemented the algorithm around the Viola and Jones [29]
32
with Adaboost framework for facial detection and Eigenfaces [30] for facial recognition.
The paper showed that if the video frame contains more than one person’s face the CPU
takes more time than the GPU to computer and recognize the face. CPU waits for all the
faces to detect in a frame while the parallel processing system as soon a face is detected, is
forwarded to the recognition system.
The MLPs were trained by Ting He et al. [31] using basic GPU capabilities and CUDA
functions which resulted in the performance acceleration by 5.31 times as compared to CPU
by doing vector operations and, the matrix multiplications on GPU. Antonino Tumeo1 et al.
researched a reconfigurable multiprocessing system for facial recognition which shows that
the parallelized face recognition application is 63% faster than a single processor solution.
An Israeli company, Corsight, developed a camera-based application to identify people
in real-time. The application mainly focuses on identifying people when half their face is
covered, even in low light conditions [32].
Due to the absence of large training datasets as well as ground truth testing datasets,
it has been difficult to train the new Masked Face Recognition(MFR) models. Mengyue
Geng et al. contributed the MFRD dataset which consists of 9,742 mask region segmen-
tation annotated masked face images. The dataset is trained using a center-based cross-
domain ranking strategy, Domain Constrained Ranking (DCR) loss [33]. Another method
proposed by Walid Hariri discards the masked region while focusing on extracting fea-
tures from the region above the mask (mostly eyes and forehead) using VGG-16 [34]. The
method is further trained on a Multilayer perceptron to achieve 91.3% accuracy.
Some of the methods work on restoration of the faces. A system is proposed by Bagchi
et al. [35] to restore facial features. the threshold depth map values are used to detect the
missing regions of the 3D images and Principal Component Analysis (PCA) is used for the
restoration of the facial features. Similarly, a statistical shape model which can restore the
partial facial curves is applied by Drira et al. [36].The missing regions were removed using
an Iterative closest point (ICP) algorithm [37].
33
The smart scooter company, Spin, recently announced their advanced driver-assistance
systems (ADAS) which will be tested in the second half of 2021 in New York City. Spin
Insight [38] has two levels, Level 1 & Level 2. Spin Insight Level 2 is the main ADAS
system which is powered by Drover AI’s computer vision and machine learning platform.
In addition to that Spin’s scooters will be equipped with a camera,on-board computing
power, and sensors to detect sidewalk and bike lane riding collision alerts. The smart-
scooter assistance system proposed in this paper is very different from the Spin Insight
ADAS. The smart-scooter assistance system can be used while riding any smart-scooter
inconsequential to the brand of the scooter. Both the systems focus on the proper riding of
the smart scooters. The smart-scooter assistance system provides benefits like rider balance
check, pothole detection, and path-tracking whereas the Spin Insight provides collision
alerts and parking alerts.
3.3 Conclusion
This chapter was crucial to show that there are no similar system currently in use. For
bigger personal transport vehicles like cars, rider-assistance systems are available. The
systems consists of technologies but not limited to, potholes detection while driving, nav-
igation systems, and even a lane departure warning (LDW) system which is an advanced
safety technology that alerts drivers when they unknowingly stray out of the lanes. But
no such system is currently in use on roads for the personal mobility vehicles like smart
scooters, bikes and skateboards. We also looked at a upcoming similar system by the smart
scooter company SPIN. In Chapter 4 , we will be looking closely at the implementation of
the system and the logic behind the balancing algorithm.
34
CHAPTER 4
IMPLEMENTATION OF STEADI APPLICATION
4.1 Introduction
The smart scooter rider assistance system is divided into three parts: the masked face au-
thentication system, the android application, STEADi, and, the WGL. STEADi is an an-
droid application used with a Smart Insole system to assess student safety conditions on
campus while riding a smart scooter. Previously, researchers have attempted to use smart-
phones to measure Gait symmetry, however, STEADi expands upon this use case by in-
corporating biofeedback training and novel assessment strategies that can be used in the
context of smart scooters and road safety.
The app is divided into three sections in the main window:
1. Balancing mode, the screen on the application turns the shades of ”RED” or ”GREEN”
based on the orientation and the calculated balance of the rider.
2. Tracking mode, lets the rider track his/her/their path while riding the scooter. The
module can be used to pin a location on the map or to track the path for future records.
3. Potholes detection mode is used to detect the potholes on the road and alert the user
beforehand by vibrating the phone and making a sound, loud enough for the rider to
hear but not loud enough to disturb neighbors.
With the balancing mode, the user will see a screen that turns shades of red based on
the error calculated by the balancing algorithm. In the tracking section, the user can see the
path, he/she has been traveling while using the scooter and in the potholes detection mode
(PotDetect), the user can detect if there is a pothole on the road and the app automatically
35
Figure 4.1: STEADi application flow-chart showcasing the workflow of how the applica-
tion checks for stability
tag that place where the PotDetect detected the pothole for future reference. Figure 4.1
shows a flow-chart of how to use the STEADi app.
4.2 Method to calculate Balance of the rider
The Smart Insole data is recorded, normalized, and used with the orientation sensors data
of the android smartphone to calculate a balancing score.
1. A Boolean Cumulative pressure points sum of the 96 pressure sensors was summed,
and averaged.
PAvgt =
Pn
i pi
n
where n is the number of pressure point sensors and pi is the recorded pressure for
that particular point sensor, Pavgt is the average total recorded pressure of all the
sensors at a time, t.
36
2. A comparison is made between every 5 seconds, PAvg(t − 5), PAvg(t) the Boolean
Cumulative pressure points sum.
3. If the average pressure has increased or decreased on the insole in the next 5 seconds,
a boolean balance variable,”ifbal”, is recorded. The ”ifbal” is TRUE if the pressure
remains the same every 5 seconds and if there is a change in overall pressure the
”ifbal” variable is changed to FALSE. If PAvg(t−5) > PAvg(t), the average pressure
decreased, which means the pressure on the insole is not evenly distributed.
Considering wavy rides or riders doing some tricks, we took roll (the rotation of the
smart phone about the positive Z-axis towards the positive X-axis) and azimuth (the angle
between the magnetic north and the positive Y-axis and its range is [0,360] degrees) into
consideration. The imbalance range of the roll is [−∞, −1.0] [1.0, ∞] and the range of
azimuth is[−∞, −0.75] [0.75, ∞].
If the ”ifbal” variable changes value every 5 seconds and the recorded values of the
roll and azimuth are in the imbalance range then we can say that the rider is not balancing
properly and the screen section of the ”Balancing” will change to RED. Figure 4.2 shows
the code behind the balancing algorithm.
Figure Figure 4.3 shows the screens of the STEADi application, when the rider is bal-
anced and when unbalanced.
37
Figure 4.2: Code for the balancing algorithm
38
(a) (b)
Figure 4.3: (a) When the STEADi shows that rider is balanced (b) When the STEADi
shows that rider is not balanced
4.3 Path Tracking
The path tracking mode allows the user to track its path while riding the Smart-scooters.
In this section we use Google Maps API for Android. The API handles access to the
Map’s server, map display, data downloading, and response to map gestures. The API
also provides additional information for map locations and allows user interaction with the
map. For more information related to Maps SDK please refer to [39]. For each session,
the system automatically prints out the route user has traveled. This feature require the
following development modules:
1. Requesting the proper permissions: Due to privacy concern, location based apps are
39
required to request location permission. For our application we need to request the
background and foreground location permissions. Figure 4.4 shows the code to add
in the AndroidMainfest.xml file to ask user for the location access. Figure 4.5 shows
the settings page which gives various options to the user and grants the background
location access.
Figure 4.4: Code to ask user for the location access.
40
Figure 4.5: Settings page which gives various options to the user and grants the background
location access.
2. Requesting regular location updates: For the path tracking module it was necessary
to get regular updates about the user location. The API returns the most accurate
geographical location of the user based on longitude, latitude, velocity and, altitude
of the device. Figure 4.6 shows the three steps in updating the location: startLoca-
tionUpdates() to start getting the location updates, stopLocationUpdates() to stop the
location updates when the app is closed and, updateTrack() to update the track on the
Google maps in path tracking module.
41
Figure 4.6: Code to update the location of the user.
4.4 PotHoles Detection
For this feature, we utilized a previously implemented OpenCV Library on Android for
loading a deep neural network model that detects potholes, DefectDetect [2]. The model
uses a framework names as YOLO which is a Natural Neural Network. It is used to train
and classify potholes from the live camera feed of the android device. The model returns
four vertices that outline each pothole; we further decorate the user interface and make it
into a more interpretable rectangular box. Structures and some weights of the model we
use come from [40] [41]. The working of the pothole detection system in the STEADi
application can be seen in Figure Figure 4.7.
To maximize its performance on this specific task, we tweaked the model, and re-trained
the final two layers with well-labeled pothole images for 20 epochs. All training images
come from a crawler program provided in OpenCV. During the training process, we use
mean squared error as a loss function.
L(y, y0
) =
1
N
N
X
i=0
(y − y0
i)2
42
where L(y, y0
)is the mean squared loss and y0
is the predicted value.
Figure 4.7: STEADi application detecting potholes on the road
The following steps were taken to use OpenCV:
1. Making sure that the OpenCV can load successfully. Figure 4.8 shows that code that
uses a BaseLoaderCallback() to test if OpenCV is loaded correctly.
2. Getting the permission to access the camera on the android device and enabling cam-
era: Similar to getting access to location to use maps, we needed to access the camera
to use the OpenCV and the YOLO to detect potholes. If the user gives the permission
to use the camera, then each frame is send to the DarkNet function to detect and clas-
sify the potholes. The detection module will automatically make the phone vibrate
if there is any pothole detected by the front camera. Figure 4.10 shows the code for
43
Figure 4.8: Making sure if OpenCV loads properly.
accessing the camera.
Figure 4.9: Code for accessing the camera.
Working of YOLO for potholes detection
We used the YOLO framework for pothole detection. One of the main reasons to use this
framework is its speed and efficiency. It can process upto 45 frames per second. It works
by taking the whole image to process, predicts the bounding box coordinates that encloses
an object (in our case potholes) and outputs class probabilities. Yolo framework is based
on a open source Natural Neural Network (NNN) which is written in C and CUDA.
44
Figure 4.10: Working of YOLO framework
4.5 Conclusion
This chapter focuses on the implementation of each section in the proposed Android ap-
plication, STEADi. It first showcase the workflow of the whole system and then explains
the working of the balancing algorithm and the pothole detection system. This chapter also
discuss briefly about the permissions and code behind using each module. The next chapter
focuses on the experiments and results of the ”smart-scooter rider assistance” system.
45
CHAPTER 5
EXPERIMENT, RESULTS, AND DISCUSSION
In this chapter, we discuss in detail various experiments designed and their results. This
is followed by the Discussion of the contributions of the system. The experiments were
performed in two parts. The first part focuses on the Masked face authentication system
and , the second part focuses on the STEADi application.
5.1 Experiment Design for Masked Face authentication
The paralleization was compared at two stages of the Masked face authentication system
development:
1. Image Processing and cropping using OpenCV.
2. The whole Masked face recognition.
Each test was performed using four different resolution images. All the tests were per-
formed using the following specifications:
• Intel i7 5th generation processor with 4-cores.
• NVIDIA GTX 1660 Ti with 1536 cores and 6GB memory
• OpenCV version 4.5.0
• CUDA toolkit version 10.2
• Python 3.5
46
(a) (b)
Figure 5.1: (a) Face Mask detection with wearing mask. (b) Face Mask detection without
wearing mask.
5.2 Results for Masked Face authentication system
Figure Figure 5.1 shows the result of the face mask detection algorithm output using the
webcam feed. The model has a input size of 480x640, 13 convolution layers in the back-
bone network and has 24 layers including location and classification layers. Figure Fig-
ure 5.2 shows the precision-recall graph for the face mask detection system.
The accuracy of the Masked face recognition system with wearing mask is 79.9% de-
pending on the conditions like lighting, angle of the face, etc., whereas the accuracy of the
system without wearing the mask is about 92.34%. Figure Figure 5.3 shows the masked
face recognition using SVC.
5.2.1 Results from Test of Image processing and cropping with different Image Resolution
As show in Table 5.1, the resolution has a vital impact on the time taken to process an
image for all the version. The multiprocessing version scaled exactly as expected, with the
47
Figure 5.2: Precision-Recall curve for the Face mask detection system
(a) (b)
Figure 5.3: (a) Face Mask recognition with wearing a mask. (b) Face Mask recognition
without wearing a mask.
48
Figure 5.4: Time taken to process an image vs the resolution of the image
doubling of number of cores, the run time decreased by fifty percent plus a bit of overhead
due to increase in the number of cores.
After converting the time taken to run into the frames per second, gives about 11.2
frames per second for the original implementation with the image resolution as 480x640,
which is same resolution for any normal security cameras. The 4-core multi-processed
version throughput around 30.8 frames per second which is almost 2.75 times faster than
the original version. Whereas CUDA frames per second recorded for 480x640 resolution
image was around 10 times faster than the original version with 110 frames per second.
The run time seems to increase linearly with number of pixels as can be seen in Figure 5.4.
5.2.2 Results from Test of Masked Face recognition with different Image Resolution
As can be seen for the Table 5.2, for the face recognition system the run-time for different
image resolutions did not differ by much on the two multiprocessing versions. The fastest
49
Image
Resolution
(Pixels)
CPU Time
Multiprocessing
Time with 2-cores
Multiprocessing
Time with 4-cores
CUDA
Time
480 x 640 8.9 5.1 3.24 0.9
960 x 1280 17.34 9.3 4.9 1.25
1280 x 720 23.56 12.12 7.18 1.9
2560 x 1440 58.98 14.78 8.12 3.28
Table 5.1: Time taken to process an image vs the resolution of the image
Image
Resolution
(Pixels)
CPU Time
Multiprocessing
Time with 2-cores
Multiprocessing
Time with 4-cores
CUDA
Time
401 x 218 0.34 0.07 0.04 0.00123
960 x 640 0.83 0.098 0.06 0.004
1280 x 720 1.104 0.34 0.19 0.01
2560 x 1440 1.35 0.78 0.44 0.05
Table 5.2: Time taken to recognize face mask vs the resolution of the image
version is CUDA and it maintains it throughput the testing especially in the high resolution
images. Hence, making the CUDA version a better choice to recognize faces or masks in a
high-resolution image.
5.3 Experiment Design for the STEADi application
In this thesis, SPIN smart scooters were used for the experimental rides to test the STAEDi
application in the below mentioned scenarios.
1. Up and down the slopes: The rider rides the scooter on the road with slopes with the
insole in his/her shoes.
2. On the Grass: The rider rides the scooter on the grass/ muddy road conditions with
the insole in his/her shoes.
3. On the Roads with numerous small potholes: The rider tries to ride on a road with
numerous potholes so that we can collect data as to how the pressure changes under
50
Figure 5.5: Time taken to recognize face mask vs the resolution of the image
this condition.
4. Asked a student who has never ride the scooter to try it the first time with the insole:
It is really necessary to record data of a rider who is very new to this or has never
ridden the scooter before.
5. Trying to take sharp turns: Taking sharp turns can induce imbalance and hence it is
very informative to record such data.
6. Trying to mimic when the rider is about to fall by deliberately leaning more on ei-
ther the left or the right side: It’s not always that a new rider is worst at balancing.
Sometimes a rider with 100 rides experience can make mistakes and fall down, which
made it crucial to record such data.
The riders were asked to perform the experiment with the smart insole system and without
the insole system. During the experiment, all the gait parameters from the smart insole
51
were recorded using a smartphone. The smart insole pressure values were recorded for
each scenario and saved into a CSV(comma-separated values) file.
5.4 Results for the STEADi application
Mainly two experiments were conducted based on different scenarios mentioned in section
3.1. Based on the data recorded from the insole system, it was seen that the cumulative
values of the 96 insole pressure sensors lie within the range of [2,3.2]. When there is very
little pressure exerted on the insole the value recorded is 2 and 3.2 is recorded when the
pressure exerted on the insole is maximum. If a rider is standing on the ground the pressure
recorded was in the range [2,2.5]. But when the rider puts his/her foot down to balance the
scooter the pressure exerted was in the range [2.9,3.2]. This is due to the fact that when a
person is standing on the ground the surface area is more and pressure is distributed equally
between both feet, whereas when a person is trying to balance and puts the foot down, the
surface area on which body is trying to balance is that of one foot and hence more pressure
is exerted. After every experiment, the rider was asked to perform the experiment without
the insole system. The balancing system was not able to differentiate between the rider
taking the turn and the rider balancing. The system perceived that whenever the ride is
taking a turn or tilting based on the road, he/she is not able to balance and hence turning
the ”BALANCING” screen red.
5.4.1 First-time rider of the Smart Scooter
Figure Figure 5.6 shows the pressure distribution of a first-time rider of the smart scooter.
The graph shows that the rider exerted pressure is unstable. An elevation in the graph
indicates that the rider puts his/her foot down because of unbalancing the scooter, whereas
whenever the pressure values are dipped that shows the rider is trying to balance the scooter
by exerting less pressure. The figure also shows various other patterns of the first-time rider.
The data for the first 100 ms indicates that the rider was almost able to balance the scooter.
52
After the 100 ms mark, the rider had some problems balancing and is putting the foot
down the scooter more often due to unbalancing the scooter. The different peaks in the
graph indicate how much pressure the rider put his/her foot on the ground. The more the
pressure, the more chances the rider was about to fall down while balancing the scooter.
Figure 5.6: Insole sensor data when the rider was riding for the first-time.
5.4.2 Riding scooter on different terrains: Up and down the slope and riding on the grass
The max pressure on the insole system can be recorded in two scenarios. First is seen when
the user goes up and down the hill and second, when on the grass. The values collected
on the grass and the slopes were interestingly the same. The possible reason is that due to
not very sturdy ground the body weight pressure on the insole increases. Figure Figure 5.7
represents when the rider goes up and down the slope. The pressure is exerted most as
the user tries to balance more during that time. The sudden changes in the pressure values
suggest potholes on the road. The potholes can be differentiated by the foot-down action
of the rider based on time. Usually, if a rider puts his/her foot down, the minimum amount
53
of time for the rider to put his/her foot back and start riding again is 20ms whereas, for
the potholes, the value of pressure fluctuates every 5 seconds depending on a number of
potholes on the road.
Figure 5.7: Insole sensor data when the rider goes up the hill.
We recorded values when an experienced rider tried riding and performing various ac-
tivities that are mentioned above simultaneously. Figure Figure 5.8 shows the graph for the
above-mentioned scenario.
5.5 Discussion and Conclusion
The accuracy of the balancing algorithm was very difficult to calculate. One of the main
reasons was potholes. Every time a rider encounters potholes on the road the pressure
sensor value fluctuates rapidly, hence rendering the balancing algorithm unusable. The
pothole detection algorithm was tested in real-time using the android smartphone as well
as on a computer on 200 images. The performance metric used was Recall and Precision,
the method obtained scores of 82.56% average precision and 84.12% recall on the computer
whereas the method obtained scores of 64.56% average precision and 69.12% recall on the
54
Figure 5.8: Insole sensor data when the rider tried every experiment simultaneously.
smartphone. The system reached the processing speed of 0.031s (31 FPS) as compared to
when deployed on a smartphone the processing speed decreased to 0.016s(16 FPS), due to
a larger reduction in model size and computation complexity.
For the Masked face authentication, even though using a 4-core version didn’t disap-
point, theoretically a 16-core CPU would be able to keep up with the CUDA version. But it
is more likely for a person to have a CUDA capable Graphical Processing Units (GPU) than
having an very very high end 16-core CPU. The average consumer has a 4-core processor,
so it was most reasonable to compare it with CUDA.
There are both disadvantages and advantages of using multiprocessing and CUDA sup-
ported systems. Due to continuous improvements in CUDA and its features, support can
very widely from one CUDA version to another which can also depend on different GPU
generation. The biggest advantage of using multiprocessing is will run on all processors
regardless of the generation. CUDA version is faster in all cases especially when dealing
with high resolution images. Whereas, the multiprocessing system still processes signifi-
cantly faster than the normal CPU version and also did not require any additional hardware
55
to run like CUDA. As simple as that, every system contains a CPU but not every system
contains a CUDA capable GPU.
56
CHAPTER 6
CONCLUSION AND FUTURE WORK
6.1 Conclusion
This thesis presents an android based smart-scooter rider assistance system that consists of
four main modules: mobile application, cameras, Insole sensors, and facial authentication
system. These modules allow the system to recognize rider or scooter balancing behavior
and producing alerts and warnings when dangerous situations like imbalance and potholes
are detected. We also detailed appropriate ontology for the rider and vehicle behavior rec-
ognized as well as the Balancing and Pothole detection algorithms using an Android-based
smartphone. The STEADi app was tried on multiple android devices. It was found that
the app glitches on older devices because the OpenCV module is consuming a tremen-
dous amount of resources. We offloaded the UI thread by leaving all computation to other
threads and it still did not solve the issue. For some outdated devices, simply updating
feedbacks from background threads is difficult to finish in a real-time fashion.
The main purpose of the authentication system with the STEADi application was to pro-
vide a system to recognize faces even when wearing mask and also some possible methods
to improve it’s performance. Due to the changing world and a need to adapt to new hard-
ships, it was impossible for riders to open or close their application while wearing masks.
As of now, there are almost 5-6 such systems which can recognize masked face person’s
identity but they all are far from being as perfect as the generalised facial recognition soft-
ware. Even though our facial recognition system did not have the best accuracy among
the similar systems but it is unique in the way that this system can learn from a very small
dataset which allows the user to train his/her/their face by just providing 2 minutes of their
time to allow the system take their pictures. In this thesis, we tried to create the system as
57
close what a normal user have in their smartphone.
The smartphone-based solutions we elaborate on here do not require any specific model
of scooter, because the core sensor is placed inside of shoes and auxiliary sensors are ac-
celerometers and gyroscopes that come with mobile devices [42]. The issue of the rider and
safety is of utmost importance and there is no other system as of now that tackles this prob-
lem specifically for the smart scooters, the system proposed focuses on creating a system
to provide student smart scooter riders a reliable and efficient system.
6.2 Future Scope and Research
Although this system is the first of its kind but has huge potential and future scope. This
system can be used with the E-Scooter rider interaction data to provide much better safety
features like collision alerts, speeding alerts, etc. The implementation of the rider interac-
tion system with the rider assistance system will make the system more robust. Some of
the issues that the new system can solve are as follows:
1. Rider alert system: Streets are a shared resource used by different users for a mul-
titude of reasons. Rider Interaction data with google maps data will help create a
system to alert drivers based on time and place for the most probable road and acci-
dents future situation with the present ones.
2. Capture e-scooter interactions with other road users
3. Determine e-scooter presence in traffic: The E-scooter Rider interaction will help to
know the presence of the e-scooters on the road and will tell us how many people
actually ride the scooters.
4. Recording general behavior of e-scooter riders like the use of helmet, parking alerts,
speeding, and following general rules of the road.
There is an ample amount of opportunity to improve the face authentication technology,
the proposed rider assistance system uses. Using a better classifier than SVC to classify
58
masked faces can be one of the options for future work. On the other hand, this thesis
tried to present the best possible approaches for face detection and feature extraction for
masked faces. As for the parallelization, streams can be implemented as they are based on
asynchronous kernel launches compared to standard synchronous kernel launches used by
CUDA by default. Use of unified memory allocation can be another direction to optimize
the proposed CUDA implementation [43]. There is an urgent need to invent systems which
will help and protect students on and off campus, and with the advancing technology that
can be achieved in the most efficient and optimized way.
59
REFERENCES
[1] D. Chen, Y. Cai, X. Qian, R. Ansari, W. Xu, K.-C. Chu, and M.-C. Huang, “Bring
gait lab to everyday life: Gait analysis in terms of activities of daily living,” IEEE
Internet of Things Journal, vol. 7, no. 2, pp. 1298–1312, 2019.
[2] “Defectdetect: An android app that identifies potholes on a road.daniel pinson and
vamsi yadav,” Github, vol. 49, no. 2, pp. M85–M94, 2018.
[3] L. Zhang, G. Gui, A. M. Khattak, M. Wang, W. Gao, and J. Jia, “Multi-task cas-
caded convolutional networks based intelligent fruit detection for designing auto-
mated robot,” IEEE Access, vol. 7, pp. 56 028–56 038, 2019.
[4] P. Nagrath, R. Jain, A. Madan, R. Arora, P. Kataria, and J. Hemanth, “Ssdmnv2: A
real time dnn-based face mask detection system using single shot multibox detector
and mobilenetv2,” Sustainable cities and society, vol. 66, p. 102 692, 2021.
[5] W.-H. Lin, P. Wang, and C.-F. Tsai, “Face recognition using support vector model
classifier for user authentication,” Electronic Commerce Research and Applications,
vol. 18, pp. 71–82, 2016.
[6] “What is a shared electric scooter?” https://www.portlandoregon.gov/transportation/77294,
p. 1, 2020.
[7] D. A. Drysdale, Campus attacks: Targeted violence affecting institutions of higher
education. DIANE Publishing, 2010.
[8] H. Fitt and A. Curl, “Perceptions and experiences of lime scooters: Summary survey
results,” 2019.
[9] D. Chen, Y. Cai, and M.-C. Huang, “Customizable pressure sensor array: Design and
evaluation,” IEEE Sensors Journal, vol. 18, no. 15, pp. 6337–6344, 2018.
[10] R. Fan, U. Ozgunalp, B. Hosking, M. Liu, and I. Pitas, “Pothole detection based on
disparity transformation and road surface modeling,” IEEE Transactions on Image
Processing, vol. 29, pp. 897–908, 2019.
[11] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, K. Jiang, N. Wang,
Y. Pei, et al., “Masked face recognition dataset and application,” arXiv preprint
arXiv:2003.09093, 2020.
[12] P. E. Hadjidoukas, V. V. Dimakopoulos, M. Delakis, and C. Garcia, “A high-performance
face detection system using openmp,” Concurrency and Computation: Practice and
Experience, vol. 21, no. 15, pp. 1819–1837, 2009.
60
[13] R. Steele, A. Lo, C. Secombe, and Y. K. Wong, “Elderly persons’ perception and ac-
ceptance of using wireless sensor networks to assist healthcare,” International jour-
nal of medical informatics, vol. 78, no. 12, pp. 788–801, 2009.
[14] Google, “Android studio,” https://developer.android.com/guide
/topics/sensors/sensorsmotion.htmlsensors − motion − grav, vol. 5, no. 2, 2019.
[15] A. Mordvintsev and K. Abid, “Introduction to opencv-python tutorials,” Open CV
Official Site, 2013.
[16] J. Annuzzi, L. Darcey, and S. Conder, Introduction to Android application develop-
ment: Android essentials. Pearson Education, 2014.
[17] A. Kaehler and G. Bradski, Learning OpenCV 3: computer vision in C++ with the
OpenCV library. ” O’Reilly Media, Inc.”, 2016.
[18] R. Rothe, M. Guillaumin, and L. Van Gool, “Non-maximum suppression for object
detection by passing messages between windows,” in Asian conference on computer
vision, Springer, 2014, pp. 290–306.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification,” in Proceedings of the IEEE
international conference on computer vision, 2015, pp. 1026–1034.
[20] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
recognition and clustering,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2015, pp. 815–823.
[21] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. An-
dreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mo-
bile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[22] C.-F. Wang, “A basic introduction to separable convolutions,” https://towardsdatascience.com/,
vol. https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-
b99ec3102728, 2020.
[23] NVIDIA, P. Vingelmann, and F. H. Fitzek, Cuda, release: 10.2.89, 2020.
[24] D. Bikov, M. Pashinska, and N. Stojkovic, “Parallel programming with cuda and
mpi,” 2020.
[25] S. M. Cain and N. C. Perkins, “Comparison of experimental data to a model for
bicycle steady-state turning,” Vehicle system dynamics, vol. 50, no. 8, pp. 1341–
1364, 2012.
61
[26] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, and H. Balakrishnan, “The
pothole patrol: Using a mobile sensor network for road surface monitoring,” in Pro-
ceedings of the 6th international conference on Mobile systems, applications, and
services, 2008, pp. 29–39.
[27] Y.-c. Tai, C.-w. Chan, and J. Y.-j. Hsu, “Automatic road anomaly detection using
smart mobile device,” in conference on technologies and applications of artificial
intelligence, Hsinchu, Taiwan, Citeseer, 2010.
[28] S. J. Bhutekar and A. K. Manjaramkar, “Parallel face detection and recognition
on gpu,” International Journal of Computer Science and Information Technologies,
vol. 5, no. 2, pp. 2013–2018, 2014.
[29] P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of
computer vision, vol. 57, no. 2, pp. 137–154, 2004.
[30] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive neuro-
science, vol. 3, no. 1, pp. 71–86, 1991.
[31] T. He, Z. Dong, K. Meng, H. Wang, and Y. Oh, “Accelerating multi-layer percep-
tron based short term demand forecasting using graphics processing units,” in 2009
Transmission & Distribution Conference & Exposition: Asia and Pacific, IEEE,
2009, pp. 1–4.
[32] Corsight, “A face-recognition tech that works even for masked faces,” www.israel21c.org/,
vol. https://www.israel21c.org/a-face-recognition-tech-that-works-even-for-masked-
faces/, 2020.
[33] M. Geng, P. Peng, Y. Huang, and Y. Tian, “Masked face recognition with genera-
tive data augmentation and domain constrained ranking,” in Proceedings of the 28th
ACM International Conference on Multimedia, 2020, pp. 2246–2254.
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[35] P. Bagchi, D. Bhattacharjee, and M. Nasipuri, “Robust 3d face recognition in pres-
ence of pose and partial occlusions or missing parts,” arXiv preprint arXiv:1408.3709,
2014.
[36] H. Drira, B. B. Amor, A. Srivastava, M. Daoudi, and R. Slama, “3d face recognition
under expressions, occlusions, and pose variations,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2270–2283, 2013.
62
[37] A. S. Gawali and R. R. Deshmukh, “3d face recognition using geodesic facial curves
to handle expression, occlusion and pose variations,” International Journal of Com-
puter Science and Information Technologies, vol. 5, no. 3, pp. 4284–4287, 2014.
[38] Spin, “Spin adas,” https://blog.spin.pm/spin-insight-building-adas-for-micromobility-
98a3d88aa976, vol. 5, no. 2, 2021.
[39] Google, “Maps sdk,” https://developers.google.com/maps/
documentation/android-sdk/overview, vol. 5, no. 2, 2019.
[40] J. Pedoeem and R. Huang, “Yolo-lite: A real-time object detection algorithm opti-
mized for non-gpu computers,” arXiv preprint arXiv:1811.05588, 2018.
[41] “Yolo: Real-time object detection.https://pjreddie.com/darknet/yolo/,” Github, vol. 49,
no. 2, pp. M85–M94, 2018.
[42] R. Foppen, “Smart insoles: Prevention of falls in older people through instant risk
analysis and signalling,” 2020.
[43] R. Landaverde, T. Zhang, A. K. Coskun, and M. Herbordt, “An investigation of uni-
fied memory access performance in cuda,” in 2014 IEEE High Performance Extreme
Computing Conference (HPEC), IEEE, 2014, pp. 1–6.
63

More Related Content

Similar to Bast digital Marketing angency in shivagghan soraon prayagraj 212502

eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdfPerPerso
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
 
Project report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemProject report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemkurkute1994
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
 
Aidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan O Mahony
 
Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Dimitar Dimitrov
 
Uni v e r si t ei t
Uni v e r si t ei tUni v e r si t ei t
Uni v e r si t ei tAnandhu Sp
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsKelly Lipiec
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Priyanka Kapoor
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisevegod
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRSAbhishek Nadkarni
 
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...mustafa sarac
 
Fulltext02
Fulltext02Fulltext02
Fulltext02Al Mtdrs
 
Ivo Pavlik - thesis (print version)
Ivo Pavlik - thesis (print version)Ivo Pavlik - thesis (print version)
Ivo Pavlik - thesis (print version)Ivo Pavlik
 

Similar to Bast digital Marketing angency in shivagghan soraon prayagraj 212502 (20)

eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdf
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Project report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemProject report on Eye tracking interpretation system
Project report on Eye tracking interpretation system
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...
 
document
documentdocument
document
 
BA1_Breitenfellner_RC4
BA1_Breitenfellner_RC4BA1_Breitenfellner_RC4
BA1_Breitenfellner_RC4
 
Aidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_Report
 
Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)Bachelor Thesis .pdf (2010)
Bachelor Thesis .pdf (2010)
 
Uni v e r si t ei t
Uni v e r si t ei tUni v e r si t ei t
Uni v e r si t ei t
 
MSc_Thesis
MSc_ThesisMSc_Thesis
MSc_Thesis
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesis
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRS
 
jc_thesis_final
jc_thesis_finaljc_thesis_final
jc_thesis_final
 
web_based_ide
web_based_ideweb_based_ide
web_based_ide
 
Software guide 3.20.0
Software guide 3.20.0Software guide 3.20.0
Software guide 3.20.0
 
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
 
Fulltext02
Fulltext02Fulltext02
Fulltext02
 
Ivo Pavlik - thesis (print version)
Ivo Pavlik - thesis (print version)Ivo Pavlik - thesis (print version)
Ivo Pavlik - thesis (print version)
 

Recently uploaded

Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
DEPED Work From Home WORKWEEK-PLAN.docx
DEPED Work From Home  WORKWEEK-PLAN.docxDEPED Work From Home  WORKWEEK-PLAN.docx
DEPED Work From Home WORKWEEK-PLAN.docxRodelinaLaud
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 

Recently uploaded (20)

Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
DEPED Work From Home WORKWEEK-PLAN.docx
DEPED Work From Home  WORKWEEK-PLAN.docxDEPED Work From Home  WORKWEEK-PLAN.docx
DEPED Work From Home WORKWEEK-PLAN.docx
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 

Bast digital Marketing angency in shivagghan soraon prayagraj 212502

  • 1. SMART-SCOOTER RIDER ASSISTANCE SYSTEM USING INTERNET OF WEARABLE THINGS AND COMPUTER VISION By DEVANSH GUPTA Submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computer and Data Sciences CASE WESTERN RESERVE UNIVERSITY May 2021
  • 2. CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of DEVANSH GUPTA candidate for the degree of Master of Science Committee Chair Dr. Ming-Chun Huang Committee Member Dr. Yanfang (Fanny) Ye Committee Member Dr. An Wang Committee Member Dr. Yinghui Wu Date of Defense 26th April 2021 ∗ We also certify that written approval has been obtained for any proprietary material contained therein.
  • 3. TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1: Introduction, Objective and Contributions . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: BACKGROUND of the Smart Scooter rider assistance system . . . . 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Hardware and Software Requirements . . . . . . . . . . . . . . . . . . . . 9 2.3 Wearable Gait Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Android OS and Programming . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Smart Scooter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 iii
  • 4. 2.7 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.8 Multi-Task Cascaded Convolutional Neural Network (MTCNN) . . . . . . 13 2.9 FaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.10 MobileV2Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.11 Support Vector Classification . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.12 Implementation of the Masked face recognition system . . . . . . . . . . . 23 2.13 Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.14 Graphical Processing Units and CUDA . . . . . . . . . . . . . . . . . . . . 26 2.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 3: Related Work for the Smart scooter rider assistance system . . . . . 31 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 4: Implementation of STEADi application . . . . . . . . . . . . . . . . 35 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Method to calculate Balance of the rider . . . . . . . . . . . . . . . . . . . 36 4.3 Path Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 PotHoles Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Chapter 5: Experiment, Results, and Discussion . . . . . . . . . . . . . . . . . . 46 5.1 Experiment Design for Masked Face authentication . . . . . . . . . . . . . 46 iv
  • 5. 5.2 Results for Masked Face authentication system . . . . . . . . . . . . . . . 47 5.2.1 Results from Test of Image processing and cropping with different Image Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2.2 Results from Test of Masked Face recognition with different Image Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Experiment Design for the STEADi application . . . . . . . . . . . . . . . 50 5.4 Results for the STEADi application . . . . . . . . . . . . . . . . . . . . . . 52 5.4.1 First-time rider of the Smart Scooter . . . . . . . . . . . . . . . . . 52 5.4.2 Riding scooter on different terrains: Up and down the slope and riding on the grass . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 6: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 57 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2 Future Scope and Research . . . . . . . . . . . . . . . . . . . . . . . . . . 58 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 v
  • 6. LIST OF TABLES 2.1 The value of losses at each stage of the . . . . . . . . . . . . . . . . . . . 16 5.1 Time taken to process an image vs the resolution of the image . . . . . . . . 50 5.2 Time taken to recognize face mask vs the resolution of the image . . . . . . 50 vi
  • 7. LIST OF FIGURES 1.1 What discouraged students and locals from using an e-scooter? [8] . . . . . 3 1.2 Raw Images from the MAFA dataset. . . . . . . . . . . . . . . . . . . . . 5 1.3 Raw image from RMFRD . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Pictures taken automatically using OpenCV for training . . . . . . . . . . . 6 1.5 Extracted pictures from the automatic pictures taken using OpenCV . . . . 6 2.1 Hardware and Software of Smart Insole System. (a) 3D bifurcation of the Insole System with Assembly structure (b) Insole Foot Pressure measure- ment for fore, mid, and hind section. The highest pressure in that area during a stride is indicated by the black dot in each area. The Red dashed line indicates US-sized insole used in this research. The Graph shows the measured GRF of each area during a gait cycle. . . . . . . . . . . . . . . . 10 2.2 Basic Android layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Execution environment of the Android applications. . . . . . . . . . . . . 11 2.4 Network structure of MTCNN that includes three-stage multi-task deep convolution networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 FaceNet take image as an input ans return a 128 number vector output . . . 17 2.6 Triplets when at Step 1 and Triplets after FaceNet at Step 6. . . . . . . . . . 18 2.7 Depthwise convolution, uses 3 kernels to transform a 12x12x3 image to a 8x8x3 image and, Pointwise convolution, transforms an image of 3 chan- nels to an image of 1 channel [22] . . . . . . . . . . . . . . . . . . . . . . 19 2.8 The main building block in MobileNetV2 . . . . . . . . . . . . . . . . . . 20 vii
  • 8. 2.9 The compression and decompression inside a building block of the Mo- bileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.10 Flowchart of Masked Face Recognition Process using SVC. . . . . . . . . . 24 2.11 Basic structure of the face recognition program. . . . . . . . . . . . . . . . 25 2.12 A multiprocessing optimization of the Masked Face Recognition system. . . 27 2.13 Memory model of CUDA Programming [23] . . . . . . . . . . . . . . . . 28 2.14 Complete overview of blocks using CUDA for face recognition system . . 29 2.15 Proposed system for CUDA based Masked Face recognition. . . . . . . . . 30 4.1 STEADi application flow-chart showcasing the workflow of how the appli- cation checks for stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Code for the balancing algorithm . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 (a) When the STEADi shows that rider is balanced (b) When the STEADi shows that rider is not balanced . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 Code to ask user for the location access. . . . . . . . . . . . . . . . . . . . 40 4.5 Settings page which gives various options to the user and grants the back- ground location access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.6 Code to update the location of the user. . . . . . . . . . . . . . . . . . . . . 42 4.7 STEADi application detecting potholes on the road . . . . . . . . . . . . . 43 4.8 Making sure if OpenCV loads properly. . . . . . . . . . . . . . . . . . . . 44 4.9 Code for accessing the camera. . . . . . . . . . . . . . . . . . . . . . . . . 44 4.10 Working of YOLO framework . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 (a) Face Mask detection with wearing mask. (b) Face Mask detection with- out wearing mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Precision-Recall curve for the Face mask detection system . . . . . . . . . 48 viii
  • 9. 5.3 (a) Face Mask recognition with wearing a mask. (b) Face Mask recognition without wearing a mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Time taken to process an image vs the resolution of the image . . . . . . . . 49 5.5 Time taken to recognize face mask vs the resolution of the image . . . . . . 51 5.6 Insole sensor data when the rider was riding for the first-time. . . . . . . . . 53 5.7 Insole sensor data when the rider goes up the hill. . . . . . . . . . . . . . . 54 5.8 Insole sensor data when the rider tried every experiment simultaneously. . . 55 ix
  • 10. ACKNOWLEDGMENTS I would like to thank for supporting me and encouraging my passion for machine learn- ing and software engineering in general. I would also like to thank my friends for their support of my work and endless patience throughout the entire process. Most importantly, I would like to thank my advisor, Dr. Ming-Chun Huang for his great ideas and without whom this thesis would not be possible. Finally, I would like to thank my committee mem- bers, Dr. Yanfang (Fanny) Ye, Dr. An Wang and, Dr. Yinghui Wu for reviewing my thesis and providing me with valuable feedback. x
  • 11. LIST OF ACRONYMS ADAS Advanced Driver Assistance Systems AI Artificial Intelligence AOH Ahead of Time CUDA Compute Unified Device Architecture FBI Federal Bureau of Investigation GPU Graphical Processing Units ID Identification IoWT Internet of Wearable Things MTCNN Multi-Task Cascaded Convolutional Neural Network NMS Non-Maximum Suppression NNN Natural Neural Network OHA Open Handset Alliance OpenCV Open Source Computer Vision OS Operating System PReLU Parametric ReLU SVC Support Vector Classifier WGL Wearable Gait Lab xi
  • 12. Smart-Scooter Rider Assistance System using Internet of Wearable Things and Computer Vision By Devansh Gupta Abstract The use of intelligent human/computer interaction systems has become an irreplace- able part of a student’s life, be it the extensive use of personal mobility vehicles like smart-scooters, or bicycles or the use of biometric authentication systems for any kind of authentication, the use of intelligent human-computer interaction systems has become an irreplaceable part of a student’s life. The aim of this thesis is to propose an IoWT and computer vision-based solution that enhances campus safety focusing on the personal mobility vehicle, Smart-scooters and facial recognition system for students, faculties and, workers wearing masks on campus. The thesis presents a one-of-a-kind ”Smart-Scooter rider-assistance system,” STEADi, which focuses on the personal mobility vehicles such as smart scooters safety for helping the riders while riding the scooter on side-roads or the sidewalks. The system proposed in this thesis uses a self-training parallel Masked face recognition system for authorization and, sensors for safety monitoring. xii
  • 13. CHAPTER 1 INTRODUCTION, OBJECTIVE AND CONTRIBUTIONS 1.1 Introduction With the growth in the student population, the issues revolving around the student’s safety on-campus are becoming a major concern around the world. The safety issues range from the misuse of personal mobility vehicles to the need for new facial recognition systems which can process masked images. Although the safety of students around the world has improved over the past year, these safety issues can be improved through the use of Artifi- cial Intelligence (AI) and the Internet of Wearable Things (IoWT). This thesis purposes a systems to either improve the current solutions available or solve these issues. The system proposed in this thesis is a ”Smart-Scooter rider-assistance system,” STEADi. The primary problem this system attempts to solve focuses on helping the riders riding their personal mobility vehicles, smart scooters, on side-roads or the sidewalks. This system uses a Wearable Gait Lab [1] for, a wearable underfoot force-sensing intelligent unit, as one of the main components. The purpose of this system is to help students who are new to using smart scooters on campus avoid injuries and accidents by alerting the rider about unforesee- able conditions. The system provides adequate data for path tracking, the pothole Detection system, and human balancing ability for the Smart Scooter riders. A straightforward ma- chine learning approach using the android application, Defectdetect [2] which makes it easy to identify potholes and other similar road surface irregularities from the accelerom- eter and the Wearable Gait Lab system data. Since the spread of the COVID-19 virus, the use of the more established biometric systems based on fingerprints and passwords is not safe, whereas the facial recognition systems have been having a hard time during the COVID-19 pandemic. One of the rapidly increasingly common modern-day annoyance is 1
  • 14. pulling out your smartphone for a hygienic contact-free payment at a shop, and glaring down at an error message, ”Face Not Recognized”. One of the well-known examples is Apples Face Identification (ID), a technology, that uses a grid of infrared dots to calibrate the physical appearance of a users face. The proposed system also implements a unique authorization technology to improve the present-day facial recognition systems which can recognize faces wearing masks. The face recognition module is based on three models: Face detection using Multi-Task Cascaded Convolutional Neural Network (MTCNN) [3], Face Mask Recognition system using MobileNet V2 [4], and an Support Vector Classifier (SVC)-based masked face recognition system [5]. Two algorithms are used to improve the performance of the system. The first is a multiprocessing system implemented in Python and, the second is using a python library for Cuda), PyCUDA which gives Pythonic access to Nvidia’s CUDA parallel computation API. The thesis evaluates the proposed system using four balance tests which are based on different terrain and with different diverse riding experiences related to the Smart Scoot- ers. The test results showed successful detection of several potholes in and around the Cleveland area with a 64.56% average precision and 69.12% recall on the smartphone. The system was successfully able to alert the riders, including the lesser experienced ones while riding on different terrains for the potential road-related threats and balancing. The system requires the rider to place a smart insole under each foot, while connected using Bluetooth Low Energy (BLE) to a smartphone. The data is collected using the android application, then normalized and used with the orientation sensors data of the smartphone to calculate a balancing score. The balancing score will tell the user if he/she is balancing the scooter properly or not [6]. 1.2 Objectives The main objective behind researching the use of computer vision applications is to find a way to enhance the safety around campus. The smart-scooter company, the Bird, conducted 2
  • 15. a study on scooter safety. The study concluded that scooters involve similar risks as bikes and other small personal mobility vehicles. As per the report in 2017, the bikes related emergency department visits topped, 59 visits per 1 million miles cycled. Based on the data gathered solely by Bird scooter riders, 38 injuries per 1 million miles were reported by the company. Out of the 38 injuries suffered by the Bird scooter riders, 27 of these 38 were around a university or an education campus. According to the book published byFederal Bureau of Investigation (FBI) on Campus Attacks [7], every 2 out of 5 attacks on campus, the suspect’s face was covered and was difficult to recognize from the surveillance cameras. In the past year, coronavirus masks have become a boon for crooks who hide their faces previously using bandanas. These are only one of the few safety issues students face during the pandemic due to the lack of masked face recognition systems. The system discussed in this thesis provide some novel methods which use computer vision technology and IoWT to tackle these safety-related problems. Figure 1.1: What discouraged students and locals from using an e-scooter? [8] 1.3 Contributions The contributions of this thesis are as follows: • Data Collection and Processing: The first step in implementation of the systems 3
  • 16. is collecting data and processing. The data for the Smart-scooter rider assistance system was collected in two parts. 1. First, the data collected is used for accessing the balance of the rider and the second data collected is used for the pothole detection system. To collect data for the balancing system, we used Wearable Gait Lab (WGL) [1]. The main component of WGL that we used in this thesis is the Smart Insole System. The Smart Insole System consists of 96 pressure sensors uniformly distributed on the pressure sensor array, which ensures a high spatial resolution for plantar pressure measurement. For details of the design method and mechanism of the pressure sensor array, please refer to the former research [9]. The insole system was put in the shoe of the rider and then pressure sensors data is collected using an Android based smartphone device. For pothole detection, we used the Pothole-600 dataset [10]. The data provided was collected using a Zen stereo camera. 2. The data for the Masked face recognition system was also collected in two parts. The first data is open-datasets available online and the second dataset is generated using the user’s faces. In this project, the two open datasets we used were provided by the National Natural Science Foundation of China, Wuhan University [11]. The two open datasets are as follow: (a) Masked Face Detection Dataset (MFDD): The dataset contains 24,77 im- ages of people wearing masks. The definition of the face position in the MAFA dataset is quite different from general face datasets. The MAFA face frame is square near the eyebrows, and the labeling frame is not strict (the frame has a gap from the edge of the face), while in the nor- mal datasets, the face frame is above the forehead of the person. Figures Figure 1.2 show the raw images from the MFDD dataset. 4
  • 17. (a) (b) Figure 1.2: Raw Images from the MAFA dataset. Figure 1.3: Raw image from RMFRD (b) Real-world Masked Face Recognition Dataset (RMFRD): The dataset is a collection of 5,000 pictures out of which 525 people are wearing masks, and 90,000 images of the 525 people without masks. Figure Figure 1.3 shows the raw image from the RMFRD dataset. The second dataset is generated by prompting the user to enter his/her name and then automatically taking pictures using the OpenCV library for 1 minute continuously. For the first 30 seconds, the user is advised to take pictures with- out a mask and the later 30 seconds with a mask. The pictures are then cropped by using a previously trained Hassar Cascade classifier to detect and crop the area around the faces in the picture. Figure Figure 1.4 shows the pictures taken and processed using OpenCV. 5
  • 18. (a) (b) (c) Figure 1.4: Pictures taken automatically using OpenCV for training (a) (b) (c) Figure 1.5: Extracted pictures from the automatic pictures taken using OpenCV • Provide a unique smart-scooter rider assistance system. • Provide a self-learning real-time Masked Face recognition system. • Provide a comparison of performance of the masked face recognition system achieved by multiprocessing and Compute Unified Device Architecture (CUDA) implementa- tions. 1.4 Thesis Organisation The thesis has been organized in the following chapters: • Chapter 1: Introduction and the research objective behind this study. • Chapter 2: Presents an overview of the smart-scooter rider assistance system, STEADi. • Chapter 3: Detailed Literature review and related approaches for balancing systems, face recognition system and an upcoming rider assistance system. • Chapter 4: Presents the methodology used and implementation for the proposed smart-scooter rider assistance system. 6
  • 19. • Chapter 5: Provides the experiments conducted and their results to prove the work- ing of the proposed system. It also focuses on the discussion of the analysis carried out and its conclusion. • Chapter 6: Draws the Conclusion from the provided results and presents potential future work. 7
  • 20. CHAPTER 2 BACKGROUND OF THE SMART SCOOTER RIDER ASSISTANCE SYSTEM 2.1 Introduction The ”rider/driver assistance system” are most commonly known as Advanced Driver Assis- tance Systems (ADAS) and was first derived from the field of automobiles. The definition of these ADAS differ based on criteria for classification. In the beginning it was described as a system that supports a driver/rider for example, remote-starter in the cars. With the pas- sage of time and everyday technological advancement, these systems started getting more and more complex and reliable, from the anti-braking systems to the navigation systems, they were mainly designed to render help to the driver. The significance of an advanced human/computer interaction system like a facial recognition system is critical to the false alarm rate, the accuracy of the system and, the computational cost it incurs to process an image. Since the first commercial facial recognition applications released, efficiency and speed have become the most crucial part of these applications. [12]. With a growing number of people infected with Corona-virus cases around the world, it has become difficult mainly for students using devices like Apple’s iPad from taking notes to going through slides. Many universities around the world are going to follow the hybrid teaching structure in which a student can select if he/she wants to take the class in- person or online. With the new smartphones having mainly face recognition as the primary unlocking system and the urge of students to use their mobile every 5 seconds there is a need for a facial recognition system that can detect faces covered by masks and unlocks the smartphones. This chapter describes the tools, libraries, and technologies used while developing the Smart Scooter rider assistance system. First, the hardware and software requirements used 8
  • 21. for developing and testing the system. This is followed by the explanation of technologies used to develop each mode of the system with the background of tools and libraries used. 2.2 Hardware and Software Requirements The following requirements were chosen as the basis of our system: • An android based smartphone with an inbuilt accelerometer and gyroscope, and a functional camera that outputs an image with a resolution higher than 1440x1080. We recommend any smartphone with a minimum Qualcomm Adreno 420 (600 MHz) GPU and quad-core CPU (2.7 GHz vs. 2.5 GHz S801). • A Smart Insole system to measure and record the gait parameters. • A smart scooter to ride on. 2.3 Wearable Gait Lab Wearable Gait Lab, is a wearable Gait system developed by SAIL lab at CWRU, which uses a force platform with the capability of measuring ground-reaction forces when worn under a person’s foot. The main component of WGL that we used in this project is the Smart Insole System. Smart Insole is an important system for realizing “Gait analysis”. Up to 96 pressure sensors were uniformly distributed on the pressure sensor array, which ensures a high spatial resolution for plantar pressure measurement. For details of the design method and mechanism of the pressure sensor array, please refer to the former research [9]. Figure Figure 2.1 (a) shows a circuit board for signal acquisition and data wireless transmission. A flexible Printed Circuit (FPC) connector is used to connect the pressure sensor array for pressure signal acquisition. The IMU sensor, including accelerometer and gyroscope, is used to measure the foot motion. A Micro-controller Unit (MCU) is used to control the process of signal acquisition and data transmission. The sample rate for sensor data 9
  • 22. Figure 2.1: Hardware and Software of Smart Insole System. (a) 3D bifurcation of the Insole System with Assembly structure (b) Insole Foot Pressure measurement for fore, mid, and hind section. The highest pressure in that area during a stride is indicated by the black dot in each area. The Red dashed line indicates US-sized insole used in this research. The Graph shows the measured GRF of each area during a gait cycle. acquisition is 30 Hz. A wireless module (classic Bluetooth) is used to transfer the acquired sensor data to a smartphone app for further processing. [1] Considering the fact that people prefer sensors embedded into their clothing or acces- sories than wearing a technology separately [13], all the hardware systems were packed into an insole shaped package which makes the use of Smart Insole similar to normal insoles. 2.4 Android OS and Programming The Android Operating System (OS) is based on the Linus OS and is a very popular and common computing platform. The first commercial version of the Android system came in 2008, back when everyone was using a flip phone and Blackberry was the biggest thing in the mobile phone industry. It first made its appearance in the form of a mobile platform. The Android platform is the hard work of Open Handset Alliance (OHA), which is an organization with a mission to collaborate to ”create a better mobile phone”, which started 10
  • 23. Figure 2.2: Basic Android layers. from the first Android phone, G1 manufactures by HTC. Figure 2.2 shows the basic view of the Android layers. Historically, Android applications have been written in the JAVA programming lan- guage. The main logic of the application, bytecode is generated after formatting and com- piling the JAVA source code and is converted into an executable code on the device during run-time. This approach is known as Ahead of Time (AOH). Figure 2.3 shows the execution environment of the Android applications. Figure 2.3: Execution environment of the Android applications. The Android application is developed and deployed on a device with a file which is essential for every application. The file is named AndroidManifest.xml. From types of events the application can process to permissions required by it, it tells the OS how to interact with the application. 11
  • 24. 2.5 Android Studio Android studio is the easiest way to get started with the development of Android appli- cations. It can be easily downloaded for any operating system (Mac OS, Windows OS, or Linux). The studio provided an efficient and user-friendly JAVA environment for deploying and developing applications with the ability to test either on a simulator or a real device. For more information about how to get started and build your own Android applications, you can visit the official Android studio page [14]. 2.6 Smart Scooter The easiest way to describe a scooter is a bike without a pedal, seat, and chain. The needed momentum required by a common scooter can be applied by pushing the ground backward but a Smart scooter uses a battery and a motor to maneuver the scooter after the initial push. The center of pressure for the smart scooter extends between the front and the back, always going along a line ignoring the wind resistance The main challenge with a scooter is balancing and falling left and right. Smart scooters are an up-and-coming new mobility service adding to the small yet vast category of bike share and car share. With the use of app-based technology and the smart-scooter, the service provides a simple yet intelligent ability to rent the scooter for the short term. To rent a scooter, the rider first unlocks a smart scooter through the company’s smartphone app by scanning the bar code provided on the handle of the scooter or for those without smartphones, a call, or text service to unlock. The ride can end the trip by parking the scooter on the sidewalk. The parked scooter should be close to the curb and out of the pedestrian travel zone. Sometimes the riders are required to confirm they have parked the smart scooter correctly by submitting a photo through the company’s app to end their rental process. Smart scooters are powered almost exclusively by an electric motor, after an initial push to start the device. By the mid of 2018 multiple new companies entered the 12
  • 25. market and introduced new Smart scooter models. 2.7 OpenCV Open Source Computer Vision (OpenCV) is one of the most dependable and widely used open source library for computer vision which can be used with wide variety of program- ming languages and platforms (C++, Python, Android, Java, Kotlin, etc.). It was first created at Intel by Gary Bradsky in 1999 and realsed in 2000 [15]. It provides various functions like read and write images, display images, object detection, edge detection,etc. The official Android port for the OpenCV library is OpenCV4Android [16]. The an- droid support came in a limited version ”alpha” in 2010 with OpenCV 2.2 [], which was later joined by NVIDIA and released a beta support version for Android OpenCV 2.3.1 [17]. OpenCV 2.4 was the first official version that was supported on Android. But now the most stable and latest method to code OpenCV application to run on Android is using the OpenCV Java API. In case of the OpenCV Java API, each function is wrapped in a Java interface, whereas the OpenCV functions are written and compiled in C++. But there was performance overhead. With the development of cross programming language com- patibility, the OpenCV was directly called using the Java functions where OpenCV code was written in C++. In this method, you first code, develop and test the OpenCV imple- mentation on the host platform, C++. it is later rebuild into the Android environment using Android tools. 2.8 Multi-Task Cascaded Convolutional Neural Network (MTCNN) One of the widely used and most accurate face detection system is MTCNN. This thesis uses a multi-task system which performs face feature point detection and face detection simultaneously. The MTCNN framework uses a similar structure as that of Viola-Jones which is based on a cascade of three CNNs. The network structure of the MTCNN algo- rithm is shown in Figure Figure 2.4 and it consists of three stages: 13
  • 26. 1. Proposal Network(P-Net): A CNN used to generate bounding box regression vec- tors and candidate windows. The candidate windows are calibrated using these esti- mated bounding boxes and then the highly overlapped boxes are merged using Non- Maximum Suppression (NMS) [18]. The NMS is used to eliminate the overlapping candidate windows by removing the the windows with smaller probabilities with the larger probability windows and delete them. 2. Refine Network (R-Net or N-Net): P-Net is feed to another CNN called refine net- work which performs calibration and rejects all the false candidate windows and this process is fed to the P-network. 3. Output-Network (O-Net): This network is used to find the output frame and the 5 feature points. One of the main problems of the above-mentioned CNN is that the kernels are not differ- entiable, hence limiting the discerning abilities of the CNN. So, all 3*3 kernels were used to reduce parameters with an activation function that uses Parametric ReLU (PReLU) [19]. ReLU was introduced since sigmoid activation functions were having vanishing gradient problem. But, ReLU was zeroing out the negative input functions, so another activation function was introduced to handle this problem and increase further accuracy of ReLU. LeakyReLU was introduced, which instead of zeroing out, multiplies the negative inputs by a small number. But, this change does not show a considerable amount of increase in accuracy of the models. Hence, PReLU was introduced, which used backpropagation to learn that small value( slope parameter) to adapt better to parameters like biases and weights. The number of slope parameter to be learned is equal to the number of layers in a feed-forward network. The formula for the PReLU can be written as: f(yi) = max(0, yi) + aimin(0, yi) , 14
  • 27. Figure 2.4: Network structure of MTCNN that includes three-stage multi-task deep convo- lution networks. • f(yi) is a ReLU function, if ai = 0 • f(yi) is a LeakyReLU function, if ai > 0 • f(yi) is a PReLU function, if ai is a parameter than can be laerned. The network is a multi-task cascading network, which consists of three tasks each as- signed to the above mentioned three stages. 1. Face/No Face Binary classification: In this task, the network classify if the object in question is a face or not. This is handled using Logistic Regression, where loss is 15
  • 28. given by: Ldet i = −(ydet i log(pi) + (1 − ydet i )(1 − log(pi)) where, the real label is represented by ydet i . 2. Regression of Frames: After finding the face, the second task is to put that face in a frame. This is a regression problem and the Euclidean distance loss is given by: Lbox i = ||ŷbox i − ybox i ||2 2 3. Feature Point Calibration: This is also a regression task, which find and marks points on the face features like right and left eyes, right and left side of the lips and nose. The Euclidean distance loss is given by: Lmark i = ||ŷmark i − ymark i ||2 2 The three losses mentioned above are combined using a weight problem and is given as: min N X i=1 X j∈(det,box,mark) αjβj i Lj i where, β is the used to determine is loss is used and α is the weight difference between loss. Table 2.1 shows the values of losses at each stage. Stage α β Loss P-Net 1 0.5 0.5 R-Net 1 0.5 0.5 O-Net 1 0.5 1 Table 2.1: The value of losses at each stage of the MTCNN 16
  • 29. In this theise, we choose to use MTCNN due to its speed, accuracy and efficiency of the model. 2.9 FaceNet FaceNet is a deep learning approach produced by Google which is used to extract fea- tures from an image [20]. It is a one-shot learning model which uses an end-end learning method instead of traditional softmax classification methods. In this network, the softmax function is removed and L2 normalization is used to calculate the feature representation loss. FaceNet output a vector of 128 numbers which represents the features of an image. These features have used as input by the face recognition system. Figure 2.5 shows that FaceNet take image as an input ans return a 128 number vector output. This vector is know as embedding and same faces have similar embedding. Figure 2.5: FaceNet take image as an input ans return a 128 number vector output Since these embedding are in 128 dimensional space, they can be assumed them in a 2D plane and then can interpret in a Cartesian coordinate system. This mean the images can be plotted in a coordinate system using their embedding. FaceNet works by calculating a vector embedding for an unseen image and then calculating the distances between the unseen image and the images in the dataset or the known person by the network. Let say the embedding distance is close enough with that of person ”XYZ”, it can be said that the 17
  • 30. unseen image is of the person ”XYZ”. The FaceNet works in the following steps: 1. Generates random vector embedding for each image in the dataset, meaning each image will be randomly plotted on a 2D coordinate system. 2. Selects a person image randomly, called a pillar. 3. Randomly selects another image of the pillar image, meaning positive examples. 4. Randomly selects another image which is not of the pillar image, meaning negative examples. 5. The FaceNet parameters are adjusted so that the positive examples are closer to the pillar than the negative examples. 6. The whole process from 2 to 5 is repeated until no further adjustments are needed. The process defined above is know as triplet loss which consists of triplets (pillar, positive and negative examples). Then a objective function is used to optimize the triplets that do not meet the requirements and all others are just passed. Figure 2.6 shows the hypothetical version of Step 1 and Step 6. Figure 2.6: Triplets when at Step 1 and Triplets after FaceNet at Step 6. 18
  • 31. 2.10 MobileV2Net MobileNetV2 is developed by Google and is based on an idea similar to its predecessor MobileNetV1. MobileNetV1 [21] is used to find the convolution layer that are expensive to compute and the replace then with a new layer named as deepthwise separable convo- lution layer. In normal convolution, one filter is applied to every input channel for every output channel for a layer, whereas in depthwise separable convolution is divided into two parts: depthwise convolution, which applies, K, the number of filters to each input channel and a pointwise convolution, which uses a 1x1 convolution to produce the required output. Figure Figure 2.7 shows the working of the depthwise separable convolution. The first layer in the MobileNetV1 is a 3*3 convolution followed by 13 times deepthwise separable convolution layer. These layers are followed by batch normalization with activation func- tion ReLU6. This version of ReLU prevents the over expansion of the activations. This is further followed by an average pooling layer and then a classification layer (1x1 convolu- tion) with softmax. MobileNet are usually 9 times more robust and less work than normal CNN with same accuracy. Figure 2.7: Depthwise convolution, uses 3 kernels to transform a 12x12x3 image to a 8x8x3 image and, Pointwise convolution, transforms an image of 3 channels to an image of 1 channel [22] MobileNetV2 is the latest version of MobileNetV1, the same convolution layer archi- tecture with deepthwise separable convolution layer. In MobileNetV2, deepthwise separa- ble convolution layer block, have three layer instead of two as in V1. The new block looks as in Figure 2.8. The first new layer is a 1x1 convolution layer, whose main purpose is to 19
  • 32. Figure 2.8: The main building block in MobileNetV2 20
  • 33. Figure 2.9: The compression and decompression inside a building block of the Mo- bileNetV2 increase the number of channels in the input data, and defined as the ”expansion layer”. The last two layers are same: a depthwise convolution for filtering the inputs,then a 1×1 pointwise convolution layer. In V2, the pointwise layer performs different function that it used to in V1. In V1, it used to either double the number of channels or keep them same, but in V2, it tries to decrease the number of channels by projecting a high dimensional channel into a small dimensional tensor and hence, it is know as the ”projection layer”. The first layer in the MobileNetV2 is a 3*3 convolution layer instead of the new 1x1 expansion layer which is followed by 16 times the main building block defined above. These layers are followed by batch normalization with activation function ReLU6. This version of ReLU prevents the over expansion of the activations. This is further followed by an average pooling layer and then a classification layer (1x1 convolution) with softmax. The main idea behind V2 is to reduce the expensive computation convolution layer with cheaper ones. The trick behind is to decompress and the compress the data as it follows through the main building block with trainable parameters than can best od this job. Figure 2.9 shows how this decompression and compression works in a building block. The main job of the MobileNetV2 [4] in the system proposed in this thesis is to detection if a person is wearing mask or not. 21
  • 34. 2.11 Support Vector Classification A face recognition system can be classified as a multi-class classification problem. The process involves defining several classes and finding which class the corresponding subject falls. A basic face recognition system takes an input face image, extracts the facial features which are then used to compare to the features of labeled facial data in a dataset. A feature similarity metric is used for comparing the labels and the most analogous dataset entry is used to label the input face image. A one-against-all strategy is used to train a multi-class Support vector machine(SVM) with k SVM classifiers, where k is the number of subjects need to be recognized. A multi- class SVM classification is known as Support Vector Classification(SVC)[5]. Given a ker- nel function, K(a, b) = φ(a)φ̇(b), the SVC can be used to find the optimal approximation by maximizing giving the separating function f(x), f(x) = n X i=1 yiαiK(x, xi) + b Let assume k is the number of faces are to be recognized, where n is the number of training faces and, ni is the number of training faces of the person i. n = k X i=1 ni The k SVM classifiers, ak can be build, where for each aj classifier, j = 1, 2, ...., k, the nj examples are positive and all other Pk i=1,i6=j ni examples are negative. After training the separating function f(x), we obtained k classifiers, aj = n X i=1 yiαjiK(xi, x) + bj 22
  • 35. Each of the k classifiers generate its own outputs ai. An assumption was that the test image should belong to one of the class to be recognized and hence, the recognition returns: id = argmaxk i=1ci where id is the identification number of the recognized face. 2.12 Implementation of the Masked face recognition system Figure 2.10 shows our proposed system for Masked face recognition (MFR). For MFR, the system first asks the name of the user and then automatically takes pictures of the user for 2 minutes. The system asks the user to sit still and prompts the user to wear a mask after 1 minute. The system then uses the Face Mask detection algorithm which uses the MTCNN which is an object detection algorithm, to detect if the images of the user are with a mask or not. If images are without masks, the system creates a new image by superimposing a face mask on the images. During the training phase, first, the multi-class classifier was trained on the MFDD and RMFRD datasets and then on the user’s face dataset. The recognition is done by comparing the selected features from the target image against selected features from the corresponding template images in the user face dataset. 2.13 Multiprocessing Neural networks can be efficiently used to solve problems where the difficulty of mathemat- ical modeling of the problem in-creases. With the increase in size, there is an exponential increase in the time required to evaluate or train. Many efforts have been made to reduce training time taken by neural networks. One of the solutions to reduce the training or the evaluation time is using the parallelization capacities of multicore/multithreaded CPUs or of graphic processing units (GPUs). Two software environments, CUDA, and multipro- cessing have established themselves as quasi-standard for GPU and multi-core systems 23
  • 36. Figure 2.10: Flowchart of Masked Face Recognition Process using SVC. 24
  • 37. Figure 2.11: Basic structure of the face recognition program. respectively. Multiprocessing loosely refers to using more than one CPU unit in a computer system. The face recognition program runs as shown in the flow chart in Figure 2.11. In our system, each process is tasked to find the potential candidate output. The system is divided into two main processes. First, if the system is to be used to recognize the Masked face in a picture with multiple faces, then each face detected is assigned to a process that returns the id number of the person detected. The maximum number of processes created depends on the number of cores of the CPU. If there are 4 cores in a CPU and each process is single- threaded, then the number of processes that can be spawned is 4. If the total number of faces detected in a photo is 6 then in a 4-core processor, each process will process 1 face 25
  • 38. and the rest 2 will be queued for any of the processes to be completed. If there is only one person to be detected then the number of processes created will be 1 and the time is taken by this method to recognize will be the same as without multiprocessing. The corresponding version after implementing multiprocessing is shown in Figure 2.12. Second, if the system is used for real-time facial recognition or in a video, each process is used to process number−of−frames number−of−processes per second. If 4 processes are used and the total number of frames in the video each second is 40, then the number of frames to be processed by each process is 10 frames, which are executed in parallel. 2.14 Graphical Processing Units and CUDA Initially, GPU’s were only used for their graphics display capabilities. At that time, the program would have to be converted to use either OpenGL or Direct 3D, which made it difficult for someone to use a GPU for normal calculations. Nvidia released CUDA as a potential alternative use for Nvidia’s devices, [23]. It is basically a C++ programming language extension for programming the graphics processing units provided by Nvidia [24]. It provides preprogramed operations for massively parallel operations using several libraries and a direct control over the graphical processing units(GPU). 26
  • 39. Figure 2.12: A multiprocessing optimization of the Masked Face Recognition system. 27
  • 40. Figure 2.13: Memory model of CUDA Programming [23] The CUDA implementation, even though being simple in design than the multipro- cess version, was significantly more complex. One of the many considerations involving processing code on a GPU is the data that is stored in system RAM can not be directly accessible by the GPU. So, all the data should be transferred to some kind of GPU memory types when launched. Figure Figure 2.13 shows a memory model of CUDA with a parallel computation block called grid which is launched from the host, where the CPU is the host and GPU is the device. Each device is 3 dimensional with x,y,z coordinates and can be launched in either one device kernel1 or kernel 2. Each grid contains a thread box in 3 dimensions referred to as thread Index and Block Index and they are unique for each thread. Each thread computes its own face to find whether the face matches the dataset present (as shown in Figure Figure 2.14). Once the face is found the output is returned and another face is 28
  • 41. Figure 2.14: Complete overview of blocks using CUDA for face recognition system processed for recognition instantly. The proposed system is shown in Figure 2.15 2.15 Conclusion The main aim of this chapter is to make familiar with the tools and technologies used for this system. We first looked at the hardware and software requirements for the system. This chapter then further explains the use and a brief description of the WGL and a various models behind the implementation of the masked face recognition system. This helped us better understand the technologies and tools currently available to develop solutions for issue in question. 29
  • 42. Figure 2.15: Proposed system for CUDA based Masked Face recognition. 30
  • 43. CHAPTER 3 RELATED WORK FOR THE SMART SCOOTER RIDER ASSISTANCE SYSTEM 3.1 Introduction The progress in technology has given rise to new techniques and devices that allow an un- biased evaluation of balance parameters and many advanced human/computer interactions system, hence providing researchers with reliable information on a rider’s ability to balance a personal mobility vehicle as well as providing with various authorization methods. This chapter provides an in-depth review of some of the related works. 3.2 Related Work One of the new techniques to measure balance and pathology parameters is the use of ul- trasonic sender and receivers. Ultrasonic sensors have been used to measure short step and stride length and the distance between feet [wahab2011Gait]. Even though the technique can be used massively to analyze the balancing of a walking person, the technique cannot be used to evaluate the balancing of a person while riding a smart scooter due to the fixed positions of the feet of the rider on the smart scooter. Stephen M. Cain, James A. Ashton-Miller, and Noel C. Perkins studied the physics behind bike balancing. They conducted experiments indoors that utilize training rollers mounted on a force platform (OR6-5-2000, AMTI), an instrumented bicycle [25], and a motion capture system (Optotrak 3020, NDI) to measure the balancing dynamics of bicycle. The bicycle [25] was equipped with sensors that measured steer torque, angle, bicycle frame roll rate, and speed. The experiments conducted by them showed that the best method to compute the balance of the ride is the interrelation between pressure distribution and center of mass. Hence, we used the smart insole to calculate the pressure distribution of the 31
  • 44. smart scooter rider. The balancing of the smart scooter is not the only problem faced by the riders. The unknown road conditions like potholes, the sudden change in the terrain, etc. are also some major problems. There are various pothole sensing systems. One of the most known is the PotHole patrol system [26], developed by the Massachusetts Institute of Technology. The system uses a GPS attached Linux-powered Soekris 4801 embedded computers with external accelerometers (sampling rate of 380Hz). The algorithm uses a simple machine learning approach that takes the smart scooter speed and X and Z axis acceleration into consideration to filter the potholes vs the non-potholes-related events like a railroad cross- ing. National Taiwan University [27] developed a system that uses HTC Diamond, a motorcycle- based smartphone as a hardware platform. The smartphone comes with an external GPS and a built-in accelerometer with a sampling rate < 24Hz. The system uses a supervised and unsupervised machine learning approach for pothole detection and is divided into two tasks systems. The server-side task uses a smooth-road model and an SVM(support vector machine) whereas the client-side perform feature extraction, filtering, and segmentation. Although the above-proposed systems for pothole detection show good performance, they were not suitable to be implemented on a device with limited storage, processing power, and software resources. Our system for measuring the balance of the rider as well as the pothole detection distinct from the prior work as: 1. It only concentrates on the potholes as a single event, hence better utilization of the hardware and software resources. 2. The balancing algorithm uses both the pressure distribution as well as the orientation of the smart scooter using the smartphone device to examine the balance of the rider. Shivashankar J. Bhutekar1 et al. proposed a modern GPU-based Facial recognition system [28]. The researchers implemented the algorithm around the Viola and Jones [29] 32
  • 45. with Adaboost framework for facial detection and Eigenfaces [30] for facial recognition. The paper showed that if the video frame contains more than one person’s face the CPU takes more time than the GPU to computer and recognize the face. CPU waits for all the faces to detect in a frame while the parallel processing system as soon a face is detected, is forwarded to the recognition system. The MLPs were trained by Ting He et al. [31] using basic GPU capabilities and CUDA functions which resulted in the performance acceleration by 5.31 times as compared to CPU by doing vector operations and, the matrix multiplications on GPU. Antonino Tumeo1 et al. researched a reconfigurable multiprocessing system for facial recognition which shows that the parallelized face recognition application is 63% faster than a single processor solution. An Israeli company, Corsight, developed a camera-based application to identify people in real-time. The application mainly focuses on identifying people when half their face is covered, even in low light conditions [32]. Due to the absence of large training datasets as well as ground truth testing datasets, it has been difficult to train the new Masked Face Recognition(MFR) models. Mengyue Geng et al. contributed the MFRD dataset which consists of 9,742 mask region segmen- tation annotated masked face images. The dataset is trained using a center-based cross- domain ranking strategy, Domain Constrained Ranking (DCR) loss [33]. Another method proposed by Walid Hariri discards the masked region while focusing on extracting fea- tures from the region above the mask (mostly eyes and forehead) using VGG-16 [34]. The method is further trained on a Multilayer perceptron to achieve 91.3% accuracy. Some of the methods work on restoration of the faces. A system is proposed by Bagchi et al. [35] to restore facial features. the threshold depth map values are used to detect the missing regions of the 3D images and Principal Component Analysis (PCA) is used for the restoration of the facial features. Similarly, a statistical shape model which can restore the partial facial curves is applied by Drira et al. [36].The missing regions were removed using an Iterative closest point (ICP) algorithm [37]. 33
  • 46. The smart scooter company, Spin, recently announced their advanced driver-assistance systems (ADAS) which will be tested in the second half of 2021 in New York City. Spin Insight [38] has two levels, Level 1 & Level 2. Spin Insight Level 2 is the main ADAS system which is powered by Drover AI’s computer vision and machine learning platform. In addition to that Spin’s scooters will be equipped with a camera,on-board computing power, and sensors to detect sidewalk and bike lane riding collision alerts. The smart- scooter assistance system proposed in this paper is very different from the Spin Insight ADAS. The smart-scooter assistance system can be used while riding any smart-scooter inconsequential to the brand of the scooter. Both the systems focus on the proper riding of the smart scooters. The smart-scooter assistance system provides benefits like rider balance check, pothole detection, and path-tracking whereas the Spin Insight provides collision alerts and parking alerts. 3.3 Conclusion This chapter was crucial to show that there are no similar system currently in use. For bigger personal transport vehicles like cars, rider-assistance systems are available. The systems consists of technologies but not limited to, potholes detection while driving, nav- igation systems, and even a lane departure warning (LDW) system which is an advanced safety technology that alerts drivers when they unknowingly stray out of the lanes. But no such system is currently in use on roads for the personal mobility vehicles like smart scooters, bikes and skateboards. We also looked at a upcoming similar system by the smart scooter company SPIN. In Chapter 4 , we will be looking closely at the implementation of the system and the logic behind the balancing algorithm. 34
  • 47. CHAPTER 4 IMPLEMENTATION OF STEADI APPLICATION 4.1 Introduction The smart scooter rider assistance system is divided into three parts: the masked face au- thentication system, the android application, STEADi, and, the WGL. STEADi is an an- droid application used with a Smart Insole system to assess student safety conditions on campus while riding a smart scooter. Previously, researchers have attempted to use smart- phones to measure Gait symmetry, however, STEADi expands upon this use case by in- corporating biofeedback training and novel assessment strategies that can be used in the context of smart scooters and road safety. The app is divided into three sections in the main window: 1. Balancing mode, the screen on the application turns the shades of ”RED” or ”GREEN” based on the orientation and the calculated balance of the rider. 2. Tracking mode, lets the rider track his/her/their path while riding the scooter. The module can be used to pin a location on the map or to track the path for future records. 3. Potholes detection mode is used to detect the potholes on the road and alert the user beforehand by vibrating the phone and making a sound, loud enough for the rider to hear but not loud enough to disturb neighbors. With the balancing mode, the user will see a screen that turns shades of red based on the error calculated by the balancing algorithm. In the tracking section, the user can see the path, he/she has been traveling while using the scooter and in the potholes detection mode (PotDetect), the user can detect if there is a pothole on the road and the app automatically 35
  • 48. Figure 4.1: STEADi application flow-chart showcasing the workflow of how the applica- tion checks for stability tag that place where the PotDetect detected the pothole for future reference. Figure 4.1 shows a flow-chart of how to use the STEADi app. 4.2 Method to calculate Balance of the rider The Smart Insole data is recorded, normalized, and used with the orientation sensors data of the android smartphone to calculate a balancing score. 1. A Boolean Cumulative pressure points sum of the 96 pressure sensors was summed, and averaged. PAvgt = Pn i pi n where n is the number of pressure point sensors and pi is the recorded pressure for that particular point sensor, Pavgt is the average total recorded pressure of all the sensors at a time, t. 36
  • 49. 2. A comparison is made between every 5 seconds, PAvg(t − 5), PAvg(t) the Boolean Cumulative pressure points sum. 3. If the average pressure has increased or decreased on the insole in the next 5 seconds, a boolean balance variable,”ifbal”, is recorded. The ”ifbal” is TRUE if the pressure remains the same every 5 seconds and if there is a change in overall pressure the ”ifbal” variable is changed to FALSE. If PAvg(t−5) > PAvg(t), the average pressure decreased, which means the pressure on the insole is not evenly distributed. Considering wavy rides or riders doing some tricks, we took roll (the rotation of the smart phone about the positive Z-axis towards the positive X-axis) and azimuth (the angle between the magnetic north and the positive Y-axis and its range is [0,360] degrees) into consideration. The imbalance range of the roll is [−∞, −1.0] [1.0, ∞] and the range of azimuth is[−∞, −0.75] [0.75, ∞]. If the ”ifbal” variable changes value every 5 seconds and the recorded values of the roll and azimuth are in the imbalance range then we can say that the rider is not balancing properly and the screen section of the ”Balancing” will change to RED. Figure 4.2 shows the code behind the balancing algorithm. Figure Figure 4.3 shows the screens of the STEADi application, when the rider is bal- anced and when unbalanced. 37
  • 50. Figure 4.2: Code for the balancing algorithm 38
  • 51. (a) (b) Figure 4.3: (a) When the STEADi shows that rider is balanced (b) When the STEADi shows that rider is not balanced 4.3 Path Tracking The path tracking mode allows the user to track its path while riding the Smart-scooters. In this section we use Google Maps API for Android. The API handles access to the Map’s server, map display, data downloading, and response to map gestures. The API also provides additional information for map locations and allows user interaction with the map. For more information related to Maps SDK please refer to [39]. For each session, the system automatically prints out the route user has traveled. This feature require the following development modules: 1. Requesting the proper permissions: Due to privacy concern, location based apps are 39
  • 52. required to request location permission. For our application we need to request the background and foreground location permissions. Figure 4.4 shows the code to add in the AndroidMainfest.xml file to ask user for the location access. Figure 4.5 shows the settings page which gives various options to the user and grants the background location access. Figure 4.4: Code to ask user for the location access. 40
  • 53. Figure 4.5: Settings page which gives various options to the user and grants the background location access. 2. Requesting regular location updates: For the path tracking module it was necessary to get regular updates about the user location. The API returns the most accurate geographical location of the user based on longitude, latitude, velocity and, altitude of the device. Figure 4.6 shows the three steps in updating the location: startLoca- tionUpdates() to start getting the location updates, stopLocationUpdates() to stop the location updates when the app is closed and, updateTrack() to update the track on the Google maps in path tracking module. 41
  • 54. Figure 4.6: Code to update the location of the user. 4.4 PotHoles Detection For this feature, we utilized a previously implemented OpenCV Library on Android for loading a deep neural network model that detects potholes, DefectDetect [2]. The model uses a framework names as YOLO which is a Natural Neural Network. It is used to train and classify potholes from the live camera feed of the android device. The model returns four vertices that outline each pothole; we further decorate the user interface and make it into a more interpretable rectangular box. Structures and some weights of the model we use come from [40] [41]. The working of the pothole detection system in the STEADi application can be seen in Figure Figure 4.7. To maximize its performance on this specific task, we tweaked the model, and re-trained the final two layers with well-labeled pothole images for 20 epochs. All training images come from a crawler program provided in OpenCV. During the training process, we use mean squared error as a loss function. L(y, y0 ) = 1 N N X i=0 (y − y0 i)2 42
  • 55. where L(y, y0 )is the mean squared loss and y0 is the predicted value. Figure 4.7: STEADi application detecting potholes on the road The following steps were taken to use OpenCV: 1. Making sure that the OpenCV can load successfully. Figure 4.8 shows that code that uses a BaseLoaderCallback() to test if OpenCV is loaded correctly. 2. Getting the permission to access the camera on the android device and enabling cam- era: Similar to getting access to location to use maps, we needed to access the camera to use the OpenCV and the YOLO to detect potholes. If the user gives the permission to use the camera, then each frame is send to the DarkNet function to detect and clas- sify the potholes. The detection module will automatically make the phone vibrate if there is any pothole detected by the front camera. Figure 4.10 shows the code for 43
  • 56. Figure 4.8: Making sure if OpenCV loads properly. accessing the camera. Figure 4.9: Code for accessing the camera. Working of YOLO for potholes detection We used the YOLO framework for pothole detection. One of the main reasons to use this framework is its speed and efficiency. It can process upto 45 frames per second. It works by taking the whole image to process, predicts the bounding box coordinates that encloses an object (in our case potholes) and outputs class probabilities. Yolo framework is based on a open source Natural Neural Network (NNN) which is written in C and CUDA. 44
  • 57. Figure 4.10: Working of YOLO framework 4.5 Conclusion This chapter focuses on the implementation of each section in the proposed Android ap- plication, STEADi. It first showcase the workflow of the whole system and then explains the working of the balancing algorithm and the pothole detection system. This chapter also discuss briefly about the permissions and code behind using each module. The next chapter focuses on the experiments and results of the ”smart-scooter rider assistance” system. 45
  • 58. CHAPTER 5 EXPERIMENT, RESULTS, AND DISCUSSION In this chapter, we discuss in detail various experiments designed and their results. This is followed by the Discussion of the contributions of the system. The experiments were performed in two parts. The first part focuses on the Masked face authentication system and , the second part focuses on the STEADi application. 5.1 Experiment Design for Masked Face authentication The paralleization was compared at two stages of the Masked face authentication system development: 1. Image Processing and cropping using OpenCV. 2. The whole Masked face recognition. Each test was performed using four different resolution images. All the tests were per- formed using the following specifications: • Intel i7 5th generation processor with 4-cores. • NVIDIA GTX 1660 Ti with 1536 cores and 6GB memory • OpenCV version 4.5.0 • CUDA toolkit version 10.2 • Python 3.5 46
  • 59. (a) (b) Figure 5.1: (a) Face Mask detection with wearing mask. (b) Face Mask detection without wearing mask. 5.2 Results for Masked Face authentication system Figure Figure 5.1 shows the result of the face mask detection algorithm output using the webcam feed. The model has a input size of 480x640, 13 convolution layers in the back- bone network and has 24 layers including location and classification layers. Figure Fig- ure 5.2 shows the precision-recall graph for the face mask detection system. The accuracy of the Masked face recognition system with wearing mask is 79.9% de- pending on the conditions like lighting, angle of the face, etc., whereas the accuracy of the system without wearing the mask is about 92.34%. Figure Figure 5.3 shows the masked face recognition using SVC. 5.2.1 Results from Test of Image processing and cropping with different Image Resolution As show in Table 5.1, the resolution has a vital impact on the time taken to process an image for all the version. The multiprocessing version scaled exactly as expected, with the 47
  • 60. Figure 5.2: Precision-Recall curve for the Face mask detection system (a) (b) Figure 5.3: (a) Face Mask recognition with wearing a mask. (b) Face Mask recognition without wearing a mask. 48
  • 61. Figure 5.4: Time taken to process an image vs the resolution of the image doubling of number of cores, the run time decreased by fifty percent plus a bit of overhead due to increase in the number of cores. After converting the time taken to run into the frames per second, gives about 11.2 frames per second for the original implementation with the image resolution as 480x640, which is same resolution for any normal security cameras. The 4-core multi-processed version throughput around 30.8 frames per second which is almost 2.75 times faster than the original version. Whereas CUDA frames per second recorded for 480x640 resolution image was around 10 times faster than the original version with 110 frames per second. The run time seems to increase linearly with number of pixels as can be seen in Figure 5.4. 5.2.2 Results from Test of Masked Face recognition with different Image Resolution As can be seen for the Table 5.2, for the face recognition system the run-time for different image resolutions did not differ by much on the two multiprocessing versions. The fastest 49
  • 62. Image Resolution (Pixels) CPU Time Multiprocessing Time with 2-cores Multiprocessing Time with 4-cores CUDA Time 480 x 640 8.9 5.1 3.24 0.9 960 x 1280 17.34 9.3 4.9 1.25 1280 x 720 23.56 12.12 7.18 1.9 2560 x 1440 58.98 14.78 8.12 3.28 Table 5.1: Time taken to process an image vs the resolution of the image Image Resolution (Pixels) CPU Time Multiprocessing Time with 2-cores Multiprocessing Time with 4-cores CUDA Time 401 x 218 0.34 0.07 0.04 0.00123 960 x 640 0.83 0.098 0.06 0.004 1280 x 720 1.104 0.34 0.19 0.01 2560 x 1440 1.35 0.78 0.44 0.05 Table 5.2: Time taken to recognize face mask vs the resolution of the image version is CUDA and it maintains it throughput the testing especially in the high resolution images. Hence, making the CUDA version a better choice to recognize faces or masks in a high-resolution image. 5.3 Experiment Design for the STEADi application In this thesis, SPIN smart scooters were used for the experimental rides to test the STAEDi application in the below mentioned scenarios. 1. Up and down the slopes: The rider rides the scooter on the road with slopes with the insole in his/her shoes. 2. On the Grass: The rider rides the scooter on the grass/ muddy road conditions with the insole in his/her shoes. 3. On the Roads with numerous small potholes: The rider tries to ride on a road with numerous potholes so that we can collect data as to how the pressure changes under 50
  • 63. Figure 5.5: Time taken to recognize face mask vs the resolution of the image this condition. 4. Asked a student who has never ride the scooter to try it the first time with the insole: It is really necessary to record data of a rider who is very new to this or has never ridden the scooter before. 5. Trying to take sharp turns: Taking sharp turns can induce imbalance and hence it is very informative to record such data. 6. Trying to mimic when the rider is about to fall by deliberately leaning more on ei- ther the left or the right side: It’s not always that a new rider is worst at balancing. Sometimes a rider with 100 rides experience can make mistakes and fall down, which made it crucial to record such data. The riders were asked to perform the experiment with the smart insole system and without the insole system. During the experiment, all the gait parameters from the smart insole 51
  • 64. were recorded using a smartphone. The smart insole pressure values were recorded for each scenario and saved into a CSV(comma-separated values) file. 5.4 Results for the STEADi application Mainly two experiments were conducted based on different scenarios mentioned in section 3.1. Based on the data recorded from the insole system, it was seen that the cumulative values of the 96 insole pressure sensors lie within the range of [2,3.2]. When there is very little pressure exerted on the insole the value recorded is 2 and 3.2 is recorded when the pressure exerted on the insole is maximum. If a rider is standing on the ground the pressure recorded was in the range [2,2.5]. But when the rider puts his/her foot down to balance the scooter the pressure exerted was in the range [2.9,3.2]. This is due to the fact that when a person is standing on the ground the surface area is more and pressure is distributed equally between both feet, whereas when a person is trying to balance and puts the foot down, the surface area on which body is trying to balance is that of one foot and hence more pressure is exerted. After every experiment, the rider was asked to perform the experiment without the insole system. The balancing system was not able to differentiate between the rider taking the turn and the rider balancing. The system perceived that whenever the ride is taking a turn or tilting based on the road, he/she is not able to balance and hence turning the ”BALANCING” screen red. 5.4.1 First-time rider of the Smart Scooter Figure Figure 5.6 shows the pressure distribution of a first-time rider of the smart scooter. The graph shows that the rider exerted pressure is unstable. An elevation in the graph indicates that the rider puts his/her foot down because of unbalancing the scooter, whereas whenever the pressure values are dipped that shows the rider is trying to balance the scooter by exerting less pressure. The figure also shows various other patterns of the first-time rider. The data for the first 100 ms indicates that the rider was almost able to balance the scooter. 52
  • 65. After the 100 ms mark, the rider had some problems balancing and is putting the foot down the scooter more often due to unbalancing the scooter. The different peaks in the graph indicate how much pressure the rider put his/her foot on the ground. The more the pressure, the more chances the rider was about to fall down while balancing the scooter. Figure 5.6: Insole sensor data when the rider was riding for the first-time. 5.4.2 Riding scooter on different terrains: Up and down the slope and riding on the grass The max pressure on the insole system can be recorded in two scenarios. First is seen when the user goes up and down the hill and second, when on the grass. The values collected on the grass and the slopes were interestingly the same. The possible reason is that due to not very sturdy ground the body weight pressure on the insole increases. Figure Figure 5.7 represents when the rider goes up and down the slope. The pressure is exerted most as the user tries to balance more during that time. The sudden changes in the pressure values suggest potholes on the road. The potholes can be differentiated by the foot-down action of the rider based on time. Usually, if a rider puts his/her foot down, the minimum amount 53
  • 66. of time for the rider to put his/her foot back and start riding again is 20ms whereas, for the potholes, the value of pressure fluctuates every 5 seconds depending on a number of potholes on the road. Figure 5.7: Insole sensor data when the rider goes up the hill. We recorded values when an experienced rider tried riding and performing various ac- tivities that are mentioned above simultaneously. Figure Figure 5.8 shows the graph for the above-mentioned scenario. 5.5 Discussion and Conclusion The accuracy of the balancing algorithm was very difficult to calculate. One of the main reasons was potholes. Every time a rider encounters potholes on the road the pressure sensor value fluctuates rapidly, hence rendering the balancing algorithm unusable. The pothole detection algorithm was tested in real-time using the android smartphone as well as on a computer on 200 images. The performance metric used was Recall and Precision, the method obtained scores of 82.56% average precision and 84.12% recall on the computer whereas the method obtained scores of 64.56% average precision and 69.12% recall on the 54
  • 67. Figure 5.8: Insole sensor data when the rider tried every experiment simultaneously. smartphone. The system reached the processing speed of 0.031s (31 FPS) as compared to when deployed on a smartphone the processing speed decreased to 0.016s(16 FPS), due to a larger reduction in model size and computation complexity. For the Masked face authentication, even though using a 4-core version didn’t disap- point, theoretically a 16-core CPU would be able to keep up with the CUDA version. But it is more likely for a person to have a CUDA capable Graphical Processing Units (GPU) than having an very very high end 16-core CPU. The average consumer has a 4-core processor, so it was most reasonable to compare it with CUDA. There are both disadvantages and advantages of using multiprocessing and CUDA sup- ported systems. Due to continuous improvements in CUDA and its features, support can very widely from one CUDA version to another which can also depend on different GPU generation. The biggest advantage of using multiprocessing is will run on all processors regardless of the generation. CUDA version is faster in all cases especially when dealing with high resolution images. Whereas, the multiprocessing system still processes signifi- cantly faster than the normal CPU version and also did not require any additional hardware 55
  • 68. to run like CUDA. As simple as that, every system contains a CPU but not every system contains a CUDA capable GPU. 56
  • 69. CHAPTER 6 CONCLUSION AND FUTURE WORK 6.1 Conclusion This thesis presents an android based smart-scooter rider assistance system that consists of four main modules: mobile application, cameras, Insole sensors, and facial authentication system. These modules allow the system to recognize rider or scooter balancing behavior and producing alerts and warnings when dangerous situations like imbalance and potholes are detected. We also detailed appropriate ontology for the rider and vehicle behavior rec- ognized as well as the Balancing and Pothole detection algorithms using an Android-based smartphone. The STEADi app was tried on multiple android devices. It was found that the app glitches on older devices because the OpenCV module is consuming a tremen- dous amount of resources. We offloaded the UI thread by leaving all computation to other threads and it still did not solve the issue. For some outdated devices, simply updating feedbacks from background threads is difficult to finish in a real-time fashion. The main purpose of the authentication system with the STEADi application was to pro- vide a system to recognize faces even when wearing mask and also some possible methods to improve it’s performance. Due to the changing world and a need to adapt to new hard- ships, it was impossible for riders to open or close their application while wearing masks. As of now, there are almost 5-6 such systems which can recognize masked face person’s identity but they all are far from being as perfect as the generalised facial recognition soft- ware. Even though our facial recognition system did not have the best accuracy among the similar systems but it is unique in the way that this system can learn from a very small dataset which allows the user to train his/her/their face by just providing 2 minutes of their time to allow the system take their pictures. In this thesis, we tried to create the system as 57
  • 70. close what a normal user have in their smartphone. The smartphone-based solutions we elaborate on here do not require any specific model of scooter, because the core sensor is placed inside of shoes and auxiliary sensors are ac- celerometers and gyroscopes that come with mobile devices [42]. The issue of the rider and safety is of utmost importance and there is no other system as of now that tackles this prob- lem specifically for the smart scooters, the system proposed focuses on creating a system to provide student smart scooter riders a reliable and efficient system. 6.2 Future Scope and Research Although this system is the first of its kind but has huge potential and future scope. This system can be used with the E-Scooter rider interaction data to provide much better safety features like collision alerts, speeding alerts, etc. The implementation of the rider interac- tion system with the rider assistance system will make the system more robust. Some of the issues that the new system can solve are as follows: 1. Rider alert system: Streets are a shared resource used by different users for a mul- titude of reasons. Rider Interaction data with google maps data will help create a system to alert drivers based on time and place for the most probable road and acci- dents future situation with the present ones. 2. Capture e-scooter interactions with other road users 3. Determine e-scooter presence in traffic: The E-scooter Rider interaction will help to know the presence of the e-scooters on the road and will tell us how many people actually ride the scooters. 4. Recording general behavior of e-scooter riders like the use of helmet, parking alerts, speeding, and following general rules of the road. There is an ample amount of opportunity to improve the face authentication technology, the proposed rider assistance system uses. Using a better classifier than SVC to classify 58
  • 71. masked faces can be one of the options for future work. On the other hand, this thesis tried to present the best possible approaches for face detection and feature extraction for masked faces. As for the parallelization, streams can be implemented as they are based on asynchronous kernel launches compared to standard synchronous kernel launches used by CUDA by default. Use of unified memory allocation can be another direction to optimize the proposed CUDA implementation [43]. There is an urgent need to invent systems which will help and protect students on and off campus, and with the advancing technology that can be achieved in the most efficient and optimized way. 59
  • 72. REFERENCES [1] D. Chen, Y. Cai, X. Qian, R. Ansari, W. Xu, K.-C. Chu, and M.-C. Huang, “Bring gait lab to everyday life: Gait analysis in terms of activities of daily living,” IEEE Internet of Things Journal, vol. 7, no. 2, pp. 1298–1312, 2019. [2] “Defectdetect: An android app that identifies potholes on a road.daniel pinson and vamsi yadav,” Github, vol. 49, no. 2, pp. M85–M94, 2018. [3] L. Zhang, G. Gui, A. M. Khattak, M. Wang, W. Gao, and J. Jia, “Multi-task cas- caded convolutional networks based intelligent fruit detection for designing auto- mated robot,” IEEE Access, vol. 7, pp. 56 028–56 038, 2019. [4] P. Nagrath, R. Jain, A. Madan, R. Arora, P. Kataria, and J. Hemanth, “Ssdmnv2: A real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2,” Sustainable cities and society, vol. 66, p. 102 692, 2021. [5] W.-H. Lin, P. Wang, and C.-F. Tsai, “Face recognition using support vector model classifier for user authentication,” Electronic Commerce Research and Applications, vol. 18, pp. 71–82, 2016. [6] “What is a shared electric scooter?” https://www.portlandoregon.gov/transportation/77294, p. 1, 2020. [7] D. A. Drysdale, Campus attacks: Targeted violence affecting institutions of higher education. DIANE Publishing, 2010. [8] H. Fitt and A. Curl, “Perceptions and experiences of lime scooters: Summary survey results,” 2019. [9] D. Chen, Y. Cai, and M.-C. Huang, “Customizable pressure sensor array: Design and evaluation,” IEEE Sensors Journal, vol. 18, no. 15, pp. 6337–6344, 2018. [10] R. Fan, U. Ozgunalp, B. Hosking, M. Liu, and I. Pitas, “Pothole detection based on disparity transformation and road surface modeling,” IEEE Transactions on Image Processing, vol. 29, pp. 897–908, 2019. [11] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, K. Jiang, N. Wang, Y. Pei, et al., “Masked face recognition dataset and application,” arXiv preprint arXiv:2003.09093, 2020. [12] P. E. Hadjidoukas, V. V. Dimakopoulos, M. Delakis, and C. Garcia, “A high-performance face detection system using openmp,” Concurrency and Computation: Practice and Experience, vol. 21, no. 15, pp. 1819–1837, 2009. 60
  • 73. [13] R. Steele, A. Lo, C. Secombe, and Y. K. Wong, “Elderly persons’ perception and ac- ceptance of using wireless sensor networks to assist healthcare,” International jour- nal of medical informatics, vol. 78, no. 12, pp. 788–801, 2009. [14] Google, “Android studio,” https://developer.android.com/guide /topics/sensors/sensorsmotion.htmlsensors − motion − grav, vol. 5, no. 2, 2019. [15] A. Mordvintsev and K. Abid, “Introduction to opencv-python tutorials,” Open CV Official Site, 2013. [16] J. Annuzzi, L. Darcey, and S. Conder, Introduction to Android application develop- ment: Android essentials. Pearson Education, 2014. [17] A. Kaehler and G. Bradski, Learning OpenCV 3: computer vision in C++ with the OpenCV library. ” O’Reilly Media, Inc.”, 2016. [18] R. Rothe, M. Guillaumin, and L. Van Gool, “Non-maximum suppression for object detection by passing messages between windows,” in Asian conference on computer vision, Springer, 2014, pp. 290–306. [19] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034. [20] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823. [21] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. An- dreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mo- bile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [22] C.-F. Wang, “A basic introduction to separable convolutions,” https://towardsdatascience.com/, vol. https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions- b99ec3102728, 2020. [23] NVIDIA, P. Vingelmann, and F. H. Fitzek, Cuda, release: 10.2.89, 2020. [24] D. Bikov, M. Pashinska, and N. Stojkovic, “Parallel programming with cuda and mpi,” 2020. [25] S. M. Cain and N. C. Perkins, “Comparison of experimental data to a model for bicycle steady-state turning,” Vehicle system dynamics, vol. 50, no. 8, pp. 1341– 1364, 2012. 61
  • 74. [26] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, and H. Balakrishnan, “The pothole patrol: Using a mobile sensor network for road surface monitoring,” in Pro- ceedings of the 6th international conference on Mobile systems, applications, and services, 2008, pp. 29–39. [27] Y.-c. Tai, C.-w. Chan, and J. Y.-j. Hsu, “Automatic road anomaly detection using smart mobile device,” in conference on technologies and applications of artificial intelligence, Hsinchu, Taiwan, Citeseer, 2010. [28] S. J. Bhutekar and A. K. Manjaramkar, “Parallel face detection and recognition on gpu,” International Journal of Computer Science and Information Technologies, vol. 5, no. 2, pp. 2013–2018, 2014. [29] P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004. [30] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive neuro- science, vol. 3, no. 1, pp. 71–86, 1991. [31] T. He, Z. Dong, K. Meng, H. Wang, and Y. Oh, “Accelerating multi-layer percep- tron based short term demand forecasting using graphics processing units,” in 2009 Transmission & Distribution Conference & Exposition: Asia and Pacific, IEEE, 2009, pp. 1–4. [32] Corsight, “A face-recognition tech that works even for masked faces,” www.israel21c.org/, vol. https://www.israel21c.org/a-face-recognition-tech-that-works-even-for-masked- faces/, 2020. [33] M. Geng, P. Peng, Y. Huang, and Y. Tian, “Masked face recognition with genera- tive data augmentation and domain constrained ranking,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2246–2254. [34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [35] P. Bagchi, D. Bhattacharjee, and M. Nasipuri, “Robust 3d face recognition in pres- ence of pose and partial occlusions or missing parts,” arXiv preprint arXiv:1408.3709, 2014. [36] H. Drira, B. B. Amor, A. Srivastava, M. Daoudi, and R. Slama, “3d face recognition under expressions, occlusions, and pose variations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2270–2283, 2013. 62
  • 75. [37] A. S. Gawali and R. R. Deshmukh, “3d face recognition using geodesic facial curves to handle expression, occlusion and pose variations,” International Journal of Com- puter Science and Information Technologies, vol. 5, no. 3, pp. 4284–4287, 2014. [38] Spin, “Spin adas,” https://blog.spin.pm/spin-insight-building-adas-for-micromobility- 98a3d88aa976, vol. 5, no. 2, 2021. [39] Google, “Maps sdk,” https://developers.google.com/maps/ documentation/android-sdk/overview, vol. 5, no. 2, 2019. [40] J. Pedoeem and R. Huang, “Yolo-lite: A real-time object detection algorithm opti- mized for non-gpu computers,” arXiv preprint arXiv:1811.05588, 2018. [41] “Yolo: Real-time object detection.https://pjreddie.com/darknet/yolo/,” Github, vol. 49, no. 2, pp. M85–M94, 2018. [42] R. Foppen, “Smart insoles: Prevention of falls in older people through instant risk analysis and signalling,” 2020. [43] R. Landaverde, T. Zhang, A. K. Coskun, and M. Herbordt, “An investigation of uni- fied memory access performance in cuda,” in 2014 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, 2014, pp. 1–6. 63