SlideShare a Scribd company logo
1 of 89
Download to read offline
Ministry of Higher Education and Scientific Research
Monastir University
*-*-*-*-*
Higher Institute of Computer Science and Mathematics of
Monastir
Graduation Project
Android application for image recognition
using artificial neural network
A thesis presented to obtain a master's
degree in software engineering
Created by:
Helmi Ben Khalifa
Dr. Asma Kerkeni President
Dr. Manel Sekma Examiner
Dr. Souhail Mallat Supervisor
University year: 2020/2021
Acknowledgments
I want to thank:
My family, who has always supported and encouraged
me during the realization of this project,
My supervisor Dr. Souhail Mallat, for supporting this
project and providing valuable assistance,
All executives of the Higher Institute of Computer
Science and Mathematics of Monastir (ISIMM),
And the thesis committee who agreed to participate in
the evaluation of this project.
Thank you very much for helping me when I needed
help. Thank you very much for the support you have
given me to finish this work.
Abstract
This project is carried out as part of preparing an end-of-studies dissertation
presented to the Higher Institute of Computer Science and Mathematics of
Monastir known as ISIMM to obtain a master's degree in software engineering.
This Master Thesis deals with the implementation of an android application for
image classification. The application works as a food detector for famous
Tunisian dishes by applying a convolutional neural network to process images
recorded by the phone's built-in camera and provides an accurate result of what
the plate might be.
Keywords: Machine Learning, Image classification, Transfer learning, Computer
vision, Image recognition, PyTorch, Android.
Rรฉsumรฉ
Ce projet est rรฉalisรฉ dans le cadre de la prรฉparation dโ€™une mรฉmoire de fin dโ€™รฉtudes
prรฉsentรฉe ร  lโ€™Institut supรฉrieur dโ€™informatique et de mathรฉmatiques de Monastir
(ISIMM). En vue de l'obtention d'un diplรดme de master professionnel en gรฉnie
logiciel. Cette mรฉmoire concerne la mise en ล“uvre dโ€™une application Android
pour la classification des images. Lโ€™application fonctionne comme un dรฉtecteur
de nourriture pour les plats tunisiens cรฉlรจbres en appliquant un rรฉseau neuronal
convolutif pour traiter les images par lโ€™appareil photo du tรฉlรฉphone et fournit un
rรฉsultat prรฉcis de ce que le plat, pourrait รชtre.
Mots-clรฉs: Machine Learning, Classification dโ€™image, Apprentissage par
transfert, Computer vision, Reconnaissance dโ€™image, PyTorch, Android.
Table of Contents
General Introduction......................................................................................................1
Introduction and context ............................................................................2
1.1 Introduction.................................................................................................................2
1.2 Project description.......................................................................................................2
1.3 Motivation...................................................................................................................3
1.4 Aim of the work ..........................................................................................................3
1.5 Dissertation structure...................................................................................................4
1.6 Development Methodology.........................................................................................4
1.6.1 Software development processes available..........................................................5
1.6.2 Methodologies comparison table.........................................................................8
1.6.3 The chosen method ............................................................................................10
1.7 Project Planning ........................................................................................................10
1.7.1 Project management processes ..........................................................................11
1.7.2 Gantt chart..........................................................................................................11
1.8 Conclusion.................................................................................................................12
Literature review....................................................................................... 13
2.1 Introduction...............................................................................................................13
2.2 Data Science..............................................................................................................13
2.2.1 Acquiring and storing data.................................................................................14
2.2.2 Asking Questions...............................................................................................14
2.2.3 Data preparation.................................................................................................14
2.2.4 Exploring data....................................................................................................14
2.2.5 Machine learning model ....................................................................................14
2.2.6 Visualization and communication......................................................................14
2.2.7 Deployment........................................................................................................14
2.3 Machine learning.......................................................................................................15
2.3.1 Unsupervised Learnings.....................................................................................15
2.3.2 Supervised Learnings.........................................................................................16
2.4 Mathematics for AI ...................................................................................................19
2.4.1 Linear algebra ....................................................................................................19
2.4.2 Calculus..............................................................................................................21
2.4.3 Multivariable calculus........................................................................................22
2.4.4 Mathematics Behind Neural Networks..............................................................23
2.5 Deep Learnings .........................................................................................................28
2.5.1 Artificial Neural Network..................................................................................28
2.5.2 Recurrent Neural Network.................................................................................29
2.5.3 Convolutional Neural Network..........................................................................29
2.6 Neural network evaluation metrics ...........................................................................31
2.6.1 Classification accuracy ......................................................................................31
2.6.2 Confusion matrix ...............................................................................................31
2.6.3 Log Loss.............................................................................................................32
2.7 Conclusion.................................................................................................................32
Gathering requirements............................................................................33
3.1 Introduction...............................................................................................................33
3.2 Requirements.............................................................................................................33
3.2.1 Functional requirements.....................................................................................33
3.2.2 Nonfunctional requirements...............................................................................34
3.3 Technical requirements .............................................................................................34
3.3.1 Tools for preparing data.....................................................................................34
3.3.2 Machine learning method ..................................................................................36
3.3.3 PyTorch..............................................................................................................38
3.3.4 Deployment........................................................................................................41
3.4 Conclusion.................................................................................................................44
Software Design and Architecture........................................................... 45
4.1 Introduction...............................................................................................................45
4.2 Software best practices..............................................................................................45
4.2.1 Clean and modular code.....................................................................................45
4.2.2 Efficient code.....................................................................................................45
4.2.3 Refactoring code ................................................................................................45
4.2.4 Documentation...................................................................................................46
4.2.5 Version control...................................................................................................46
4.3 Software Design........................................................................................................46
4.4 ML Lifecycle.............................................................................................................47
4.4.1 Preparing data ....................................................................................................48
4.4.2 Algorithm Selection...........................................................................................48
4.4.3 Training the Model ............................................................................................51
4.4.4 Evaluating the Model.........................................................................................52
4.4.5 Deploying the model..........................................................................................52
4.5 Android software design ...........................................................................................52
4.5.1 Android Applications structure..........................................................................52
4.5.2 Activity lifecycle................................................................................................53
4.5.3 Software architecture .........................................................................................54
4.6 Conclusion.................................................................................................................57
Realization..................................................................................................58
5.1 Introduction...............................................................................................................58
5.2 Preparing the Development environment..................................................................58
5.2.1 Devices...............................................................................................................58
5.2.2 Tools ..................................................................................................................58
5.2.3 Installation..........................................................................................................58
5.3 Data preparing...........................................................................................................59
5.3.1 Gathering images ...............................................................................................59
5.3.2 Deleting redundant images ................................................................................60
5.3.3 Cropping Images................................................................................................60
5.3.4 Resizing and normalizing data...........................................................................60
5.3.5 Creating validation and test data........................................................................60
5.4 Creating the neural network model...........................................................................61
5.4.1 Pretrained model ................................................................................................61
5.4.2 Classifier Model.................................................................................................61
5.5 Training the model ....................................................................................................61
5.6 Training progress Test...............................................................................................61
5.7 Testing the model accuracy.......................................................................................62
5.8 Saving Model (serialization).....................................................................................64
5.9 Creating the Android application..............................................................................64
5.10 User Interface Design ............................................................................................65
5.10.1 Welcome Activity..............................................................................................65
5.10.2 Camera classification activity............................................................................65
5.10.3 Description activity............................................................................................66
5.11 Application icon ....................................................................................................66
5.12 Activities realization..............................................................................................66
5.12.1 Welcome Activity..............................................................................................66
5.12.2 Camera classification activity............................................................................67
5.12.3 Description Activity...........................................................................................67
5.13 Conclusion.............................................................................................................67
General Conclusion.......................................................................................................68
Webography .................................................................................................................. 69
Appendix........................................................................................................................ 72
List of figures
Figure 1 : Project description.....................................................................................................2
Figure 2 : Agile Methodology [6]..............................................................................................5
Figure 3 : Scrum Methodology [7] ............................................................................................5
Figure 4 : Extreme programming Methodology [8] ..................................................................6
Figure 5 : Unified Process [9]....................................................................................................7
Figure 6 : Rational Unified Process [6] .....................................................................................7
Figure 7 : Waterfall with overlapping phases [10] ....................................................................8
Figure 8 : Project management processes [12] ........................................................................11
Figure 9 : Gantt chart ...............................................................................................................12
Figure 10 : The data science process [13]................................................................................13
Figure 11 : K-means clustering................................................................................................15
Figure 12 : Hierarchical clustering ..........................................................................................16
Figure 13 : Support vector machine.........................................................................................17
Figure 14 : Nearest Neighbors:................................................................................................18
Figure 15 : Linear regression...................................................................................................18
Figure 16 : Neural network......................................................................................................19
Figure 17 : Neural network structure.......................................................................................21
Figure 18 : The Derivative As AFunction ...............................................................................21
Figure 19 : Multivariable functions .........................................................................................22
Figure 20 : The structure of an artificial neuron......................................................................24
Figure 21 : Example of a simple neural network.....................................................................27
Figure 22 : The chain rule........................................................................................................27
Figure 23 : Backpropagation....................................................................................................28
Figure 24 : Representation of RNN both in folded and unfolded forms .................................29
Figure 25 : Convolutional Neural Network .............................................................................30
Figure 26 : The annual winner of the ImageNet challenge [21]..............................................30
Figure 27 : Confusion matrix...................................................................................................31
Figure 28 : PurseHub interface ................................................................................................35
Figure 29 : Duplicate Photo Cleaner........................................................................................35
Figure 30 : JPEGCrops interface .............................................................................................36
Figure 31 : Online job listing growth.......................................................................................37
Figure 32 : Anaconda Navigator..............................................................................................38
Figure 33 : VSCode screenshoot..............................................................................................39
Figure 34 : Jupyter Notebooks in Visual Studio Code ............................................................39
Figure 35 : Matplotlib style sheets...........................................................................................40
Figure 36 : CUDA ecosystem diagram....................................................................................41
Figure 37 : OS market share [26].............................................................................................41
Figure 38 : Android studio interface........................................................................................43
Figure 39 : Material Design Components................................................................................44
Figure 40 : Project Structure....................................................................................................47
Figure 41 : Machine Learning Lifecycle [13]..........................................................................47
Figure 42 : Data Structures ......................................................................................................48
Figure 43 : Transfer Learning Technique ................................................................................48
Figure 44 : Neural network Structure ......................................................................................49
Figure 45 : Classifier Structure................................................................................................50
Figure 46 : Data Sets Division.................................................................................................51
Figure 47 : Checkpoints Design Pattern [31]...........................................................................51
Figure 48 : Activity lifecycle in Android [33].........................................................................53
Figure 49 : Welcome Acitvity..................................................................................................54
Figure 50 : CameraClassification Activity ..............................................................................55
Figure 51 : Description Activity ..............................................................................................55
Figure 52 : CameraClassification Class Diagram....................................................................56
Figure 53 : VSCode View........................................................................................................59
Figure 54 : Images Sample ......................................................................................................59
Figure 55 : Deleting repeated images. .....................................................................................60
Figure 56 : Cropping Images ...................................................................................................60
Figure 57 : Training progress...................................................................................................62
Figure 58 : Test Accuracy Output............................................................................................62
Figure 59 : Test Examples .......................................................................................................64
Figure 60 : Android studio Project...........................................................................................64
Figure 61 : Welcome Activity Layout .....................................................................................65
Figure 62 : Camera Classification Activity .............................................................................65
Figure 63 : Description Activity Layout..................................................................................66
List of tables
Table 1 : Methodologies comparison table..............................................................................10
Table 2 : Project tasks timeline................................................................................................12
Table 3 : Activation functions..................................................................................................25
Table 4 : Functional requirements ...........................................................................................33
Table 5 : Nonfunctional requirements .....................................................................................34
Table 6 : TensorFlow and PyTorch comparison......................................................................37
Table 7 : Kotlin , Java Comparaison [27]................................................................................42
Table 8 : UI Components.........................................................................................................56
Table 9 : Kotlin Classes Description .......................................................................................57
Table 10 : Device Characteristics ............................................................................................58
Table 11: Development tools...................................................................................................58
Table 12 : Model accuracy on images from the web...............................................................63
Table 13 : Model accuracy on image from phone camera.......................................................63
List of Symbols
๐‘ƒ(๐‘ฅ), ๐‘ƒ(๐‘ฆ) : The independent probabilities of ๐‘ฅ and ๐‘ฆ
๐‘(๐‘ฅ|๐‘ฆ) : Probability ๐‘ฅ if the given ๐‘ฆ is true
โˆฅ
โˆฅ ๐‘ฃ
โ†’
โˆฅ
โˆฅ : Magnitude of a Vector
โˆ‘ ๐‘ฅ๐‘–
๐‘›
๐‘–=0 : Sum of the ๐‘ฅ๐‘–: ๐‘ฅ1 + . . . + ๐‘ฅ ๐‘›
๐‘‘๐‘“
dx
: Total derivative of ๐‘“ with respect to x
๐‘“(๐‘”(๐‘ฅ)) : Function composition
๐œ•๐‘“
๐œ•๐‘ฅ
: Partial derivative of ๐‘“ with respect to x
๐›ป๐‘“ : Gradient of a function ๐‘“
๐‘Ÿ๐‘’๐‘™๐‘ข(๐‘ฅ) : Rectified linear unit function
tanh(๐‘ฅ) : Hyperbolic tangent function
log(๐‘ฅ) : Logarithmic function
โ† : Assignment operator
List of Abbreviations
ML : Machine Learning
ANN : Artificial Neural Network
CNN : Convolutional Neural Network
RNN : Recurrent Neural Network
AI : Artificial Intelligence
XP : Extreme Programing
RUP : Rational Unified Process
API : Application Programming Interface
UI : User Interface
TP : True Positive
FP : False Positive
TN : True Negative
FN : False Negative
IDE : Integrated Development Environment
SVM : Support Vector Machine
VSCode : Visual Studio Code
OS : Operating System
JDK : Java Development Kit
SDK : Software Development Kit
DPC : Duplicate Photo Cleaner
GPU : Graphics Processing Unit
CPU : Central Processing Unit
ReLU : Rectified Linear Unit
Page | 1
General Introduction
These days, we are living in the golden age of artificial intelligence, which some have called
the next industrial revolution. Especially in machine learning and deep learning fields,
regarding the availability of massive data sets known as big data, which keep increasing very
fast, including everything related to our life. From images and videos posted daily on social
media websites to data collected periodically by smart sensors spread all over the world to
measure climate changes and weather conditions.
Computer performance has also breakthrough in different ways many fields of applications.
Applications that were a few years ago just science fiction, especially with the dramatic
increase in computational power and parallel computing in recent years that removed many of
the barriers in the way of artificial intelligence and machine learning as these fields have been
around for a long time since 1950 but only theoretical [1], due to the lack of powerful
computing resources and large data sets.
Artificial intelligence has now been able to contribute to our living conditions in a variety of
ways. Image recognition systems have human like performance achieved, autonomous vehicles
are increasingly becoming a reality, business models are changing rapidly, and medicine
enables automated clinical diagnoses and suggests treatments. It is very important to take
advantage of the available opportunities and provide useful solutions. This mission is not
exclusive to big tech companies. Any software developer can also contribute to the artificial
intelligence field by deploying apps, especially for lightweight devices like smartphones that
do not require a considerable budget or expensive equipment.
Smartphones and other devices are now dominated and improving every day. The field of
mobile devices is very dynamic and rapidly developing. Formerly programs of powerful home
computers are nowadays successfully ported to various mobile devices, and people can carry
much valuable equipment in their briefcase or pocket. Moreover, this opens the door for AI to
deliver practical products, innovative solutions, and smart services by taking advantage of the
cutting-edge technologies that phone providers offer, which encourage the development of
more demanding applications. In this project, we will try to participate in the artificial
intelligence revolution by developing an application using the best technologies in this field.
Page | 2
Introduction and context
1.1 Introduction
The idea of this project is inspired by the famous "Cats vs Dogs" classification problem, which
is considered the "hello world" program for ML [2]. Where we make a computer recognize
and discern between dog or cat images. ML is an application of artificial intelligence, where
we teach a machine to perform smart tasks. ML itself is a broad field of science, and one of its
most widely used subfields is Computer Vision, which our project falls within it. This chapter
will present the roadmap and the software development process we will use to create a complete
image recognition program.
1.2 Project description
This project is about a mobile application based on deep learning algorithms and works on an
Android system. The application's main role is to identify the type of different Tunisian dishes
from the live image feed of the phone's camera as shown in figure 1 and provide additional
information on the result, including calories, ingredients, etc. The application can recognize up
to ten different categories of Tunisian food.
Figure 1 : Project description
No internet connection is required in the userโ€™s phone as the ML model will be integrated
directly into the application. And all of the image analysis processes will be directly performed
by the phone processor. We can summarize the functionality of the app in four steps
Taking a picture
Providing Food Description
Page | 3
1. The user takes a picture of his food with the phoneโ€™s camera.
2. Food Detector app analyses the food picture using a neural network.
3. Food Detector app displays what the food might be.
4. The user checks the complete food description.
In later sections, we will dive deeper into this application functionality and its architecture.
1.3 Motivation
After giving a brief introduction to the project, the utility of the app might seem a bit unclear.
To remove this ambiguity, we will mention some of the problems that we will try to solve or
minimize in this app. This app is not necessarily going to be useful right away for everyone,
but it can become handy at some points later especially when we travel to a foreign country.
As we all know, traveling is common for many reasons, such as tourism, working or studying
abroad, vacations, etc. One of the problems that many of us may face while traveling is eating
exotic food that we are not used to. Typically, it is the foreign country's local food. Sometimes
the struggle begins while on the plane ride [3].
Besides this problem, we noticed that many people suffer from keeping up with a food diet
plan and make the wrong food choices. And after several types of research, we found out it is
seemingly due to a lack of knowledge about everything related to nutrition and food. As a
nutrition coach mentions: โ€œKnowledge is imperative to any endeavor so why should this be
any different in the case for weight loss? Too many people blindly throw themselves into this
game headfirst without doing any research or laying any sort of foundation. It is this approach
that results in yo-yo dieting and relapsesโ€ [4].
In addition to all of this, AI services integrated into mobile apps are not yet widely used in
Tunisia, despite their significant influence in the world. AI with mobile apps can now provide
more sophisticated services than any time before.
1.4 Aim of the work
As we introduced different challenges and problems, in this part we will explain how this
project will try to use machine learning techniques to resolve them.
The mentioned problems address the lack of nutritional knowledge and having a bad experience
with exotic food. In this application, we will gather information about different Tunisian food.
When the user wants to discover more about his meal while he is in Tunisia, he can detect the
Page | 4
food using the app. Then the food description will be displayed in a simple layout on the phone
screen. It will include all the relevant information like ingredients, calories, recommendations,
etc., so that the user can be aware of what he is eating and check if it will break his diet plan or
being allergic to any ingredient included in the food.
This application will use machine learning technologies to help the user to become familiar
with new types of dishes. Furthermore, become able to adopt healthy eating habits overeating
junk food, and encourage him to keep up with his diet. It also helps tourists avoid foods that
taste strange or do not conform to their usual choices.
Nevertheless, we need to mention that this application will only solve a part of the puzzle.
because it works only on Tunisian food, and even with that, it only classifies 10 different
famous Tunisian food types. This is because of the limited availability of images of some
Tunisian dishes on the internet. As for each type of food, we need at least 1,000 images with
high quality to get a good result. The 1,000-image magic number comes from the original
ImageNet classification challenge, where each category of the dataset had around 1,000
images. This was good enough to train the early generations of image classifiers like AlexNet,
and so proves that around 1,000 images are enough [5].
On top of that, the training of a large amount of data demands a huge amount of processing
resources. It takes too much time to train neural networks to give more than 90% accuracy.
However, the project could still be extended if the necessary capabilities and investments are
available to encompass all the Tunisian food types or even other countries' food.
1.5 Dissertation structure
The structure of this dissertation will be divided into a set of chapters. The first chapter is the
โ€œIntroduction and context.โ€ The next chapter will be the โ€œLiterature reviewโ€, where we will
discuss machine learning and the methods used in this field. After that, we will specify the
application requirements in the โ€œGathering requirementsโ€ chapter followed by the โ€œsoftware
designโ€ chapter in which we will define the design patterns and architecture of the program.
Lastly is the "Realization" chapter in which we will create our machine learning model then
deploy it to an android application.
1.6 Development Methodology
Software development methodology involves dividing software and architecture into different
stages with specific activities for more effective planning and management. Before choosing
Page | 5
the right method for our project, we will state some of the popular methodological approaches
used in software engineering. Then we will compare them to find the most suitable method.
1.6.1 Software development processes available
โ€ข Agile
Agile is based on highly iterative and incremental development by creating software in short
time boxes in order to minimize risks [6].
Figure 2 : Agile Methodology [6]
Agile is mostly a disciplinary approach (figure 2). That anticipates the need for flexibility and
uses some frequent alteration before delivering the finished product.
โ€ข Scrum
Scrum is an agile methodology. It is a simple process where we try to speed up productivity
and deliver products that focus on satisfying customers.
Figure 3 : Scrum Methodology [7]
Page | 6
As shown in figure 3, It does this by breaking the complexity down into smaller tasks. Then it
divides them across all team members, where each one will focus on solving his dedicated task
at a specific time according to a planning process. This step is repeated time and time again.
After each incremental step, the team members will re-evaluate the product's current direction
and decide which strategy is the most effective to achieve the goal.
โ€ข Extreme programming
Extreme programming is also an agile method [6]. It is a Lightweight process with the goal to
reduce the cost of software requirements.
Figure 4 : Extreme programming Methodology [8]
Figure 4 illustrates the different process steps of XP methods. Where it takes traditional
principles to extreme levels through several practices, including simple design, pair
programming, constant testing, ongoing integration, and refactoring coding standards and
small releases, it is mainly used for creating software within a volatile and dynamic
environment. It allows for much better flexibility within the modeling process [4].
โ€ข Unified Process
Unified Process is an architectural-centered, case-based, iterative, and incremental
development process that uses the Unified modeling language. Unified Process can be
implemented to different software systems with varying technical complexity and management
levels in other areas and other organizational cultures. The unified process is divided into a
series of timeboxed iterations, as shown in figure 5.
Page | 7
โ€ข RUP
Rational Unified Process method is referred to as a RUP. It separates the development process
into four phases as shown in figure 6, which are Inception, Elaboration, Construction, and
Transition. It is considered an object-oriented and web-enabled program development
methodology. This method helps software developers to deal with changing requirements and
provides guidelines, templates, and examples for all aspects of software development stages
[4]. This method Describes how specific development goals should be achieved.
Figure 5 : Unified Process [9]
Figure 6 : Rational Unified Process [6]
Page | 8
โ€ข Waterfall
The waterfall method is one of the most traditional and commonly used software development
methodologies [4]. It differs from agile and unified processes as it is about a sequential design
process as shown in figure 7, meaning that the earlier phases define subsequent phases. In the
waterfall method, there are seven stages from system feasibility down to operations and
maintenance.
1.6.2 Methodologies comparison table
The following table will list some advantages and disadvantages of the mentioned software
development processes to help us decide this project's appropriate method.
Methodologies Advantages Disadvantages
Scrum
โžข It contains a backlog listing out
everything to do.
โžข The team decides how much work
to be done.
โžข Meetings can be too long
โžข Require a dedicated scrum
master
Figure 7 : Waterfall with overlapping phases [10]
Page | 9
โžข Communication, which is an
important part of the process, is
achieved through meetings, called
events, and Scrum events.
โžข Hard to understand,
requires team member
guidance [6].
Extreme
programming
โžข Product delivery faster.
โžข Allows software development
groups to save costs and time
needed for project realization.
โžข Allows developers to produce
quality software. Through regular
testing at the development phases
assures the detection of all bugs.
โžข Impossible to know the
exact estimation of the job
effort required to produce
a final product [6].
RUP
โžข Comprehensive methodology It can
proactively resolve project risks
associated with the changing
requirements of the client, which
requires careful management of
change requests [4].
โžข Less time is required for integration
as the integration process continues
throughout the software
development cycle.
โžข The development process
is too complicated and
unorganized on massive
projects that use new
technology [4] .
โžข The reuse of components
will not be possible.
Waterfall
โžข Suitable for simple structured
projects.
โžข Works well when requirements are
well understood.
โžข The cost of the model is low.
โžข No iterations during
project realization [11].
โžข No working product will
be available until all
phases are finished [11].
Page | 10
โžข It includes testing, i.e., verification
of completed operations and
obtained results at the closure of
each development phase [5].
โžข It is very difficult to go
back and fix the software,
especially at the testing
phase.
Table 1 : Methodologies comparison table
1.6.3 The chosen method
Based on table 1, for this project, we decided to use the waterfall methodology. The idea behind
the Waterfall method is that the project progresses to an orderly sequence of steps, from the
initial software concept, down until the final system testing phase. This approach is suitable for
this project where cost and time are constrained, and the scope and requirements are well
understood. Also, the Waterfall methodology gives a set of processes built on the principle of
approval of the previous phase, which fits our need to deliver a complete and validated project
constrained with a specific time limit.
Lately, this method has faced some criticism for being an outdated method due to the limitation
in fixing defects that appear in later stages as it is based on linear sequential phases that always
move forward, making going back and solving problems very daunting. Many modified
waterfall models have been produced, like the "sashimi model" (waterfall with overlapping
phases) in response to this problem. "The key feature of the Sashimi model is the possibility of
overlapping development phases, i.e., introducing feedback into the classical waterfall model.
The idea on which the model is based in identifying errors made on time while the development
phase is still in progress. For instance, errors made in the design phase are identified during
implementation, while the design is still in progress" [11].
The waterfall with overlapping phases version can overcome the original waterfall model's
major problem, which is the difficulty of fixing errors that appears in earlier finished stages.
1.7 Project Planning
Project planning is a critical phase. It is part of the project management process. Project
management is considered a structured discipline that defines project goals, strategy, planning,
and motivations. Its main objective is to produce a complete project that complies with the
project's nature and scope. Typically, it is divided into five steps (figure 8).
Page | 11
Figure 8 : Project management processes [12]
1.7.1 Project management processes
โ€ข Initiation
Initiation is the project's first process, where we defined the idea behind the project, overall
goal, and project scope.
โ€ข Planning
Project planning is part of project Structure, which uses schedules such as Gantt charts to
determine the progress within the project environment. It is the process where we define the
project management methodology.
โ€ข Execution
The execution phase is the third phase of the project management lifecycle, and it is usually
the most extended phase of the project. In this phase, we start executing our plan and
methodology for developing our software.
โ€ข Control
Control is the phase where we perform our control and observation on the project by validating
each step included.
โ€ข Closure
Project Closure is the last phase of the project life cycle. In this phase, we will formally close
our project and prepared it to be delivered and presented.
1.7.2 Gantt chart
Gantt chart is a commonly used chart in project management, as it is a way to show the schedule
of project tasks over a specified date range.
Page | 12
Task Name Start Date Due Date
Initiation and planning 13/02/2020 13/03/2020
Research and studies 14/03/2020 30/04/2020
Requirements Analysis 01/05/2020 31/05/2020
Software design 13/07/2020 31/08/2020
Realization 01/09/2020 31/10/2020
Thesis writing 01/04/2020 31/10/2020
Table 2 : Project tasks timeline
1.8 Conclusion
In this chapter, we have introduced the project idea. We have also discussed the problems that
we will try to solve through the use of machine learning techniques. We have then outlined the
thesis structure and software methodology that we are going to use, and lastly, we put the
planning of the project.
Figure 9 : Gantt chart
Page | 13
Literature review
2.1 Introduction
After introducing the project, in this chapter, we will address the scientific background of
machine learning program development with a detailed elaboration about the theoretical
meanings of some essential concepts, followed by relevant algorithms in this field since this
project aims at using cutting-edge technologies. This chapter will include all the necessary
knowledge to establish an integrated understanding of Machine learning and related fields.
2.2 Data Science
Data science is a required field for developing ML programs regarding the data management
process that we need to do to prepare training data for our machine to be trained on, so it is
imperative to have a good understanding of it. Furthermore, many tutorial materials and e-
learning websites about artificial intelligence and machine learning will assume that you
already have a good knowledge of data science and data analysis [13]. The data science process
usually contains seven essential steps as shown in figure 10.
Figure 10 : The data science process [13]
Page | 14
2.2.1 Acquiring and storing data
Acquiring and storing data is the first step in data analysis. We need to find data related to our
subject by collecting them from many sources available on the internet. Or by gathering
observations or measurements from real world experiments.
2.2.2 Asking Questions
This step can be either before gathering data or after it, depends on the project subject. In this
step, we try to ask the right questions relevant to our topic, explaining how the data might be
useful for understanding and defining the project's objectives.
2.2.3 Data preparation
The next step of data analysis is data preparation, and this step has two parts, data cleaning and
data transformation. First, we need to clean the data from any wrong or duplicated values. Then
we transform it based on define mapping tools [13].
2.2.4 Exploring data
After cleaning the data, we start exploring it. We spend some time getting familiar with the
different data sets we have. We can use descriptive statistics to discover some patterns, building
intuition about it, and understanding their nature.
2.2.5 Machine learning model
Next, we need to define the best machine learning techniques and algorithms that can work
with our collected data types and extract features from them in a way that fits the functional
and business requirements of the project.
2.2.6 Visualization and communication
It is very important to communicate our findings to other people. There are a variety of formats
this communication can take. We might create images, diagrams, or animations and share them
on a paper, an email, a PowerPoint presentation, or have an in-person conversation.
2.2.7 Deployment
finally, we need to deploy our final machine learning model for the production environment.
Then we start exploiting and making decisions based on the results of the ML model.
Page | 15
2.3 Machine learning
Machine learning is a branch of computer science applied in different fields like finance,
medicine, games, robotics, etc., with the aim of training a machine to perform tasks intelligently
as a human does by using labeled data sets instead of explicitly coding the solution. This can
be done by supervised or unsupervised learning.
2.3.1 Unsupervised Learnings
Unsupervised learning is mostly used for clustering, where we are just given a data set with no
corresponding labels paired with it. Then we attempt to learn some type of structure or pattern
from the data plus extracting useful information or features from it.
2.3.1.1 Unsupervised Learning Algorithm
There are different algorithms for unsupervised learning like k-means, k-medoids, hierarchical,
gaussian learning, and even neural network, which can be considered as supervised or
unsupervised learning at the same time. But mostly, it is classified as supervised learning.
โ€ข K-means clustering
k-means estimate for a given number of data points the best centers of k clusters representing
it. In step one (figure11), we pick random k center points. And then, we connect points to the
closest center. In the third step, we recalculate centers based on the mean of points, and in the
last step, we repeat this process until there is no change in clusters data points.
Figure 11 : K-means clustering
Page | 16
โ€ข Hierarchical clustering
In hierarchical clustering (figure 12), we classify data according to similarity metrics that we
define, like Euclidean distance, minimum/maximum distance. There are two approaches we
can use bottom-up or top-down. This approach is good for tree structures from data similarities.
Figure 12 : Hierarchical clustering
โ€ข K-medoids clustering
K medoids clustering is similar to K-means, except in K medoids, we use another algorithm
for defining the center points for clusters. By determining a center within the data sets, then
calculating the total cost of swapping the center with another data element. Instead of picking
the center from outside of the data. This algorithm can be more accurate, but it requires many
iterations to converge.
2.3.2 Supervised Learnings
Supervised learning is a machine learning approach mostly used when we have a problem that
contains a dataset written as a set of example label pairs, where we have the label y associated
with each example x [14]. In other words, supervised learning means that you have many
examples where you know the correct answers in those examples. And have the computer
figure out the rules for getting these answers.
Page | 17
2.3.2.1 Supervised Learning Algorithm
There are many supervised learning algorithms. Below we will site the most used ones.
โ€ข Support Vector Machine
SVM is a classification algorithm that finds a separating line between two data classes
(figure13), which maximizes the distances to each class's nearest point equally. That distance
is often called the margin.
Figure 13 : Support vector machine
โ€ข Naive Bayes
Naรฏve Bayes algorithm is based on Bayes' theorem, which we can use to draw some conclusions
about an event x given the observed probability of an event y [14].
๐‘(๐‘ฅ|๐‘ฆ) =
๐‘ƒ(๐‘ฆ|๐‘ฅ) ยท ๐‘ƒ(๐‘ฅ)
๐‘ƒ(๐‘ฆ)
(2.1)
By using a large amount of data, we can calculate the number of occurrences of each element
in our dataset. Thus, we can calculate the probability of belonging to a certain class for new
data examples.
โ€ข Nearest Neighbors
k-nearest neighborsโ€™ algorithm is a non-parametric supervised learning method. It is very
straightforward, we simply memorize all data, and then we label new example by finding k
nearest neighborโ€™s majority class as illustrated in figure 14.
Page | 18
Figure 14 : Nearest Neighbors:
โ€ข Linear regression
Linear regression is an algorithm that tries to fit a line (figure15) that best describes data sets
that involve more than one dimension. for example, the size of a house is relative to its price.
Or the age of a person and the person's income. With linear regression, we can draw a line
representing a mathematical relationship based on a bunch of measurements of points to map
new continuous inputs to outputs.
Figure 15 : Linear regression
โ€ข Neural network
A neural network is a supervised learning method that uses multiple layers of a connected set
of nodes(figure16). These layers transform an input set to a particular output set representing
the result we want to get from a specific data. In order to obtain an accurate result, we need to
train the neural network on labeled data. Next, we reduce the error we get from the model when
we apply new examples by changing the connected nodes' values using calculus formulas.
Page | 19
Figure 16 : Neural network
Due to the excellent performance, deep neural networks have been widely used in image
analysis, speech recognition, target detection, face recognition, and other fields.
2.4 Mathematics for AI
Mathematics forms the foundations of machine learning. โ€œAs machine learning is applied to
new domains, developers of machine learning need to develop new methods and extend
existing algorithms. They are often researchers who need to understand the mathematical basis
of machine learning and uncover relationships between different tasksโ€ [14].
ML is very much an interdisciplinary field. Even though it runs as a computer program, it
heavily relies on calculus, statistics, linear algebra, and probability.
โ€ข Calculus tells us how to learn and optimize our linear model.
โ€ข Algebra makes running these algorithms possible as ML deals with matrices and
vectors to represent data (text, images, etc.).
โ€ข Statistics is at the core of everything. It is very helpful for optimization tasks.
โ€ข Probability helps predict the likelihood of an event occurring.
Therefore, we will illustrate some math concepts before diving into machine learning models.
2.4.1 Linear algebra
Some concepts of Linear Algebra are important for understanding the principles behind
Machine Learning. In this section, we will try to focus on the parts that are involved in ML.
2.4.1.1 Vectors
Page | 20
The fundamental build block for linear algebra is vectors because linear algebra is the study of
vectors and certain rules to manipulate vectors [14]. There are three distinct but related ideas
about vectors:
โ€ข In physics, vectors are quantity that has both magnitude and direction and can be placed
anywhere.
โ€ข In Computer Science, a vector is a collection of data where the order matters.
โ€ข In math, a vector could be anything. It can be drawn everywhere in space [15].
2.4.1.2 Operating on vectors
โžข Addition: we can add two vectors
[
1
2
] + [
3
1
] = [
4
3
]
โžข Scalar multiplication: we can scale a vector by multiplying by a number.
2 ร— [
1
2
] = [
2
4
]
โžข Magnitude: length of the vector.
๐‘ฃ
โ†’
= [
โˆ’1
2
3
] โˆฅ
โˆฅ ๐‘ฃ
โ†’
โˆฅ
โˆฅ = โˆš(โˆ’1)2 + 22 + 32 = โˆš14
โžข Dot Product: is a way to measure the length of the projection of two
vectors [15].
[
1
2
โˆ’1
] ยท [
3
1
0
] = 1 ยท 3 + 2 ยท 1 + (โˆ’1) ยท 0 = 5
2.4.1.3 Matrices
A matrix in mathematics is a collection of vectors [14]. it represents a table of numbers
arranged in rows and columns. In computer science, it is a two-dimensional set of numbers
with m rows and n columns.
2.4.1.4 Linear Transformation
A linear transformation is like a function in math. It takes a matrix and transforms it into another
matrix or vector. And the function itself is a matrix.
[
๐‘Ž ๐‘
๐‘ ๐‘‘
] [
๐‘ฅ
๐‘ฆ] = ๐‘ฅ [
๐‘Ž
๐‘
] + [
๐‘
๐‘‘
] = [
๐‘Ž๐‘ฅ + ๐‘๐‘ฆ
๐‘๐‘ฅ + ๐‘‘1
] = [
๐‘–โ€ฒ
๐‘ฆโ€ฒ] (2.2)
Page | 21
We transfer the vector [
๐‘ฅ
๐‘ฆ] by the matrix [
๐‘Ž ๐‘
๐‘ ๐‘‘
] and we get [
๐‘–โ€ฒ
๐‘ฆโ€ฒ] as output.
This is like the behavior of a neural network (figure17) we take an image as an input, and it
gives us the potential content of it as output, but in the case of a neural network, it is a group
of matrices and vectors.
Figure 17 : Neural network structure
2.4.2 Calculus
In the last figure, we can see a simple neural network with nodes interconnected between each
other. These nodes represent the core of the network. To find their value, we need to apply
some calculus concepts, which we will explain in the following parts.
2.4.2.1 Derivative
The derivative is a fundamental concept in calculus. We can consider it as the average rate of
change of a function with respect to a single variable as shown in figure18.
Figure 18 : The Derivative As AFunction
The average rate of change It is a measure of how much the function changed per unit.
๐‘“(๐‘Ž+โ„Ž)โˆ’๐‘“(๐‘Ž)
(๐‘Ž+โ„Ž)โˆ’๐‘Ž
=
๐‘“(๐‘Ž+โ„Ž)โˆ’๐‘“(๐‘Ž)
โ„Ž
(2.3)
Page | 22
We can calculate the second and third derivatives up to the 'nth' derivative using eq (2.3). For
neural networks, derivatives are very important in the optimization process and in reducing the
error rate.
2.4.2.2 Chain Rule
Chaine rule allows us to calculate the derivatives of combinations of two functions [15].
๐‘‘
dx
๐‘“(๐‘”(๐‘ฅ)) = ๐‘“โ€ฒ
(๐‘”(๐‘ฅ))๐‘”1(๐‘ฅ)
(2.4)
We can extend the chain rule formula to calculate more complicated compositions like this
function ๐‘“(๐‘”(๐‘˜)) And k can be a different composition of functions. In neural network models,
we deal with multiple compositions, but instead of functions, we use matrices and vectors.
2.4.3 Multivariable calculus
Multivariable calculus is the extension of calculus where we deal with multivariable functions
that involve more than one input number, rather than just one variable.
2.4.3.1 Multivariable functions
Multivariable functions are functions that assign multiple variables to a real number [15].
Ex: ๐‘“(๐‘ฅ) = ๐‘ฅ + 12 , normal function with one variable x.
Ex: ๐‘“(๐‘ฅ, ๐‘ฆ) = 4๐‘ฅ + 2๐‘ฆ , multivariable function with two variables x and y.
โ€ข We can write the function as a vector: ๐‘ = ๐‘“(๐‘ฅ, ๐‘ฆ) = [
4๐‘ฅ
2๐‘ฆ
]
โ€ข We can also graph the function in a 3D dimension:
Z
X Y
2.4.3.2 Partial Derivatives
Figure 19 : Multivariable functions
Page | 23
To calculate derivatives of multivariable functions, we need to use the partial derivative, which
is very similar to the ordinary derivatives equation (2.3). The difference is with each variable
of a function, we calculate the derivative concerning it, making all the other variables constant.
Ex: ๐‘“(๐‘ฅ, ๐‘ฆ) = 4๐‘ฅ + 2๐‘ฆ we choose random variables for x and y example (2,1)
Derivative with respect to x (๐œ•๐‘ฅ) :
๐œ•๐‘“
๐œ•๐‘ฅ
(2,1) =
๐œ•
๐œ•๐‘ฅ
(4๐‘ฅ2
+ 2 ยท 1) = 8๐‘ฅ = 1
Derivative with respect to y (๐œ•๐‘ฆ) :
๐œ•๐‘“
๐œ•๐‘ฆ
(2,1) =
๐œ•
๐œ•๐‘ฆ
(4 ยท 22
+ 2 ยท ๐‘ฆ) = 2
2.4.3.3 Gradient
The gradient is a vector that gathers all the partial derivatives of a function [15].
๐›ป๐‘“(๐‘ฅ, ๐‘ฆ, โ€ฆ ) =
[
๐œ•๐‘“
๐œ•๐‘ฅ
๐œ•๐‘“
๐œ•๐‘ฆ
โ‹ฎ ]
(2.5)
The gradient is essential for neural networks as we are dealing with multivariable functions
[14].
2.4.3.4 Gradient descent
Gradient descent is a popular algorithm used in machine learning .and many Deep Learning
libraries support its implementation. It is used to optimize neural networks by iteratively
moving in the direction of steepest descent, which is the gradient's negative [16]. We will be
using gradient descent in the error function that we will talk about it later.
2.4.4 Mathematics Behind Neural Networks
2.4.4.1 Perceptron
Neural networks are the central concepts of deep learning. They consist of artificial neurons
connected to each other. A neuron is the basic processing unit in a neural network model.
Generally, it is a unit with multiple inputs and a single output, which we call a perceptron.
Page | 24
Figure 20 : The structure of an artificial neuron
perceptron equation (figure20) :
๐’š = ๐’‡(๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + โ‹ฏ + ๐’˜ ๐’ ๐’™ ๐’ + ๐’ƒ) (2.6)
๐’‡ in equation(2.6) is an activation function. Generally, for neural networks, we use Sigmoid,
ReLU, or Tanh functions [14]. The activation function is a way to map and normalize outputs
to new values that optimize the computational performance without changing the network's
computational state, so it is just a function to help the neural network process inputs
information and map them to the correct outputs.
Activation Function Formula Graph
Sigmoid ๐‘บ(๐’™) =
๐Ÿ
๐Ÿ + ๐’†โˆ’๐’™
ReLU ๐’“๐’†๐’๐’–(๐’™) = {
๐’™ ๐’Š๐’‡ ๐’™ โ‰ฅ ๐ŸŽ
๐ŸŽ ๐’Š๐’‡ ๐’™ < ๐ŸŽ
Page | 25
Tanh ๐ญ๐š๐ง๐ก(๐’™) =
๐’† ๐’™
โ€“ ๐’†โˆ’๐’™
๐’† ๐’™ + ๐’†โˆ’๐’™
Table 3 : Activation functions
๐’š in equation (2.6) represents the result we get from this perception. Its range can be different
based on the activation function but mostly between 0 and 1. Then this value is passed to other
connected neurons to solve more complex problems. If we think of this perception as an
individual classification model, we can consider ๐’š as a value for classifying a particular input
based on a specific formula.
for example, we have students grades ๐’™ ๐Ÿ , ๐’™ ๐Ÿ , โ€ฆ , ๐’™ ๐’ then we define weights
๐’˜ ๐Ÿ , ๐’˜ ๐Ÿ, โ€ฆ , ๐’˜ ๐’ and a bias ๐’ƒ to form a function ๐’‡ . In a way that a student to be succeeded
this function needs to output a value higher than a particular threshold ๐œฝ :
Student succeeded if
โˆ‘ ๐‘“(๐’˜๐’Š ร— ๐’™๐’Š + ๐’ƒ) โ‰ฅ ๐œฝ
๐‘›
๐‘–=0
(2.7)
Student failed if
โˆ‘ ๐‘“(๐’˜๐’Š ร— ๐’™๐’Š + ๐’ƒ) < ๐œฝ
๐‘›
๐‘–=0
(2.8)
2.4.4.2 Error function
For complex problems, we cannot determine weights and bias by ourselves. So, we need a way
to compute these values, and therefore we use error or cost function. The most used ones are
cross-entropy or mean squared error function [17]. First, we initialize our model with random
Page | 26
weights, and then we measure the error made by the model by comparing the output it gives us
with the correct answer that we already know.
โ€ข Cross-entropy
๐‘ฌ = โˆ’
๐Ÿ
๐’Ž
โˆ‘ ๐ฒ ๐’Š ยท ๐ฅ๐จ๐ (ลท๐’Š
)
๐’Ž
๐’Š=๐Ÿ
(2.9)
For each ๐ฒ ๐’Š label(correct result), we use the predicted output ลท๐’Š by the classifier, and instead
of multiplying them, we use the logarithm function for computational purposes. Then we sum
all the results and divide by m the number of examples. To get the total error of our model that
we need to reduce later.
โ€ข Mean squared error:
๐‘ฌ =
๐Ÿ
๐’Ž
โˆ‘(๐’š๐’Š โˆ’ ลท๐’Š
)
๐Ÿ
๐’Ž
๐’Š=๐Ÿ
(2.10)
The mean squared error function is another way to compute error for the model by subtracting
the correct label results from predicted ones. It works very well for complex models.
2.4.4.3 Gradient descent
To reduce the cost of the error function, we need to use the gradient descent algorithm with
respect to all the weights of the function [17].
๐œต๐‘ฌ = (
๐๐‘ฌ
๐๐’˜ ๐Ÿ
, โ€ฆ ,
๐๐‘ฌ
๐๐’˜ ๐’
,
๐๐‘ฌ
๐๐›
) (2.11)
As we mentioned earlier in equation (2.5), the gradient is a vector full of partial derivatives
representing the steepest ascent direction of a function [14]. To reduce the error, we need to
take a negative step in that direction to update the weights.
๐’˜๐’Š
โ€ฒ
โ† ๐’˜๐’Š โˆ’ ๐œถ
๐๐‘ฌ
๐๐’˜๐’Š
(2.12)
Then we keep repeating this step for each weight in the model until we get to a local minimum
of the error function by using a learning rate ฮฑ to make small changes for weights each time
Page | 27
because we do not want to pass a local minimum of the function. We can define the ฮฑ value by
ourselves during the training and testing phase.
2.4.4.4 Feedforward
The feedforward is a process that the multilayer neural networks do to receive the input vector's
prediction [18]. It means applying all the perceptrons of a model (figure 20).
Figure 21 : Example of a simple neural network
Compound function :
๐’š = ๐ˆ (
๐’˜ ๐Ÿ“
๐’˜ ๐Ÿ”
) ๐ˆ (
๐’˜ ๐Ÿ ๐’˜ ๐Ÿ‘
๐’˜ ๐Ÿ ๐’˜ ๐Ÿ’
) (
๐’™ ๐Ÿ
๐’™ ๐Ÿ
) (2.13)
2.4.4.5 Backpropagation
Backpropagation is an algorithm to compute the gradient for multilayer neural networks. As
its error function is a composite function, it uses the chain rule we discussed earlier [18].
Chain rule:
Figure 22 : The chain rule
Page | 28
First, we do the feedforward process of the inputs ๐’™ ๐Ÿ and ๐’™ ๐Ÿ with the two layers.
๐‘พ๐’ ๐Ÿ
(๐’˜ ๐Ÿ, ๐’˜ ๐Ÿ, ๐’˜ ๐Ÿ‘, ๐’˜ ๐Ÿ’) and ๐‘พ๐’ ๐Ÿ
(๐’˜ ๐Ÿ“, ๐’˜ ๐Ÿ”) as shown in figure 21.
๐’‰ ๐Ÿ = ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’ƒ (2.14)
๐’‰ ๐Ÿ = ๐’˜ ๐Ÿ‘ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ’ ๐’™ ๐Ÿ + ๐’ƒ (2.15)
๐’‰ = ๐’˜ ๐Ÿ“ ๐ˆ(๐’‰ ๐Ÿ) + ๐’˜ ๐Ÿ” ๐ˆ(๐’‰ ๐Ÿ) (2.16)
ลท = ๐ˆ( ๐’‰) = ๐ˆ โˆ˜ ๐‘พ๐’ ๐Ÿ
โˆ˜ ๐‘พ๐’ ๐Ÿ
( ๐’™) (2.17)
Then we need to calculate the derivative of the error function with respect to weights using the
loss function equation (2.10).
๐‘ฌ in the equation (2.10) can be seen as the function on all the ๐’˜๐’Š ๐‘ฌ(๐‘พ) = ๐‘ฌ(๐’˜ ๐Ÿ, โ€ฆ ๐’˜๐’Š)
Figure 23 : Backpropagation
After that, we apply the backpropagation. For Example backpropagation of ๐’˜ ๐Ÿ shown figure
23 by calculating partial derivatives of the equation (2.17) is:
๐๐‘ฌ
๐๐’˜ ๐Ÿ
=
๐๐‘ฌ
๐ลท
๐ลท
๐๐’‰
๐๐’‰
๐๐’‰ ๐Ÿ
๐๐’‰ ๐Ÿ
๐๐’˜ ๐Ÿ
(2.18)
2.5 Deep Learnings
Deep learning is part of machine learning, where we rely on artificial neural networks to solve
complex problems in terms of the volume of data that cannot be solved by traditional machine
learning methods.
2.5.1 Artificial Neural Network
ANN is a group of multiple perceptron layers where a forward propagation transforms input
data through these layers to give us a new output. ANN consists of 3 layers Input, Hidden, and
Page | 29
Output [19].ANN is simply a neural network where each node of a layer is fully connected
with the next layer's nodes.
2.5.2 Recurrent Neural Network
The previous neural network we mentioned was trained using current inputs. We did not
consider prior inputs when generating the output. In RNN, instead, we save results from the
previous feedforward process of the system to use in our next iteration (figure 24), so it can
process sequential data without losing the relational information between them. For example,
in a text, we cannot just process each word by itself to predict a paragraph's meaning. We need
to understand the word context by processing the previous and subsequent words a well.
In other words, we can consider Recurrent Neural Network as a looping process where we
keep previous results with new inputs in each iteration to predict the final result.
2.5.3 Convolutional Neural Network
ANN is an excellent neural network structure, and it works very well for solving specific
problems. However, for image classification, it only works when we give it a set of images
where we place the target object in the center. ANNs are not suitable for images because these
networks can cause vanishing and exploding gradients. Especially if we have a network with
many hidden layers, where the size of the trainable parameters inside images can reach
thousands of pixels as each pixel is coded in 3 color channels. This can lead ANN to lose spatial
features of an image [19]. On the other hand, CNN reduces the dimensions of many parameters
to a small number of parameters using image filters that track spatial information and learn to
extract features such as the edges of objects or shapes as explained in figure 25.
Figure 24 : Representation of RNN both in folded and unfolded forms
Page | 30
Figure 25 : Convolutional Neural Network
CNNs are made of three main types of layers: convolutional layer, pooling layer, and fully
connected layer [20].
โ€ข Convolutional layer: its primary role is to track the picture's characteristics. It consists
of a set of filters.
โ€ข Pooling layer: its main role is to reduce the dimensionality of the data.
โ€ข Fully connected layer: output the results we want according to different tasks.
Since 2012 CNN achieved a state-of-the-art result in the Image-net challenge and caused a
huge advance in this field Image-net.
Figure 26 : The annual winner of the ImageNet challenge [21]
Page | 31
โ€œImageNet is formally a project aimed at (manually) labeling and categorizing images into
almost 22,000 separate object categories for the purpose of computer vision researchโ€ [21].
2.6 Neural network evaluation metrics
Neural network evaluation metrics are used to measure a machine learning model's
performance compared to other models using mathematical formulas to create a model that
gives high accuracy. Below we will site some of the metrics used in neural network models.
2.6.1 Classification accuracy
The first metric we will talk about is classification accuracy. It is mainly used in classification
models where we get a percentage representing the accuracy of the model.
๐ด๐‘๐‘๐‘ข๐‘Ÿ๐‘Ž๐‘๐‘ฆ =
๐‘๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘œ๐‘Ÿ๐‘Ÿ๐‘’๐‘๐‘ก ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘œ๐‘›๐‘ 
๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘œ๐‘›๐‘ 
(2.19)
This equation is very intuitive. We divide the number of correct predictions by the total number
of predictions.
2.6.2 Confusion matrix
The confusion matrix is more expressive than accuracy in terms of the type of error. We can
have a false positive (FP) or true negative made by the model and other combinations like in
the figure 27 below TP, FP, FN, and TN, which can help us understand some other metrics.
Figure 27 : Confusion matrix
Page | 32
โ€ข Precision
In some cases, we want to focus mostly on avoiding false positives like putting an important
email in a spam folder when it is not spam. For this kind of case, we use the precision metric.
๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› =
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ + ๐น๐‘ƒ
(2.20)
If we have no false positives, the result will be 1, meaning the model is perfect for our case.
โ€ข Recall
๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ =
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ + ๐น๐‘
(2.21)
The recall is the inverse of the precision metric, where we only focus on reducing TP.
โ€ข F1 score
F1 score calculated using the precision and recall represents the overall accuracy that
summarizes the confusion matrix result.
F ๐Ÿ = 2 ร—
๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
(2.22)
2.6.3 Log Loss
Log loss is another way of assessing a machine learning model's performance, and it is also
often used as a loss function.
โˆ’
1
๐‘›
โˆ‘ y๐‘–log(p๐‘–) + (1 โˆ’ y๐‘–)log(1 โˆ’ p๐‘–)
๐‘›
๐‘–=1
(2.23)
Int this equation ๐‘› is the number of observations,y represents its true label in a binary case (0
or 1), and p is the model predicted probability.
2.7 Conclusion
Machine learning, in recent years, has become very advanced and capable of solving
challenging problems. In this chapter, we have mentioned many approaches and techniques
used in machine learning. For our project, we will use the Convolutional neural network as it
is the latest and the most successful technology in the image classification field.
Page | 33
Gathering requirements
3.1 Introduction
Before creating any software project, it is necessary to define the technical and functional
requirements concerning the project's specifics. In this chapter we will list and demonstrate the
project's operational behavior and the methods and tools that we will use to create an Android
application capable of classifying different food images from a phone camera.
3.2 Requirements
3.2.1 Functional requirements
Functional requirement Description
Provide application guide The user should be able to use the application correctly by
providing him with a simple user guide inside the app.
Asking for camera permission The app should ask the user for permission before
accessing the phone camera.
Displaying camera preview The users should be able to see a preview of their target
food plate.
Classifying The application should run a classification process on
captured images.
Displaying results The application should display the probable food
categories as soon as it finishes the classification process.
Providing food description The user should be able to view a full description of the
classified food category.
Table 4 : Functional requirements
Page | 34
3.2.2 Nonfunctional requirements
Non-functional requirement Description
Design
The app should be aesthetically pleasing and
appropriately designed to satisfy the end-user
requirements.
Quality
The app should run smoothly and should avoid any
memory leaks and bugs that can affect the Android OS
or any other running apps.
Accuracy The app should provide reliable data to the user.
Compatibility
For any device that uses the Android operating system,
the app should run correctly without showing any bugs
drawbacks.
Accessibility
The application must be designed and developed so that
anyone can use it.
Table 5 : Nonfunctional requirements
3.3 Technical requirements
3.3.1 Tools for preparing data
Machine learning programs, in general, require labeled data sets. It could be images, texts, or
any data that can be represented numeracy. For this project, we will use labeled images to train
our model to classify new image inputs for training this project. We can get this data by
collecting it from different sources. After that, we need to clean and prepare the data.
3.3.1.1 Data scraping
The data will mainly be from the web, so we need a tool to scrape data from different websites.
In this project, we will use a software called Parsehub.
Page | 35
Figure 28 : PurseHub interface
Parsehub is a powerful and free tool for scraping different types of data like images, titles, texts
from websites that contain a large number of contents like a list of pictures, a list of films, and
so on. This tool offers an automatic and fast way to collect these data as shown in figure 28.
3.3.1.2 Cleaning images
After data scrapping, we need to make sure that there are no repeated images in our food data
set, but we cannot do this manually as we are dealing with a high volume of data, so we will
be using a tool called "Duplicate Photo Cleaner" as showing in figure 29.
Figure 29 : Duplicate Photo Cleaner
Page | 36
DPC is an advanced image similarity detector. It is an excellent tool for everyone who takes
photos with their smartphone. Unlike ordinary duplicate image finders, Duplicate Photo
Cleaner can compare images based on how similar they look [22] .
3.3.1.3 Preparing images
The collection of images must be all of the same sizes, and they must be like the size of a photo
taken from a phone camera. For this task, we will use a tool called JPEGCrops (figure 30).
Figure 30 : JPEGCrops interface
JPEGCrops is a Windows program created for the preparation of a batch of images for printing.
It provides lossless cropping with fixed aspects using jpegtran. [23]
3.3.2 Machine learning method
The project aims to apply Machine learning for creating a useful food classifier. We mentioned
many different types of machine learning approaches in the previous chapter, where we have
concluded that the best method used when building an image classifier is the convolutional
neural network.
3.3.2.1 Deep learning frameworks
For building deep learning models, there are different frameworks the most famous ones are
TensorFlow and PyTorch as shown in the diagram in figure 31.
Page | 37
Figure 31 : Online job listing growth
3.3.2.2 Framework comparison
TensorFlow PyTorch
โ€ข Developed by Google
โ€ข Difficult Debugging
โ€ข Open source
โ€ข Static network graph [24].
โ€ข Big community
โ€ข Good for production
โ€ข More mature
โ€ข Developed by Facebook
โ€ข Good for debugging
โ€ข Open source
โ€ข Uses dynamic computational graph [24].
โ€ข Based on Python
โ€ข Popular in research labs [24].
โ€ข Relatively new
Table 6 : TensorFlow and PyTorch comparison
We can see in table 6 that both frameworks are great for creating neural network models but
for this project, we decided to use PyTorch as it offers a more object-oriented approach and
good for learning and research.
Page | 38
3.3.3 PyTorch
PyTorch is an open-source machine learning library used for developing and training neural
networks based on deep learning models. It is primarily developed by Facebookโ€™s AI research
group. PyTorch can be used with Python as well as C++ [24].
3.3.3.1 Programing language
We need to install the Python language package for running the PyTorch framework. Python
is the most used language in machine learning and data analysis. It is very straightforward and
simple, especially for mathematicians and researchers who want to get involved in developing
programs related to their field. There are many distributions for Python, but the best data
science and machine learning distribution is Anaconda. It includes all the required libraries and
APIs for Machine learning. We can flexibly be adding more through a graphical user interface
called Anaconda Navigator that enables us to launch applications and efficiently manage
Conda packages.
Figure 32 : Anaconda Navigator
3.3.3.2 Coding environment
To write the convolutional neural network model code in Python, we need to prepare our coding
environment. There are many choices that we can pick from. The choice will not affect the
project quality, so each developer can decide the IDE he is comfortable with.
Page | 39
โ€ข Visual Studio Code
For this project, we will use VSCode. VSCode is a free and open-source code editor from
Microsoft, and it runs on all major platforms, so it is available for Mac OS, Windows, and
Linux. It is a very lightweight code editor and contains many additional plugins and APIs that
we can import, so it is an excellent choice for our developing machine learning project.
โ€ข Jupyter notebook
Alongside VSCode, we will use a Jupyter notebook. Jupyter notebook is like a web page that
holds a document where you can execute chunks of programming code one chunk at a time.
And you can insert explanatory text and even data visualization, tables, equations, and graphs
with the code as shown in figure 34. Jupiter notebook is open-source and was created for data
science and machine learning researchers.
Figure 34 : Jupyter Notebooks in Visual Studio Code
Figure 33 : VSCode screenshoot
Page | 40
we will be using Jupiter notebooks because we need to see the output of our code fragments
frequently, and we need to draw some graphs for debugging purposes, which is what Jupyter
notebook allows.
3.3.3.3 Libraries and APIs
The PyTorch library contains many useful features for building a neural network, but we need
to use additional libraries to work with it.
โ€ข Matplotlib
Matplotlib is the most popular plotting library for Python. It provides numerous ways to create
statically and animated visuals (figure 35). And it works very well with PyTorch and NumPy.
Figure 35 : Matplotlib style sheets
โ€ข NumPy
NumPy is one of the most powerful Python libraries [25]. With NumPy, we can practice simple
image processing techniques. Because NumPy can represent images as a multi-dimensional
array. NumPy is a scientific computing library used by numerous other Python data science
libraries. It contains many functions that work with linear algebra, statistics, simulation, data
science, machine learning, and so much more.
โ€ข CUDA
CNN consists of many hidden layers. So, if we try to train it on the CPU, it will take pretty
much forever. The solution for this is to use the GPU. Which are built specifically for doing a
bunch of linear algebra computations in parallel, and neural networks are fundamentally just a
bunch of linear algebra computations. So, if we run on GPU, computation will be done in
parallel, and we get about 100 times more speed than the CPU. In PyTorch, we can move our
model parameters from the CPU over to the GPU by installing the CUDA toolkit (figure 36)
from Nvidia to our operating system.
Page | 41
Figure 36 : CUDA ecosystem diagram
CUDA toolkit is a software platform that pairs with Nvidia GPU device to facilitate building
programs that increase computational speed using NVIDIA GPUs' parallel processing power.
3.3.4 Deployment
For the last phase of the project, we need to deploy our neural network to a mobile application
so that users can use it wherever they go.
3.3.4.1 Building Android application
There are two major mobile phone operating systems, IOS and Android. Android is taken most
of the market share as shown in figure 26 due to the varied price range of Android devices that
make it affordable for many people in countries with developing economies.
Figure 37 : OS market share [26]
Page | 42
Besides that, IOS development must have a Mac computer and an IOS device. So, I decided to
use Android as I have a windows pc and an Android device.
3.3.4.2 Programing language
Android development has changed a lot recently. Android apps are now built either using Java
or Kotlin language. Java was the default language, but recently Google announced that Kotlin
will replace Java as the official language for Android development. We still can use Java, but
Kotlin is now considered more efficient.
3.3.4.1 Programing language comparison
Kotlin Java
โ€ข Can inference the type of the variable at
compile time.
โ€ข Null safe, all types of variables are non-
nullable.
โ€ข Provides developers the ability to extend an
existing class with new functionalities.
โ€ข Do not have checked exceptions.
โ€ข It contains data classes specially made for
handling the work for us.
โ€ข We need to specify the type of declared
variables explicitly.
โ€ข Null Exceptions allow users to assign null
to variables.
โ€ข To add new functionalities to a class, we
need to create a new class and inherit the
parent.
โ€ข It contains checked exception support.
โ€ข We need to create a data class and its
constructors, setters, and getters methods
ourselves.
Table 7 : Kotlin , Java Comparaison [27]
As we can see in table 7, Kotlin is the most suitable choice for our project also it is Google's
preferred language for Android development.
3.3.4.2 Coding environment
For building an Android app, we will use Android studio as the coding environment because it
is the official IDE for Android, made by Google..
Page | 43
Figure 38 : Android studio interface
As shown in figure 38 Android studio is a robust code editor that helps with creating new
projects, as well as adding new modules, and gives a comprehensive representation of the
project structure, providing quick access to resources, code, and files.
3.3.4.3 Libraries and APIs
โ€ข CameraX
CameraX is a Jetpack library designed to make camera app development easier [28]. Because
writing a camera app using the standard camera API is a challenging task for developers. That
is why Google built this API, which is very easy to understand and significantly reduced the
total amount of code that we must write. The CameraX API is built on top of the Camera2
API to achieve a consistent experience across all the device types.
โ€ข PyTorch with Android
After training our convolutional neural network model, we need to pass it to an Android app.
PyTorch provides APIs that cover standard preprocessing and integration tasks required for
integrating Machine learning models in mobile applications and reduces integration errors by
allowing a seamless process to go from training to deployment by remaining entirely within
the PyTorch ecosystem.
Page | 44
โ€ข Material design support library:
Material Design is a design language made by Google. It is an adaptable design system backed
by open-source code that helps developers build high-quality digital experiences. From design
guidelines to developer components (figure 39), Material design can help develop products
faster, and it makes sure our app works for all users, regardless of the platform.
Figure 39 : Material Design Components
3.4 Conclusion
So far, we were successfully able to Select our method and tools needed for doing the project,
based on the studies we made and discussed in the earlier chapter. The next chapter will go into
detail about the method for designing our application and the approach used.
Page | 45
Software Design and Architecture
4.1 Introduction
Before beginning the realization phase, in this chapter, we will define the overall software
architecture and the design patterns we will use during the project's realization. We will include
conceptual visualization and diagrams to illustrate our architecture.
4.2 Software best practices
This Android application is intended to be used by multiple users, so to make sure the code is
efficient and avoid making mistakes, we will follow specific standards used in software
engineering for data Science [29].
4.2.1 Clean and modular code
The first practice we will talk about is writing code in a way that is clean and modular. Code
is clean when it is clear, simple, and compact. This makes it much easier for developers to
understand and reuse code, especially when iterating over a project. Also, our code should be
modular. Meaning the program is broken up into functions and modules. A module is just a
file. Like a function, we encapsulate code in it and reuse it by calling the function in different
places, the same for a module. We can encapsulate code within a file and reuse it by importing
it into separate files. This helps us to write fewer unnecessary lines of code.
4.2.2 Efficient code
Writing efficient code is very important, especially for usersโ€™ experience. There are two parts
to making code efficient. Reducing the time it takes to run and reducing the amount of space it
takes up and memory usage. This is very important for developing our mobile application since
our app runs on the user device, and updates happen instantaneously. For machine learning,
the model will be trained locally before being integrated into the Android app. so we can use
slow code because the essential thing is to produce a module that can classify images with the
highest possible accuracy.
4.2.3 Refactoring code
Refactoring code is a step done after writing a program that solves a new problem because
when writing the code for the first time, we do not pay attention to the code structure and
arrangement. We manage to focus on just doing the code work, which can cause the code to be
Page | 46
a little bit unorganized and repetitive. That is why we should always go back to do some
refactoring after achieving a working model. Refactoring means restructuring the code to
improve its internal structure without changing its external functionality. Refactoring gives us
a chance to clean and restructure our program and modularized it.
4.2.4 Documentation
Documentation is the additional text that comes with or is included in the software code to
compactly represented it. It helps to clarify complex parts of the program, especially if we are
dealing with hundreds of lines. Documentation makes it easy to navigate throughout the code
without getting lost and quickly understand how and why different application components are
used. We can add different documentation types to our software like:
โ€ข In-line comments: are used to clarify a specific line of code.
โ€ข Docstrings: This is a way to create documentation for a function or a module describing
its purpose and details.
โ€ข Project documentation: We can add it at the project level, like using a readme file to
document details about the project.
4.2.5 Version control
The version control system's primary purpose is to help multiple developers work
independently on the same project without making conflicts, but that is not the only use. We
still can benefit from using version control as it creates safe points that save our project
progress, and we can try out new code branches without losing previous code. For this project,
we will use Git because it is the most common version control system.
4.3 Software Design
Software design is usually broken into two different phases, architectural design, and detail
design. Architectural design is the process of dividing the programs into components, assigning
responsibilities for aspects of behavior to each component, and addressing how the components
interact with each other. Detail design is more related to the functional requirements, where we
create a full definition of every aspect of project development. In this project, we are dealing
with two separate and independent Programs a machine learning program for creating a neural
network and an Android application to use the produced model as illustrated in figure 40.
Page | 47
Figure 40 : Project Structure
There is no direct interaction between these two systems. As shown in the figure above, they
both run in a separate timeline. First is the ML program train and generates a convolutional
neural network model, and then the model is then imported to the Android application to
classify captured images. As these programs are not related to each other, we will go directly
to the detailed design of each of these programs individually.
4.4 ML Lifecycle
Machine learning is considered a data science analysis more than a software development
process because it relies on training data sets and statistics to solve problems. It has a different
lifecycle, as shown in figure 41. We have already made the Asking questions phase in our
โ€œIntroduction and contextโ€ chapter. We will start directly from the preparing data phase.
Figure 41 : Machine Learning Lifecycle [13]
Page | 48
4.4.1 Preparing data
Preparing data is the first process in machine learning. The first thing we need to do is to define
data categories. For this project, we decided to go with only ten categories because it is tough
to find extensive image data on an innovative project idea, like classifying Tunisian food.
Furthermore, we need approximately 1000 labeled images for each food type, as explained in
the introduction chapter. Our image data sets will be divided into ten different folders where
the folder's name represents the data labels as shown in figure 42.
Figure 42 : Data Structures
4.4.2 Algorithm Selection
4.4.2.1 Transfer learning
For creating the neural network, we will use a technique that has proven to be very good in
solving complex problems without the need for massive data sets [30] , or months of training
duration. This technique called Transfer learning, transfer learning means taking a model that
has been trained for one task and then tuning it to accomplish another task (figure 43).
Figure 43 : Transfer Learning Technique
Page | 49
More specifically, Transfer learning refers to the process of taking a pre-trained neural network
and using it with our classifier model (figure 44) and training them on our dataset by freezing
the weights of the CNN model as it is already trained. This CNN model can still extract general
features from our data samples, while the classifier model uses this information to classify the
data in a way that is pertinent to our problem [30]. This technique has proven to work very
well, especially for convolutional neural networks that have been trained on millions of images
of the โ€œImageNetโ€ challenge that happen each year to discover the best possible ML model.
Because as we mentioned in the literature revue chapter, CNNs use image filters to extract
features from training data and then pass it to a classifier neural network for classification. And
when the model is trained on a colossal amount of data samples belonging to a large variety of
categories, the model becomes able to extract features from any new data. Then we can use it
to add our classifier neural network to be trained for our specific problem without changing the
pretrained CNN filter parameters. This technique gives an astonishing result, as it has been
tested numerous times in different machine learning researchersโ€™ papers.
.
Figure 44 : Neural network Structure
For this technique to work on our ten food categories, we need to create our classification
model to be added at the end of the pre-trained CNN structure as shown in figure 44. Many
CNNs have performed well in the โ€œImageNetโ€ competition. One of them is the MobileNet
model, which we will be using, as it is a light version of about 10 MB space compacted for
mobile phone devices.
Page | 50
4.4.2.2 Classifier Model Structure
Once we have the pre-trained CNN, we need to overwrite its Classifier model to our model.
Classifier models are typically divided into three parts :
โ€ข Input layer
The classifier model's input layer is where the CNN output layer's feature extraction result is
passed. To fit it exactly to our model, we need to make the input layer size resembles the CNN
output size, which is 2048 nodes for the MobileNet CNN shown in figure 45.
โ€ข Hidden layers
Defining Hidden layers is the most challenging part, as there are no specific rules because each
problem has its characteristics. What many developers do is testing different structures, and
then they compare results. This process requires a vast computing resource. For that reason,
we will just use the most common format used in problems like our situation, where there are
only ten output classes.
โ€ข Output layer
The output layer is the result layer. The size of it will be ten nodes as our defined food
categories. We will use the SoftMax function as shown in figure 45 to find a probability
between 1 and 0 about the most probable class.
Figure 45 : Classifier Structure
Page | 51
The first and second hidden layers will use the ReLU activation function. The first layer will
use 1000 nodes, and the second one with 500 nodes.
4.4.3 Training the Model
For training the classifier model, we will use the cross-entropy loss function. Before that, we
need to divide our image data sets into training, testing, and validation data [18]. As illustrated
in figure 46. Because ML models tend to perform well on the training data sets but cannot
generalize to data that has not to be viewed before, this is called the overfitting problem. To
avoid this problem, we need a proportion of the data to be for validation in order to test when
we should stop the training process. And at the end, we need another portion of data outside of
the training process to test the model's real performance as if it is working on real-world data.
Figure 46 : Data Sets Division
Validation data will be involved in the training phase to examine the real performance of our
model. Then we save checkpoints for each iteration in the model that makes a better result on
the validation data set as shown in figure 47.
Figure 47 : Checkpoints Design Pattern [31]
80% 10% 10%
Page | 52
4.4.4 Evaluating the Model
To evaluate the model, we will use the testing data to measure the model precision by dividing
the number of correct classifications by the total number of the data set elements.
4.4.5 Deploying the model
Before we proceed to the Android development, we need to save the final result of the CNN
model by using a serialization method included in the PyTorch library to generate a serialized
version of the model for the Android application. This model will then be packaged inside our
application as an asset that we can run on the mobile device.
4.5 Android software design
The second part of the project is about designing an Android application. We will start first by
presenting an overview of the Android application structure and activity lifecycle.
4.5.1 Android Applications structure
Mobile apps are slightly different from standard software for Android platform applications
use up to four basic components [32], and two other additional components.
4.5.1.1 Basic components
โ€ข Activities :
Activities are the fundamental building blocks for Android apps. An activity can be considered
as an individual window containing a graphical interface for interaction with the user.
โ€ข Services :
Services represent a process running in the background, designed for continuous operations,
which do not have a graphic interface. They are usually used to perform long-lasting tasks.
โ€ข Content providers :
Content providers grant a level of abstraction for any data stored on the device that can be
accessed from multiple applications.
โ€ข Broadcast providers :
Broadcast providers are system messages that circulate in the device and alert applications to
various events.
Page | 53
4.5.1.1 Additional components
โ€ข Fragments :
Fragments are an optional component. They help to change the configuration of activities to
support large and small screens on mobile devices.
โ€ข Views :
Views are the basic building blocks of the application user interface. They are arranged in a
tree and used to display text fields, images, buttons, and so on.
4.5.2 Activity lifecycle
Figure 48 : Activity lifecycle in Android [33]
Page | 54
The life cycle of an Android activity has four basic states controlled by six callbacks, as shown
in figure 48:
โ€ข Launched State: this state is when the user launches the app by clicking on its icon. The
Android system will then create a new instance for the launched activity.
โ€ข Running State: this is when the activity is displayed on the screen executing its code or
waits for the user input. It is the state between the onResume() and onPause() callbacks.
โ€ข Killed state: This is when the activity still saves the necessary data. The user always
has access to it, but the Android system shut it down to save memory for a higher
priority app that the user is focusing on.
โ€ข Shutdown state: This is the final phase where the app memory is being released before
being shut down.
4.5.3 Software architecture
For this Android application, we will use three Activities Classes, one for showing a user guide,
one for the camera and image classification process, and one to display the full description of
the result food type.
4.5.3.1 Welcome/Guide Activity
UI Components (figure 49):
โ€ข ConstraintLayout parent view for
structuring and organizing child views.
โ€ข TextView for the activity title
โ€ข ViewPager to display guide
โ€ข Button view to skip to the next Activity
Figure 49 : Welcome Acitvity
Page | 55
4.5.3.2 CameraClassification Activity
4.5.3.3 Description Activity
Figure 51 : Description Activity
UI Components (figure 50):
โ€ข ConstraintLayout parent view.
โ€ข TextureView for camera preview.
โ€ข LinearLayout parent view.
โ€ข TextView/Button for top 1 result.
โ€ข TextView/Button for top 2 result.
โ€ข TextView/Button for top 3 result.
UI Components (figure 51) :
โ€ข ScrollView parent view.
โ€ข ConstraintLayout parent view.
โ€ข TextView for food name.
โ€ข ImageView for food image.
โ€ข LinearLayout parent view
โ€ข TextView for Description text.
โ€ข TextView for Ingredients list.
Figure 50 : CameraClassification Activity
Page | 56
UI Component Description
TextView TextView is a view that displays a text or any type of string.
ImageView ImageView is a view that displays an image from its source path.
Button Button is a view to display a button Ui component that can handle click events.
TextureView TextureView is a view that displays the content stream.
ViewPager
ViewPager is a view that allows the user to swipe left or right to view multiple
contents. We will use it to display the guide.
LinearLayout LinearLayout is a view group that linearly organizes subviews.
ConstraintLayout ConstraintLayout is a view group to place views with position constraints.
ScrollView ScrollView is a view group to display subviews on a scrollable page.
Table 8 : UI Components
4.5.3.4 CameraClassification Class Diagram
This UML Class diagram shown in figure 52 is to explain the internal structure of
CameraClassification Activity.
Figure 52 : CameraClassification Class Diagram
Page | 57
Class Description
BaseModuleActivity It is a base class for activities from Android SDK.
ImageClassificationActivity
This activity class is responsible for the image classification process,
reading camera data, and loading the Machine Learning model.
AbstractCamera
The class provides an API surface that connects to an Android device
camera and requests an image stream.
BackgroundThread
This class is used to create a background thread using the
HandlerThread class.
PythonModel This class is responsible for holding the Machine Learning model.
AnalyseResult
This is an inner class to exchange result data between the background
thread and the main thread.
Table 9 : Kotlin Classes Description
4.6 Conclusion
In this chapter, we have outlined the software structure in a way that replies to the specified
project requirements. We have also defined the Machine learning processes and algorithms for
creating a convolutional neural network. And, we have defined the design pattern for the
Android application. And now we are ready to pass to the realization phase.
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks
Android image recognition app using neural networks

More Related Content

What's hot

Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
ย 
Introduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVIntroduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVDylan Seychell
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
ย 
iot based home security using image processing algorithms
iot based home security using image processing algorithmsiot based home security using image processing algorithms
iot based home security using image processing algorithmsBadiginchala Manohari
ย 
Deep Learning
Deep Learning Deep Learning
Deep Learning Roshan Chettri
ย 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer VisionSilicon Mentor
ย 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
ย 
Computer vision ppt
Computer vision pptComputer vision ppt
Computer vision pptRachitSogani1
ย 
Breast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning pptBreast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning pptAnkitGupta1476
ย 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonAkash Satamkar
ย 
Traffic sign recognition
Traffic sign recognitionTraffic sign recognition
Traffic sign recognitionAKR Education
ย 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
ย 
Presentation on FACE MASK DETECTION
Presentation on FACE MASK DETECTIONPresentation on FACE MASK DETECTION
Presentation on FACE MASK DETECTIONShantaJha2
ย 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
ย 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
ย 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
ย 

What's hot (20)

Computer vision
Computer visionComputer vision
Computer vision
ย 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ย 
Introduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVIntroduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCV
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
ย 
iot based home security using image processing algorithms
iot based home security using image processing algorithmsiot based home security using image processing algorithms
iot based home security using image processing algorithms
ย 
Deep Learning
Deep Learning Deep Learning
Deep Learning
ย 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer Vision
ย 
Computer Vision Introduction
Computer Vision IntroductionComputer Vision Introduction
Computer Vision Introduction
ย 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
ย 
Computer vision ppt
Computer vision pptComputer vision ppt
Computer vision ppt
ย 
Breast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning pptBreast cancer diagnosis machine learning ppt
Breast cancer diagnosis machine learning ppt
ย 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
ย 
Traffic sign recognition
Traffic sign recognitionTraffic sign recognition
Traffic sign recognition
ย 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
ย 
Computer Vision
Computer VisionComputer Vision
Computer Vision
ย 
Presentation on FACE MASK DETECTION
Presentation on FACE MASK DETECTIONPresentation on FACE MASK DETECTION
Presentation on FACE MASK DETECTION
ย 
Cnn
CnnCnn
Cnn
ย 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
ย 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
ย 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
ย 

Similar to Android image recognition app using neural networks

Sixth Sense Technology
Sixth Sense TechnologySixth Sense Technology
Sixth Sense TechnologyIRJET Journal
ย 
AMIZONER: Final Report
AMIZONER: Final ReportAMIZONER: Final Report
AMIZONER: Final ReportNeil Mathew
ย 
ML for blind people.pptx
ML for blind people.pptxML for blind people.pptx
ML for blind people.pptxJAYKUMARKABRA
ย 
Jitin_Francis_CV....
Jitin_Francis_CV....Jitin_Francis_CV....
Jitin_Francis_CV....Jitin Francis
ย 
Android Based Facemask Detection system report.pdf
Android Based Facemask Detection system report.pdfAndroid Based Facemask Detection system report.pdf
Android Based Facemask Detection system report.pdfApuKumarGiri
ย 
Assistance Application for Visually Impaired - VISION
Assistance Application for Visually  Impaired - VISIONAssistance Application for Visually  Impaired - VISION
Assistance Application for Visually Impaired - VISIONIJSRED
ย 
Sixth sense technology
Sixth sense technologySixth sense technology
Sixth sense technologyNiraj Bharambe
ย 
Intelligent System For Face Mask Detection
Intelligent System For Face Mask DetectionIntelligent System For Face Mask Detection
Intelligent System For Face Mask DetectionIRJET Journal
ย 
Face Mask Detection System Using Artificial Intelligence
Face Mask Detection System Using Artificial IntelligenceFace Mask Detection System Using Artificial Intelligence
Face Mask Detection System Using Artificial IntelligenceIRJET Journal
ย 
Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...IJECEIAES
ย 
Sixth Sense Technology ppt
Sixth Sense Technology pptSixth Sense Technology ppt
Sixth Sense Technology pptAvijeet Negel
ย 
Object Recognition in Mobile Phone Application for Visually Impaired Users
Object Recognition in Mobile Phone Application for Visually Impaired UsersObject Recognition in Mobile Phone Application for Visually Impaired Users
Object Recognition in Mobile Phone Application for Visually Impaired UsersIOSR Journals
ย 
Saksham presentation
Saksham presentationSaksham presentation
Saksham presentationSakshamTurki
ย 
IRJET- Application of MCNN in Object Detection
IRJET-  	  Application of MCNN in Object DetectionIRJET-  	  Application of MCNN in Object Detection
IRJET- Application of MCNN in Object DetectionIRJET Journal
ย 
VTU final year project report
VTU final year project reportVTU final year project report
VTU final year project reportathiathi3
ย 
Developing Image Processing System for Classification of Indian Multispectral...
Developing Image Processing System for Classification of Indian Multispectral...Developing Image Processing System for Classification of Indian Multispectral...
Developing Image Processing System for Classification of Indian Multispectral...Sumedha Mishra
ย 
RAPA Project Documentaion
RAPA Project DocumentaionRAPA Project Documentaion
RAPA Project DocumentaionKhaled El Sawy
ย 
Camara for uav jan2012 eas 021
Camara for uav jan2012 eas 021Camara for uav jan2012 eas 021
Camara for uav jan2012 eas 021M.L. Kamalasana
ย 
Facemask_project (1).pptx
Facemask_project (1).pptxFacemask_project (1).pptx
Facemask_project (1).pptxJosh Josh
ย 

Similar to Android image recognition app using neural networks (20)

Sixth Sense Technology
Sixth Sense TechnologySixth Sense Technology
Sixth Sense Technology
ย 
AMIZONER: Final Report
AMIZONER: Final ReportAMIZONER: Final Report
AMIZONER: Final Report
ย 
ML for blind people.pptx
ML for blind people.pptxML for blind people.pptx
ML for blind people.pptx
ย 
Jitin_Francis_CV....
Jitin_Francis_CV....Jitin_Francis_CV....
Jitin_Francis_CV....
ย 
Android Based Facemask Detection system report.pdf
Android Based Facemask Detection system report.pdfAndroid Based Facemask Detection system report.pdf
Android Based Facemask Detection system report.pdf
ย 
Assistance Application for Visually Impaired - VISION
Assistance Application for Visually  Impaired - VISIONAssistance Application for Visually  Impaired - VISION
Assistance Application for Visually Impaired - VISION
ย 
Sixth sense technology
Sixth sense technologySixth sense technology
Sixth sense technology
ย 
Intelligent System For Face Mask Detection
Intelligent System For Face Mask DetectionIntelligent System For Face Mask Detection
Intelligent System For Face Mask Detection
ย 
Face Mask Detection System Using Artificial Intelligence
Face Mask Detection System Using Artificial IntelligenceFace Mask Detection System Using Artificial Intelligence
Face Mask Detection System Using Artificial Intelligence
ย 
Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...Analysis of student sentiment during video class with multi-layer deep learni...
Analysis of student sentiment during video class with multi-layer deep learni...
ย 
Sixth Sense Technology ppt
Sixth Sense Technology pptSixth Sense Technology ppt
Sixth Sense Technology ppt
ย 
G017133033
G017133033G017133033
G017133033
ย 
Object Recognition in Mobile Phone Application for Visually Impaired Users
Object Recognition in Mobile Phone Application for Visually Impaired UsersObject Recognition in Mobile Phone Application for Visually Impaired Users
Object Recognition in Mobile Phone Application for Visually Impaired Users
ย 
Saksham presentation
Saksham presentationSaksham presentation
Saksham presentation
ย 
IRJET- Application of MCNN in Object Detection
IRJET-  	  Application of MCNN in Object DetectionIRJET-  	  Application of MCNN in Object Detection
IRJET- Application of MCNN in Object Detection
ย 
VTU final year project report
VTU final year project reportVTU final year project report
VTU final year project report
ย 
Developing Image Processing System for Classification of Indian Multispectral...
Developing Image Processing System for Classification of Indian Multispectral...Developing Image Processing System for Classification of Indian Multispectral...
Developing Image Processing System for Classification of Indian Multispectral...
ย 
RAPA Project Documentaion
RAPA Project DocumentaionRAPA Project Documentaion
RAPA Project Documentaion
ย 
Camara for uav jan2012 eas 021
Camara for uav jan2012 eas 021Camara for uav jan2012 eas 021
Camara for uav jan2012 eas 021
ย 
Facemask_project (1).pptx
Facemask_project (1).pptxFacemask_project (1).pptx
Facemask_project (1).pptx
ย 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
ย 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
ย 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
ย 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
ย 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
ย 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
ย 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
ย 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
ย 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
ย 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
ย 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
ย 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
ย 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Featuresลukasz Chruล›ciel
ย 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
ย 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
ย 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
ย 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
ย 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
ย 
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...soniya singh
ย 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
ย 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
ย 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
ย 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
ย 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
ย 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
ย 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
ย 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
ย 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
ย 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
ย 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
ย 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
ย 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
ย 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
ย 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
ย 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
ย 
Hot Sexy call girls in Patel Nagar๐Ÿ” 9953056974 ๐Ÿ” escort Service
Hot Sexy call girls in Patel Nagar๐Ÿ” 9953056974 ๐Ÿ” escort ServiceHot Sexy call girls in Patel Nagar๐Ÿ” 9953056974 ๐Ÿ” escort Service
Hot Sexy call girls in Patel Nagar๐Ÿ” 9953056974 ๐Ÿ” escort Service
ย 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
ย 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
ย 
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi โžก๏ธ 8264348440 ๐Ÿ’‹๐Ÿ“ž Independent Escort S...
ย 

Android image recognition app using neural networks

  • 1. Ministry of Higher Education and Scientific Research Monastir University *-*-*-*-* Higher Institute of Computer Science and Mathematics of Monastir Graduation Project Android application for image recognition using artificial neural network A thesis presented to obtain a master's degree in software engineering Created by: Helmi Ben Khalifa Dr. Asma Kerkeni President Dr. Manel Sekma Examiner Dr. Souhail Mallat Supervisor University year: 2020/2021
  • 2. Acknowledgments I want to thank: My family, who has always supported and encouraged me during the realization of this project, My supervisor Dr. Souhail Mallat, for supporting this project and providing valuable assistance, All executives of the Higher Institute of Computer Science and Mathematics of Monastir (ISIMM), And the thesis committee who agreed to participate in the evaluation of this project. Thank you very much for helping me when I needed help. Thank you very much for the support you have given me to finish this work.
  • 3. Abstract This project is carried out as part of preparing an end-of-studies dissertation presented to the Higher Institute of Computer Science and Mathematics of Monastir known as ISIMM to obtain a master's degree in software engineering. This Master Thesis deals with the implementation of an android application for image classification. The application works as a food detector for famous Tunisian dishes by applying a convolutional neural network to process images recorded by the phone's built-in camera and provides an accurate result of what the plate might be. Keywords: Machine Learning, Image classification, Transfer learning, Computer vision, Image recognition, PyTorch, Android. Rรฉsumรฉ Ce projet est rรฉalisรฉ dans le cadre de la prรฉparation dโ€™une mรฉmoire de fin dโ€™รฉtudes prรฉsentรฉe ร  lโ€™Institut supรฉrieur dโ€™informatique et de mathรฉmatiques de Monastir (ISIMM). En vue de l'obtention d'un diplรดme de master professionnel en gรฉnie logiciel. Cette mรฉmoire concerne la mise en ล“uvre dโ€™une application Android pour la classification des images. Lโ€™application fonctionne comme un dรฉtecteur de nourriture pour les plats tunisiens cรฉlรจbres en appliquant un rรฉseau neuronal convolutif pour traiter les images par lโ€™appareil photo du tรฉlรฉphone et fournit un rรฉsultat prรฉcis de ce que le plat, pourrait รชtre. Mots-clรฉs: Machine Learning, Classification dโ€™image, Apprentissage par transfert, Computer vision, Reconnaissance dโ€™image, PyTorch, Android.
  • 4. Table of Contents General Introduction......................................................................................................1 Introduction and context ............................................................................2 1.1 Introduction.................................................................................................................2 1.2 Project description.......................................................................................................2 1.3 Motivation...................................................................................................................3 1.4 Aim of the work ..........................................................................................................3 1.5 Dissertation structure...................................................................................................4 1.6 Development Methodology.........................................................................................4 1.6.1 Software development processes available..........................................................5 1.6.2 Methodologies comparison table.........................................................................8 1.6.3 The chosen method ............................................................................................10 1.7 Project Planning ........................................................................................................10 1.7.1 Project management processes ..........................................................................11 1.7.2 Gantt chart..........................................................................................................11 1.8 Conclusion.................................................................................................................12 Literature review....................................................................................... 13 2.1 Introduction...............................................................................................................13 2.2 Data Science..............................................................................................................13 2.2.1 Acquiring and storing data.................................................................................14 2.2.2 Asking Questions...............................................................................................14 2.2.3 Data preparation.................................................................................................14 2.2.4 Exploring data....................................................................................................14 2.2.5 Machine learning model ....................................................................................14 2.2.6 Visualization and communication......................................................................14 2.2.7 Deployment........................................................................................................14 2.3 Machine learning.......................................................................................................15 2.3.1 Unsupervised Learnings.....................................................................................15
  • 5. 2.3.2 Supervised Learnings.........................................................................................16 2.4 Mathematics for AI ...................................................................................................19 2.4.1 Linear algebra ....................................................................................................19 2.4.2 Calculus..............................................................................................................21 2.4.3 Multivariable calculus........................................................................................22 2.4.4 Mathematics Behind Neural Networks..............................................................23 2.5 Deep Learnings .........................................................................................................28 2.5.1 Artificial Neural Network..................................................................................28 2.5.2 Recurrent Neural Network.................................................................................29 2.5.3 Convolutional Neural Network..........................................................................29 2.6 Neural network evaluation metrics ...........................................................................31 2.6.1 Classification accuracy ......................................................................................31 2.6.2 Confusion matrix ...............................................................................................31 2.6.3 Log Loss.............................................................................................................32 2.7 Conclusion.................................................................................................................32 Gathering requirements............................................................................33 3.1 Introduction...............................................................................................................33 3.2 Requirements.............................................................................................................33 3.2.1 Functional requirements.....................................................................................33 3.2.2 Nonfunctional requirements...............................................................................34 3.3 Technical requirements .............................................................................................34 3.3.1 Tools for preparing data.....................................................................................34 3.3.2 Machine learning method ..................................................................................36 3.3.3 PyTorch..............................................................................................................38 3.3.4 Deployment........................................................................................................41 3.4 Conclusion.................................................................................................................44 Software Design and Architecture........................................................... 45 4.1 Introduction...............................................................................................................45
  • 6. 4.2 Software best practices..............................................................................................45 4.2.1 Clean and modular code.....................................................................................45 4.2.2 Efficient code.....................................................................................................45 4.2.3 Refactoring code ................................................................................................45 4.2.4 Documentation...................................................................................................46 4.2.5 Version control...................................................................................................46 4.3 Software Design........................................................................................................46 4.4 ML Lifecycle.............................................................................................................47 4.4.1 Preparing data ....................................................................................................48 4.4.2 Algorithm Selection...........................................................................................48 4.4.3 Training the Model ............................................................................................51 4.4.4 Evaluating the Model.........................................................................................52 4.4.5 Deploying the model..........................................................................................52 4.5 Android software design ...........................................................................................52 4.5.1 Android Applications structure..........................................................................52 4.5.2 Activity lifecycle................................................................................................53 4.5.3 Software architecture .........................................................................................54 4.6 Conclusion.................................................................................................................57 Realization..................................................................................................58 5.1 Introduction...............................................................................................................58 5.2 Preparing the Development environment..................................................................58 5.2.1 Devices...............................................................................................................58 5.2.2 Tools ..................................................................................................................58 5.2.3 Installation..........................................................................................................58 5.3 Data preparing...........................................................................................................59 5.3.1 Gathering images ...............................................................................................59 5.3.2 Deleting redundant images ................................................................................60
  • 7. 5.3.3 Cropping Images................................................................................................60 5.3.4 Resizing and normalizing data...........................................................................60 5.3.5 Creating validation and test data........................................................................60 5.4 Creating the neural network model...........................................................................61 5.4.1 Pretrained model ................................................................................................61 5.4.2 Classifier Model.................................................................................................61 5.5 Training the model ....................................................................................................61 5.6 Training progress Test...............................................................................................61 5.7 Testing the model accuracy.......................................................................................62 5.8 Saving Model (serialization).....................................................................................64 5.9 Creating the Android application..............................................................................64 5.10 User Interface Design ............................................................................................65 5.10.1 Welcome Activity..............................................................................................65 5.10.2 Camera classification activity............................................................................65 5.10.3 Description activity............................................................................................66 5.11 Application icon ....................................................................................................66 5.12 Activities realization..............................................................................................66 5.12.1 Welcome Activity..............................................................................................66 5.12.2 Camera classification activity............................................................................67 5.12.3 Description Activity...........................................................................................67 5.13 Conclusion.............................................................................................................67 General Conclusion.......................................................................................................68 Webography .................................................................................................................. 69 Appendix........................................................................................................................ 72
  • 8. List of figures Figure 1 : Project description.....................................................................................................2 Figure 2 : Agile Methodology [6]..............................................................................................5 Figure 3 : Scrum Methodology [7] ............................................................................................5 Figure 4 : Extreme programming Methodology [8] ..................................................................6 Figure 5 : Unified Process [9]....................................................................................................7 Figure 6 : Rational Unified Process [6] .....................................................................................7 Figure 7 : Waterfall with overlapping phases [10] ....................................................................8 Figure 8 : Project management processes [12] ........................................................................11 Figure 9 : Gantt chart ...............................................................................................................12 Figure 10 : The data science process [13]................................................................................13 Figure 11 : K-means clustering................................................................................................15 Figure 12 : Hierarchical clustering ..........................................................................................16 Figure 13 : Support vector machine.........................................................................................17 Figure 14 : Nearest Neighbors:................................................................................................18 Figure 15 : Linear regression...................................................................................................18 Figure 16 : Neural network......................................................................................................19 Figure 17 : Neural network structure.......................................................................................21 Figure 18 : The Derivative As AFunction ...............................................................................21 Figure 19 : Multivariable functions .........................................................................................22 Figure 20 : The structure of an artificial neuron......................................................................24 Figure 21 : Example of a simple neural network.....................................................................27 Figure 22 : The chain rule........................................................................................................27 Figure 23 : Backpropagation....................................................................................................28 Figure 24 : Representation of RNN both in folded and unfolded forms .................................29 Figure 25 : Convolutional Neural Network .............................................................................30 Figure 26 : The annual winner of the ImageNet challenge [21]..............................................30 Figure 27 : Confusion matrix...................................................................................................31 Figure 28 : PurseHub interface ................................................................................................35 Figure 29 : Duplicate Photo Cleaner........................................................................................35 Figure 30 : JPEGCrops interface .............................................................................................36 Figure 31 : Online job listing growth.......................................................................................37 Figure 32 : Anaconda Navigator..............................................................................................38
  • 9. Figure 33 : VSCode screenshoot..............................................................................................39 Figure 34 : Jupyter Notebooks in Visual Studio Code ............................................................39 Figure 35 : Matplotlib style sheets...........................................................................................40 Figure 36 : CUDA ecosystem diagram....................................................................................41 Figure 37 : OS market share [26].............................................................................................41 Figure 38 : Android studio interface........................................................................................43 Figure 39 : Material Design Components................................................................................44 Figure 40 : Project Structure....................................................................................................47 Figure 41 : Machine Learning Lifecycle [13]..........................................................................47 Figure 42 : Data Structures ......................................................................................................48 Figure 43 : Transfer Learning Technique ................................................................................48 Figure 44 : Neural network Structure ......................................................................................49 Figure 45 : Classifier Structure................................................................................................50 Figure 46 : Data Sets Division.................................................................................................51 Figure 47 : Checkpoints Design Pattern [31]...........................................................................51 Figure 48 : Activity lifecycle in Android [33].........................................................................53 Figure 49 : Welcome Acitvity..................................................................................................54 Figure 50 : CameraClassification Activity ..............................................................................55 Figure 51 : Description Activity ..............................................................................................55 Figure 52 : CameraClassification Class Diagram....................................................................56 Figure 53 : VSCode View........................................................................................................59 Figure 54 : Images Sample ......................................................................................................59 Figure 55 : Deleting repeated images. .....................................................................................60 Figure 56 : Cropping Images ...................................................................................................60 Figure 57 : Training progress...................................................................................................62 Figure 58 : Test Accuracy Output............................................................................................62 Figure 59 : Test Examples .......................................................................................................64 Figure 60 : Android studio Project...........................................................................................64 Figure 61 : Welcome Activity Layout .....................................................................................65 Figure 62 : Camera Classification Activity .............................................................................65 Figure 63 : Description Activity Layout..................................................................................66
  • 10. List of tables Table 1 : Methodologies comparison table..............................................................................10 Table 2 : Project tasks timeline................................................................................................12 Table 3 : Activation functions..................................................................................................25 Table 4 : Functional requirements ...........................................................................................33 Table 5 : Nonfunctional requirements .....................................................................................34 Table 6 : TensorFlow and PyTorch comparison......................................................................37 Table 7 : Kotlin , Java Comparaison [27]................................................................................42 Table 8 : UI Components.........................................................................................................56 Table 9 : Kotlin Classes Description .......................................................................................57 Table 10 : Device Characteristics ............................................................................................58 Table 11: Development tools...................................................................................................58 Table 12 : Model accuracy on images from the web...............................................................63 Table 13 : Model accuracy on image from phone camera.......................................................63 List of Symbols ๐‘ƒ(๐‘ฅ), ๐‘ƒ(๐‘ฆ) : The independent probabilities of ๐‘ฅ and ๐‘ฆ ๐‘(๐‘ฅ|๐‘ฆ) : Probability ๐‘ฅ if the given ๐‘ฆ is true โˆฅ โˆฅ ๐‘ฃ โ†’ โˆฅ โˆฅ : Magnitude of a Vector โˆ‘ ๐‘ฅ๐‘– ๐‘› ๐‘–=0 : Sum of the ๐‘ฅ๐‘–: ๐‘ฅ1 + . . . + ๐‘ฅ ๐‘› ๐‘‘๐‘“ dx : Total derivative of ๐‘“ with respect to x ๐‘“(๐‘”(๐‘ฅ)) : Function composition ๐œ•๐‘“ ๐œ•๐‘ฅ : Partial derivative of ๐‘“ with respect to x ๐›ป๐‘“ : Gradient of a function ๐‘“ ๐‘Ÿ๐‘’๐‘™๐‘ข(๐‘ฅ) : Rectified linear unit function tanh(๐‘ฅ) : Hyperbolic tangent function log(๐‘ฅ) : Logarithmic function โ† : Assignment operator
  • 11. List of Abbreviations ML : Machine Learning ANN : Artificial Neural Network CNN : Convolutional Neural Network RNN : Recurrent Neural Network AI : Artificial Intelligence XP : Extreme Programing RUP : Rational Unified Process API : Application Programming Interface UI : User Interface TP : True Positive FP : False Positive TN : True Negative FN : False Negative IDE : Integrated Development Environment SVM : Support Vector Machine VSCode : Visual Studio Code OS : Operating System JDK : Java Development Kit SDK : Software Development Kit DPC : Duplicate Photo Cleaner GPU : Graphics Processing Unit CPU : Central Processing Unit ReLU : Rectified Linear Unit
  • 12. Page | 1 General Introduction These days, we are living in the golden age of artificial intelligence, which some have called the next industrial revolution. Especially in machine learning and deep learning fields, regarding the availability of massive data sets known as big data, which keep increasing very fast, including everything related to our life. From images and videos posted daily on social media websites to data collected periodically by smart sensors spread all over the world to measure climate changes and weather conditions. Computer performance has also breakthrough in different ways many fields of applications. Applications that were a few years ago just science fiction, especially with the dramatic increase in computational power and parallel computing in recent years that removed many of the barriers in the way of artificial intelligence and machine learning as these fields have been around for a long time since 1950 but only theoretical [1], due to the lack of powerful computing resources and large data sets. Artificial intelligence has now been able to contribute to our living conditions in a variety of ways. Image recognition systems have human like performance achieved, autonomous vehicles are increasingly becoming a reality, business models are changing rapidly, and medicine enables automated clinical diagnoses and suggests treatments. It is very important to take advantage of the available opportunities and provide useful solutions. This mission is not exclusive to big tech companies. Any software developer can also contribute to the artificial intelligence field by deploying apps, especially for lightweight devices like smartphones that do not require a considerable budget or expensive equipment. Smartphones and other devices are now dominated and improving every day. The field of mobile devices is very dynamic and rapidly developing. Formerly programs of powerful home computers are nowadays successfully ported to various mobile devices, and people can carry much valuable equipment in their briefcase or pocket. Moreover, this opens the door for AI to deliver practical products, innovative solutions, and smart services by taking advantage of the cutting-edge technologies that phone providers offer, which encourage the development of more demanding applications. In this project, we will try to participate in the artificial intelligence revolution by developing an application using the best technologies in this field.
  • 13. Page | 2 Introduction and context 1.1 Introduction The idea of this project is inspired by the famous "Cats vs Dogs" classification problem, which is considered the "hello world" program for ML [2]. Where we make a computer recognize and discern between dog or cat images. ML is an application of artificial intelligence, where we teach a machine to perform smart tasks. ML itself is a broad field of science, and one of its most widely used subfields is Computer Vision, which our project falls within it. This chapter will present the roadmap and the software development process we will use to create a complete image recognition program. 1.2 Project description This project is about a mobile application based on deep learning algorithms and works on an Android system. The application's main role is to identify the type of different Tunisian dishes from the live image feed of the phone's camera as shown in figure 1 and provide additional information on the result, including calories, ingredients, etc. The application can recognize up to ten different categories of Tunisian food. Figure 1 : Project description No internet connection is required in the userโ€™s phone as the ML model will be integrated directly into the application. And all of the image analysis processes will be directly performed by the phone processor. We can summarize the functionality of the app in four steps Taking a picture Providing Food Description
  • 14. Page | 3 1. The user takes a picture of his food with the phoneโ€™s camera. 2. Food Detector app analyses the food picture using a neural network. 3. Food Detector app displays what the food might be. 4. The user checks the complete food description. In later sections, we will dive deeper into this application functionality and its architecture. 1.3 Motivation After giving a brief introduction to the project, the utility of the app might seem a bit unclear. To remove this ambiguity, we will mention some of the problems that we will try to solve or minimize in this app. This app is not necessarily going to be useful right away for everyone, but it can become handy at some points later especially when we travel to a foreign country. As we all know, traveling is common for many reasons, such as tourism, working or studying abroad, vacations, etc. One of the problems that many of us may face while traveling is eating exotic food that we are not used to. Typically, it is the foreign country's local food. Sometimes the struggle begins while on the plane ride [3]. Besides this problem, we noticed that many people suffer from keeping up with a food diet plan and make the wrong food choices. And after several types of research, we found out it is seemingly due to a lack of knowledge about everything related to nutrition and food. As a nutrition coach mentions: โ€œKnowledge is imperative to any endeavor so why should this be any different in the case for weight loss? Too many people blindly throw themselves into this game headfirst without doing any research or laying any sort of foundation. It is this approach that results in yo-yo dieting and relapsesโ€ [4]. In addition to all of this, AI services integrated into mobile apps are not yet widely used in Tunisia, despite their significant influence in the world. AI with mobile apps can now provide more sophisticated services than any time before. 1.4 Aim of the work As we introduced different challenges and problems, in this part we will explain how this project will try to use machine learning techniques to resolve them. The mentioned problems address the lack of nutritional knowledge and having a bad experience with exotic food. In this application, we will gather information about different Tunisian food. When the user wants to discover more about his meal while he is in Tunisia, he can detect the
  • 15. Page | 4 food using the app. Then the food description will be displayed in a simple layout on the phone screen. It will include all the relevant information like ingredients, calories, recommendations, etc., so that the user can be aware of what he is eating and check if it will break his diet plan or being allergic to any ingredient included in the food. This application will use machine learning technologies to help the user to become familiar with new types of dishes. Furthermore, become able to adopt healthy eating habits overeating junk food, and encourage him to keep up with his diet. It also helps tourists avoid foods that taste strange or do not conform to their usual choices. Nevertheless, we need to mention that this application will only solve a part of the puzzle. because it works only on Tunisian food, and even with that, it only classifies 10 different famous Tunisian food types. This is because of the limited availability of images of some Tunisian dishes on the internet. As for each type of food, we need at least 1,000 images with high quality to get a good result. The 1,000-image magic number comes from the original ImageNet classification challenge, where each category of the dataset had around 1,000 images. This was good enough to train the early generations of image classifiers like AlexNet, and so proves that around 1,000 images are enough [5]. On top of that, the training of a large amount of data demands a huge amount of processing resources. It takes too much time to train neural networks to give more than 90% accuracy. However, the project could still be extended if the necessary capabilities and investments are available to encompass all the Tunisian food types or even other countries' food. 1.5 Dissertation structure The structure of this dissertation will be divided into a set of chapters. The first chapter is the โ€œIntroduction and context.โ€ The next chapter will be the โ€œLiterature reviewโ€, where we will discuss machine learning and the methods used in this field. After that, we will specify the application requirements in the โ€œGathering requirementsโ€ chapter followed by the โ€œsoftware designโ€ chapter in which we will define the design patterns and architecture of the program. Lastly is the "Realization" chapter in which we will create our machine learning model then deploy it to an android application. 1.6 Development Methodology Software development methodology involves dividing software and architecture into different stages with specific activities for more effective planning and management. Before choosing
  • 16. Page | 5 the right method for our project, we will state some of the popular methodological approaches used in software engineering. Then we will compare them to find the most suitable method. 1.6.1 Software development processes available โ€ข Agile Agile is based on highly iterative and incremental development by creating software in short time boxes in order to minimize risks [6]. Figure 2 : Agile Methodology [6] Agile is mostly a disciplinary approach (figure 2). That anticipates the need for flexibility and uses some frequent alteration before delivering the finished product. โ€ข Scrum Scrum is an agile methodology. It is a simple process where we try to speed up productivity and deliver products that focus on satisfying customers. Figure 3 : Scrum Methodology [7]
  • 17. Page | 6 As shown in figure 3, It does this by breaking the complexity down into smaller tasks. Then it divides them across all team members, where each one will focus on solving his dedicated task at a specific time according to a planning process. This step is repeated time and time again. After each incremental step, the team members will re-evaluate the product's current direction and decide which strategy is the most effective to achieve the goal. โ€ข Extreme programming Extreme programming is also an agile method [6]. It is a Lightweight process with the goal to reduce the cost of software requirements. Figure 4 : Extreme programming Methodology [8] Figure 4 illustrates the different process steps of XP methods. Where it takes traditional principles to extreme levels through several practices, including simple design, pair programming, constant testing, ongoing integration, and refactoring coding standards and small releases, it is mainly used for creating software within a volatile and dynamic environment. It allows for much better flexibility within the modeling process [4]. โ€ข Unified Process Unified Process is an architectural-centered, case-based, iterative, and incremental development process that uses the Unified modeling language. Unified Process can be implemented to different software systems with varying technical complexity and management levels in other areas and other organizational cultures. The unified process is divided into a series of timeboxed iterations, as shown in figure 5.
  • 18. Page | 7 โ€ข RUP Rational Unified Process method is referred to as a RUP. It separates the development process into four phases as shown in figure 6, which are Inception, Elaboration, Construction, and Transition. It is considered an object-oriented and web-enabled program development methodology. This method helps software developers to deal with changing requirements and provides guidelines, templates, and examples for all aspects of software development stages [4]. This method Describes how specific development goals should be achieved. Figure 5 : Unified Process [9] Figure 6 : Rational Unified Process [6]
  • 19. Page | 8 โ€ข Waterfall The waterfall method is one of the most traditional and commonly used software development methodologies [4]. It differs from agile and unified processes as it is about a sequential design process as shown in figure 7, meaning that the earlier phases define subsequent phases. In the waterfall method, there are seven stages from system feasibility down to operations and maintenance. 1.6.2 Methodologies comparison table The following table will list some advantages and disadvantages of the mentioned software development processes to help us decide this project's appropriate method. Methodologies Advantages Disadvantages Scrum โžข It contains a backlog listing out everything to do. โžข The team decides how much work to be done. โžข Meetings can be too long โžข Require a dedicated scrum master Figure 7 : Waterfall with overlapping phases [10]
  • 20. Page | 9 โžข Communication, which is an important part of the process, is achieved through meetings, called events, and Scrum events. โžข Hard to understand, requires team member guidance [6]. Extreme programming โžข Product delivery faster. โžข Allows software development groups to save costs and time needed for project realization. โžข Allows developers to produce quality software. Through regular testing at the development phases assures the detection of all bugs. โžข Impossible to know the exact estimation of the job effort required to produce a final product [6]. RUP โžข Comprehensive methodology It can proactively resolve project risks associated with the changing requirements of the client, which requires careful management of change requests [4]. โžข Less time is required for integration as the integration process continues throughout the software development cycle. โžข The development process is too complicated and unorganized on massive projects that use new technology [4] . โžข The reuse of components will not be possible. Waterfall โžข Suitable for simple structured projects. โžข Works well when requirements are well understood. โžข The cost of the model is low. โžข No iterations during project realization [11]. โžข No working product will be available until all phases are finished [11].
  • 21. Page | 10 โžข It includes testing, i.e., verification of completed operations and obtained results at the closure of each development phase [5]. โžข It is very difficult to go back and fix the software, especially at the testing phase. Table 1 : Methodologies comparison table 1.6.3 The chosen method Based on table 1, for this project, we decided to use the waterfall methodology. The idea behind the Waterfall method is that the project progresses to an orderly sequence of steps, from the initial software concept, down until the final system testing phase. This approach is suitable for this project where cost and time are constrained, and the scope and requirements are well understood. Also, the Waterfall methodology gives a set of processes built on the principle of approval of the previous phase, which fits our need to deliver a complete and validated project constrained with a specific time limit. Lately, this method has faced some criticism for being an outdated method due to the limitation in fixing defects that appear in later stages as it is based on linear sequential phases that always move forward, making going back and solving problems very daunting. Many modified waterfall models have been produced, like the "sashimi model" (waterfall with overlapping phases) in response to this problem. "The key feature of the Sashimi model is the possibility of overlapping development phases, i.e., introducing feedback into the classical waterfall model. The idea on which the model is based in identifying errors made on time while the development phase is still in progress. For instance, errors made in the design phase are identified during implementation, while the design is still in progress" [11]. The waterfall with overlapping phases version can overcome the original waterfall model's major problem, which is the difficulty of fixing errors that appears in earlier finished stages. 1.7 Project Planning Project planning is a critical phase. It is part of the project management process. Project management is considered a structured discipline that defines project goals, strategy, planning, and motivations. Its main objective is to produce a complete project that complies with the project's nature and scope. Typically, it is divided into five steps (figure 8).
  • 22. Page | 11 Figure 8 : Project management processes [12] 1.7.1 Project management processes โ€ข Initiation Initiation is the project's first process, where we defined the idea behind the project, overall goal, and project scope. โ€ข Planning Project planning is part of project Structure, which uses schedules such as Gantt charts to determine the progress within the project environment. It is the process where we define the project management methodology. โ€ข Execution The execution phase is the third phase of the project management lifecycle, and it is usually the most extended phase of the project. In this phase, we start executing our plan and methodology for developing our software. โ€ข Control Control is the phase where we perform our control and observation on the project by validating each step included. โ€ข Closure Project Closure is the last phase of the project life cycle. In this phase, we will formally close our project and prepared it to be delivered and presented. 1.7.2 Gantt chart Gantt chart is a commonly used chart in project management, as it is a way to show the schedule of project tasks over a specified date range.
  • 23. Page | 12 Task Name Start Date Due Date Initiation and planning 13/02/2020 13/03/2020 Research and studies 14/03/2020 30/04/2020 Requirements Analysis 01/05/2020 31/05/2020 Software design 13/07/2020 31/08/2020 Realization 01/09/2020 31/10/2020 Thesis writing 01/04/2020 31/10/2020 Table 2 : Project tasks timeline 1.8 Conclusion In this chapter, we have introduced the project idea. We have also discussed the problems that we will try to solve through the use of machine learning techniques. We have then outlined the thesis structure and software methodology that we are going to use, and lastly, we put the planning of the project. Figure 9 : Gantt chart
  • 24. Page | 13 Literature review 2.1 Introduction After introducing the project, in this chapter, we will address the scientific background of machine learning program development with a detailed elaboration about the theoretical meanings of some essential concepts, followed by relevant algorithms in this field since this project aims at using cutting-edge technologies. This chapter will include all the necessary knowledge to establish an integrated understanding of Machine learning and related fields. 2.2 Data Science Data science is a required field for developing ML programs regarding the data management process that we need to do to prepare training data for our machine to be trained on, so it is imperative to have a good understanding of it. Furthermore, many tutorial materials and e- learning websites about artificial intelligence and machine learning will assume that you already have a good knowledge of data science and data analysis [13]. The data science process usually contains seven essential steps as shown in figure 10. Figure 10 : The data science process [13]
  • 25. Page | 14 2.2.1 Acquiring and storing data Acquiring and storing data is the first step in data analysis. We need to find data related to our subject by collecting them from many sources available on the internet. Or by gathering observations or measurements from real world experiments. 2.2.2 Asking Questions This step can be either before gathering data or after it, depends on the project subject. In this step, we try to ask the right questions relevant to our topic, explaining how the data might be useful for understanding and defining the project's objectives. 2.2.3 Data preparation The next step of data analysis is data preparation, and this step has two parts, data cleaning and data transformation. First, we need to clean the data from any wrong or duplicated values. Then we transform it based on define mapping tools [13]. 2.2.4 Exploring data After cleaning the data, we start exploring it. We spend some time getting familiar with the different data sets we have. We can use descriptive statistics to discover some patterns, building intuition about it, and understanding their nature. 2.2.5 Machine learning model Next, we need to define the best machine learning techniques and algorithms that can work with our collected data types and extract features from them in a way that fits the functional and business requirements of the project. 2.2.6 Visualization and communication It is very important to communicate our findings to other people. There are a variety of formats this communication can take. We might create images, diagrams, or animations and share them on a paper, an email, a PowerPoint presentation, or have an in-person conversation. 2.2.7 Deployment finally, we need to deploy our final machine learning model for the production environment. Then we start exploiting and making decisions based on the results of the ML model.
  • 26. Page | 15 2.3 Machine learning Machine learning is a branch of computer science applied in different fields like finance, medicine, games, robotics, etc., with the aim of training a machine to perform tasks intelligently as a human does by using labeled data sets instead of explicitly coding the solution. This can be done by supervised or unsupervised learning. 2.3.1 Unsupervised Learnings Unsupervised learning is mostly used for clustering, where we are just given a data set with no corresponding labels paired with it. Then we attempt to learn some type of structure or pattern from the data plus extracting useful information or features from it. 2.3.1.1 Unsupervised Learning Algorithm There are different algorithms for unsupervised learning like k-means, k-medoids, hierarchical, gaussian learning, and even neural network, which can be considered as supervised or unsupervised learning at the same time. But mostly, it is classified as supervised learning. โ€ข K-means clustering k-means estimate for a given number of data points the best centers of k clusters representing it. In step one (figure11), we pick random k center points. And then, we connect points to the closest center. In the third step, we recalculate centers based on the mean of points, and in the last step, we repeat this process until there is no change in clusters data points. Figure 11 : K-means clustering
  • 27. Page | 16 โ€ข Hierarchical clustering In hierarchical clustering (figure 12), we classify data according to similarity metrics that we define, like Euclidean distance, minimum/maximum distance. There are two approaches we can use bottom-up or top-down. This approach is good for tree structures from data similarities. Figure 12 : Hierarchical clustering โ€ข K-medoids clustering K medoids clustering is similar to K-means, except in K medoids, we use another algorithm for defining the center points for clusters. By determining a center within the data sets, then calculating the total cost of swapping the center with another data element. Instead of picking the center from outside of the data. This algorithm can be more accurate, but it requires many iterations to converge. 2.3.2 Supervised Learnings Supervised learning is a machine learning approach mostly used when we have a problem that contains a dataset written as a set of example label pairs, where we have the label y associated with each example x [14]. In other words, supervised learning means that you have many examples where you know the correct answers in those examples. And have the computer figure out the rules for getting these answers.
  • 28. Page | 17 2.3.2.1 Supervised Learning Algorithm There are many supervised learning algorithms. Below we will site the most used ones. โ€ข Support Vector Machine SVM is a classification algorithm that finds a separating line between two data classes (figure13), which maximizes the distances to each class's nearest point equally. That distance is often called the margin. Figure 13 : Support vector machine โ€ข Naive Bayes Naรฏve Bayes algorithm is based on Bayes' theorem, which we can use to draw some conclusions about an event x given the observed probability of an event y [14]. ๐‘(๐‘ฅ|๐‘ฆ) = ๐‘ƒ(๐‘ฆ|๐‘ฅ) ยท ๐‘ƒ(๐‘ฅ) ๐‘ƒ(๐‘ฆ) (2.1) By using a large amount of data, we can calculate the number of occurrences of each element in our dataset. Thus, we can calculate the probability of belonging to a certain class for new data examples. โ€ข Nearest Neighbors k-nearest neighborsโ€™ algorithm is a non-parametric supervised learning method. It is very straightforward, we simply memorize all data, and then we label new example by finding k nearest neighborโ€™s majority class as illustrated in figure 14.
  • 29. Page | 18 Figure 14 : Nearest Neighbors: โ€ข Linear regression Linear regression is an algorithm that tries to fit a line (figure15) that best describes data sets that involve more than one dimension. for example, the size of a house is relative to its price. Or the age of a person and the person's income. With linear regression, we can draw a line representing a mathematical relationship based on a bunch of measurements of points to map new continuous inputs to outputs. Figure 15 : Linear regression โ€ข Neural network A neural network is a supervised learning method that uses multiple layers of a connected set of nodes(figure16). These layers transform an input set to a particular output set representing the result we want to get from a specific data. In order to obtain an accurate result, we need to train the neural network on labeled data. Next, we reduce the error we get from the model when we apply new examples by changing the connected nodes' values using calculus formulas.
  • 30. Page | 19 Figure 16 : Neural network Due to the excellent performance, deep neural networks have been widely used in image analysis, speech recognition, target detection, face recognition, and other fields. 2.4 Mathematics for AI Mathematics forms the foundations of machine learning. โ€œAs machine learning is applied to new domains, developers of machine learning need to develop new methods and extend existing algorithms. They are often researchers who need to understand the mathematical basis of machine learning and uncover relationships between different tasksโ€ [14]. ML is very much an interdisciplinary field. Even though it runs as a computer program, it heavily relies on calculus, statistics, linear algebra, and probability. โ€ข Calculus tells us how to learn and optimize our linear model. โ€ข Algebra makes running these algorithms possible as ML deals with matrices and vectors to represent data (text, images, etc.). โ€ข Statistics is at the core of everything. It is very helpful for optimization tasks. โ€ข Probability helps predict the likelihood of an event occurring. Therefore, we will illustrate some math concepts before diving into machine learning models. 2.4.1 Linear algebra Some concepts of Linear Algebra are important for understanding the principles behind Machine Learning. In this section, we will try to focus on the parts that are involved in ML. 2.4.1.1 Vectors
  • 31. Page | 20 The fundamental build block for linear algebra is vectors because linear algebra is the study of vectors and certain rules to manipulate vectors [14]. There are three distinct but related ideas about vectors: โ€ข In physics, vectors are quantity that has both magnitude and direction and can be placed anywhere. โ€ข In Computer Science, a vector is a collection of data where the order matters. โ€ข In math, a vector could be anything. It can be drawn everywhere in space [15]. 2.4.1.2 Operating on vectors โžข Addition: we can add two vectors [ 1 2 ] + [ 3 1 ] = [ 4 3 ] โžข Scalar multiplication: we can scale a vector by multiplying by a number. 2 ร— [ 1 2 ] = [ 2 4 ] โžข Magnitude: length of the vector. ๐‘ฃ โ†’ = [ โˆ’1 2 3 ] โˆฅ โˆฅ ๐‘ฃ โ†’ โˆฅ โˆฅ = โˆš(โˆ’1)2 + 22 + 32 = โˆš14 โžข Dot Product: is a way to measure the length of the projection of two vectors [15]. [ 1 2 โˆ’1 ] ยท [ 3 1 0 ] = 1 ยท 3 + 2 ยท 1 + (โˆ’1) ยท 0 = 5 2.4.1.3 Matrices A matrix in mathematics is a collection of vectors [14]. it represents a table of numbers arranged in rows and columns. In computer science, it is a two-dimensional set of numbers with m rows and n columns. 2.4.1.4 Linear Transformation A linear transformation is like a function in math. It takes a matrix and transforms it into another matrix or vector. And the function itself is a matrix. [ ๐‘Ž ๐‘ ๐‘ ๐‘‘ ] [ ๐‘ฅ ๐‘ฆ] = ๐‘ฅ [ ๐‘Ž ๐‘ ] + [ ๐‘ ๐‘‘ ] = [ ๐‘Ž๐‘ฅ + ๐‘๐‘ฆ ๐‘๐‘ฅ + ๐‘‘1 ] = [ ๐‘–โ€ฒ ๐‘ฆโ€ฒ] (2.2)
  • 32. Page | 21 We transfer the vector [ ๐‘ฅ ๐‘ฆ] by the matrix [ ๐‘Ž ๐‘ ๐‘ ๐‘‘ ] and we get [ ๐‘–โ€ฒ ๐‘ฆโ€ฒ] as output. This is like the behavior of a neural network (figure17) we take an image as an input, and it gives us the potential content of it as output, but in the case of a neural network, it is a group of matrices and vectors. Figure 17 : Neural network structure 2.4.2 Calculus In the last figure, we can see a simple neural network with nodes interconnected between each other. These nodes represent the core of the network. To find their value, we need to apply some calculus concepts, which we will explain in the following parts. 2.4.2.1 Derivative The derivative is a fundamental concept in calculus. We can consider it as the average rate of change of a function with respect to a single variable as shown in figure18. Figure 18 : The Derivative As AFunction The average rate of change It is a measure of how much the function changed per unit. ๐‘“(๐‘Ž+โ„Ž)โˆ’๐‘“(๐‘Ž) (๐‘Ž+โ„Ž)โˆ’๐‘Ž = ๐‘“(๐‘Ž+โ„Ž)โˆ’๐‘“(๐‘Ž) โ„Ž (2.3)
  • 33. Page | 22 We can calculate the second and third derivatives up to the 'nth' derivative using eq (2.3). For neural networks, derivatives are very important in the optimization process and in reducing the error rate. 2.4.2.2 Chain Rule Chaine rule allows us to calculate the derivatives of combinations of two functions [15]. ๐‘‘ dx ๐‘“(๐‘”(๐‘ฅ)) = ๐‘“โ€ฒ (๐‘”(๐‘ฅ))๐‘”1(๐‘ฅ) (2.4) We can extend the chain rule formula to calculate more complicated compositions like this function ๐‘“(๐‘”(๐‘˜)) And k can be a different composition of functions. In neural network models, we deal with multiple compositions, but instead of functions, we use matrices and vectors. 2.4.3 Multivariable calculus Multivariable calculus is the extension of calculus where we deal with multivariable functions that involve more than one input number, rather than just one variable. 2.4.3.1 Multivariable functions Multivariable functions are functions that assign multiple variables to a real number [15]. Ex: ๐‘“(๐‘ฅ) = ๐‘ฅ + 12 , normal function with one variable x. Ex: ๐‘“(๐‘ฅ, ๐‘ฆ) = 4๐‘ฅ + 2๐‘ฆ , multivariable function with two variables x and y. โ€ข We can write the function as a vector: ๐‘ = ๐‘“(๐‘ฅ, ๐‘ฆ) = [ 4๐‘ฅ 2๐‘ฆ ] โ€ข We can also graph the function in a 3D dimension: Z X Y 2.4.3.2 Partial Derivatives Figure 19 : Multivariable functions
  • 34. Page | 23 To calculate derivatives of multivariable functions, we need to use the partial derivative, which is very similar to the ordinary derivatives equation (2.3). The difference is with each variable of a function, we calculate the derivative concerning it, making all the other variables constant. Ex: ๐‘“(๐‘ฅ, ๐‘ฆ) = 4๐‘ฅ + 2๐‘ฆ we choose random variables for x and y example (2,1) Derivative with respect to x (๐œ•๐‘ฅ) : ๐œ•๐‘“ ๐œ•๐‘ฅ (2,1) = ๐œ• ๐œ•๐‘ฅ (4๐‘ฅ2 + 2 ยท 1) = 8๐‘ฅ = 1 Derivative with respect to y (๐œ•๐‘ฆ) : ๐œ•๐‘“ ๐œ•๐‘ฆ (2,1) = ๐œ• ๐œ•๐‘ฆ (4 ยท 22 + 2 ยท ๐‘ฆ) = 2 2.4.3.3 Gradient The gradient is a vector that gathers all the partial derivatives of a function [15]. ๐›ป๐‘“(๐‘ฅ, ๐‘ฆ, โ€ฆ ) = [ ๐œ•๐‘“ ๐œ•๐‘ฅ ๐œ•๐‘“ ๐œ•๐‘ฆ โ‹ฎ ] (2.5) The gradient is essential for neural networks as we are dealing with multivariable functions [14]. 2.4.3.4 Gradient descent Gradient descent is a popular algorithm used in machine learning .and many Deep Learning libraries support its implementation. It is used to optimize neural networks by iteratively moving in the direction of steepest descent, which is the gradient's negative [16]. We will be using gradient descent in the error function that we will talk about it later. 2.4.4 Mathematics Behind Neural Networks 2.4.4.1 Perceptron Neural networks are the central concepts of deep learning. They consist of artificial neurons connected to each other. A neuron is the basic processing unit in a neural network model. Generally, it is a unit with multiple inputs and a single output, which we call a perceptron.
  • 35. Page | 24 Figure 20 : The structure of an artificial neuron perceptron equation (figure20) : ๐’š = ๐’‡(๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + โ‹ฏ + ๐’˜ ๐’ ๐’™ ๐’ + ๐’ƒ) (2.6) ๐’‡ in equation(2.6) is an activation function. Generally, for neural networks, we use Sigmoid, ReLU, or Tanh functions [14]. The activation function is a way to map and normalize outputs to new values that optimize the computational performance without changing the network's computational state, so it is just a function to help the neural network process inputs information and map them to the correct outputs. Activation Function Formula Graph Sigmoid ๐‘บ(๐’™) = ๐Ÿ ๐Ÿ + ๐’†โˆ’๐’™ ReLU ๐’“๐’†๐’๐’–(๐’™) = { ๐’™ ๐’Š๐’‡ ๐’™ โ‰ฅ ๐ŸŽ ๐ŸŽ ๐’Š๐’‡ ๐’™ < ๐ŸŽ
  • 36. Page | 25 Tanh ๐ญ๐š๐ง๐ก(๐’™) = ๐’† ๐’™ โ€“ ๐’†โˆ’๐’™ ๐’† ๐’™ + ๐’†โˆ’๐’™ Table 3 : Activation functions ๐’š in equation (2.6) represents the result we get from this perception. Its range can be different based on the activation function but mostly between 0 and 1. Then this value is passed to other connected neurons to solve more complex problems. If we think of this perception as an individual classification model, we can consider ๐’š as a value for classifying a particular input based on a specific formula. for example, we have students grades ๐’™ ๐Ÿ , ๐’™ ๐Ÿ , โ€ฆ , ๐’™ ๐’ then we define weights ๐’˜ ๐Ÿ , ๐’˜ ๐Ÿ, โ€ฆ , ๐’˜ ๐’ and a bias ๐’ƒ to form a function ๐’‡ . In a way that a student to be succeeded this function needs to output a value higher than a particular threshold ๐œฝ : Student succeeded if โˆ‘ ๐‘“(๐’˜๐’Š ร— ๐’™๐’Š + ๐’ƒ) โ‰ฅ ๐œฝ ๐‘› ๐‘–=0 (2.7) Student failed if โˆ‘ ๐‘“(๐’˜๐’Š ร— ๐’™๐’Š + ๐’ƒ) < ๐œฝ ๐‘› ๐‘–=0 (2.8) 2.4.4.2 Error function For complex problems, we cannot determine weights and bias by ourselves. So, we need a way to compute these values, and therefore we use error or cost function. The most used ones are cross-entropy or mean squared error function [17]. First, we initialize our model with random
  • 37. Page | 26 weights, and then we measure the error made by the model by comparing the output it gives us with the correct answer that we already know. โ€ข Cross-entropy ๐‘ฌ = โˆ’ ๐Ÿ ๐’Ž โˆ‘ ๐ฒ ๐’Š ยท ๐ฅ๐จ๐ (ลท๐’Š ) ๐’Ž ๐’Š=๐Ÿ (2.9) For each ๐ฒ ๐’Š label(correct result), we use the predicted output ลท๐’Š by the classifier, and instead of multiplying them, we use the logarithm function for computational purposes. Then we sum all the results and divide by m the number of examples. To get the total error of our model that we need to reduce later. โ€ข Mean squared error: ๐‘ฌ = ๐Ÿ ๐’Ž โˆ‘(๐’š๐’Š โˆ’ ลท๐’Š ) ๐Ÿ ๐’Ž ๐’Š=๐Ÿ (2.10) The mean squared error function is another way to compute error for the model by subtracting the correct label results from predicted ones. It works very well for complex models. 2.4.4.3 Gradient descent To reduce the cost of the error function, we need to use the gradient descent algorithm with respect to all the weights of the function [17]. ๐œต๐‘ฌ = ( ๐๐‘ฌ ๐๐’˜ ๐Ÿ , โ€ฆ , ๐๐‘ฌ ๐๐’˜ ๐’ , ๐๐‘ฌ ๐๐› ) (2.11) As we mentioned earlier in equation (2.5), the gradient is a vector full of partial derivatives representing the steepest ascent direction of a function [14]. To reduce the error, we need to take a negative step in that direction to update the weights. ๐’˜๐’Š โ€ฒ โ† ๐’˜๐’Š โˆ’ ๐œถ ๐๐‘ฌ ๐๐’˜๐’Š (2.12) Then we keep repeating this step for each weight in the model until we get to a local minimum of the error function by using a learning rate ฮฑ to make small changes for weights each time
  • 38. Page | 27 because we do not want to pass a local minimum of the function. We can define the ฮฑ value by ourselves during the training and testing phase. 2.4.4.4 Feedforward The feedforward is a process that the multilayer neural networks do to receive the input vector's prediction [18]. It means applying all the perceptrons of a model (figure 20). Figure 21 : Example of a simple neural network Compound function : ๐’š = ๐ˆ ( ๐’˜ ๐Ÿ“ ๐’˜ ๐Ÿ” ) ๐ˆ ( ๐’˜ ๐Ÿ ๐’˜ ๐Ÿ‘ ๐’˜ ๐Ÿ ๐’˜ ๐Ÿ’ ) ( ๐’™ ๐Ÿ ๐’™ ๐Ÿ ) (2.13) 2.4.4.5 Backpropagation Backpropagation is an algorithm to compute the gradient for multilayer neural networks. As its error function is a composite function, it uses the chain rule we discussed earlier [18]. Chain rule: Figure 22 : The chain rule
  • 39. Page | 28 First, we do the feedforward process of the inputs ๐’™ ๐Ÿ and ๐’™ ๐Ÿ with the two layers. ๐‘พ๐’ ๐Ÿ (๐’˜ ๐Ÿ, ๐’˜ ๐Ÿ, ๐’˜ ๐Ÿ‘, ๐’˜ ๐Ÿ’) and ๐‘พ๐’ ๐Ÿ (๐’˜ ๐Ÿ“, ๐’˜ ๐Ÿ”) as shown in figure 21. ๐’‰ ๐Ÿ = ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ ๐’™ ๐Ÿ + ๐’ƒ (2.14) ๐’‰ ๐Ÿ = ๐’˜ ๐Ÿ‘ ๐’™ ๐Ÿ + ๐’˜ ๐Ÿ’ ๐’™ ๐Ÿ + ๐’ƒ (2.15) ๐’‰ = ๐’˜ ๐Ÿ“ ๐ˆ(๐’‰ ๐Ÿ) + ๐’˜ ๐Ÿ” ๐ˆ(๐’‰ ๐Ÿ) (2.16) ลท = ๐ˆ( ๐’‰) = ๐ˆ โˆ˜ ๐‘พ๐’ ๐Ÿ โˆ˜ ๐‘พ๐’ ๐Ÿ ( ๐’™) (2.17) Then we need to calculate the derivative of the error function with respect to weights using the loss function equation (2.10). ๐‘ฌ in the equation (2.10) can be seen as the function on all the ๐’˜๐’Š ๐‘ฌ(๐‘พ) = ๐‘ฌ(๐’˜ ๐Ÿ, โ€ฆ ๐’˜๐’Š) Figure 23 : Backpropagation After that, we apply the backpropagation. For Example backpropagation of ๐’˜ ๐Ÿ shown figure 23 by calculating partial derivatives of the equation (2.17) is: ๐๐‘ฌ ๐๐’˜ ๐Ÿ = ๐๐‘ฌ ๐ลท ๐ลท ๐๐’‰ ๐๐’‰ ๐๐’‰ ๐Ÿ ๐๐’‰ ๐Ÿ ๐๐’˜ ๐Ÿ (2.18) 2.5 Deep Learnings Deep learning is part of machine learning, where we rely on artificial neural networks to solve complex problems in terms of the volume of data that cannot be solved by traditional machine learning methods. 2.5.1 Artificial Neural Network ANN is a group of multiple perceptron layers where a forward propagation transforms input data through these layers to give us a new output. ANN consists of 3 layers Input, Hidden, and
  • 40. Page | 29 Output [19].ANN is simply a neural network where each node of a layer is fully connected with the next layer's nodes. 2.5.2 Recurrent Neural Network The previous neural network we mentioned was trained using current inputs. We did not consider prior inputs when generating the output. In RNN, instead, we save results from the previous feedforward process of the system to use in our next iteration (figure 24), so it can process sequential data without losing the relational information between them. For example, in a text, we cannot just process each word by itself to predict a paragraph's meaning. We need to understand the word context by processing the previous and subsequent words a well. In other words, we can consider Recurrent Neural Network as a looping process where we keep previous results with new inputs in each iteration to predict the final result. 2.5.3 Convolutional Neural Network ANN is an excellent neural network structure, and it works very well for solving specific problems. However, for image classification, it only works when we give it a set of images where we place the target object in the center. ANNs are not suitable for images because these networks can cause vanishing and exploding gradients. Especially if we have a network with many hidden layers, where the size of the trainable parameters inside images can reach thousands of pixels as each pixel is coded in 3 color channels. This can lead ANN to lose spatial features of an image [19]. On the other hand, CNN reduces the dimensions of many parameters to a small number of parameters using image filters that track spatial information and learn to extract features such as the edges of objects or shapes as explained in figure 25. Figure 24 : Representation of RNN both in folded and unfolded forms
  • 41. Page | 30 Figure 25 : Convolutional Neural Network CNNs are made of three main types of layers: convolutional layer, pooling layer, and fully connected layer [20]. โ€ข Convolutional layer: its primary role is to track the picture's characteristics. It consists of a set of filters. โ€ข Pooling layer: its main role is to reduce the dimensionality of the data. โ€ข Fully connected layer: output the results we want according to different tasks. Since 2012 CNN achieved a state-of-the-art result in the Image-net challenge and caused a huge advance in this field Image-net. Figure 26 : The annual winner of the ImageNet challenge [21]
  • 42. Page | 31 โ€œImageNet is formally a project aimed at (manually) labeling and categorizing images into almost 22,000 separate object categories for the purpose of computer vision researchโ€ [21]. 2.6 Neural network evaluation metrics Neural network evaluation metrics are used to measure a machine learning model's performance compared to other models using mathematical formulas to create a model that gives high accuracy. Below we will site some of the metrics used in neural network models. 2.6.1 Classification accuracy The first metric we will talk about is classification accuracy. It is mainly used in classification models where we get a percentage representing the accuracy of the model. ๐ด๐‘๐‘๐‘ข๐‘Ÿ๐‘Ž๐‘๐‘ฆ = ๐‘๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘œ๐‘Ÿ๐‘Ÿ๐‘’๐‘๐‘ก ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘œ๐‘›๐‘  ๐‘‡๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘œ๐‘›๐‘  (2.19) This equation is very intuitive. We divide the number of correct predictions by the total number of predictions. 2.6.2 Confusion matrix The confusion matrix is more expressive than accuracy in terms of the type of error. We can have a false positive (FP) or true negative made by the model and other combinations like in the figure 27 below TP, FP, FN, and TN, which can help us understand some other metrics. Figure 27 : Confusion matrix
  • 43. Page | 32 โ€ข Precision In some cases, we want to focus mostly on avoiding false positives like putting an important email in a spam folder when it is not spam. For this kind of case, we use the precision metric. ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› = ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ + ๐น๐‘ƒ (2.20) If we have no false positives, the result will be 1, meaning the model is perfect for our case. โ€ข Recall ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ = ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ + ๐น๐‘ (2.21) The recall is the inverse of the precision metric, where we only focus on reducing TP. โ€ข F1 score F1 score calculated using the precision and recall represents the overall accuracy that summarizes the confusion matrix result. F ๐Ÿ = 2 ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ (2.22) 2.6.3 Log Loss Log loss is another way of assessing a machine learning model's performance, and it is also often used as a loss function. โˆ’ 1 ๐‘› โˆ‘ y๐‘–log(p๐‘–) + (1 โˆ’ y๐‘–)log(1 โˆ’ p๐‘–) ๐‘› ๐‘–=1 (2.23) Int this equation ๐‘› is the number of observations,y represents its true label in a binary case (0 or 1), and p is the model predicted probability. 2.7 Conclusion Machine learning, in recent years, has become very advanced and capable of solving challenging problems. In this chapter, we have mentioned many approaches and techniques used in machine learning. For our project, we will use the Convolutional neural network as it is the latest and the most successful technology in the image classification field.
  • 44. Page | 33 Gathering requirements 3.1 Introduction Before creating any software project, it is necessary to define the technical and functional requirements concerning the project's specifics. In this chapter we will list and demonstrate the project's operational behavior and the methods and tools that we will use to create an Android application capable of classifying different food images from a phone camera. 3.2 Requirements 3.2.1 Functional requirements Functional requirement Description Provide application guide The user should be able to use the application correctly by providing him with a simple user guide inside the app. Asking for camera permission The app should ask the user for permission before accessing the phone camera. Displaying camera preview The users should be able to see a preview of their target food plate. Classifying The application should run a classification process on captured images. Displaying results The application should display the probable food categories as soon as it finishes the classification process. Providing food description The user should be able to view a full description of the classified food category. Table 4 : Functional requirements
  • 45. Page | 34 3.2.2 Nonfunctional requirements Non-functional requirement Description Design The app should be aesthetically pleasing and appropriately designed to satisfy the end-user requirements. Quality The app should run smoothly and should avoid any memory leaks and bugs that can affect the Android OS or any other running apps. Accuracy The app should provide reliable data to the user. Compatibility For any device that uses the Android operating system, the app should run correctly without showing any bugs drawbacks. Accessibility The application must be designed and developed so that anyone can use it. Table 5 : Nonfunctional requirements 3.3 Technical requirements 3.3.1 Tools for preparing data Machine learning programs, in general, require labeled data sets. It could be images, texts, or any data that can be represented numeracy. For this project, we will use labeled images to train our model to classify new image inputs for training this project. We can get this data by collecting it from different sources. After that, we need to clean and prepare the data. 3.3.1.1 Data scraping The data will mainly be from the web, so we need a tool to scrape data from different websites. In this project, we will use a software called Parsehub.
  • 46. Page | 35 Figure 28 : PurseHub interface Parsehub is a powerful and free tool for scraping different types of data like images, titles, texts from websites that contain a large number of contents like a list of pictures, a list of films, and so on. This tool offers an automatic and fast way to collect these data as shown in figure 28. 3.3.1.2 Cleaning images After data scrapping, we need to make sure that there are no repeated images in our food data set, but we cannot do this manually as we are dealing with a high volume of data, so we will be using a tool called "Duplicate Photo Cleaner" as showing in figure 29. Figure 29 : Duplicate Photo Cleaner
  • 47. Page | 36 DPC is an advanced image similarity detector. It is an excellent tool for everyone who takes photos with their smartphone. Unlike ordinary duplicate image finders, Duplicate Photo Cleaner can compare images based on how similar they look [22] . 3.3.1.3 Preparing images The collection of images must be all of the same sizes, and they must be like the size of a photo taken from a phone camera. For this task, we will use a tool called JPEGCrops (figure 30). Figure 30 : JPEGCrops interface JPEGCrops is a Windows program created for the preparation of a batch of images for printing. It provides lossless cropping with fixed aspects using jpegtran. [23] 3.3.2 Machine learning method The project aims to apply Machine learning for creating a useful food classifier. We mentioned many different types of machine learning approaches in the previous chapter, where we have concluded that the best method used when building an image classifier is the convolutional neural network. 3.3.2.1 Deep learning frameworks For building deep learning models, there are different frameworks the most famous ones are TensorFlow and PyTorch as shown in the diagram in figure 31.
  • 48. Page | 37 Figure 31 : Online job listing growth 3.3.2.2 Framework comparison TensorFlow PyTorch โ€ข Developed by Google โ€ข Difficult Debugging โ€ข Open source โ€ข Static network graph [24]. โ€ข Big community โ€ข Good for production โ€ข More mature โ€ข Developed by Facebook โ€ข Good for debugging โ€ข Open source โ€ข Uses dynamic computational graph [24]. โ€ข Based on Python โ€ข Popular in research labs [24]. โ€ข Relatively new Table 6 : TensorFlow and PyTorch comparison We can see in table 6 that both frameworks are great for creating neural network models but for this project, we decided to use PyTorch as it offers a more object-oriented approach and good for learning and research.
  • 49. Page | 38 3.3.3 PyTorch PyTorch is an open-source machine learning library used for developing and training neural networks based on deep learning models. It is primarily developed by Facebookโ€™s AI research group. PyTorch can be used with Python as well as C++ [24]. 3.3.3.1 Programing language We need to install the Python language package for running the PyTorch framework. Python is the most used language in machine learning and data analysis. It is very straightforward and simple, especially for mathematicians and researchers who want to get involved in developing programs related to their field. There are many distributions for Python, but the best data science and machine learning distribution is Anaconda. It includes all the required libraries and APIs for Machine learning. We can flexibly be adding more through a graphical user interface called Anaconda Navigator that enables us to launch applications and efficiently manage Conda packages. Figure 32 : Anaconda Navigator 3.3.3.2 Coding environment To write the convolutional neural network model code in Python, we need to prepare our coding environment. There are many choices that we can pick from. The choice will not affect the project quality, so each developer can decide the IDE he is comfortable with.
  • 50. Page | 39 โ€ข Visual Studio Code For this project, we will use VSCode. VSCode is a free and open-source code editor from Microsoft, and it runs on all major platforms, so it is available for Mac OS, Windows, and Linux. It is a very lightweight code editor and contains many additional plugins and APIs that we can import, so it is an excellent choice for our developing machine learning project. โ€ข Jupyter notebook Alongside VSCode, we will use a Jupyter notebook. Jupyter notebook is like a web page that holds a document where you can execute chunks of programming code one chunk at a time. And you can insert explanatory text and even data visualization, tables, equations, and graphs with the code as shown in figure 34. Jupiter notebook is open-source and was created for data science and machine learning researchers. Figure 34 : Jupyter Notebooks in Visual Studio Code Figure 33 : VSCode screenshoot
  • 51. Page | 40 we will be using Jupiter notebooks because we need to see the output of our code fragments frequently, and we need to draw some graphs for debugging purposes, which is what Jupyter notebook allows. 3.3.3.3 Libraries and APIs The PyTorch library contains many useful features for building a neural network, but we need to use additional libraries to work with it. โ€ข Matplotlib Matplotlib is the most popular plotting library for Python. It provides numerous ways to create statically and animated visuals (figure 35). And it works very well with PyTorch and NumPy. Figure 35 : Matplotlib style sheets โ€ข NumPy NumPy is one of the most powerful Python libraries [25]. With NumPy, we can practice simple image processing techniques. Because NumPy can represent images as a multi-dimensional array. NumPy is a scientific computing library used by numerous other Python data science libraries. It contains many functions that work with linear algebra, statistics, simulation, data science, machine learning, and so much more. โ€ข CUDA CNN consists of many hidden layers. So, if we try to train it on the CPU, it will take pretty much forever. The solution for this is to use the GPU. Which are built specifically for doing a bunch of linear algebra computations in parallel, and neural networks are fundamentally just a bunch of linear algebra computations. So, if we run on GPU, computation will be done in parallel, and we get about 100 times more speed than the CPU. In PyTorch, we can move our model parameters from the CPU over to the GPU by installing the CUDA toolkit (figure 36) from Nvidia to our operating system.
  • 52. Page | 41 Figure 36 : CUDA ecosystem diagram CUDA toolkit is a software platform that pairs with Nvidia GPU device to facilitate building programs that increase computational speed using NVIDIA GPUs' parallel processing power. 3.3.4 Deployment For the last phase of the project, we need to deploy our neural network to a mobile application so that users can use it wherever they go. 3.3.4.1 Building Android application There are two major mobile phone operating systems, IOS and Android. Android is taken most of the market share as shown in figure 26 due to the varied price range of Android devices that make it affordable for many people in countries with developing economies. Figure 37 : OS market share [26]
  • 53. Page | 42 Besides that, IOS development must have a Mac computer and an IOS device. So, I decided to use Android as I have a windows pc and an Android device. 3.3.4.2 Programing language Android development has changed a lot recently. Android apps are now built either using Java or Kotlin language. Java was the default language, but recently Google announced that Kotlin will replace Java as the official language for Android development. We still can use Java, but Kotlin is now considered more efficient. 3.3.4.1 Programing language comparison Kotlin Java โ€ข Can inference the type of the variable at compile time. โ€ข Null safe, all types of variables are non- nullable. โ€ข Provides developers the ability to extend an existing class with new functionalities. โ€ข Do not have checked exceptions. โ€ข It contains data classes specially made for handling the work for us. โ€ข We need to specify the type of declared variables explicitly. โ€ข Null Exceptions allow users to assign null to variables. โ€ข To add new functionalities to a class, we need to create a new class and inherit the parent. โ€ข It contains checked exception support. โ€ข We need to create a data class and its constructors, setters, and getters methods ourselves. Table 7 : Kotlin , Java Comparaison [27] As we can see in table 7, Kotlin is the most suitable choice for our project also it is Google's preferred language for Android development. 3.3.4.2 Coding environment For building an Android app, we will use Android studio as the coding environment because it is the official IDE for Android, made by Google..
  • 54. Page | 43 Figure 38 : Android studio interface As shown in figure 38 Android studio is a robust code editor that helps with creating new projects, as well as adding new modules, and gives a comprehensive representation of the project structure, providing quick access to resources, code, and files. 3.3.4.3 Libraries and APIs โ€ข CameraX CameraX is a Jetpack library designed to make camera app development easier [28]. Because writing a camera app using the standard camera API is a challenging task for developers. That is why Google built this API, which is very easy to understand and significantly reduced the total amount of code that we must write. The CameraX API is built on top of the Camera2 API to achieve a consistent experience across all the device types. โ€ข PyTorch with Android After training our convolutional neural network model, we need to pass it to an Android app. PyTorch provides APIs that cover standard preprocessing and integration tasks required for integrating Machine learning models in mobile applications and reduces integration errors by allowing a seamless process to go from training to deployment by remaining entirely within the PyTorch ecosystem.
  • 55. Page | 44 โ€ข Material design support library: Material Design is a design language made by Google. It is an adaptable design system backed by open-source code that helps developers build high-quality digital experiences. From design guidelines to developer components (figure 39), Material design can help develop products faster, and it makes sure our app works for all users, regardless of the platform. Figure 39 : Material Design Components 3.4 Conclusion So far, we were successfully able to Select our method and tools needed for doing the project, based on the studies we made and discussed in the earlier chapter. The next chapter will go into detail about the method for designing our application and the approach used.
  • 56. Page | 45 Software Design and Architecture 4.1 Introduction Before beginning the realization phase, in this chapter, we will define the overall software architecture and the design patterns we will use during the project's realization. We will include conceptual visualization and diagrams to illustrate our architecture. 4.2 Software best practices This Android application is intended to be used by multiple users, so to make sure the code is efficient and avoid making mistakes, we will follow specific standards used in software engineering for data Science [29]. 4.2.1 Clean and modular code The first practice we will talk about is writing code in a way that is clean and modular. Code is clean when it is clear, simple, and compact. This makes it much easier for developers to understand and reuse code, especially when iterating over a project. Also, our code should be modular. Meaning the program is broken up into functions and modules. A module is just a file. Like a function, we encapsulate code in it and reuse it by calling the function in different places, the same for a module. We can encapsulate code within a file and reuse it by importing it into separate files. This helps us to write fewer unnecessary lines of code. 4.2.2 Efficient code Writing efficient code is very important, especially for usersโ€™ experience. There are two parts to making code efficient. Reducing the time it takes to run and reducing the amount of space it takes up and memory usage. This is very important for developing our mobile application since our app runs on the user device, and updates happen instantaneously. For machine learning, the model will be trained locally before being integrated into the Android app. so we can use slow code because the essential thing is to produce a module that can classify images with the highest possible accuracy. 4.2.3 Refactoring code Refactoring code is a step done after writing a program that solves a new problem because when writing the code for the first time, we do not pay attention to the code structure and arrangement. We manage to focus on just doing the code work, which can cause the code to be
  • 57. Page | 46 a little bit unorganized and repetitive. That is why we should always go back to do some refactoring after achieving a working model. Refactoring means restructuring the code to improve its internal structure without changing its external functionality. Refactoring gives us a chance to clean and restructure our program and modularized it. 4.2.4 Documentation Documentation is the additional text that comes with or is included in the software code to compactly represented it. It helps to clarify complex parts of the program, especially if we are dealing with hundreds of lines. Documentation makes it easy to navigate throughout the code without getting lost and quickly understand how and why different application components are used. We can add different documentation types to our software like: โ€ข In-line comments: are used to clarify a specific line of code. โ€ข Docstrings: This is a way to create documentation for a function or a module describing its purpose and details. โ€ข Project documentation: We can add it at the project level, like using a readme file to document details about the project. 4.2.5 Version control The version control system's primary purpose is to help multiple developers work independently on the same project without making conflicts, but that is not the only use. We still can benefit from using version control as it creates safe points that save our project progress, and we can try out new code branches without losing previous code. For this project, we will use Git because it is the most common version control system. 4.3 Software Design Software design is usually broken into two different phases, architectural design, and detail design. Architectural design is the process of dividing the programs into components, assigning responsibilities for aspects of behavior to each component, and addressing how the components interact with each other. Detail design is more related to the functional requirements, where we create a full definition of every aspect of project development. In this project, we are dealing with two separate and independent Programs a machine learning program for creating a neural network and an Android application to use the produced model as illustrated in figure 40.
  • 58. Page | 47 Figure 40 : Project Structure There is no direct interaction between these two systems. As shown in the figure above, they both run in a separate timeline. First is the ML program train and generates a convolutional neural network model, and then the model is then imported to the Android application to classify captured images. As these programs are not related to each other, we will go directly to the detailed design of each of these programs individually. 4.4 ML Lifecycle Machine learning is considered a data science analysis more than a software development process because it relies on training data sets and statistics to solve problems. It has a different lifecycle, as shown in figure 41. We have already made the Asking questions phase in our โ€œIntroduction and contextโ€ chapter. We will start directly from the preparing data phase. Figure 41 : Machine Learning Lifecycle [13]
  • 59. Page | 48 4.4.1 Preparing data Preparing data is the first process in machine learning. The first thing we need to do is to define data categories. For this project, we decided to go with only ten categories because it is tough to find extensive image data on an innovative project idea, like classifying Tunisian food. Furthermore, we need approximately 1000 labeled images for each food type, as explained in the introduction chapter. Our image data sets will be divided into ten different folders where the folder's name represents the data labels as shown in figure 42. Figure 42 : Data Structures 4.4.2 Algorithm Selection 4.4.2.1 Transfer learning For creating the neural network, we will use a technique that has proven to be very good in solving complex problems without the need for massive data sets [30] , or months of training duration. This technique called Transfer learning, transfer learning means taking a model that has been trained for one task and then tuning it to accomplish another task (figure 43). Figure 43 : Transfer Learning Technique
  • 60. Page | 49 More specifically, Transfer learning refers to the process of taking a pre-trained neural network and using it with our classifier model (figure 44) and training them on our dataset by freezing the weights of the CNN model as it is already trained. This CNN model can still extract general features from our data samples, while the classifier model uses this information to classify the data in a way that is pertinent to our problem [30]. This technique has proven to work very well, especially for convolutional neural networks that have been trained on millions of images of the โ€œImageNetโ€ challenge that happen each year to discover the best possible ML model. Because as we mentioned in the literature revue chapter, CNNs use image filters to extract features from training data and then pass it to a classifier neural network for classification. And when the model is trained on a colossal amount of data samples belonging to a large variety of categories, the model becomes able to extract features from any new data. Then we can use it to add our classifier neural network to be trained for our specific problem without changing the pretrained CNN filter parameters. This technique gives an astonishing result, as it has been tested numerous times in different machine learning researchersโ€™ papers. . Figure 44 : Neural network Structure For this technique to work on our ten food categories, we need to create our classification model to be added at the end of the pre-trained CNN structure as shown in figure 44. Many CNNs have performed well in the โ€œImageNetโ€ competition. One of them is the MobileNet model, which we will be using, as it is a light version of about 10 MB space compacted for mobile phone devices.
  • 61. Page | 50 4.4.2.2 Classifier Model Structure Once we have the pre-trained CNN, we need to overwrite its Classifier model to our model. Classifier models are typically divided into three parts : โ€ข Input layer The classifier model's input layer is where the CNN output layer's feature extraction result is passed. To fit it exactly to our model, we need to make the input layer size resembles the CNN output size, which is 2048 nodes for the MobileNet CNN shown in figure 45. โ€ข Hidden layers Defining Hidden layers is the most challenging part, as there are no specific rules because each problem has its characteristics. What many developers do is testing different structures, and then they compare results. This process requires a vast computing resource. For that reason, we will just use the most common format used in problems like our situation, where there are only ten output classes. โ€ข Output layer The output layer is the result layer. The size of it will be ten nodes as our defined food categories. We will use the SoftMax function as shown in figure 45 to find a probability between 1 and 0 about the most probable class. Figure 45 : Classifier Structure
  • 62. Page | 51 The first and second hidden layers will use the ReLU activation function. The first layer will use 1000 nodes, and the second one with 500 nodes. 4.4.3 Training the Model For training the classifier model, we will use the cross-entropy loss function. Before that, we need to divide our image data sets into training, testing, and validation data [18]. As illustrated in figure 46. Because ML models tend to perform well on the training data sets but cannot generalize to data that has not to be viewed before, this is called the overfitting problem. To avoid this problem, we need a proportion of the data to be for validation in order to test when we should stop the training process. And at the end, we need another portion of data outside of the training process to test the model's real performance as if it is working on real-world data. Figure 46 : Data Sets Division Validation data will be involved in the training phase to examine the real performance of our model. Then we save checkpoints for each iteration in the model that makes a better result on the validation data set as shown in figure 47. Figure 47 : Checkpoints Design Pattern [31] 80% 10% 10%
  • 63. Page | 52 4.4.4 Evaluating the Model To evaluate the model, we will use the testing data to measure the model precision by dividing the number of correct classifications by the total number of the data set elements. 4.4.5 Deploying the model Before we proceed to the Android development, we need to save the final result of the CNN model by using a serialization method included in the PyTorch library to generate a serialized version of the model for the Android application. This model will then be packaged inside our application as an asset that we can run on the mobile device. 4.5 Android software design The second part of the project is about designing an Android application. We will start first by presenting an overview of the Android application structure and activity lifecycle. 4.5.1 Android Applications structure Mobile apps are slightly different from standard software for Android platform applications use up to four basic components [32], and two other additional components. 4.5.1.1 Basic components โ€ข Activities : Activities are the fundamental building blocks for Android apps. An activity can be considered as an individual window containing a graphical interface for interaction with the user. โ€ข Services : Services represent a process running in the background, designed for continuous operations, which do not have a graphic interface. They are usually used to perform long-lasting tasks. โ€ข Content providers : Content providers grant a level of abstraction for any data stored on the device that can be accessed from multiple applications. โ€ข Broadcast providers : Broadcast providers are system messages that circulate in the device and alert applications to various events.
  • 64. Page | 53 4.5.1.1 Additional components โ€ข Fragments : Fragments are an optional component. They help to change the configuration of activities to support large and small screens on mobile devices. โ€ข Views : Views are the basic building blocks of the application user interface. They are arranged in a tree and used to display text fields, images, buttons, and so on. 4.5.2 Activity lifecycle Figure 48 : Activity lifecycle in Android [33]
  • 65. Page | 54 The life cycle of an Android activity has four basic states controlled by six callbacks, as shown in figure 48: โ€ข Launched State: this state is when the user launches the app by clicking on its icon. The Android system will then create a new instance for the launched activity. โ€ข Running State: this is when the activity is displayed on the screen executing its code or waits for the user input. It is the state between the onResume() and onPause() callbacks. โ€ข Killed state: This is when the activity still saves the necessary data. The user always has access to it, but the Android system shut it down to save memory for a higher priority app that the user is focusing on. โ€ข Shutdown state: This is the final phase where the app memory is being released before being shut down. 4.5.3 Software architecture For this Android application, we will use three Activities Classes, one for showing a user guide, one for the camera and image classification process, and one to display the full description of the result food type. 4.5.3.1 Welcome/Guide Activity UI Components (figure 49): โ€ข ConstraintLayout parent view for structuring and organizing child views. โ€ข TextView for the activity title โ€ข ViewPager to display guide โ€ข Button view to skip to the next Activity Figure 49 : Welcome Acitvity
  • 66. Page | 55 4.5.3.2 CameraClassification Activity 4.5.3.3 Description Activity Figure 51 : Description Activity UI Components (figure 50): โ€ข ConstraintLayout parent view. โ€ข TextureView for camera preview. โ€ข LinearLayout parent view. โ€ข TextView/Button for top 1 result. โ€ข TextView/Button for top 2 result. โ€ข TextView/Button for top 3 result. UI Components (figure 51) : โ€ข ScrollView parent view. โ€ข ConstraintLayout parent view. โ€ข TextView for food name. โ€ข ImageView for food image. โ€ข LinearLayout parent view โ€ข TextView for Description text. โ€ข TextView for Ingredients list. Figure 50 : CameraClassification Activity
  • 67. Page | 56 UI Component Description TextView TextView is a view that displays a text or any type of string. ImageView ImageView is a view that displays an image from its source path. Button Button is a view to display a button Ui component that can handle click events. TextureView TextureView is a view that displays the content stream. ViewPager ViewPager is a view that allows the user to swipe left or right to view multiple contents. We will use it to display the guide. LinearLayout LinearLayout is a view group that linearly organizes subviews. ConstraintLayout ConstraintLayout is a view group to place views with position constraints. ScrollView ScrollView is a view group to display subviews on a scrollable page. Table 8 : UI Components 4.5.3.4 CameraClassification Class Diagram This UML Class diagram shown in figure 52 is to explain the internal structure of CameraClassification Activity. Figure 52 : CameraClassification Class Diagram
  • 68. Page | 57 Class Description BaseModuleActivity It is a base class for activities from Android SDK. ImageClassificationActivity This activity class is responsible for the image classification process, reading camera data, and loading the Machine Learning model. AbstractCamera The class provides an API surface that connects to an Android device camera and requests an image stream. BackgroundThread This class is used to create a background thread using the HandlerThread class. PythonModel This class is responsible for holding the Machine Learning model. AnalyseResult This is an inner class to exchange result data between the background thread and the main thread. Table 9 : Kotlin Classes Description 4.6 Conclusion In this chapter, we have outlined the software structure in a way that replies to the specified project requirements. We have also defined the Machine learning processes and algorithms for creating a convolutional neural network. And, we have defined the design pattern for the Android application. And now we are ready to pass to the realization phase.