This project is carried out as part of preparing an end-of-studies dissertation presented to the Higher Institute of Computer Science and Mathematics of Monastir known as ISIMM to obtain a master's degree in software engineering. This Master Thesis deals with the implementation of an android application for image classification. The application works as a food detector for famous Tunisian dishes by applying a convolutional neural network to process images recorded by the phone's built-in camera and provides an accurate result of what the plate might be.
Russian Call Girls in Karol Bagh Aasnvi โก๏ธ 8264348440 ๐๐ Independent Escort S...
ย
Android image recognition app using neural networks
1. Ministry of Higher Education and Scientific Research
Monastir University
*-*-*-*-*
Higher Institute of Computer Science and Mathematics of
Monastir
Graduation Project
Android application for image recognition
using artificial neural network
A thesis presented to obtain a master's
degree in software engineering
Created by:
Helmi Ben Khalifa
Dr. Asma Kerkeni President
Dr. Manel Sekma Examiner
Dr. Souhail Mallat Supervisor
University year: 2020/2021
2. Acknowledgments
I want to thank:
My family, who has always supported and encouraged
me during the realization of this project,
My supervisor Dr. Souhail Mallat, for supporting this
project and providing valuable assistance,
All executives of the Higher Institute of Computer
Science and Mathematics of Monastir (ISIMM),
And the thesis committee who agreed to participate in
the evaluation of this project.
Thank you very much for helping me when I needed
help. Thank you very much for the support you have
given me to finish this work.
3. Abstract
This project is carried out as part of preparing an end-of-studies dissertation
presented to the Higher Institute of Computer Science and Mathematics of
Monastir known as ISIMM to obtain a master's degree in software engineering.
This Master Thesis deals with the implementation of an android application for
image classification. The application works as a food detector for famous
Tunisian dishes by applying a convolutional neural network to process images
recorded by the phone's built-in camera and provides an accurate result of what
the plate might be.
Keywords: Machine Learning, Image classification, Transfer learning, Computer
vision, Image recognition, PyTorch, Android.
Rรฉsumรฉ
Ce projet est rรฉalisรฉ dans le cadre de la prรฉparation dโune mรฉmoire de fin dโรฉtudes
prรฉsentรฉe ร lโInstitut supรฉrieur dโinformatique et de mathรฉmatiques de Monastir
(ISIMM). En vue de l'obtention d'un diplรดme de master professionnel en gรฉnie
logiciel. Cette mรฉmoire concerne la mise en ลuvre dโune application Android
pour la classification des images. Lโapplication fonctionne comme un dรฉtecteur
de nourriture pour les plats tunisiens cรฉlรจbres en appliquant un rรฉseau neuronal
convolutif pour traiter les images par lโappareil photo du tรฉlรฉphone et fournit un
rรฉsultat prรฉcis de ce que le plat, pourrait รชtre.
Mots-clรฉs: Machine Learning, Classification dโimage, Apprentissage par
transfert, Computer vision, Reconnaissance dโimage, PyTorch, Android.
4. Table of Contents
General Introduction......................................................................................................1
Introduction and context ............................................................................2
1.1 Introduction.................................................................................................................2
1.2 Project description.......................................................................................................2
1.3 Motivation...................................................................................................................3
1.4 Aim of the work ..........................................................................................................3
1.5 Dissertation structure...................................................................................................4
1.6 Development Methodology.........................................................................................4
1.6.1 Software development processes available..........................................................5
1.6.2 Methodologies comparison table.........................................................................8
1.6.3 The chosen method ............................................................................................10
1.7 Project Planning ........................................................................................................10
1.7.1 Project management processes ..........................................................................11
1.7.2 Gantt chart..........................................................................................................11
1.8 Conclusion.................................................................................................................12
Literature review....................................................................................... 13
2.1 Introduction...............................................................................................................13
2.2 Data Science..............................................................................................................13
2.2.1 Acquiring and storing data.................................................................................14
2.2.2 Asking Questions...............................................................................................14
2.2.3 Data preparation.................................................................................................14
2.2.4 Exploring data....................................................................................................14
2.2.5 Machine learning model ....................................................................................14
2.2.6 Visualization and communication......................................................................14
2.2.7 Deployment........................................................................................................14
2.3 Machine learning.......................................................................................................15
2.3.1 Unsupervised Learnings.....................................................................................15
10. List of tables
Table 1 : Methodologies comparison table..............................................................................10
Table 2 : Project tasks timeline................................................................................................12
Table 3 : Activation functions..................................................................................................25
Table 4 : Functional requirements ...........................................................................................33
Table 5 : Nonfunctional requirements .....................................................................................34
Table 6 : TensorFlow and PyTorch comparison......................................................................37
Table 7 : Kotlin , Java Comparaison [27]................................................................................42
Table 8 : UI Components.........................................................................................................56
Table 9 : Kotlin Classes Description .......................................................................................57
Table 10 : Device Characteristics ............................................................................................58
Table 11: Development tools...................................................................................................58
Table 12 : Model accuracy on images from the web...............................................................63
Table 13 : Model accuracy on image from phone camera.......................................................63
List of Symbols
๐(๐ฅ), ๐(๐ฆ) : The independent probabilities of ๐ฅ and ๐ฆ
๐(๐ฅ|๐ฆ) : Probability ๐ฅ if the given ๐ฆ is true
โฅ
โฅ ๐ฃ
โ
โฅ
โฅ : Magnitude of a Vector
โ ๐ฅ๐
๐
๐=0 : Sum of the ๐ฅ๐: ๐ฅ1 + . . . + ๐ฅ ๐
๐๐
dx
: Total derivative of ๐ with respect to x
๐(๐(๐ฅ)) : Function composition
๐๐
๐๐ฅ
: Partial derivative of ๐ with respect to x
๐ป๐ : Gradient of a function ๐
๐๐๐๐ข(๐ฅ) : Rectified linear unit function
tanh(๐ฅ) : Hyperbolic tangent function
log(๐ฅ) : Logarithmic function
โ : Assignment operator
11. List of Abbreviations
ML : Machine Learning
ANN : Artificial Neural Network
CNN : Convolutional Neural Network
RNN : Recurrent Neural Network
AI : Artificial Intelligence
XP : Extreme Programing
RUP : Rational Unified Process
API : Application Programming Interface
UI : User Interface
TP : True Positive
FP : False Positive
TN : True Negative
FN : False Negative
IDE : Integrated Development Environment
SVM : Support Vector Machine
VSCode : Visual Studio Code
OS : Operating System
JDK : Java Development Kit
SDK : Software Development Kit
DPC : Duplicate Photo Cleaner
GPU : Graphics Processing Unit
CPU : Central Processing Unit
ReLU : Rectified Linear Unit
12. Page | 1
General Introduction
These days, we are living in the golden age of artificial intelligence, which some have called
the next industrial revolution. Especially in machine learning and deep learning fields,
regarding the availability of massive data sets known as big data, which keep increasing very
fast, including everything related to our life. From images and videos posted daily on social
media websites to data collected periodically by smart sensors spread all over the world to
measure climate changes and weather conditions.
Computer performance has also breakthrough in different ways many fields of applications.
Applications that were a few years ago just science fiction, especially with the dramatic
increase in computational power and parallel computing in recent years that removed many of
the barriers in the way of artificial intelligence and machine learning as these fields have been
around for a long time since 1950 but only theoretical [1], due to the lack of powerful
computing resources and large data sets.
Artificial intelligence has now been able to contribute to our living conditions in a variety of
ways. Image recognition systems have human like performance achieved, autonomous vehicles
are increasingly becoming a reality, business models are changing rapidly, and medicine
enables automated clinical diagnoses and suggests treatments. It is very important to take
advantage of the available opportunities and provide useful solutions. This mission is not
exclusive to big tech companies. Any software developer can also contribute to the artificial
intelligence field by deploying apps, especially for lightweight devices like smartphones that
do not require a considerable budget or expensive equipment.
Smartphones and other devices are now dominated and improving every day. The field of
mobile devices is very dynamic and rapidly developing. Formerly programs of powerful home
computers are nowadays successfully ported to various mobile devices, and people can carry
much valuable equipment in their briefcase or pocket. Moreover, this opens the door for AI to
deliver practical products, innovative solutions, and smart services by taking advantage of the
cutting-edge technologies that phone providers offer, which encourage the development of
more demanding applications. In this project, we will try to participate in the artificial
intelligence revolution by developing an application using the best technologies in this field.
13. Page | 2
Introduction and context
1.1 Introduction
The idea of this project is inspired by the famous "Cats vs Dogs" classification problem, which
is considered the "hello world" program for ML [2]. Where we make a computer recognize
and discern between dog or cat images. ML is an application of artificial intelligence, where
we teach a machine to perform smart tasks. ML itself is a broad field of science, and one of its
most widely used subfields is Computer Vision, which our project falls within it. This chapter
will present the roadmap and the software development process we will use to create a complete
image recognition program.
1.2 Project description
This project is about a mobile application based on deep learning algorithms and works on an
Android system. The application's main role is to identify the type of different Tunisian dishes
from the live image feed of the phone's camera as shown in figure 1 and provide additional
information on the result, including calories, ingredients, etc. The application can recognize up
to ten different categories of Tunisian food.
Figure 1 : Project description
No internet connection is required in the userโs phone as the ML model will be integrated
directly into the application. And all of the image analysis processes will be directly performed
by the phone processor. We can summarize the functionality of the app in four steps
Taking a picture
Providing Food Description
14. Page | 3
1. The user takes a picture of his food with the phoneโs camera.
2. Food Detector app analyses the food picture using a neural network.
3. Food Detector app displays what the food might be.
4. The user checks the complete food description.
In later sections, we will dive deeper into this application functionality and its architecture.
1.3 Motivation
After giving a brief introduction to the project, the utility of the app might seem a bit unclear.
To remove this ambiguity, we will mention some of the problems that we will try to solve or
minimize in this app. This app is not necessarily going to be useful right away for everyone,
but it can become handy at some points later especially when we travel to a foreign country.
As we all know, traveling is common for many reasons, such as tourism, working or studying
abroad, vacations, etc. One of the problems that many of us may face while traveling is eating
exotic food that we are not used to. Typically, it is the foreign country's local food. Sometimes
the struggle begins while on the plane ride [3].
Besides this problem, we noticed that many people suffer from keeping up with a food diet
plan and make the wrong food choices. And after several types of research, we found out it is
seemingly due to a lack of knowledge about everything related to nutrition and food. As a
nutrition coach mentions: โKnowledge is imperative to any endeavor so why should this be
any different in the case for weight loss? Too many people blindly throw themselves into this
game headfirst without doing any research or laying any sort of foundation. It is this approach
that results in yo-yo dieting and relapsesโ [4].
In addition to all of this, AI services integrated into mobile apps are not yet widely used in
Tunisia, despite their significant influence in the world. AI with mobile apps can now provide
more sophisticated services than any time before.
1.4 Aim of the work
As we introduced different challenges and problems, in this part we will explain how this
project will try to use machine learning techniques to resolve them.
The mentioned problems address the lack of nutritional knowledge and having a bad experience
with exotic food. In this application, we will gather information about different Tunisian food.
When the user wants to discover more about his meal while he is in Tunisia, he can detect the
15. Page | 4
food using the app. Then the food description will be displayed in a simple layout on the phone
screen. It will include all the relevant information like ingredients, calories, recommendations,
etc., so that the user can be aware of what he is eating and check if it will break his diet plan or
being allergic to any ingredient included in the food.
This application will use machine learning technologies to help the user to become familiar
with new types of dishes. Furthermore, become able to adopt healthy eating habits overeating
junk food, and encourage him to keep up with his diet. It also helps tourists avoid foods that
taste strange or do not conform to their usual choices.
Nevertheless, we need to mention that this application will only solve a part of the puzzle.
because it works only on Tunisian food, and even with that, it only classifies 10 different
famous Tunisian food types. This is because of the limited availability of images of some
Tunisian dishes on the internet. As for each type of food, we need at least 1,000 images with
high quality to get a good result. The 1,000-image magic number comes from the original
ImageNet classification challenge, where each category of the dataset had around 1,000
images. This was good enough to train the early generations of image classifiers like AlexNet,
and so proves that around 1,000 images are enough [5].
On top of that, the training of a large amount of data demands a huge amount of processing
resources. It takes too much time to train neural networks to give more than 90% accuracy.
However, the project could still be extended if the necessary capabilities and investments are
available to encompass all the Tunisian food types or even other countries' food.
1.5 Dissertation structure
The structure of this dissertation will be divided into a set of chapters. The first chapter is the
โIntroduction and context.โ The next chapter will be the โLiterature reviewโ, where we will
discuss machine learning and the methods used in this field. After that, we will specify the
application requirements in the โGathering requirementsโ chapter followed by the โsoftware
designโ chapter in which we will define the design patterns and architecture of the program.
Lastly is the "Realization" chapter in which we will create our machine learning model then
deploy it to an android application.
1.6 Development Methodology
Software development methodology involves dividing software and architecture into different
stages with specific activities for more effective planning and management. Before choosing
16. Page | 5
the right method for our project, we will state some of the popular methodological approaches
used in software engineering. Then we will compare them to find the most suitable method.
1.6.1 Software development processes available
โข Agile
Agile is based on highly iterative and incremental development by creating software in short
time boxes in order to minimize risks [6].
Figure 2 : Agile Methodology [6]
Agile is mostly a disciplinary approach (figure 2). That anticipates the need for flexibility and
uses some frequent alteration before delivering the finished product.
โข Scrum
Scrum is an agile methodology. It is a simple process where we try to speed up productivity
and deliver products that focus on satisfying customers.
Figure 3 : Scrum Methodology [7]
17. Page | 6
As shown in figure 3, It does this by breaking the complexity down into smaller tasks. Then it
divides them across all team members, where each one will focus on solving his dedicated task
at a specific time according to a planning process. This step is repeated time and time again.
After each incremental step, the team members will re-evaluate the product's current direction
and decide which strategy is the most effective to achieve the goal.
โข Extreme programming
Extreme programming is also an agile method [6]. It is a Lightweight process with the goal to
reduce the cost of software requirements.
Figure 4 : Extreme programming Methodology [8]
Figure 4 illustrates the different process steps of XP methods. Where it takes traditional
principles to extreme levels through several practices, including simple design, pair
programming, constant testing, ongoing integration, and refactoring coding standards and
small releases, it is mainly used for creating software within a volatile and dynamic
environment. It allows for much better flexibility within the modeling process [4].
โข Unified Process
Unified Process is an architectural-centered, case-based, iterative, and incremental
development process that uses the Unified modeling language. Unified Process can be
implemented to different software systems with varying technical complexity and management
levels in other areas and other organizational cultures. The unified process is divided into a
series of timeboxed iterations, as shown in figure 5.
18. Page | 7
โข RUP
Rational Unified Process method is referred to as a RUP. It separates the development process
into four phases as shown in figure 6, which are Inception, Elaboration, Construction, and
Transition. It is considered an object-oriented and web-enabled program development
methodology. This method helps software developers to deal with changing requirements and
provides guidelines, templates, and examples for all aspects of software development stages
[4]. This method Describes how specific development goals should be achieved.
Figure 5 : Unified Process [9]
Figure 6 : Rational Unified Process [6]
19. Page | 8
โข Waterfall
The waterfall method is one of the most traditional and commonly used software development
methodologies [4]. It differs from agile and unified processes as it is about a sequential design
process as shown in figure 7, meaning that the earlier phases define subsequent phases. In the
waterfall method, there are seven stages from system feasibility down to operations and
maintenance.
1.6.2 Methodologies comparison table
The following table will list some advantages and disadvantages of the mentioned software
development processes to help us decide this project's appropriate method.
Methodologies Advantages Disadvantages
Scrum
โข It contains a backlog listing out
everything to do.
โข The team decides how much work
to be done.
โข Meetings can be too long
โข Require a dedicated scrum
master
Figure 7 : Waterfall with overlapping phases [10]
20. Page | 9
โข Communication, which is an
important part of the process, is
achieved through meetings, called
events, and Scrum events.
โข Hard to understand,
requires team member
guidance [6].
Extreme
programming
โข Product delivery faster.
โข Allows software development
groups to save costs and time
needed for project realization.
โข Allows developers to produce
quality software. Through regular
testing at the development phases
assures the detection of all bugs.
โข Impossible to know the
exact estimation of the job
effort required to produce
a final product [6].
RUP
โข Comprehensive methodology It can
proactively resolve project risks
associated with the changing
requirements of the client, which
requires careful management of
change requests [4].
โข Less time is required for integration
as the integration process continues
throughout the software
development cycle.
โข The development process
is too complicated and
unorganized on massive
projects that use new
technology [4] .
โข The reuse of components
will not be possible.
Waterfall
โข Suitable for simple structured
projects.
โข Works well when requirements are
well understood.
โข The cost of the model is low.
โข No iterations during
project realization [11].
โข No working product will
be available until all
phases are finished [11].
21. Page | 10
โข It includes testing, i.e., verification
of completed operations and
obtained results at the closure of
each development phase [5].
โข It is very difficult to go
back and fix the software,
especially at the testing
phase.
Table 1 : Methodologies comparison table
1.6.3 The chosen method
Based on table 1, for this project, we decided to use the waterfall methodology. The idea behind
the Waterfall method is that the project progresses to an orderly sequence of steps, from the
initial software concept, down until the final system testing phase. This approach is suitable for
this project where cost and time are constrained, and the scope and requirements are well
understood. Also, the Waterfall methodology gives a set of processes built on the principle of
approval of the previous phase, which fits our need to deliver a complete and validated project
constrained with a specific time limit.
Lately, this method has faced some criticism for being an outdated method due to the limitation
in fixing defects that appear in later stages as it is based on linear sequential phases that always
move forward, making going back and solving problems very daunting. Many modified
waterfall models have been produced, like the "sashimi model" (waterfall with overlapping
phases) in response to this problem. "The key feature of the Sashimi model is the possibility of
overlapping development phases, i.e., introducing feedback into the classical waterfall model.
The idea on which the model is based in identifying errors made on time while the development
phase is still in progress. For instance, errors made in the design phase are identified during
implementation, while the design is still in progress" [11].
The waterfall with overlapping phases version can overcome the original waterfall model's
major problem, which is the difficulty of fixing errors that appears in earlier finished stages.
1.7 Project Planning
Project planning is a critical phase. It is part of the project management process. Project
management is considered a structured discipline that defines project goals, strategy, planning,
and motivations. Its main objective is to produce a complete project that complies with the
project's nature and scope. Typically, it is divided into five steps (figure 8).
22. Page | 11
Figure 8 : Project management processes [12]
1.7.1 Project management processes
โข Initiation
Initiation is the project's first process, where we defined the idea behind the project, overall
goal, and project scope.
โข Planning
Project planning is part of project Structure, which uses schedules such as Gantt charts to
determine the progress within the project environment. It is the process where we define the
project management methodology.
โข Execution
The execution phase is the third phase of the project management lifecycle, and it is usually
the most extended phase of the project. In this phase, we start executing our plan and
methodology for developing our software.
โข Control
Control is the phase where we perform our control and observation on the project by validating
each step included.
โข Closure
Project Closure is the last phase of the project life cycle. In this phase, we will formally close
our project and prepared it to be delivered and presented.
1.7.2 Gantt chart
Gantt chart is a commonly used chart in project management, as it is a way to show the schedule
of project tasks over a specified date range.
23. Page | 12
Task Name Start Date Due Date
Initiation and planning 13/02/2020 13/03/2020
Research and studies 14/03/2020 30/04/2020
Requirements Analysis 01/05/2020 31/05/2020
Software design 13/07/2020 31/08/2020
Realization 01/09/2020 31/10/2020
Thesis writing 01/04/2020 31/10/2020
Table 2 : Project tasks timeline
1.8 Conclusion
In this chapter, we have introduced the project idea. We have also discussed the problems that
we will try to solve through the use of machine learning techniques. We have then outlined the
thesis structure and software methodology that we are going to use, and lastly, we put the
planning of the project.
Figure 9 : Gantt chart
24. Page | 13
Literature review
2.1 Introduction
After introducing the project, in this chapter, we will address the scientific background of
machine learning program development with a detailed elaboration about the theoretical
meanings of some essential concepts, followed by relevant algorithms in this field since this
project aims at using cutting-edge technologies. This chapter will include all the necessary
knowledge to establish an integrated understanding of Machine learning and related fields.
2.2 Data Science
Data science is a required field for developing ML programs regarding the data management
process that we need to do to prepare training data for our machine to be trained on, so it is
imperative to have a good understanding of it. Furthermore, many tutorial materials and e-
learning websites about artificial intelligence and machine learning will assume that you
already have a good knowledge of data science and data analysis [13]. The data science process
usually contains seven essential steps as shown in figure 10.
Figure 10 : The data science process [13]
25. Page | 14
2.2.1 Acquiring and storing data
Acquiring and storing data is the first step in data analysis. We need to find data related to our
subject by collecting them from many sources available on the internet. Or by gathering
observations or measurements from real world experiments.
2.2.2 Asking Questions
This step can be either before gathering data or after it, depends on the project subject. In this
step, we try to ask the right questions relevant to our topic, explaining how the data might be
useful for understanding and defining the project's objectives.
2.2.3 Data preparation
The next step of data analysis is data preparation, and this step has two parts, data cleaning and
data transformation. First, we need to clean the data from any wrong or duplicated values. Then
we transform it based on define mapping tools [13].
2.2.4 Exploring data
After cleaning the data, we start exploring it. We spend some time getting familiar with the
different data sets we have. We can use descriptive statistics to discover some patterns, building
intuition about it, and understanding their nature.
2.2.5 Machine learning model
Next, we need to define the best machine learning techniques and algorithms that can work
with our collected data types and extract features from them in a way that fits the functional
and business requirements of the project.
2.2.6 Visualization and communication
It is very important to communicate our findings to other people. There are a variety of formats
this communication can take. We might create images, diagrams, or animations and share them
on a paper, an email, a PowerPoint presentation, or have an in-person conversation.
2.2.7 Deployment
finally, we need to deploy our final machine learning model for the production environment.
Then we start exploiting and making decisions based on the results of the ML model.
26. Page | 15
2.3 Machine learning
Machine learning is a branch of computer science applied in different fields like finance,
medicine, games, robotics, etc., with the aim of training a machine to perform tasks intelligently
as a human does by using labeled data sets instead of explicitly coding the solution. This can
be done by supervised or unsupervised learning.
2.3.1 Unsupervised Learnings
Unsupervised learning is mostly used for clustering, where we are just given a data set with no
corresponding labels paired with it. Then we attempt to learn some type of structure or pattern
from the data plus extracting useful information or features from it.
2.3.1.1 Unsupervised Learning Algorithm
There are different algorithms for unsupervised learning like k-means, k-medoids, hierarchical,
gaussian learning, and even neural network, which can be considered as supervised or
unsupervised learning at the same time. But mostly, it is classified as supervised learning.
โข K-means clustering
k-means estimate for a given number of data points the best centers of k clusters representing
it. In step one (figure11), we pick random k center points. And then, we connect points to the
closest center. In the third step, we recalculate centers based on the mean of points, and in the
last step, we repeat this process until there is no change in clusters data points.
Figure 11 : K-means clustering
27. Page | 16
โข Hierarchical clustering
In hierarchical clustering (figure 12), we classify data according to similarity metrics that we
define, like Euclidean distance, minimum/maximum distance. There are two approaches we
can use bottom-up or top-down. This approach is good for tree structures from data similarities.
Figure 12 : Hierarchical clustering
โข K-medoids clustering
K medoids clustering is similar to K-means, except in K medoids, we use another algorithm
for defining the center points for clusters. By determining a center within the data sets, then
calculating the total cost of swapping the center with another data element. Instead of picking
the center from outside of the data. This algorithm can be more accurate, but it requires many
iterations to converge.
2.3.2 Supervised Learnings
Supervised learning is a machine learning approach mostly used when we have a problem that
contains a dataset written as a set of example label pairs, where we have the label y associated
with each example x [14]. In other words, supervised learning means that you have many
examples where you know the correct answers in those examples. And have the computer
figure out the rules for getting these answers.
28. Page | 17
2.3.2.1 Supervised Learning Algorithm
There are many supervised learning algorithms. Below we will site the most used ones.
โข Support Vector Machine
SVM is a classification algorithm that finds a separating line between two data classes
(figure13), which maximizes the distances to each class's nearest point equally. That distance
is often called the margin.
Figure 13 : Support vector machine
โข Naive Bayes
Naรฏve Bayes algorithm is based on Bayes' theorem, which we can use to draw some conclusions
about an event x given the observed probability of an event y [14].
๐(๐ฅ|๐ฆ) =
๐(๐ฆ|๐ฅ) ยท ๐(๐ฅ)
๐(๐ฆ)
(2.1)
By using a large amount of data, we can calculate the number of occurrences of each element
in our dataset. Thus, we can calculate the probability of belonging to a certain class for new
data examples.
โข Nearest Neighbors
k-nearest neighborsโ algorithm is a non-parametric supervised learning method. It is very
straightforward, we simply memorize all data, and then we label new example by finding k
nearest neighborโs majority class as illustrated in figure 14.
29. Page | 18
Figure 14 : Nearest Neighbors:
โข Linear regression
Linear regression is an algorithm that tries to fit a line (figure15) that best describes data sets
that involve more than one dimension. for example, the size of a house is relative to its price.
Or the age of a person and the person's income. With linear regression, we can draw a line
representing a mathematical relationship based on a bunch of measurements of points to map
new continuous inputs to outputs.
Figure 15 : Linear regression
โข Neural network
A neural network is a supervised learning method that uses multiple layers of a connected set
of nodes(figure16). These layers transform an input set to a particular output set representing
the result we want to get from a specific data. In order to obtain an accurate result, we need to
train the neural network on labeled data. Next, we reduce the error we get from the model when
we apply new examples by changing the connected nodes' values using calculus formulas.
30. Page | 19
Figure 16 : Neural network
Due to the excellent performance, deep neural networks have been widely used in image
analysis, speech recognition, target detection, face recognition, and other fields.
2.4 Mathematics for AI
Mathematics forms the foundations of machine learning. โAs machine learning is applied to
new domains, developers of machine learning need to develop new methods and extend
existing algorithms. They are often researchers who need to understand the mathematical basis
of machine learning and uncover relationships between different tasksโ [14].
ML is very much an interdisciplinary field. Even though it runs as a computer program, it
heavily relies on calculus, statistics, linear algebra, and probability.
โข Calculus tells us how to learn and optimize our linear model.
โข Algebra makes running these algorithms possible as ML deals with matrices and
vectors to represent data (text, images, etc.).
โข Statistics is at the core of everything. It is very helpful for optimization tasks.
โข Probability helps predict the likelihood of an event occurring.
Therefore, we will illustrate some math concepts before diving into machine learning models.
2.4.1 Linear algebra
Some concepts of Linear Algebra are important for understanding the principles behind
Machine Learning. In this section, we will try to focus on the parts that are involved in ML.
2.4.1.1 Vectors
31. Page | 20
The fundamental build block for linear algebra is vectors because linear algebra is the study of
vectors and certain rules to manipulate vectors [14]. There are three distinct but related ideas
about vectors:
โข In physics, vectors are quantity that has both magnitude and direction and can be placed
anywhere.
โข In Computer Science, a vector is a collection of data where the order matters.
โข In math, a vector could be anything. It can be drawn everywhere in space [15].
2.4.1.2 Operating on vectors
โข Addition: we can add two vectors
[
1
2
] + [
3
1
] = [
4
3
]
โข Scalar multiplication: we can scale a vector by multiplying by a number.
2 ร [
1
2
] = [
2
4
]
โข Magnitude: length of the vector.
๐ฃ
โ
= [
โ1
2
3
] โฅ
โฅ ๐ฃ
โ
โฅ
โฅ = โ(โ1)2 + 22 + 32 = โ14
โข Dot Product: is a way to measure the length of the projection of two
vectors [15].
[
1
2
โ1
] ยท [
3
1
0
] = 1 ยท 3 + 2 ยท 1 + (โ1) ยท 0 = 5
2.4.1.3 Matrices
A matrix in mathematics is a collection of vectors [14]. it represents a table of numbers
arranged in rows and columns. In computer science, it is a two-dimensional set of numbers
with m rows and n columns.
2.4.1.4 Linear Transformation
A linear transformation is like a function in math. It takes a matrix and transforms it into another
matrix or vector. And the function itself is a matrix.
[
๐ ๐
๐ ๐
] [
๐ฅ
๐ฆ] = ๐ฅ [
๐
๐
] + [
๐
๐
] = [
๐๐ฅ + ๐๐ฆ
๐๐ฅ + ๐1
] = [
๐โฒ
๐ฆโฒ] (2.2)
32. Page | 21
We transfer the vector [
๐ฅ
๐ฆ] by the matrix [
๐ ๐
๐ ๐
] and we get [
๐โฒ
๐ฆโฒ] as output.
This is like the behavior of a neural network (figure17) we take an image as an input, and it
gives us the potential content of it as output, but in the case of a neural network, it is a group
of matrices and vectors.
Figure 17 : Neural network structure
2.4.2 Calculus
In the last figure, we can see a simple neural network with nodes interconnected between each
other. These nodes represent the core of the network. To find their value, we need to apply
some calculus concepts, which we will explain in the following parts.
2.4.2.1 Derivative
The derivative is a fundamental concept in calculus. We can consider it as the average rate of
change of a function with respect to a single variable as shown in figure18.
Figure 18 : The Derivative As AFunction
The average rate of change It is a measure of how much the function changed per unit.
๐(๐+โ)โ๐(๐)
(๐+โ)โ๐
=
๐(๐+โ)โ๐(๐)
โ
(2.3)
33. Page | 22
We can calculate the second and third derivatives up to the 'nth' derivative using eq (2.3). For
neural networks, derivatives are very important in the optimization process and in reducing the
error rate.
2.4.2.2 Chain Rule
Chaine rule allows us to calculate the derivatives of combinations of two functions [15].
๐
dx
๐(๐(๐ฅ)) = ๐โฒ
(๐(๐ฅ))๐1(๐ฅ)
(2.4)
We can extend the chain rule formula to calculate more complicated compositions like this
function ๐(๐(๐)) And k can be a different composition of functions. In neural network models,
we deal with multiple compositions, but instead of functions, we use matrices and vectors.
2.4.3 Multivariable calculus
Multivariable calculus is the extension of calculus where we deal with multivariable functions
that involve more than one input number, rather than just one variable.
2.4.3.1 Multivariable functions
Multivariable functions are functions that assign multiple variables to a real number [15].
Ex: ๐(๐ฅ) = ๐ฅ + 12 , normal function with one variable x.
Ex: ๐(๐ฅ, ๐ฆ) = 4๐ฅ + 2๐ฆ , multivariable function with two variables x and y.
โข We can write the function as a vector: ๐ = ๐(๐ฅ, ๐ฆ) = [
4๐ฅ
2๐ฆ
]
โข We can also graph the function in a 3D dimension:
Z
X Y
2.4.3.2 Partial Derivatives
Figure 19 : Multivariable functions
34. Page | 23
To calculate derivatives of multivariable functions, we need to use the partial derivative, which
is very similar to the ordinary derivatives equation (2.3). The difference is with each variable
of a function, we calculate the derivative concerning it, making all the other variables constant.
Ex: ๐(๐ฅ, ๐ฆ) = 4๐ฅ + 2๐ฆ we choose random variables for x and y example (2,1)
Derivative with respect to x (๐๐ฅ) :
๐๐
๐๐ฅ
(2,1) =
๐
๐๐ฅ
(4๐ฅ2
+ 2 ยท 1) = 8๐ฅ = 1
Derivative with respect to y (๐๐ฆ) :
๐๐
๐๐ฆ
(2,1) =
๐
๐๐ฆ
(4 ยท 22
+ 2 ยท ๐ฆ) = 2
2.4.3.3 Gradient
The gradient is a vector that gathers all the partial derivatives of a function [15].
๐ป๐(๐ฅ, ๐ฆ, โฆ ) =
[
๐๐
๐๐ฅ
๐๐
๐๐ฆ
โฎ ]
(2.5)
The gradient is essential for neural networks as we are dealing with multivariable functions
[14].
2.4.3.4 Gradient descent
Gradient descent is a popular algorithm used in machine learning .and many Deep Learning
libraries support its implementation. It is used to optimize neural networks by iteratively
moving in the direction of steepest descent, which is the gradient's negative [16]. We will be
using gradient descent in the error function that we will talk about it later.
2.4.4 Mathematics Behind Neural Networks
2.4.4.1 Perceptron
Neural networks are the central concepts of deep learning. They consist of artificial neurons
connected to each other. A neuron is the basic processing unit in a neural network model.
Generally, it is a unit with multiple inputs and a single output, which we call a perceptron.
35. Page | 24
Figure 20 : The structure of an artificial neuron
perceptron equation (figure20) :
๐ = ๐(๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + โฏ + ๐ ๐ ๐ ๐ + ๐) (2.6)
๐ in equation(2.6) is an activation function. Generally, for neural networks, we use Sigmoid,
ReLU, or Tanh functions [14]. The activation function is a way to map and normalize outputs
to new values that optimize the computational performance without changing the network's
computational state, so it is just a function to help the neural network process inputs
information and map them to the correct outputs.
Activation Function Formula Graph
Sigmoid ๐บ(๐) =
๐
๐ + ๐โ๐
ReLU ๐๐๐๐(๐) = {
๐ ๐๐ ๐ โฅ ๐
๐ ๐๐ ๐ < ๐
36. Page | 25
Tanh ๐ญ๐๐ง๐ก(๐) =
๐ ๐
โ ๐โ๐
๐ ๐ + ๐โ๐
Table 3 : Activation functions
๐ in equation (2.6) represents the result we get from this perception. Its range can be different
based on the activation function but mostly between 0 and 1. Then this value is passed to other
connected neurons to solve more complex problems. If we think of this perception as an
individual classification model, we can consider ๐ as a value for classifying a particular input
based on a specific formula.
for example, we have students grades ๐ ๐ , ๐ ๐ , โฆ , ๐ ๐ then we define weights
๐ ๐ , ๐ ๐, โฆ , ๐ ๐ and a bias ๐ to form a function ๐ . In a way that a student to be succeeded
this function needs to output a value higher than a particular threshold ๐ฝ :
Student succeeded if
โ ๐(๐๐ ร ๐๐ + ๐) โฅ ๐ฝ
๐
๐=0
(2.7)
Student failed if
โ ๐(๐๐ ร ๐๐ + ๐) < ๐ฝ
๐
๐=0
(2.8)
2.4.4.2 Error function
For complex problems, we cannot determine weights and bias by ourselves. So, we need a way
to compute these values, and therefore we use error or cost function. The most used ones are
cross-entropy or mean squared error function [17]. First, we initialize our model with random
37. Page | 26
weights, and then we measure the error made by the model by comparing the output it gives us
with the correct answer that we already know.
โข Cross-entropy
๐ฌ = โ
๐
๐
โ ๐ฒ ๐ ยท ๐ฅ๐จ๐ (ลท๐
)
๐
๐=๐
(2.9)
For each ๐ฒ ๐ label(correct result), we use the predicted output ลท๐ by the classifier, and instead
of multiplying them, we use the logarithm function for computational purposes. Then we sum
all the results and divide by m the number of examples. To get the total error of our model that
we need to reduce later.
โข Mean squared error:
๐ฌ =
๐
๐
โ(๐๐ โ ลท๐
)
๐
๐
๐=๐
(2.10)
The mean squared error function is another way to compute error for the model by subtracting
the correct label results from predicted ones. It works very well for complex models.
2.4.4.3 Gradient descent
To reduce the cost of the error function, we need to use the gradient descent algorithm with
respect to all the weights of the function [17].
๐ต๐ฌ = (
๐๐ฌ
๐๐ ๐
, โฆ ,
๐๐ฌ
๐๐ ๐
,
๐๐ฌ
๐๐
) (2.11)
As we mentioned earlier in equation (2.5), the gradient is a vector full of partial derivatives
representing the steepest ascent direction of a function [14]. To reduce the error, we need to
take a negative step in that direction to update the weights.
๐๐
โฒ
โ ๐๐ โ ๐ถ
๐๐ฌ
๐๐๐
(2.12)
Then we keep repeating this step for each weight in the model until we get to a local minimum
of the error function by using a learning rate ฮฑ to make small changes for weights each time
38. Page | 27
because we do not want to pass a local minimum of the function. We can define the ฮฑ value by
ourselves during the training and testing phase.
2.4.4.4 Feedforward
The feedforward is a process that the multilayer neural networks do to receive the input vector's
prediction [18]. It means applying all the perceptrons of a model (figure 20).
Figure 21 : Example of a simple neural network
Compound function :
๐ = ๐ (
๐ ๐
๐ ๐
) ๐ (
๐ ๐ ๐ ๐
๐ ๐ ๐ ๐
) (
๐ ๐
๐ ๐
) (2.13)
2.4.4.5 Backpropagation
Backpropagation is an algorithm to compute the gradient for multilayer neural networks. As
its error function is a composite function, it uses the chain rule we discussed earlier [18].
Chain rule:
Figure 22 : The chain rule
39. Page | 28
First, we do the feedforward process of the inputs ๐ ๐ and ๐ ๐ with the two layers.
๐พ๐ ๐
(๐ ๐, ๐ ๐, ๐ ๐, ๐ ๐) and ๐พ๐ ๐
(๐ ๐, ๐ ๐) as shown in figure 21.
๐ ๐ = ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ (2.14)
๐ ๐ = ๐ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ + ๐ (2.15)
๐ = ๐ ๐ ๐(๐ ๐) + ๐ ๐ ๐(๐ ๐) (2.16)
ลท = ๐( ๐) = ๐ โ ๐พ๐ ๐
โ ๐พ๐ ๐
( ๐) (2.17)
Then we need to calculate the derivative of the error function with respect to weights using the
loss function equation (2.10).
๐ฌ in the equation (2.10) can be seen as the function on all the ๐๐ ๐ฌ(๐พ) = ๐ฌ(๐ ๐, โฆ ๐๐)
Figure 23 : Backpropagation
After that, we apply the backpropagation. For Example backpropagation of ๐ ๐ shown figure
23 by calculating partial derivatives of the equation (2.17) is:
๐๐ฌ
๐๐ ๐
=
๐๐ฌ
๐ลท
๐ลท
๐๐
๐๐
๐๐ ๐
๐๐ ๐
๐๐ ๐
(2.18)
2.5 Deep Learnings
Deep learning is part of machine learning, where we rely on artificial neural networks to solve
complex problems in terms of the volume of data that cannot be solved by traditional machine
learning methods.
2.5.1 Artificial Neural Network
ANN is a group of multiple perceptron layers where a forward propagation transforms input
data through these layers to give us a new output. ANN consists of 3 layers Input, Hidden, and
40. Page | 29
Output [19].ANN is simply a neural network where each node of a layer is fully connected
with the next layer's nodes.
2.5.2 Recurrent Neural Network
The previous neural network we mentioned was trained using current inputs. We did not
consider prior inputs when generating the output. In RNN, instead, we save results from the
previous feedforward process of the system to use in our next iteration (figure 24), so it can
process sequential data without losing the relational information between them. For example,
in a text, we cannot just process each word by itself to predict a paragraph's meaning. We need
to understand the word context by processing the previous and subsequent words a well.
In other words, we can consider Recurrent Neural Network as a looping process where we
keep previous results with new inputs in each iteration to predict the final result.
2.5.3 Convolutional Neural Network
ANN is an excellent neural network structure, and it works very well for solving specific
problems. However, for image classification, it only works when we give it a set of images
where we place the target object in the center. ANNs are not suitable for images because these
networks can cause vanishing and exploding gradients. Especially if we have a network with
many hidden layers, where the size of the trainable parameters inside images can reach
thousands of pixels as each pixel is coded in 3 color channels. This can lead ANN to lose spatial
features of an image [19]. On the other hand, CNN reduces the dimensions of many parameters
to a small number of parameters using image filters that track spatial information and learn to
extract features such as the edges of objects or shapes as explained in figure 25.
Figure 24 : Representation of RNN both in folded and unfolded forms
41. Page | 30
Figure 25 : Convolutional Neural Network
CNNs are made of three main types of layers: convolutional layer, pooling layer, and fully
connected layer [20].
โข Convolutional layer: its primary role is to track the picture's characteristics. It consists
of a set of filters.
โข Pooling layer: its main role is to reduce the dimensionality of the data.
โข Fully connected layer: output the results we want according to different tasks.
Since 2012 CNN achieved a state-of-the-art result in the Image-net challenge and caused a
huge advance in this field Image-net.
Figure 26 : The annual winner of the ImageNet challenge [21]
42. Page | 31
โImageNet is formally a project aimed at (manually) labeling and categorizing images into
almost 22,000 separate object categories for the purpose of computer vision researchโ [21].
2.6 Neural network evaluation metrics
Neural network evaluation metrics are used to measure a machine learning model's
performance compared to other models using mathematical formulas to create a model that
gives high accuracy. Below we will site some of the metrics used in neural network models.
2.6.1 Classification accuracy
The first metric we will talk about is classification accuracy. It is mainly used in classification
models where we get a percentage representing the accuracy of the model.
๐ด๐๐๐ข๐๐๐๐ฆ =
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐ก ๐๐๐๐๐๐๐ก๐๐๐๐
๐๐๐ก๐๐ ๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐ก๐๐๐๐
(2.19)
This equation is very intuitive. We divide the number of correct predictions by the total number
of predictions.
2.6.2 Confusion matrix
The confusion matrix is more expressive than accuracy in terms of the type of error. We can
have a false positive (FP) or true negative made by the model and other combinations like in
the figure 27 below TP, FP, FN, and TN, which can help us understand some other metrics.
Figure 27 : Confusion matrix
43. Page | 32
โข Precision
In some cases, we want to focus mostly on avoiding false positives like putting an important
email in a spam folder when it is not spam. For this kind of case, we use the precision metric.
๐๐๐๐๐๐ ๐๐๐ =
๐๐
๐๐ + ๐น๐
(2.20)
If we have no false positives, the result will be 1, meaning the model is perfect for our case.
โข Recall
๐๐๐๐๐๐ =
๐๐
๐๐ + ๐น๐
(2.21)
The recall is the inverse of the precision metric, where we only focus on reducing TP.
โข F1 score
F1 score calculated using the precision and recall represents the overall accuracy that
summarizes the confusion matrix result.
F ๐ = 2 ร
๐๐๐๐๐๐ ๐๐๐ ร ๐๐๐๐๐๐
๐๐๐๐๐๐ ๐๐๐ + ๐๐๐๐๐๐
(2.22)
2.6.3 Log Loss
Log loss is another way of assessing a machine learning model's performance, and it is also
often used as a loss function.
โ
1
๐
โ y๐log(p๐) + (1 โ y๐)log(1 โ p๐)
๐
๐=1
(2.23)
Int this equation ๐ is the number of observations,y represents its true label in a binary case (0
or 1), and p is the model predicted probability.
2.7 Conclusion
Machine learning, in recent years, has become very advanced and capable of solving
challenging problems. In this chapter, we have mentioned many approaches and techniques
used in machine learning. For our project, we will use the Convolutional neural network as it
is the latest and the most successful technology in the image classification field.
44. Page | 33
Gathering requirements
3.1 Introduction
Before creating any software project, it is necessary to define the technical and functional
requirements concerning the project's specifics. In this chapter we will list and demonstrate the
project's operational behavior and the methods and tools that we will use to create an Android
application capable of classifying different food images from a phone camera.
3.2 Requirements
3.2.1 Functional requirements
Functional requirement Description
Provide application guide The user should be able to use the application correctly by
providing him with a simple user guide inside the app.
Asking for camera permission The app should ask the user for permission before
accessing the phone camera.
Displaying camera preview The users should be able to see a preview of their target
food plate.
Classifying The application should run a classification process on
captured images.
Displaying results The application should display the probable food
categories as soon as it finishes the classification process.
Providing food description The user should be able to view a full description of the
classified food category.
Table 4 : Functional requirements
45. Page | 34
3.2.2 Nonfunctional requirements
Non-functional requirement Description
Design
The app should be aesthetically pleasing and
appropriately designed to satisfy the end-user
requirements.
Quality
The app should run smoothly and should avoid any
memory leaks and bugs that can affect the Android OS
or any other running apps.
Accuracy The app should provide reliable data to the user.
Compatibility
For any device that uses the Android operating system,
the app should run correctly without showing any bugs
drawbacks.
Accessibility
The application must be designed and developed so that
anyone can use it.
Table 5 : Nonfunctional requirements
3.3 Technical requirements
3.3.1 Tools for preparing data
Machine learning programs, in general, require labeled data sets. It could be images, texts, or
any data that can be represented numeracy. For this project, we will use labeled images to train
our model to classify new image inputs for training this project. We can get this data by
collecting it from different sources. After that, we need to clean and prepare the data.
3.3.1.1 Data scraping
The data will mainly be from the web, so we need a tool to scrape data from different websites.
In this project, we will use a software called Parsehub.
46. Page | 35
Figure 28 : PurseHub interface
Parsehub is a powerful and free tool for scraping different types of data like images, titles, texts
from websites that contain a large number of contents like a list of pictures, a list of films, and
so on. This tool offers an automatic and fast way to collect these data as shown in figure 28.
3.3.1.2 Cleaning images
After data scrapping, we need to make sure that there are no repeated images in our food data
set, but we cannot do this manually as we are dealing with a high volume of data, so we will
be using a tool called "Duplicate Photo Cleaner" as showing in figure 29.
Figure 29 : Duplicate Photo Cleaner
47. Page | 36
DPC is an advanced image similarity detector. It is an excellent tool for everyone who takes
photos with their smartphone. Unlike ordinary duplicate image finders, Duplicate Photo
Cleaner can compare images based on how similar they look [22] .
3.3.1.3 Preparing images
The collection of images must be all of the same sizes, and they must be like the size of a photo
taken from a phone camera. For this task, we will use a tool called JPEGCrops (figure 30).
Figure 30 : JPEGCrops interface
JPEGCrops is a Windows program created for the preparation of a batch of images for printing.
It provides lossless cropping with fixed aspects using jpegtran. [23]
3.3.2 Machine learning method
The project aims to apply Machine learning for creating a useful food classifier. We mentioned
many different types of machine learning approaches in the previous chapter, where we have
concluded that the best method used when building an image classifier is the convolutional
neural network.
3.3.2.1 Deep learning frameworks
For building deep learning models, there are different frameworks the most famous ones are
TensorFlow and PyTorch as shown in the diagram in figure 31.
48. Page | 37
Figure 31 : Online job listing growth
3.3.2.2 Framework comparison
TensorFlow PyTorch
โข Developed by Google
โข Difficult Debugging
โข Open source
โข Static network graph [24].
โข Big community
โข Good for production
โข More mature
โข Developed by Facebook
โข Good for debugging
โข Open source
โข Uses dynamic computational graph [24].
โข Based on Python
โข Popular in research labs [24].
โข Relatively new
Table 6 : TensorFlow and PyTorch comparison
We can see in table 6 that both frameworks are great for creating neural network models but
for this project, we decided to use PyTorch as it offers a more object-oriented approach and
good for learning and research.
49. Page | 38
3.3.3 PyTorch
PyTorch is an open-source machine learning library used for developing and training neural
networks based on deep learning models. It is primarily developed by Facebookโs AI research
group. PyTorch can be used with Python as well as C++ [24].
3.3.3.1 Programing language
We need to install the Python language package for running the PyTorch framework. Python
is the most used language in machine learning and data analysis. It is very straightforward and
simple, especially for mathematicians and researchers who want to get involved in developing
programs related to their field. There are many distributions for Python, but the best data
science and machine learning distribution is Anaconda. It includes all the required libraries and
APIs for Machine learning. We can flexibly be adding more through a graphical user interface
called Anaconda Navigator that enables us to launch applications and efficiently manage
Conda packages.
Figure 32 : Anaconda Navigator
3.3.3.2 Coding environment
To write the convolutional neural network model code in Python, we need to prepare our coding
environment. There are many choices that we can pick from. The choice will not affect the
project quality, so each developer can decide the IDE he is comfortable with.
50. Page | 39
โข Visual Studio Code
For this project, we will use VSCode. VSCode is a free and open-source code editor from
Microsoft, and it runs on all major platforms, so it is available for Mac OS, Windows, and
Linux. It is a very lightweight code editor and contains many additional plugins and APIs that
we can import, so it is an excellent choice for our developing machine learning project.
โข Jupyter notebook
Alongside VSCode, we will use a Jupyter notebook. Jupyter notebook is like a web page that
holds a document where you can execute chunks of programming code one chunk at a time.
And you can insert explanatory text and even data visualization, tables, equations, and graphs
with the code as shown in figure 34. Jupiter notebook is open-source and was created for data
science and machine learning researchers.
Figure 34 : Jupyter Notebooks in Visual Studio Code
Figure 33 : VSCode screenshoot
51. Page | 40
we will be using Jupiter notebooks because we need to see the output of our code fragments
frequently, and we need to draw some graphs for debugging purposes, which is what Jupyter
notebook allows.
3.3.3.3 Libraries and APIs
The PyTorch library contains many useful features for building a neural network, but we need
to use additional libraries to work with it.
โข Matplotlib
Matplotlib is the most popular plotting library for Python. It provides numerous ways to create
statically and animated visuals (figure 35). And it works very well with PyTorch and NumPy.
Figure 35 : Matplotlib style sheets
โข NumPy
NumPy is one of the most powerful Python libraries [25]. With NumPy, we can practice simple
image processing techniques. Because NumPy can represent images as a multi-dimensional
array. NumPy is a scientific computing library used by numerous other Python data science
libraries. It contains many functions that work with linear algebra, statistics, simulation, data
science, machine learning, and so much more.
โข CUDA
CNN consists of many hidden layers. So, if we try to train it on the CPU, it will take pretty
much forever. The solution for this is to use the GPU. Which are built specifically for doing a
bunch of linear algebra computations in parallel, and neural networks are fundamentally just a
bunch of linear algebra computations. So, if we run on GPU, computation will be done in
parallel, and we get about 100 times more speed than the CPU. In PyTorch, we can move our
model parameters from the CPU over to the GPU by installing the CUDA toolkit (figure 36)
from Nvidia to our operating system.
52. Page | 41
Figure 36 : CUDA ecosystem diagram
CUDA toolkit is a software platform that pairs with Nvidia GPU device to facilitate building
programs that increase computational speed using NVIDIA GPUs' parallel processing power.
3.3.4 Deployment
For the last phase of the project, we need to deploy our neural network to a mobile application
so that users can use it wherever they go.
3.3.4.1 Building Android application
There are two major mobile phone operating systems, IOS and Android. Android is taken most
of the market share as shown in figure 26 due to the varied price range of Android devices that
make it affordable for many people in countries with developing economies.
Figure 37 : OS market share [26]
53. Page | 42
Besides that, IOS development must have a Mac computer and an IOS device. So, I decided to
use Android as I have a windows pc and an Android device.
3.3.4.2 Programing language
Android development has changed a lot recently. Android apps are now built either using Java
or Kotlin language. Java was the default language, but recently Google announced that Kotlin
will replace Java as the official language for Android development. We still can use Java, but
Kotlin is now considered more efficient.
3.3.4.1 Programing language comparison
Kotlin Java
โข Can inference the type of the variable at
compile time.
โข Null safe, all types of variables are non-
nullable.
โข Provides developers the ability to extend an
existing class with new functionalities.
โข Do not have checked exceptions.
โข It contains data classes specially made for
handling the work for us.
โข We need to specify the type of declared
variables explicitly.
โข Null Exceptions allow users to assign null
to variables.
โข To add new functionalities to a class, we
need to create a new class and inherit the
parent.
โข It contains checked exception support.
โข We need to create a data class and its
constructors, setters, and getters methods
ourselves.
Table 7 : Kotlin , Java Comparaison [27]
As we can see in table 7, Kotlin is the most suitable choice for our project also it is Google's
preferred language for Android development.
3.3.4.2 Coding environment
For building an Android app, we will use Android studio as the coding environment because it
is the official IDE for Android, made by Google..
54. Page | 43
Figure 38 : Android studio interface
As shown in figure 38 Android studio is a robust code editor that helps with creating new
projects, as well as adding new modules, and gives a comprehensive representation of the
project structure, providing quick access to resources, code, and files.
3.3.4.3 Libraries and APIs
โข CameraX
CameraX is a Jetpack library designed to make camera app development easier [28]. Because
writing a camera app using the standard camera API is a challenging task for developers. That
is why Google built this API, which is very easy to understand and significantly reduced the
total amount of code that we must write. The CameraX API is built on top of the Camera2
API to achieve a consistent experience across all the device types.
โข PyTorch with Android
After training our convolutional neural network model, we need to pass it to an Android app.
PyTorch provides APIs that cover standard preprocessing and integration tasks required for
integrating Machine learning models in mobile applications and reduces integration errors by
allowing a seamless process to go from training to deployment by remaining entirely within
the PyTorch ecosystem.
55. Page | 44
โข Material design support library:
Material Design is a design language made by Google. It is an adaptable design system backed
by open-source code that helps developers build high-quality digital experiences. From design
guidelines to developer components (figure 39), Material design can help develop products
faster, and it makes sure our app works for all users, regardless of the platform.
Figure 39 : Material Design Components
3.4 Conclusion
So far, we were successfully able to Select our method and tools needed for doing the project,
based on the studies we made and discussed in the earlier chapter. The next chapter will go into
detail about the method for designing our application and the approach used.
56. Page | 45
Software Design and Architecture
4.1 Introduction
Before beginning the realization phase, in this chapter, we will define the overall software
architecture and the design patterns we will use during the project's realization. We will include
conceptual visualization and diagrams to illustrate our architecture.
4.2 Software best practices
This Android application is intended to be used by multiple users, so to make sure the code is
efficient and avoid making mistakes, we will follow specific standards used in software
engineering for data Science [29].
4.2.1 Clean and modular code
The first practice we will talk about is writing code in a way that is clean and modular. Code
is clean when it is clear, simple, and compact. This makes it much easier for developers to
understand and reuse code, especially when iterating over a project. Also, our code should be
modular. Meaning the program is broken up into functions and modules. A module is just a
file. Like a function, we encapsulate code in it and reuse it by calling the function in different
places, the same for a module. We can encapsulate code within a file and reuse it by importing
it into separate files. This helps us to write fewer unnecessary lines of code.
4.2.2 Efficient code
Writing efficient code is very important, especially for usersโ experience. There are two parts
to making code efficient. Reducing the time it takes to run and reducing the amount of space it
takes up and memory usage. This is very important for developing our mobile application since
our app runs on the user device, and updates happen instantaneously. For machine learning,
the model will be trained locally before being integrated into the Android app. so we can use
slow code because the essential thing is to produce a module that can classify images with the
highest possible accuracy.
4.2.3 Refactoring code
Refactoring code is a step done after writing a program that solves a new problem because
when writing the code for the first time, we do not pay attention to the code structure and
arrangement. We manage to focus on just doing the code work, which can cause the code to be
57. Page | 46
a little bit unorganized and repetitive. That is why we should always go back to do some
refactoring after achieving a working model. Refactoring means restructuring the code to
improve its internal structure without changing its external functionality. Refactoring gives us
a chance to clean and restructure our program and modularized it.
4.2.4 Documentation
Documentation is the additional text that comes with or is included in the software code to
compactly represented it. It helps to clarify complex parts of the program, especially if we are
dealing with hundreds of lines. Documentation makes it easy to navigate throughout the code
without getting lost and quickly understand how and why different application components are
used. We can add different documentation types to our software like:
โข In-line comments: are used to clarify a specific line of code.
โข Docstrings: This is a way to create documentation for a function or a module describing
its purpose and details.
โข Project documentation: We can add it at the project level, like using a readme file to
document details about the project.
4.2.5 Version control
The version control system's primary purpose is to help multiple developers work
independently on the same project without making conflicts, but that is not the only use. We
still can benefit from using version control as it creates safe points that save our project
progress, and we can try out new code branches without losing previous code. For this project,
we will use Git because it is the most common version control system.
4.3 Software Design
Software design is usually broken into two different phases, architectural design, and detail
design. Architectural design is the process of dividing the programs into components, assigning
responsibilities for aspects of behavior to each component, and addressing how the components
interact with each other. Detail design is more related to the functional requirements, where we
create a full definition of every aspect of project development. In this project, we are dealing
with two separate and independent Programs a machine learning program for creating a neural
network and an Android application to use the produced model as illustrated in figure 40.
58. Page | 47
Figure 40 : Project Structure
There is no direct interaction between these two systems. As shown in the figure above, they
both run in a separate timeline. First is the ML program train and generates a convolutional
neural network model, and then the model is then imported to the Android application to
classify captured images. As these programs are not related to each other, we will go directly
to the detailed design of each of these programs individually.
4.4 ML Lifecycle
Machine learning is considered a data science analysis more than a software development
process because it relies on training data sets and statistics to solve problems. It has a different
lifecycle, as shown in figure 41. We have already made the Asking questions phase in our
โIntroduction and contextโ chapter. We will start directly from the preparing data phase.
Figure 41 : Machine Learning Lifecycle [13]
59. Page | 48
4.4.1 Preparing data
Preparing data is the first process in machine learning. The first thing we need to do is to define
data categories. For this project, we decided to go with only ten categories because it is tough
to find extensive image data on an innovative project idea, like classifying Tunisian food.
Furthermore, we need approximately 1000 labeled images for each food type, as explained in
the introduction chapter. Our image data sets will be divided into ten different folders where
the folder's name represents the data labels as shown in figure 42.
Figure 42 : Data Structures
4.4.2 Algorithm Selection
4.4.2.1 Transfer learning
For creating the neural network, we will use a technique that has proven to be very good in
solving complex problems without the need for massive data sets [30] , or months of training
duration. This technique called Transfer learning, transfer learning means taking a model that
has been trained for one task and then tuning it to accomplish another task (figure 43).
Figure 43 : Transfer Learning Technique
60. Page | 49
More specifically, Transfer learning refers to the process of taking a pre-trained neural network
and using it with our classifier model (figure 44) and training them on our dataset by freezing
the weights of the CNN model as it is already trained. This CNN model can still extract general
features from our data samples, while the classifier model uses this information to classify the
data in a way that is pertinent to our problem [30]. This technique has proven to work very
well, especially for convolutional neural networks that have been trained on millions of images
of the โImageNetโ challenge that happen each year to discover the best possible ML model.
Because as we mentioned in the literature revue chapter, CNNs use image filters to extract
features from training data and then pass it to a classifier neural network for classification. And
when the model is trained on a colossal amount of data samples belonging to a large variety of
categories, the model becomes able to extract features from any new data. Then we can use it
to add our classifier neural network to be trained for our specific problem without changing the
pretrained CNN filter parameters. This technique gives an astonishing result, as it has been
tested numerous times in different machine learning researchersโ papers.
.
Figure 44 : Neural network Structure
For this technique to work on our ten food categories, we need to create our classification
model to be added at the end of the pre-trained CNN structure as shown in figure 44. Many
CNNs have performed well in the โImageNetโ competition. One of them is the MobileNet
model, which we will be using, as it is a light version of about 10 MB space compacted for
mobile phone devices.
61. Page | 50
4.4.2.2 Classifier Model Structure
Once we have the pre-trained CNN, we need to overwrite its Classifier model to our model.
Classifier models are typically divided into three parts :
โข Input layer
The classifier model's input layer is where the CNN output layer's feature extraction result is
passed. To fit it exactly to our model, we need to make the input layer size resembles the CNN
output size, which is 2048 nodes for the MobileNet CNN shown in figure 45.
โข Hidden layers
Defining Hidden layers is the most challenging part, as there are no specific rules because each
problem has its characteristics. What many developers do is testing different structures, and
then they compare results. This process requires a vast computing resource. For that reason,
we will just use the most common format used in problems like our situation, where there are
only ten output classes.
โข Output layer
The output layer is the result layer. The size of it will be ten nodes as our defined food
categories. We will use the SoftMax function as shown in figure 45 to find a probability
between 1 and 0 about the most probable class.
Figure 45 : Classifier Structure
62. Page | 51
The first and second hidden layers will use the ReLU activation function. The first layer will
use 1000 nodes, and the second one with 500 nodes.
4.4.3 Training the Model
For training the classifier model, we will use the cross-entropy loss function. Before that, we
need to divide our image data sets into training, testing, and validation data [18]. As illustrated
in figure 46. Because ML models tend to perform well on the training data sets but cannot
generalize to data that has not to be viewed before, this is called the overfitting problem. To
avoid this problem, we need a proportion of the data to be for validation in order to test when
we should stop the training process. And at the end, we need another portion of data outside of
the training process to test the model's real performance as if it is working on real-world data.
Figure 46 : Data Sets Division
Validation data will be involved in the training phase to examine the real performance of our
model. Then we save checkpoints for each iteration in the model that makes a better result on
the validation data set as shown in figure 47.
Figure 47 : Checkpoints Design Pattern [31]
80% 10% 10%
63. Page | 52
4.4.4 Evaluating the Model
To evaluate the model, we will use the testing data to measure the model precision by dividing
the number of correct classifications by the total number of the data set elements.
4.4.5 Deploying the model
Before we proceed to the Android development, we need to save the final result of the CNN
model by using a serialization method included in the PyTorch library to generate a serialized
version of the model for the Android application. This model will then be packaged inside our
application as an asset that we can run on the mobile device.
4.5 Android software design
The second part of the project is about designing an Android application. We will start first by
presenting an overview of the Android application structure and activity lifecycle.
4.5.1 Android Applications structure
Mobile apps are slightly different from standard software for Android platform applications
use up to four basic components [32], and two other additional components.
4.5.1.1 Basic components
โข Activities :
Activities are the fundamental building blocks for Android apps. An activity can be considered
as an individual window containing a graphical interface for interaction with the user.
โข Services :
Services represent a process running in the background, designed for continuous operations,
which do not have a graphic interface. They are usually used to perform long-lasting tasks.
โข Content providers :
Content providers grant a level of abstraction for any data stored on the device that can be
accessed from multiple applications.
โข Broadcast providers :
Broadcast providers are system messages that circulate in the device and alert applications to
various events.
64. Page | 53
4.5.1.1 Additional components
โข Fragments :
Fragments are an optional component. They help to change the configuration of activities to
support large and small screens on mobile devices.
โข Views :
Views are the basic building blocks of the application user interface. They are arranged in a
tree and used to display text fields, images, buttons, and so on.
4.5.2 Activity lifecycle
Figure 48 : Activity lifecycle in Android [33]
65. Page | 54
The life cycle of an Android activity has four basic states controlled by six callbacks, as shown
in figure 48:
โข Launched State: this state is when the user launches the app by clicking on its icon. The
Android system will then create a new instance for the launched activity.
โข Running State: this is when the activity is displayed on the screen executing its code or
waits for the user input. It is the state between the onResume() and onPause() callbacks.
โข Killed state: This is when the activity still saves the necessary data. The user always
has access to it, but the Android system shut it down to save memory for a higher
priority app that the user is focusing on.
โข Shutdown state: This is the final phase where the app memory is being released before
being shut down.
4.5.3 Software architecture
For this Android application, we will use three Activities Classes, one for showing a user guide,
one for the camera and image classification process, and one to display the full description of
the result food type.
4.5.3.1 Welcome/Guide Activity
UI Components (figure 49):
โข ConstraintLayout parent view for
structuring and organizing child views.
โข TextView for the activity title
โข ViewPager to display guide
โข Button view to skip to the next Activity
Figure 49 : Welcome Acitvity
66. Page | 55
4.5.3.2 CameraClassification Activity
4.5.3.3 Description Activity
Figure 51 : Description Activity
UI Components (figure 50):
โข ConstraintLayout parent view.
โข TextureView for camera preview.
โข LinearLayout parent view.
โข TextView/Button for top 1 result.
โข TextView/Button for top 2 result.
โข TextView/Button for top 3 result.
UI Components (figure 51) :
โข ScrollView parent view.
โข ConstraintLayout parent view.
โข TextView for food name.
โข ImageView for food image.
โข LinearLayout parent view
โข TextView for Description text.
โข TextView for Ingredients list.
Figure 50 : CameraClassification Activity
67. Page | 56
UI Component Description
TextView TextView is a view that displays a text or any type of string.
ImageView ImageView is a view that displays an image from its source path.
Button Button is a view to display a button Ui component that can handle click events.
TextureView TextureView is a view that displays the content stream.
ViewPager
ViewPager is a view that allows the user to swipe left or right to view multiple
contents. We will use it to display the guide.
LinearLayout LinearLayout is a view group that linearly organizes subviews.
ConstraintLayout ConstraintLayout is a view group to place views with position constraints.
ScrollView ScrollView is a view group to display subviews on a scrollable page.
Table 8 : UI Components
4.5.3.4 CameraClassification Class Diagram
This UML Class diagram shown in figure 52 is to explain the internal structure of
CameraClassification Activity.
Figure 52 : CameraClassification Class Diagram
68. Page | 57
Class Description
BaseModuleActivity It is a base class for activities from Android SDK.
ImageClassificationActivity
This activity class is responsible for the image classification process,
reading camera data, and loading the Machine Learning model.
AbstractCamera
The class provides an API surface that connects to an Android device
camera and requests an image stream.
BackgroundThread
This class is used to create a background thread using the
HandlerThread class.
PythonModel This class is responsible for holding the Machine Learning model.
AnalyseResult
This is an inner class to exchange result data between the background
thread and the main thread.
Table 9 : Kotlin Classes Description
4.6 Conclusion
In this chapter, we have outlined the software structure in a way that replies to the specified
project requirements. We have also defined the Machine learning processes and algorithms for
creating a convolutional neural network. And, we have defined the design pattern for the
Android application. And now we are ready to pass to the realization phase.