0_Phase-1 report.pdf

Department of Computer Science and Engineering
Global Campus, Jakkasandra Post, Kanakapura Taluk, Ramanagara District, Pin Code: 562 112
2022-2023
A Project Phase 1 Report on
“Analysis And Detection Of Autism Spectrum Disorder Using ML ”
Submitted in partial fulfilment for the award of the degree of
BACHELOR OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
PRATHAMESH SAMDADIYA
19BTRCS065
SANTHOSH RAJ R
19BTRCS069
ROSHAN KUMAR
19BTRCS061
Under the guidance of
Dr. Yogesh Kumaran S
Assistant Professor
Department of Computer Science & Engineering
Faculty of Engineering & Technology
JAIN Deemed to be University

Department of Computer Science and
Engineering
Global Campus, Jakkasandra Post, Kanakapura Taluk, Ramanagara District, Pin Code: 562 112
CERTIFICATE
This is to certify that the project work titled “Analysis And Detection Of Autism
Spectrum Using ML” is carried out by Prathamesh Samdadiya (19BTRCS065) ,
Santhosh Raj R (19BTRCS069), Roshan Kumar (19BTRCS061), a bonafide students of
Bachelor of Technology at the Faculty of Engineering & Technology, Jain Deemed-to-be
University, Bangalore in partial fulfillment for the award of degree in Bachelor of Technology
in Computer Science & Engineering, during the year 2022-2023.
Dr.Yogesh Kumaran S Dr. Mahesh T R Dr. S A Hariprasad
Assistant Professor
Dept. of CSE,
Faculty of Engineering &
Technology,
Jain Deemed to be University
Date:
Head of the Department, Dept.
of CSE,
Faculty of Engineering &
Technology,
Jain Deemed to be University
Date:
Director,
Faculty of Engineering
& Technology,
Jain Deemed to be
University
Date:
Name of the Examiner Signature of Examiner
1.
2.

DECLARATION
We, Prathamesh Samdadiya (19BTRCS065) , Santhosh Raj R (19BTRCS069) and
Roshan Kumar (19BTRCS061) are students of seventh semester B.Tech in Computer
Science & Engineering, at Faculty of Engineering & Technology, Jain Deemed to-be
University, hereby declare that the project titled “Analysis And Detection Of
Autism Spectrum Using ML” has been carried out by us and submitted in partial
fulfilment for the award of degree in Bachelor of Technology in Computer Science &
Engineering during the academic year 2022-2023. Further, the matter presented in the
project has not been submitted previously by anybody for the award of any degree or any
diploma to any other University, to the best of our knowledge and faith.
Signature
Name1: Prathamesh Samdadiya
USN: 19BTRCS065
Name 2: Santhosh Raj R
USN: 19BTRCS069
Name 3: Roshan Kumar
USN: 19BTRCS061

ACKNOWLEDGEMENT
It is a great pleasure for us to acknowledge the assistance and support of a large
number of individuals who have been responsible for the successful completion of this project
work.
First, we take this opportunity to express our sincere gratitude to Faculty of
Engineering & Technology, Jain Deemed to be University for providing us with a great
opportunity to pursue our Bachelor’s Degree in this institution.
In particular we would like to thank Dr. S A Hariprasad, Director, Faculty of
Engineering & Technology, Jain Deemed to be University for their constant encouragement.
we would like to thank Dr. Geetha, Dean Academic, Faculty of Engineering &
Technology, Jain Deemed to be University for their constant encouragement and support.
It is a matter of immense pleasure to express our sincere thanks to Dr. Mahesh T R ,
Head of the Department, Computer Science & Engineering, Jain Deemed to be University,
for providing right academic guidance that made our task possible.
we would like to thank Dr. Mahesh T R, Program Head, Faculty of Engineering &
Technology, Jain Deemed to be University for their constant encouragement and expert advice.
We would like to thank our guide Dr. Yohesh Kumaran S Assistant Professor, Dept.
of Computer Science & Engineering, Jain Deemed to be University, for sparing his/her
valuable time to extend help in every step of our project work, which paved the way for smooth
progress and fruitful culmination of the project.
We would like to thank our Project Coordinator Dr. R.Chandramma and Dr. Rajat
Bhardawaj and all the staff members of Computer Science & Engineering for their support.
We are also grateful to our family and friends who provided us with every requirement
throughout the course.
We would like to thank one and all who directly or indirectly helped us in completing
the Project phase1 work successfully.
Signature of Students

ABSTRACT
Autism Spectrum Disorder (ASD) is a neurological disorder that can affect a
person's language learning, language, cognitive and social skills throughout life.
Symptoms usually appear during development and affect about 3% of the
population worldwide. The disorder also includes restricted and repetitive
patterns of behaviour. The term "spectrum" in autism spectrum disorders refers
to a wide range of symptoms and severity. Some children show signs of autism
spectrum disorder from an early age. Decreased eye contact, lack of response to
names, indifference to significant others. Some children develop normally in the
first few months or years of life, but then suddenly become withdrawn,
aggressive, or lose pre-existing language skills. Symptoms are usually seen by
the age of 2 years . According to ASD, the problem begins in childhood and
continues through adolescence and adulthood. Fuelled by the increasing use of
machine learning techniques in the medical diagnostic research field, this article
attempts to explore possible uses of Logistic Regression, Naive Bayes, Support
Vector Machine, Convolutional Neural Network, Random Forest Classifier to
analyse and detect the autism spectrum disorder.

TABLE OF CONTENTS
Page No
List of Figures ix
Nomenclature used Ix
Abstract viii
Chapter 1 01
1. INTRODUCTION 01
Chapter 2 02
2. Literature Survey 02
Chapter 3 04
3. Objective and Methodology 04
3.1 Objective 04
3.2 Methodology 04
Chapter 4 06
4. System Design 06
4.1 System Architecture 06
Chapter 5 07
5. Hardware and Software requirement 07
5.1 Hardware requirement 07
5.2 Software requirement 07
Chapter 6
6.Results And Discussion 8
Conclusion 10
References 11

LIST OF FIGURES
Fig. No. Description of the figure Page No.
3.1 Methodology 5
4.1 System Architecture 6
NOMENCLATURE USED
R-CNN Region Based Convolutional Neural Networks
CNN Convolution Neural Networks
VGG Visual Geometry Group
vii

Chapter 1
INTRODUCTION
The problem of Autism Spectrum Disorder (ASD) is increasing rapidly in all age groups of the
population today. Early detection of this neurological condition can greatly help patients maintain their
mental and physical health. With the increasing application of machine learning-based models to
predict various human diseases, early detection based on various health and physiological parameters
appears to be possible. This factor has increased interest in the detection and analysis of ASD disorders
in order to improve better treatment modalities. ASD can be difficult to recognize because there are
several other mental disorders with few symptoms that are very similar to those of ASD, making this
task difficult.
Autism spectrum disorders are problems related to human brain development. People suffering from
autism spectrum disorders are generally unable to engage in social interaction or communication with
others. A person's life is usually affected throughout life. It is interesting to know that both
environmental and genetic factors can contribute to this disease.Symptoms of this problem can begin
as young as three years of age and last a lifetime.
Patient suffering from this disease cannot be completely cured, but if symptoms are detected early, the
effects can be mitigated for some time. Scientists have yet to pinpoint the exact cause of ASD by
assuming that human genes are responsible.Human genes influence development by influencing the
environment.
He has risk factors that affect ASD, such as: Examples: low birth weight infants, siblings with ASD,
elderly parents, etc. Instead, we have social interaction and communication issues such as:
● Does not respond appropriately to sounds
● Does not want to be hugged
● Cannot use gestures
● Does not interact with others
● Inappropriate attachment
● Wants to live alone
● Use of Echo Word, etc.
People with ASD also struggle with limited interests and repetitive behaviors. The following
list shows a concrete example of the type of behavior.
● Repeating certain actions, such as repeating a word or phrase over and over again.
● Gets upset when routine changes.
● Has some interest in certain aspects of a subject, such as numbers or facts.
● Some people are less sensitive than others, such as light and noise.
1

Early detection and treatment are the most important steps in reducing the symptoms of autism
spectrum disorder problems and improving the quality of life of people with the disorder. ASD that
improves. However, there is no medical test that detects autism. Symptoms of ASD are usually
recognized by observation. For older adults and adolescents in school, ASD symptoms are usually
identified by parents and teachers. ASD symptoms are then evaluated by the school's special education
team. The school team suggested that these children see a doctor for the necessary tests. Because some
symptoms of ASD can overlap with other psychiatric disorders, it is much more difficult to recognize
ASD symptoms in adults than in older children and adolescents.Brain images show 2 Observing
changes in a child's behavior is easier because it can be identified by age and can be seen earlier than
the autism-specific brain images at 6 months of age.
Department of CSE, FET, Bangalore.
2

Chapter 2
LITERATURE SURVEY
Vaishali R, Sasikala R et al. [3] proposed a method to identify autism with an optimal set of
behaviors. In this work, his ASD diagnostic dataset of 21 features from the UCImachine learning
repository was experimented with a swarm intelligence-based binary firefly function selection wrapper.
The alternative hypothesis of experiment claims that the machine learning model can achieve better
classification accuracy with a minimal subset of his features. Using a swarm intelligence-based
single-objective binary firefly feature selection wrapper, we found that 10 features out of 21 features in
the ASD dataset were sufficient to distinguish between ASD and non-ASD patients.The results
obtained with this approach yield average accuracies ranging from 92.12% to 97.95% using the best
subset of features, which is approximately equal to the average accuracy obtained from all ASD
diagnostic data records generated.
M. S. Mythili, A. R. Mohamed Shanavas et al. [4] conducted a study on ASD using classification
techniques. The main purpose of this paper was to identify autism problems and levels of autism.In this
neural network, SVM and fuzzy techniques were used along with his WEKA tool to improve student
behavior and social Analyze human interactions.
Fadi Thabtah et al. [5] proposes his ASD screening model using machine learning adaptation and
DSM-5. The screening tool was used to meet his goal(s) of ASD screening. In this paper, researcher
described his ASD machine learning classification and its strengths and weaknesses. Using the
DSM-IV instead of the DSM-5 manual, the researcher sought to highlight his issues related to existing
his ASD screening tools and the consistency of such tools.
Li B, Sharma A, Meng J, Purushwalkam S, Gowen E (2017) et al. [6] We used a machine learning
classifier to imitatively recognize autistic adults. The purpose of this study was to investigate
fundamental issues related to the test conditions and identified motor parameters. The dataset contains
16 ASC participants with a series of hand gestures. It extracted 40 kinematic constraints from 08
mimic conditions using machine learning techniques. This study demonstrates that for small samples,
machine learning techniques can be applied to analyze high-dimensional data and diagnostic
classification of autism.
Me. Kosmicki1, V. Sochat, M. Duda and D.P. Wallet Al. [7] employed a search method for a minimal
set of features to detect autism. The authors recorded a clinical score for ASD using a machine
learning approach. ADOS was performed on a behavioral subset of children based on the autism
spectrum. Eight different machine learning algorithms were used in this work. This includes
identification of his back reference features step by step on 4540 scoresheets. Using 9 of the 28
behaviors in module 2 and 12 of the 28 behaviors in module 3, where it identifies ASD risk with an
overall accuracy of 98.27% and 97.66%, respectively.
Department of CSE, FET, Bangalore.
3

Chapter 3
OBJECTIVE AND METHODOLOGY
3.1 Objective
● The major Objective of this application is to analyze the presence of AUTISM
SPECTRUM DISORDER in the groups of Toddlers , Teenagers and the Adults.
● Predict the presence of the disease which cannot be easily identified normally.
● To also predict the perfect model which can provide higher rate of accuracy based on the
results.
3.2 Methodology
The dataset [1] we used was compiled by Dr. Fadi Thabtah [5] and contains categorical,
continuous and binary attributes. Originally, the dataset had 1054 instances along with 18
attributes (including a class variable). As the dataset contained several non-contributive and
categorical attributes, we had to pre-process the data. Preprocessing refers to the
transformations applied to a data set before it is fed into a model. It is done to clean raw or
noisy data and adapt it for training and analysis. We removed non-contributing attributes,
namely 'Case_No', 'Who completed the test' and 'Qchat-10-Score'.
To deal with categorical values, we use label encoding. Label Encoding converts labels into
numeric form to make them machine readable. Repeated labels are assigned the same value as
before. Four traits having 2 classes (Sex, Jaundice, Family_mem_with_ASD and
Class/ASD_Traits) were selected for binary coding of labels. Label encoding turns out to be
ineffective if there are more than 2 classes. One-Hot Encoding is used for multi-class features to
avoid hierarchical ordering by model. The feature 'Ethnicity', which has 11 classes, was coded
to one.

Fig.3.1Methodology of the system
1. DATA PRE-PROCESSING: Data preprocessing is the technique of
transforming raw data into a meaningful and understandable form. Real-world data is
often incomplete, inconsistent, and contains many errors and null values. Properly
preprocessed data always leads to better results. Various data preprocessing methods are
used to handle incomplete and inconsistent data, such as: B. Missing value handling,
outlier detection, data discretization, data reduction (dimension), and count reduction.
The problem of missing values in these datasets was addressed by imputation method.
2. TRAINING MODEL :The entire data set is split into two parts. That is, one part trains
the data set and the other part tests the data set in a ratio of 80:20 respectively. For
cross-validation, the training data were again split into two parts. One is the training data
set and the other is the validation data set, each with a ratio of 80:20.The final training,
testing and validation will be the sentences on which the classification was performed.
a. LOGISTIC REGRESSION:LR is a regression tool used to analyze binary
dependent variables. Its initial value is in the form of either 0 or 1. Used for
continuous value recording. It shows the relationship between the dependent
binary variable and the nominal or ordinary variable. It can be represented by the
sigmoid function.
b. CONVOLUTIONAL NEURAL NETWORK :CNN is one of the known deep
learning techniques for creating models for various problems [10], [11], or [12].
This is his feedforward neural network inspired by the human brain. A CNN
model contains an input layer, an output layer, and many other different layers.
H. Convolutional layers, max pooling, fully connected layers, and regularization
layers. Their activation functions can be computed using matrix multiplication
followed by bias offset.
c. SUPPORT VECTOR MACHINE:SVM is a linear supervised machine
learning approach used for classification and regression. Pattern recognition
troubleshooter. No overfitting issues. SVM separates classes by defining decision
boundaries[8].
d. NAIVE BAYES :A naive Bayesian classifier is a supervised learning algorithm.
This is a generative model and is based on joint probability distributions. The
concept of Naive Bayes is based on the assumption of independence. Shorter
training time compared to his for SVM and ME models. Calculate the posterior
probability of a record using the prior probability and the probability[9].
e. RANDOM FOREST CLASSIFIERS:Random Forest Classifier is a flexible
algorithm that can also be used for classification, regression, and other tasks
[13]. It works by creating multiple decision trees for any given data point. After
predictions are obtained from each tree, the best solution is selected by voting.

Chapter 4
SYSTEM DESIGN
System design involves system architecture and working of the modules. The
functioning of the system is explained using UML diagrams.
4.1 System Architecture
Figure 4.1 shows the general functions and processes of the system. We start by
preprocessing the dataset to eliminate missing values and outliers, remove noise, and
encode categorical attributes. It also uses feature engineering to select the most favorable
of all the features present in the dataset. This reduces the dimensionality of the data,
increasing speed and efficiency during training. Once the dataset is preprocessed, the
output labels (ASD or no ASD) are predicted using classification algorithms such as
Logistic Regression, Naive Bayes, Support Vector Machines, K-Nearest Neighbors, and
Random Forest Classifier. The accuracy of each classifier is observed and compared. In
addition, metrics such as F1 score and accuracy recall were calculated to better evaluate
each classifier. If the classifier works well, the training accuracy will be higher than the
testing accuracy. This model is considered the best model and used for further training and
classification. A brief description of this approach is given in the methodology section.
Fig. 4.1. AUTISM
SPECTRUM DISORDER
PREDICTION

Chapter 5
HARDWARE AND SOFTWARE REQUIREMENTS
The following are basic hardware and software required to train and test the program.
5.1 Hardware Requirements
1. Processor : Intel Dual-Core processor.
2. RAM : 2-4 GB.
3. HDD : 10 GB.
5.2 Software Requirements
1. Operating System - Windows 10,8,7,Windows 2007/XP.
2. Documentation -MS Word, MS PowerPoint, MS Excel.
3. Language - Python

Chapter 6
RESULTS AND DISCUSSION
6.1 Results and Discussion
Performance Evaluation metrics :Measuring performance is key to verifying how well your
classification model is doing in achieving your goals. The Performance metric is used to assess
the effectiveness and performance of a classification model on the test data set. To evaluate
model performance, it is important to choose appropriate metrics, such as: B. Confusion matrix,
accuracy, specificity, sensitivity, etc. Find the performance metric using the following formula:
Experimental results of various machine learning algorithms, selecting all features, were
presented against adult, child, and adolescent ASD screening data sets. All 21 features are
selected to find the specificity, sensitivity, and accuracy of the predictive model. I used
Gaussian NB to implement the naive bias algorithm. The RBF Kernel was used for SVM with a
gamma value of 0.1. We used N=5 for ANN. The ANN used the Adam optimizer with a
learning rate of 0.01 and epochs of 100. The CNN used the Relu activation function, Adam
optimizer, binary cross-entropy loss functions, 16 and 32 filters, and 0.5 failures at 150 epochs.
A total of performance measurements for all machine learning classifiers using all three datasets
are detailed below.
(Overall Results for Autistic Spectrum Disorder Screening Data for Adult)

Evaluation of different machine learning models on the adult ASD diagnostic dataset revealed
accuracies ranging from (95.75% to 99.53%) on the original dataset. Her K-NN classifier with
K=5 achieved the lowest accuracy of 95.75%. CNN on the original dataset he achieved a
prediction accuracy of 99.53%. The learning curve of any machine learning algorithm also
represents the outcome of the predictive model.
(Overall Results for Autistic Spectrum Disorder Screening Data for children)

Chapter 7
CONCLUSION AND FUTURE WORK
7.1 Conclusion
This study attempted to detect autism spectrum disorders using various machine learning and
deep learning techniques. We analyzed the performance of implemented model using various
performance metrics to detect his ASD in a nonclinical dataset from three age groups. children,
adolescents, and adults. When compared the results with another recent study [3] on this issue,
the CNN classifier outperformed the SVM, including all feature attributes after missing value
treatment. In this work, after handling missing values, both the SVM-based and CNN-based
models show the same predictive accuracy of about 98.30% for the ASD child dataset.
However, for the remaining two datasets, the CNN-based model was able to achieve higher
accuracy than the other modeling techniques considered. These results strongly suggest that the
CNN-based model can be implemented to detect autism spectrum disorders instead of other
conventional machine learning classifiers proposed in previous studies.

REFERENCES
1. Dataset:https://www.kaggle.com/fabdelja/autism-screening-for-toddl
ers. Accessed 1 Oct 2019.
2. Thabtah, F. An accessible and efficient autism screening method for
behavioral data and predictive analyses. Health Informatics Journal
2019;25(4):1739–55. https://doi.org/10.1177/1460458218796636.
3. Vaishali, R., and R. Sasikala. "A machine learning based approach to
classify Autism with optimum behavior sets. (2018) " International
Journal of Engineering & Technology 7(4).
4. M. S. Mythili, and AR Mohamed Shanavas. (2014) “A study on Autism
spectrum disorders using classification techniques.” International
Journal of Soft Computing and Engineering (IJSCE),
5. Fadi Thabtah. (2017). “Autism spectrum disorder screening: machine
learning adaptation and DSM-5 fulfillment.” In Proceedings of the 1st
International Conference on Medical and Health Informatics.
6. Baihua Li, Arjun Sharma, James Meng, Senthil Purushwalkam, and
Emma Gowen. (2017) “Applying machine learning to identify autistic
adults using imitation: An exploratory study.”
7. J. A. Kosmicki, V. Sochat, M. Duda, and D. P. Wall. (2015) “Searching
for a minimal set of behaviors for autism detection through feature
selection-based machine learning.”
8. Keerthi, S. Sathiya, Shirish Krishnaj Shevade, Chiranjib
Bhattacharyya, and Karuturi Radha Krishna Murthy. (2001)
“Improvements to Platt's SMO algorithm for SVM classifier design. “
Neural computation, 13(3):637-649.
9. John, George H., and Pat Langley. (1995). “Estimating continuous
distributions in Bayesian classifiers.” In Proceedings of the Eleventh
conference on Uncertainty in artificial intelligence (pp. 338-345).
Morgan Kaufmann Publishers Inc.
10.Sarfaraz Masood, Abhinav Rai, Aakash Aggarwal, Mohammad
Najmud Doja, and Musheer Ahmad. (2018) “Detecting distraction of
drivers using convolutional neural network.” Pattern Recognition
Letters.
11. Sarfaraz Masood, Adhyan Srivastava, Harish Chandra Thuwal, and
Musheer Ahmad. (2018). “Real-time sign language gesture (word)
recognition from video sequences using CNN and RNN.” In Intelligent
Engineering Informatics (pp. 623-632). Springer, Singapore.
12. Sarfaraz Masood, Harish Chandra Thuwal, and Adhyan Srivastava.
(2018). “American Sign Language character recognition using
convolution neural network. “ In Smart Computing and Informatics
(pp. 403-412). Springer, Singapore.
13.Random Forests(r), Explained.

https://www.kdnuggets.com/2017/10/random-forests-explained.html
. Accessed 8 Oct 2019

0_Phase-1 report.pdf

Recommended

Recommended

More Related Content

Similar to 0_Phase-1 report.pdf

Similar to 0_Phase-1 report.pdf (20)

Recently uploaded

Recently uploaded (20)

0_Phase-1 report.pdf