Machine Learning Model Autoselection for Cloud Analytics
1. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Dynamic Autoselection and Autotuning ofMachine Learning Models
forCloud Network Analytics
Cloud networks will be monitored via signals to allocate proper resources and
this resource allocating decision will be taken using machine learning
algorithms such as KNN, Naïve Bayes, Random Forest, Decision Tree,
Boosting, StochasticGradient, Gradient Boosting, and Multilayer Perceptron,
but this algorithms accuracy may change due to changes in availability and
unavailability of signal, sometime training model generated on some signal data
may give high prediction accuracy and sometime it may give less accuracy if
data changes. If less accuracy model chosen for prediction of cloud allocation
resources then it may not give correctly predicted value.
To overcome from above issue author is introducing Autoselection and
Autotuning model choosing concept where application generated training model
on various algorithms and generated accuracy will be voted between this
algorithms to choose high accuracy model and whatever algorithms gives high
accuracy then this application will automatically choose that model and tune the
application to use that model for further prediction of cloud resources.
By using voting concept with accuracy application can tune itself with best
working model.
To have best accuracy for each algorithm clustering technique will be applied
on dataset to put all closer or similar data into one clusters and only that cluster
will be chosen for training (all algorithms) who gave best accuracy.
This application consists of following phases
1) Matching a single optimized model to a given contextin a dynamic
environment. To have single training model with best accuracy in this dynamic
and distributed (network) environment where data changes frequently.
2) Creating and building multiple models and selecting the best for a given
context. Building multiple models with multiple algorithms to choose best one
3) Closed loop, auto-selection mechanism in the cloud DevOps environment.
The concept name DevOps where selection and tuning of model will be done
automatically.
2. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
4) Using unsupervised clustering to segment the dataset ahead of supervised
classification. Using clustering technique to put all similar data into one cluster
and choose best performing cluster for training model
5) End-to-end comparison with Ensemble Machine Learning. Can compare with
other algorithms to choose best one
6) Deep learning implementation and some of its hyper parametersTuning. This
algorithms can be compare with Deep Learning also but Deep Learning will
take too much long time for model generation and may not suitable in this cloud
services where response required quickly.
To implement above concept this paper proposes 3 algorithms
Model Auto Selection Algorithm: In this algorithm we will choose model from
algorithm who is correctly predicting class label of given test data or providing
better accuracy
Model Selection with UnsupervisedLearning + Supervised Learning Algorithm:
In this algorithm we will cluster data and choose only that cluster to generate
training model who is giving better accuracy.
Parameter Tuning with Autoselection Algorithm: Accuracy of one algorithm
will be compare with other algorithm to auto select that algorithm who has best
accuracy.
Dataset Information
To implement this paper author has used ‘UNSW-NB15’ dataset which contains
information about http request attack and non-attack signatures and this dataset
will be passed to application as streams (streams will consider as data coming
from distributed network environment in place of signal data) and application
will monitor such stream and generate training model and to predict new request
contains attack signature or non-attack signatures. This dataset available inside
dataset folder. This folder contains ‘DatasetURL.txt’ file which contains dataset
URL from where it’s downloaded. This folder also contains
‘Dataset_Information.txt’ file which contains information about dataset such as
description of dataset column.
Implementation Details and Screen Shots
3. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
To implement this project I design two applications called ‘StreamSender’ and
‘StreamReceiver’. StreamSender will send streams of size 500 records to
StreamReceiver and this receiver contains implementation of 5 algorithms such
as Random Forest, Decision Tree, KNN, StochasticGradient and Naïve Bayes.
First receiver will cluster data and then each algorithm will generate model on
entire stream and then generate one more model on cluster1 and cluster2.
Whoever gives better accuracy will be autoselected and tune for future request
data prediction.
First double click on ‘run.bat’ file from StreamReceiver folder to get below
screen and let it run
Two screen will display in above screen where first screen display stream
details and second screen will display accuracy of each and every algorithm
generated from stream.
Now double click on ‘run.bat’ file from StreamSender folder to get below
screen
4. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen click on ‘Upload UNSW-NB15 Cloud Monitoring Dataset’
button to upload dataset, see below screen
In above screen I am uploading ‘UNSW-NB15’ dataset which contains normal
and attack signature. After uploading dataset will get below screen
5. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen we can see dataset uploaded and now click on ‘Start Streaming’
button to send stream by stream dataset to StreamReceiver application
In above screen while sending stream then each stream data will be displayed in
above screen table. In above screen table by reading column names u can
understand what values dataset contains. In before last column we will have
attack names and in last column will have 0 and 1 values where 0 is normal
6. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
signature and 1 means attack signature. See last columns of above table in
below screen
In above screen in last column when value 1 is there then its displaying name of
attack beside that column. After sending streams wait till u get below screen
dialog box, this dialog box indicates all streams processed at receiver side
7. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Now we can see each stream process details at StreamReceiver application
screen
In above screen at receiver side we can see 3 stream are received and for each
stream 5 algorithms will generate training model and show its accuracy also.
See below screen
8. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen for each stream all 5 algorithms accuracy we can see and in
selected line which algorithm is auto selected with best accuracy also
displaying. Just scroll down above screen to see accuracy of all 3 streams with
all 5 algorithms. In our implementation we can see Stochastic Gradient
algorithm is giving better accuracy and same algorithm is performing better in
paper also. See below screen with scroll down
9. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen we can see accuracy for all 3 streams. Now click on ‘Accuracy
Graph’ button to see accuracy in graph
10. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above graph x-axis represents full features, cluster1 and 2 features and y-axis
represents accuracy for full, cluster1 and 2. Each algorithm line represents with
different colour and we can see all stochastic algorithm performing well
compare to other algorithms. Now click on ‘Features Graph’ to see which
cluster features are giving better accuracy
In above graph we can see we generate 2 clusters and cluster2 features are
giving better accuracy compare to cluster 1 and we can choose cluster2 as auto
selected model. In above graph x-axis represents cluster1 and cluster2 features
and y-axis represents accuracy