SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 814
Comparatively Analysis on K-Means++ and Mini Batch K-Means
Clustering Algorithm in Cloud Computing with Map Reduce
Virendra Tiwari1, Dr.Akhilesh A. Waoo2
1
Assistant Professor, Dept. of Computer Science & Application, AKS University, Satna, Madhya Pradesh, India
2
Associate Professor, Dept. of Computer Science & Application, AKS University, Satna, Madhya Pradesh, India
-----------------------------------------------------------------------***--------------------------------------------------------------------
Abstract - Cloud computing provides the functionality to host
the servers in a dispersed network so wide range of customers
can access the applications asutilities over the internet. Oneof
the very important aspects for the development of cloud
computing is the rapidly growing the volume ofBigdatawhich
requires to be managed and examined efficiently. Experts and
Researchers design and utilize multiple algorithms with
intelligent approach to gain useful knowledge from a huge
volume of data set. Hadoop MapReduce is a new and emerging
software platform suitable for writing applications easily
which process huge amounts of data in-parallel on large
clusters of commodityhardwareinareliableandfault-tolerant
manner. There only specific unsupervised learning algorithms
for big data problems which are working and being executed
successfully in MapReduce technique and are deployedonhigh
volume of data set. MapReduce frameworks performance
suffers degradation due to the sequential processing
approaches. To overcomethislimitationandreducingcosts,we
can introduce cloud computing with parallel processing in
MapReduce a powerful and effective approach for the future
technology enhancements. This survey givesabriefoverviewof
cloud computing and comparatively analysis on most popular
clustering algorithms named K-means++ and Mini Batch K-
means.
Key Words: Cloud computing, Big data, Hadoop,
MapReduce, K- Means++, and Mini Batch K-means.
1. INTRODUCTION
Cloud computing in simple terms on demand availability of
hardware resources for computing or storage purpose over
the Internet instead of ourcomputer'sharddrive. Withcloud
computing, clients can access files and many applications
from any remote device through internet. It reduces the
work and cost for client to setup and manage the hardware
and required infrastructure [11][12]. Cloud provider take
the ownership of providing the set up and maintaining the
hardware infrastructure. Most famous cloud service
providers are Amazon, Alibaba, Google and Microsoft [13].
Most famous Cloud Service Providers [11] [14]
a) Google: Google Cloud Platform solves issues with
accessible AI & data analytics and uses resources
such as hard disks, computers,virtual machines,etc.
located at Google data centers. Googleisa dedicated
cloud computing service provider, with all the
storage available online so it can work with the
cloud apps: Google Sheets, Google Docs, andGoogle
Slides. Most of Google's services could be
considered cloud computing: Gmail, Google Maps,
Google Calendar, Picasa, Google Analyticsandsoon.
b) Alibaba: Alibaba is one the largestcloudcomputing
platform which created a global footprint with over
1500 CDN Nodes worldwide of 19 regions and 56
availability zones across more than approximate
200 countries. Its prominent features arehelpingto
protect and backup your data and help you to
achieve faster results.
c) Apple iCloud: Apple's cloud service is mainly used
for online backup, storage, and synchronization of
your mail, calendar, contacts, and more. All thedata
we need remains available on Mac OS, iOS, or
Windows device.
d) Amazon Cloud Drive: Web hosting platformwhich
storage reliable and cost-effective solutions. If you
have Amazon Prime, you get unlimited image
storage. It's essentially storage for anything digital
and It is the most popular as it was the first to enter
the cloud computing space.
World has created 90% of the data in the last two years
which comes from everywhere and surely this is only going
to grow with arrival of internet of things. By 2020, it’s
estimated that approx 1.7MB of data will be created every
second for every person on planet"[13]. Processing such a
huge amount of data will be extreme challengingforthedata
analyzers and researchers. Classical sequential methods of
programming are not very efficient and hence the
requirement of parallel processinggainedimportance.Cloud
provides the most suitable environment and Hadoop
MapReduce as the processing components a programming
paradigm that enables massive scalability across hundreds
or thousands of servers in a Hadoop cluster [16][17].
Virtually we are floating in oceans of data, with theadvent of
Big data; MapReduce emerged as a most effective solution.
Number of researchers work is undergoing on the extension
of MapReduce carried out with new features, functionalities
and mechanism to perfect it for a new set of problems
[18].Various sequential programming algorithms are
nowadays being use to convert to Hadoop MapReduce
programming paradigm. There may be few algorithms that
cannot be parallelized, so their uses due to working
efficiency are less in the real world as parallelization is of
essence to process Big Data. Hence some of the conventional
useful algorithms for clustering may provide a great
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 815
platform and enhancements tothenewemergingtechnology
and also very beneficial to the business enterprise
organizations [13].
2. BACKGROUND
a) Cloud computing
Cloud computing is the delivery of different services like
managing hardware and software through the Internet. It
has evolved since the inception of internet due to its
efficiency in storing data, computationandlessmaintenance
cost. Cloud-based storage is making it possible to save files
and documents to a remote database and retrieve them on
demand. Services can be in public and private. Public
services are providing online for a fee while private services
are being hosted on a network to particular clients [19].
Figure. 1: Cloud computing services type
Figure. 2: Cloud computing deployment models
As shown in Fig. 1, cloud providers services can be
categorized in three delivery types that are Software as a
Service (SaaS), Infrastructure as a Service (IaaS) and
Platform as a Service (PaaS). Cloud deployment services
comes in the form of private, hybrid, public and community
based on their location as mentioned in Figure 2.Public
clouds based on a shared cost model have many customers
compared to other forms of cloud. Private clouds are perfect
for organizations that have high management demands,
high-security requirements, and availability requirements
usually but are not cost efficient. CommunityCloudmutually
shared model between organizations [21].
b) Hadoop
Hadoop is an open-source software framework considered
to be one of the best tools to handle and analyze very huge
volume of data. Currently being used by Google, Facebook,
LinkedIn, Yahoo, Twitter etc. It has two major components
Hadoop Distributed File System (HDFS) and another is
MapReduce. Hadoop is being highly in used distributed
Computing platform specially in past 10 years as big data is
getting evolved with social media generating PETA bytes of
data everyday which can be used for data mining, predictive
analytics, and machine learning applications. Hadoop can
run with single node or multi-node cluster designed
specifically for storing and analyzing huge amounts of
unstructured data in a distributed computing environment.
When HDFS takes data as input, it breaks the information
down into various separate blocks and distributes all the
sepreted blocks to different nodes in a cluster; it provides
highly efficient parallel processing. HDFS can be thought of
Data node (stores actual data in HDFS) + Name node( holds
the meta data for the HDFS ) + Secondary Name
node(merging editlogs with fsimage from the namenode)
and daemon process to manage MapReduce programming
paradigm in HDFS are Job Tracker[20][13].
Figure. 3: Data flow in HDFS
c) Map reduce paradigm
Map Reduce Programming Paradigm technique and a
program model for distributed computing to process huge
amount of data. MapReduce program executes in three
stages: Map phase process and maps the input and creates
several small chunks of data into key-value pairs
[3][22].Complete process gets executed by four phases
namely splitting, mapping, shuffling, and reducing. The
reduce phase combines the data based on common keys and
performs reduce operation defined by the user. The
parallelization implementation occurs with many mappers
created for reading the data and it is not sequential. Because
of this there is high throughput [5][21].The core advantage
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 816
of the MapReduce framework is its fault tolerance due to
periodic reports from each node in the cluster are expected
when work is processed.
Figure. 4: Map reduce in HDFS
3. K-Means++ CLUSTERING ALGORITHM
K-Means++ algorithm is the improved and robust version of
most popular and widely used K-means algorithm in
clustering. K-Means algorithm chooses and initializes the
centroid randomly, forms the clusters for a given dataset
[23].
The performance and accuracy of k-means depends
completely on the goodness of the initialization where this
algorithm observed and found sometimes giving less
accurate clusters. To overcome this limitation, we can use
improved clustering method K-Means++ which overcomes
by initializationparttoimproveK-Meansalgorithm[13][24].
K- Means++ technique, takes an input k, which refers to the
number of clusters that should be generated and n refers to
set of objects [4]. This improved algorithms mainly aims to
initialize the centroids for traditional k-means clustering
algorithm [25].K-Means++ algorithm reduces the
computational time and it can handle huge amount of data
sets.
K-Means++ clustering is one of the partitioning based
clustering algorithm works as follows [25].
[1] Choose initial centroid X uniformly at random from
initial data set.
[2] For each data point X we need to compute D(X)
which is the minimum squared distance between X
and the nearest centroid that has been already
defined.
[3] Choose another data point as second centroidusing
a weighted probability distribution which is
proportional to D(X).
[4] Repeat 2 and 3 until required K centroidshavebeen
chosen.
[5] Assign each record or instance point to a cluster
centre which has least distance.
[6] Replace cluster centroids by calculatingmeanvalue
of all points in the cluster.
In map-reduce pattern mentioned algorithm can be
implemented as follows.
 Map function: The HDFS stores input data as
sequence file of <key, value> pairs and produces a
set of <key, value> pairs as the output of the job[6].
The map function splits the data acrossall mappers,
creates a map task for each split and assigns each
map task to a Task Tracker[7].First Step is Map,
Second Step is Shuffle and Third Step is
Reduce[26][27].
 Reduce function: Aftermapping, reducersareused
for computing Step-2 of the mentioned algorithm.
The Reducer’s job is to process the data that comes
from the mapper [8]. After processing, the
intermediate data can be put in hdfs or stored
locally [9]. Afterwards new centers will be
generated which can be used for further iterative
statements [27].
Figure. 5: K-Means++ Map Reduce
4. Mini Batch K-means
The mini batch k-means clustering algorithm [29] is the
alternative and modified version of the k-means algorithm.
The advantage of this algorithm uses mini-batchestoreduce
the computation time and cost in large datasets not using all
the dataset each iteration [33]. Moreover, it attempts to
provide optimal result of the clustering. To accomplish this,
the mini batch k-means takes mini-batches of a fixed size as
an input randomly, which are subsets of the all dataset [30].
The Mini Batch K-Means framework with Map-Reduce
[31][32]:
1. From the throughput of the input sequence, initial cutoff
gets calculated using a sample of points.
2. Here, MapReduce job splits the input filesintochunks and
further assign each split to a mapper to process thatconsists
of an instance of the k-means algorithm this is called
InputSplit.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 817
3. It will be giving a set of centroids c that is much larger
than the final expected clusters k but, it should be small
enough to fit into the memory.
4. Now, Each split which is the results of clustering gets pass
to single reducer uses thestreamingBall k-meansfunctionto
combine the clusters as required.
Before invoking the Map routine, OutputFormat organizes
the key-value pairs from Reducer for writing it on HDFS.
Based on optimal distance-cutoff parametersarerequiredto
measure the distance between the point and the centroids,
Either the point is merged to any of the existing clusters or
become a new centroids to form new cluster. To determine
Optimal Clusters, the points which are forming the clusters
with any of centroids become the value in key-value pair.
5. CONCLUSIONS AND FUTURE WORK
Today, Big Data and Cloud Computing are the two
mainstream technologies to managing and processing big
data Big data makes you dealing with the massive scale of
data whereas Cloud computing provides you infrastructure.
Today Big Data generating the demand of efficient
computing, well managed storage resources as well as best
way to process the data [27]. Conventional methods are not
found by the researchers and failing to cope up with such
challenges because of which Cloud Computing and Hadoop
MapReduce found efficient and gained popularity [10].
MapReduce is intelligent enough to tackle huge data sets by
distributing processing across many nodes, and then
reducing or combining the results of those nodes. In
summary, Mini Batch K-means and K-Means++ algorithms
discussed and analyzed in this research are robust and
efficient for large set of data. K-means++ always tries to
select centroids that are far away from the existing
centroids, which gives significant improvement over others
with the issue of outlier whereas Mini Batch K-means
algorithm is not only found suitable and efficient for the
processing of huge data, but also make sure the accurate by
reduce the amount of computation required [28][31]. Italso
found giving relatively stable precision, for the outlier
detection of data. For scaling well and efficiently process
large datasets this algorithm requires to be improved in a
way so it could reduce outliers efficiently and to give better
and more accurate results with reduced run time.
REFERENCES
[1]Rajashree Shettar, Bhimasen. V. Purohit, “A Review on
Clustering Algorithms Applicable for Map Reduce”,
International Conference on Computational Systems for
Health & Sustainability, pp. 176-178, 2015.
[2]Adem Tepe, GüRay Yilmaz,“ASurveyonCloudComputing
Technology and Its Application to SatelliteGroundSystems”,
International Conference on Recent Advances in Space
Technologies (RAST), 2013.
[3]K. Singh and R. Kaur, "Hadoop: Addressing challenges of
Big Data," 2014 IEEE International Advance Computing
Conference (IACC), Gurgaon, 2014, pp. 686-689.
[4]Weihzong Zhao, Huifang Ma, Qing He, “Parallel K-Means
clustering Based on MapReduce”, springer-verlag Berlin,
Heidelberg, 2009.
[5]Borthakur. D “The Hadoop Distributed File System:
Architecture and design”, 2007.
[6]Sangeeta Ahuja, M.Ester, H. P. Kriegel, J. Sander, X. Xu, “A
Density based algorithm for discovering clusters in large
spatial database with noise”, Second international
conference on knowledge discovery and Data Mining,1996.
[7]B. Dai and I. Lin, "Efficient Map/Reduce-Based DBSCAN
Algorithm with Optimized Data Partition," 2012 IEEE Fifth
International Conference on Cloud Computing,Honolulu,HI,
2012, pp. 59-66.
[8]V. Gaede, O. G’unther,“Multidimensional accessmethods”,
ACM comput. Surv., Vol. 30, No. 2, pp. 170- 231, 1998.
[9]VaradMeru“Data clustering:UsingMapReduce”,software
Developers Journal, 2013.
[10]Das A.S, Datar M, Garg, and Rajaram S, “Google news
personalization scalable online collaborative filtering”, pp.
271-280, 2007.
[11]Aaqib Rashid and Amit Chaturvedi, "Cloud Computing
Characteristics and Services", 2019.
[12]Mahantesh N. Birje and Praveen S. Challagidad "Cloud
computing review: concepts, technology, challenges and
security “, 2017.
[13]Sweekruth S Badiger and Sushmitha N,"A Review on K-
means++ ClusteringAlgorithminCloudComputingwith Map
Reduce”, 2019.
[14]Dr. Richa Purohit,"Comparative Analysis of Few Cloud
Service Providers Considering Their DistinctiveProperties”,
2017.
[15]Vishal R. Pancholi and Dr. Bhadresh P. Patel,"A Study on
Services Provided by Various Service Providers of Cloud
Computing", 2017.
[16]https://www.ibm.com/analytics/hadoop/mapreduce.
[17]Dr. Anitha Patil,"Securing mapreduce programming.
Paradigm in hadoop, cloud and big data eco-system", 2018.
[18]Jeffrey Dean and Sanjay Ghemawat,"MapReduce:
Simplified Data Processing on Large Clusters", 2004.
[19]Palvinder,"Survey Paper on Cloud Computing”, 2014.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 818
[20]Nasrullah and Ms. Akanksha Bana, "Review paper on
hadoop configuration and implementation in virtual cloud
environment", 2018.
[21]Santhosh voruganti,"Map Reducea ProgrammingModel
for Cloud Computing Based On Hadoop Ecosystem", 2014.
[22]Bisma Bashir, "An Approach of MapReduce
Programming Model for Cloud Computing", 2017.
[23]Md. Zakir Hossain, Md. Nasim Akhtar, R.B. Ahmad and
Mostafijur Rahman,"A dynamic K-means clustering for data
mining ", 2019.
[24]Dhruv Sharma, Krishnaiya Thulasiraman and John N.
Jiang, "A network science-based k-means++ clustering
method for power systems network equivalence", 2019.
[25]G. Yamini and Dr. B. Renuka Devi, "A New Hybrid
Clustering Technique Based on Mini-batch K-means And K-
means++ For Analysing Big Data", 2018.
[26]N. Deshai, S. Venkataramana and Dr. G. P. Saradhi
Varma3 "Research Paper on Big Data Hadoop Map Reduce
Job Scheduling", 2018.
[27]Ch. Shobha Rani and Dr. B. Rama"MapReduce with
Hadoop for Simplified Analysis of Big Data", 2017.
[28] Bo Xiao, Zhen Wang, Qi Liu, and Xiaodong Liu "SMK-
means: An Improved Mini Batch K-means Algorithm Based
on Mapreduce with Big Data", 2018.
[29]Sculley D , "Web-scale k-means clustering", 2010.
[30]Ali Feizollah, Nor Badrul Anuar, Rosli Salleh and Fairuz
Amalina, "Comparative Study of K-means and Mini Batch K-
means Clustering Algorithms in Android Malware Detection
Using Network Traffic Analysis",2014.
[31]Meghana M Chavan, Asawari Patil,Lata Dalvi And
Ajinkya Patil, "Mini Batch K-Means Clustering On Large
Dataset",2015.
[32] Mr. Krishna Yadav and Mr. Jwalant Baria,"Mini-BatchK-
Means Clustering Using Map-Reduce in Hadoop", 2014.
[33]Nadeem Akthar, Mohd Vasim Ahamad and Shahbaaz
Ahmad, "MapReduce Model ofImprovedK-MeansClustering
Algorithm Using Hadoop MapReduce", 2016.

More Related Content

What's hot

Impactofcloudcomputing 141103103626-conversion-gate01
Impactofcloudcomputing 141103103626-conversion-gate01Impactofcloudcomputing 141103103626-conversion-gate01
Impactofcloudcomputing 141103103626-conversion-gate01
Rabia Naushad
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Computer project
Computer projectComputer project
Computer project
Pranav Nedungadi
 
2018 19 Cloudcomputing
2018 19 Cloudcomputing2018 19 Cloudcomputing
2018 19 Cloudcomputing
Rajesh Math
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
Associate Professor in VSB Coimbatore
 
E04432934
E04432934E04432934
E04432934
IOSR-JEN
 
Single cloud
Single cloudSingle cloud
Single cloud
Mazikk
 
Efficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseEfficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big database
ijccsa
 
Green cloud computing
Green cloud computing Green cloud computing
Green cloud computing
JauwadSyed
 
Energy Saving by Virtual Machine Migration in Green Cloud Computing
Energy Saving by Virtual Machine Migration in Green Cloud ComputingEnergy Saving by Virtual Machine Migration in Green Cloud Computing
Energy Saving by Virtual Machine Migration in Green Cloud Computing
ijtsrd
 
Opportunistic job sharing for mobile cloud computing
Opportunistic job sharing for mobile cloud computingOpportunistic job sharing for mobile cloud computing
Opportunistic job sharing for mobile cloud computing
ijccsa
 
Am36234239
Am36234239Am36234239
Am36234239
IJERA Editor
 
A review on serverless architectures - function as a service (FaaS) in cloud ...
A review on serverless architectures - function as a service (FaaS) in cloud ...A review on serverless architectures - function as a service (FaaS) in cloud ...
A review on serverless architectures - function as a service (FaaS) in cloud ...
TELKOMNIKA JOURNAL
 
B02120307013
B02120307013B02120307013
B02120307013
theijes
 
Disaster Recovery in Business Continuity Management
Disaster Recovery in Business Continuity ManagementDisaster Recovery in Business Continuity Management
Disaster Recovery in Business Continuity Management
ijtsrd
 
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENTA REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
INTERNATIONAL JOURNAL OF COMPUTERS AND TECHNOLOGY
 
An Introduction to Cluster Computing
An Introduction to Cluster ComputingAn Introduction to Cluster Computing
An Introduction to Cluster Computing
ijtsrd
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Robert Grossman
 
Energy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingEnergy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computing
Divaynshu Totla
 

What's hot (20)

Impactofcloudcomputing 141103103626-conversion-gate01
Impactofcloudcomputing 141103103626-conversion-gate01Impactofcloudcomputing 141103103626-conversion-gate01
Impactofcloudcomputing 141103103626-conversion-gate01
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Computer project
Computer projectComputer project
Computer project
 
2018 19 Cloudcomputing
2018 19 Cloudcomputing2018 19 Cloudcomputing
2018 19 Cloudcomputing
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
 
E04432934
E04432934E04432934
E04432934
 
Single cloud
Single cloudSingle cloud
Single cloud
 
Efficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseEfficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big database
 
Green cloud computing
Green cloud computing Green cloud computing
Green cloud computing
 
Energy Saving by Virtual Machine Migration in Green Cloud Computing
Energy Saving by Virtual Machine Migration in Green Cloud ComputingEnergy Saving by Virtual Machine Migration in Green Cloud Computing
Energy Saving by Virtual Machine Migration in Green Cloud Computing
 
Opportunistic job sharing for mobile cloud computing
Opportunistic job sharing for mobile cloud computingOpportunistic job sharing for mobile cloud computing
Opportunistic job sharing for mobile cloud computing
 
Am36234239
Am36234239Am36234239
Am36234239
 
50120140502008
5012014050200850120140502008
50120140502008
 
A review on serverless architectures - function as a service (FaaS) in cloud ...
A review on serverless architectures - function as a service (FaaS) in cloud ...A review on serverless architectures - function as a service (FaaS) in cloud ...
A review on serverless architectures - function as a service (FaaS) in cloud ...
 
B02120307013
B02120307013B02120307013
B02120307013
 
Disaster Recovery in Business Continuity Management
Disaster Recovery in Business Continuity ManagementDisaster Recovery in Business Continuity Management
Disaster Recovery in Business Continuity Management
 
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENTA REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
 
An Introduction to Cluster Computing
An Introduction to Cluster ComputingAn Introduction to Cluster Computing
An Introduction to Cluster Computing
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Energy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingEnergy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computing
 

Similar to IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering Algorithm in Cloud Computing with Map Reduce

IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
 
A Brief Introduction to Cloud Computing
A Brief Introduction to Cloud ComputingA Brief Introduction to Cloud Computing
A Brief Introduction to Cloud Computing
IRJET Journal
 
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and ChallengesIRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
IRJET Journal
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
 
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
Eswar Publications
 
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
IRJET Journal
 
IRJET - Cloud Computing Over Traditional Computing
IRJET - Cloud Computing Over Traditional ComputingIRJET - Cloud Computing Over Traditional Computing
IRJET - Cloud Computing Over Traditional Computing
IRJET Journal
 
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
ijccsa
 
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
neirew J
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
cloude computing report
cloude computing reportcloude computing report
cloude computing report
Krishnalal Kj
 
IRJET- Efficient and Secure Data Storage in Cloud Computing
IRJET- Efficient and Secure Data Storage in Cloud ComputingIRJET- Efficient and Secure Data Storage in Cloud Computing
IRJET- Efficient and Secure Data Storage in Cloud Computing
IRJET Journal
 
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
IOSR Journals
 
IRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud ComputingIRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud Computing
IRJET Journal
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Review and Classification of Cloud Computing Research
Review and Classification of Cloud Computing ResearchReview and Classification of Cloud Computing Research
Review and Classification of Cloud Computing Research
iosrjce
 
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGA SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
ijasa
 
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGA SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
ijasa
 
B04 06 0918
B04 06 0918B04 06 0918
International Journal of Computer Science & Information Technology (IJCSIT)
International Journal of Computer Science & Information Technology (IJCSIT) International Journal of Computer Science & Information Technology (IJCSIT)
International Journal of Computer Science & Information Technology (IJCSIT)
ijcsit
 

Similar to IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering Algorithm in Cloud Computing with Map Reduce (20)

IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
A Brief Introduction to Cloud Computing
A Brief Introduction to Cloud ComputingA Brief Introduction to Cloud Computing
A Brief Introduction to Cloud Computing
 
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and ChallengesIRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
IRJET- An Ample Analysis of Cloud Computing Assessment Issues and Challenges
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
A Survey on Cloud Computing Security Issues, Vendor Evaluation and Selection ...
 
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
 
IRJET - Cloud Computing Over Traditional Computing
IRJET - Cloud Computing Over Traditional ComputingIRJET - Cloud Computing Over Traditional Computing
IRJET - Cloud Computing Over Traditional Computing
 
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
A PROPOSED MODEL FOR IMPROVING PERFORMANCE AND REDUCING COSTS OF IT THROUGH C...
 
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
A Proposed Model for Improving Performance and Reducing Costs of IT Through C...
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
cloude computing report
cloude computing reportcloude computing report
cloude computing report
 
IRJET- Efficient and Secure Data Storage in Cloud Computing
IRJET- Efficient and Secure Data Storage in Cloud ComputingIRJET- Efficient and Secure Data Storage in Cloud Computing
IRJET- Efficient and Secure Data Storage in Cloud Computing
 
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
Implementing SAAS: Cloud Computing and Android Based Application Framework fo...
 
IRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud ComputingIRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud Computing
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Review and Classification of Cloud Computing Research
Review and Classification of Cloud Computing ResearchReview and Classification of Cloud Computing Research
Review and Classification of Cloud Computing Research
 
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGA SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
 
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGA SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTING
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
International Journal of Computer Science & Information Technology (IJCSIT)
International Journal of Computer Science & Information Technology (IJCSIT) International Journal of Computer Science & Information Technology (IJCSIT)
International Journal of Computer Science & Information Technology (IJCSIT)
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 

Recently uploaded (20)

power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 

IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering Algorithm in Cloud Computing with Map Reduce

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 814 Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering Algorithm in Cloud Computing with Map Reduce Virendra Tiwari1, Dr.Akhilesh A. Waoo2 1 Assistant Professor, Dept. of Computer Science & Application, AKS University, Satna, Madhya Pradesh, India 2 Associate Professor, Dept. of Computer Science & Application, AKS University, Satna, Madhya Pradesh, India -----------------------------------------------------------------------***-------------------------------------------------------------------- Abstract - Cloud computing provides the functionality to host the servers in a dispersed network so wide range of customers can access the applications asutilities over the internet. Oneof the very important aspects for the development of cloud computing is the rapidly growing the volume ofBigdatawhich requires to be managed and examined efficiently. Experts and Researchers design and utilize multiple algorithms with intelligent approach to gain useful knowledge from a huge volume of data set. Hadoop MapReduce is a new and emerging software platform suitable for writing applications easily which process huge amounts of data in-parallel on large clusters of commodityhardwareinareliableandfault-tolerant manner. There only specific unsupervised learning algorithms for big data problems which are working and being executed successfully in MapReduce technique and are deployedonhigh volume of data set. MapReduce frameworks performance suffers degradation due to the sequential processing approaches. To overcomethislimitationandreducingcosts,we can introduce cloud computing with parallel processing in MapReduce a powerful and effective approach for the future technology enhancements. This survey givesabriefoverviewof cloud computing and comparatively analysis on most popular clustering algorithms named K-means++ and Mini Batch K- means. Key Words: Cloud computing, Big data, Hadoop, MapReduce, K- Means++, and Mini Batch K-means. 1. INTRODUCTION Cloud computing in simple terms on demand availability of hardware resources for computing or storage purpose over the Internet instead of ourcomputer'sharddrive. Withcloud computing, clients can access files and many applications from any remote device through internet. It reduces the work and cost for client to setup and manage the hardware and required infrastructure [11][12]. Cloud provider take the ownership of providing the set up and maintaining the hardware infrastructure. Most famous cloud service providers are Amazon, Alibaba, Google and Microsoft [13]. Most famous Cloud Service Providers [11] [14] a) Google: Google Cloud Platform solves issues with accessible AI & data analytics and uses resources such as hard disks, computers,virtual machines,etc. located at Google data centers. Googleisa dedicated cloud computing service provider, with all the storage available online so it can work with the cloud apps: Google Sheets, Google Docs, andGoogle Slides. Most of Google's services could be considered cloud computing: Gmail, Google Maps, Google Calendar, Picasa, Google Analyticsandsoon. b) Alibaba: Alibaba is one the largestcloudcomputing platform which created a global footprint with over 1500 CDN Nodes worldwide of 19 regions and 56 availability zones across more than approximate 200 countries. Its prominent features arehelpingto protect and backup your data and help you to achieve faster results. c) Apple iCloud: Apple's cloud service is mainly used for online backup, storage, and synchronization of your mail, calendar, contacts, and more. All thedata we need remains available on Mac OS, iOS, or Windows device. d) Amazon Cloud Drive: Web hosting platformwhich storage reliable and cost-effective solutions. If you have Amazon Prime, you get unlimited image storage. It's essentially storage for anything digital and It is the most popular as it was the first to enter the cloud computing space. World has created 90% of the data in the last two years which comes from everywhere and surely this is only going to grow with arrival of internet of things. By 2020, it’s estimated that approx 1.7MB of data will be created every second for every person on planet"[13]. Processing such a huge amount of data will be extreme challengingforthedata analyzers and researchers. Classical sequential methods of programming are not very efficient and hence the requirement of parallel processinggainedimportance.Cloud provides the most suitable environment and Hadoop MapReduce as the processing components a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster [16][17]. Virtually we are floating in oceans of data, with theadvent of Big data; MapReduce emerged as a most effective solution. Number of researchers work is undergoing on the extension of MapReduce carried out with new features, functionalities and mechanism to perfect it for a new set of problems [18].Various sequential programming algorithms are nowadays being use to convert to Hadoop MapReduce programming paradigm. There may be few algorithms that cannot be parallelized, so their uses due to working efficiency are less in the real world as parallelization is of essence to process Big Data. Hence some of the conventional useful algorithms for clustering may provide a great
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 815 platform and enhancements tothenewemergingtechnology and also very beneficial to the business enterprise organizations [13]. 2. BACKGROUND a) Cloud computing Cloud computing is the delivery of different services like managing hardware and software through the Internet. It has evolved since the inception of internet due to its efficiency in storing data, computationandlessmaintenance cost. Cloud-based storage is making it possible to save files and documents to a remote database and retrieve them on demand. Services can be in public and private. Public services are providing online for a fee while private services are being hosted on a network to particular clients [19]. Figure. 1: Cloud computing services type Figure. 2: Cloud computing deployment models As shown in Fig. 1, cloud providers services can be categorized in three delivery types that are Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). Cloud deployment services comes in the form of private, hybrid, public and community based on their location as mentioned in Figure 2.Public clouds based on a shared cost model have many customers compared to other forms of cloud. Private clouds are perfect for organizations that have high management demands, high-security requirements, and availability requirements usually but are not cost efficient. CommunityCloudmutually shared model between organizations [21]. b) Hadoop Hadoop is an open-source software framework considered to be one of the best tools to handle and analyze very huge volume of data. Currently being used by Google, Facebook, LinkedIn, Yahoo, Twitter etc. It has two major components Hadoop Distributed File System (HDFS) and another is MapReduce. Hadoop is being highly in used distributed Computing platform specially in past 10 years as big data is getting evolved with social media generating PETA bytes of data everyday which can be used for data mining, predictive analytics, and machine learning applications. Hadoop can run with single node or multi-node cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. When HDFS takes data as input, it breaks the information down into various separate blocks and distributes all the sepreted blocks to different nodes in a cluster; it provides highly efficient parallel processing. HDFS can be thought of Data node (stores actual data in HDFS) + Name node( holds the meta data for the HDFS ) + Secondary Name node(merging editlogs with fsimage from the namenode) and daemon process to manage MapReduce programming paradigm in HDFS are Job Tracker[20][13]. Figure. 3: Data flow in HDFS c) Map reduce paradigm Map Reduce Programming Paradigm technique and a program model for distributed computing to process huge amount of data. MapReduce program executes in three stages: Map phase process and maps the input and creates several small chunks of data into key-value pairs [3][22].Complete process gets executed by four phases namely splitting, mapping, shuffling, and reducing. The reduce phase combines the data based on common keys and performs reduce operation defined by the user. The parallelization implementation occurs with many mappers created for reading the data and it is not sequential. Because of this there is high throughput [5][21].The core advantage
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 816 of the MapReduce framework is its fault tolerance due to periodic reports from each node in the cluster are expected when work is processed. Figure. 4: Map reduce in HDFS 3. K-Means++ CLUSTERING ALGORITHM K-Means++ algorithm is the improved and robust version of most popular and widely used K-means algorithm in clustering. K-Means algorithm chooses and initializes the centroid randomly, forms the clusters for a given dataset [23]. The performance and accuracy of k-means depends completely on the goodness of the initialization where this algorithm observed and found sometimes giving less accurate clusters. To overcome this limitation, we can use improved clustering method K-Means++ which overcomes by initializationparttoimproveK-Meansalgorithm[13][24]. K- Means++ technique, takes an input k, which refers to the number of clusters that should be generated and n refers to set of objects [4]. This improved algorithms mainly aims to initialize the centroids for traditional k-means clustering algorithm [25].K-Means++ algorithm reduces the computational time and it can handle huge amount of data sets. K-Means++ clustering is one of the partitioning based clustering algorithm works as follows [25]. [1] Choose initial centroid X uniformly at random from initial data set. [2] For each data point X we need to compute D(X) which is the minimum squared distance between X and the nearest centroid that has been already defined. [3] Choose another data point as second centroidusing a weighted probability distribution which is proportional to D(X). [4] Repeat 2 and 3 until required K centroidshavebeen chosen. [5] Assign each record or instance point to a cluster centre which has least distance. [6] Replace cluster centroids by calculatingmeanvalue of all points in the cluster. In map-reduce pattern mentioned algorithm can be implemented as follows.  Map function: The HDFS stores input data as sequence file of <key, value> pairs and produces a set of <key, value> pairs as the output of the job[6]. The map function splits the data acrossall mappers, creates a map task for each split and assigns each map task to a Task Tracker[7].First Step is Map, Second Step is Shuffle and Third Step is Reduce[26][27].  Reduce function: Aftermapping, reducersareused for computing Step-2 of the mentioned algorithm. The Reducer’s job is to process the data that comes from the mapper [8]. After processing, the intermediate data can be put in hdfs or stored locally [9]. Afterwards new centers will be generated which can be used for further iterative statements [27]. Figure. 5: K-Means++ Map Reduce 4. Mini Batch K-means The mini batch k-means clustering algorithm [29] is the alternative and modified version of the k-means algorithm. The advantage of this algorithm uses mini-batchestoreduce the computation time and cost in large datasets not using all the dataset each iteration [33]. Moreover, it attempts to provide optimal result of the clustering. To accomplish this, the mini batch k-means takes mini-batches of a fixed size as an input randomly, which are subsets of the all dataset [30]. The Mini Batch K-Means framework with Map-Reduce [31][32]: 1. From the throughput of the input sequence, initial cutoff gets calculated using a sample of points. 2. Here, MapReduce job splits the input filesintochunks and further assign each split to a mapper to process thatconsists of an instance of the k-means algorithm this is called InputSplit.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 817 3. It will be giving a set of centroids c that is much larger than the final expected clusters k but, it should be small enough to fit into the memory. 4. Now, Each split which is the results of clustering gets pass to single reducer uses thestreamingBall k-meansfunctionto combine the clusters as required. Before invoking the Map routine, OutputFormat organizes the key-value pairs from Reducer for writing it on HDFS. Based on optimal distance-cutoff parametersarerequiredto measure the distance between the point and the centroids, Either the point is merged to any of the existing clusters or become a new centroids to form new cluster. To determine Optimal Clusters, the points which are forming the clusters with any of centroids become the value in key-value pair. 5. CONCLUSIONS AND FUTURE WORK Today, Big Data and Cloud Computing are the two mainstream technologies to managing and processing big data Big data makes you dealing with the massive scale of data whereas Cloud computing provides you infrastructure. Today Big Data generating the demand of efficient computing, well managed storage resources as well as best way to process the data [27]. Conventional methods are not found by the researchers and failing to cope up with such challenges because of which Cloud Computing and Hadoop MapReduce found efficient and gained popularity [10]. MapReduce is intelligent enough to tackle huge data sets by distributing processing across many nodes, and then reducing or combining the results of those nodes. In summary, Mini Batch K-means and K-Means++ algorithms discussed and analyzed in this research are robust and efficient for large set of data. K-means++ always tries to select centroids that are far away from the existing centroids, which gives significant improvement over others with the issue of outlier whereas Mini Batch K-means algorithm is not only found suitable and efficient for the processing of huge data, but also make sure the accurate by reduce the amount of computation required [28][31]. Italso found giving relatively stable precision, for the outlier detection of data. For scaling well and efficiently process large datasets this algorithm requires to be improved in a way so it could reduce outliers efficiently and to give better and more accurate results with reduced run time. REFERENCES [1]Rajashree Shettar, Bhimasen. V. Purohit, “A Review on Clustering Algorithms Applicable for Map Reduce”, International Conference on Computational Systems for Health & Sustainability, pp. 176-178, 2015. [2]Adem Tepe, GüRay Yilmaz,“ASurveyonCloudComputing Technology and Its Application to SatelliteGroundSystems”, International Conference on Recent Advances in Space Technologies (RAST), 2013. [3]K. Singh and R. Kaur, "Hadoop: Addressing challenges of Big Data," 2014 IEEE International Advance Computing Conference (IACC), Gurgaon, 2014, pp. 686-689. [4]Weihzong Zhao, Huifang Ma, Qing He, “Parallel K-Means clustering Based on MapReduce”, springer-verlag Berlin, Heidelberg, 2009. [5]Borthakur. D “The Hadoop Distributed File System: Architecture and design”, 2007. [6]Sangeeta Ahuja, M.Ester, H. P. Kriegel, J. Sander, X. Xu, “A Density based algorithm for discovering clusters in large spatial database with noise”, Second international conference on knowledge discovery and Data Mining,1996. [7]B. Dai and I. Lin, "Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition," 2012 IEEE Fifth International Conference on Cloud Computing,Honolulu,HI, 2012, pp. 59-66. [8]V. Gaede, O. G’unther,“Multidimensional accessmethods”, ACM comput. Surv., Vol. 30, No. 2, pp. 170- 231, 1998. [9]VaradMeru“Data clustering:UsingMapReduce”,software Developers Journal, 2013. [10]Das A.S, Datar M, Garg, and Rajaram S, “Google news personalization scalable online collaborative filtering”, pp. 271-280, 2007. [11]Aaqib Rashid and Amit Chaturvedi, "Cloud Computing Characteristics and Services", 2019. [12]Mahantesh N. Birje and Praveen S. Challagidad "Cloud computing review: concepts, technology, challenges and security “, 2017. [13]Sweekruth S Badiger and Sushmitha N,"A Review on K- means++ ClusteringAlgorithminCloudComputingwith Map Reduce”, 2019. [14]Dr. Richa Purohit,"Comparative Analysis of Few Cloud Service Providers Considering Their DistinctiveProperties”, 2017. [15]Vishal R. Pancholi and Dr. Bhadresh P. Patel,"A Study on Services Provided by Various Service Providers of Cloud Computing", 2017. [16]https://www.ibm.com/analytics/hadoop/mapreduce. [17]Dr. Anitha Patil,"Securing mapreduce programming. Paradigm in hadoop, cloud and big data eco-system", 2018. [18]Jeffrey Dean and Sanjay Ghemawat,"MapReduce: Simplified Data Processing on Large Clusters", 2004. [19]Palvinder,"Survey Paper on Cloud Computing”, 2014.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 818 [20]Nasrullah and Ms. Akanksha Bana, "Review paper on hadoop configuration and implementation in virtual cloud environment", 2018. [21]Santhosh voruganti,"Map Reducea ProgrammingModel for Cloud Computing Based On Hadoop Ecosystem", 2014. [22]Bisma Bashir, "An Approach of MapReduce Programming Model for Cloud Computing", 2017. [23]Md. Zakir Hossain, Md. Nasim Akhtar, R.B. Ahmad and Mostafijur Rahman,"A dynamic K-means clustering for data mining ", 2019. [24]Dhruv Sharma, Krishnaiya Thulasiraman and John N. Jiang, "A network science-based k-means++ clustering method for power systems network equivalence", 2019. [25]G. Yamini and Dr. B. Renuka Devi, "A New Hybrid Clustering Technique Based on Mini-batch K-means And K- means++ For Analysing Big Data", 2018. [26]N. Deshai, S. Venkataramana and Dr. G. P. Saradhi Varma3 "Research Paper on Big Data Hadoop Map Reduce Job Scheduling", 2018. [27]Ch. Shobha Rani and Dr. B. Rama"MapReduce with Hadoop for Simplified Analysis of Big Data", 2017. [28] Bo Xiao, Zhen Wang, Qi Liu, and Xiaodong Liu "SMK- means: An Improved Mini Batch K-means Algorithm Based on Mapreduce with Big Data", 2018. [29]Sculley D , "Web-scale k-means clustering", 2010. [30]Ali Feizollah, Nor Badrul Anuar, Rosli Salleh and Fairuz Amalina, "Comparative Study of K-means and Mini Batch K- means Clustering Algorithms in Android Malware Detection Using Network Traffic Analysis",2014. [31]Meghana M Chavan, Asawari Patil,Lata Dalvi And Ajinkya Patil, "Mini Batch K-Means Clustering On Large Dataset",2015. [32] Mr. Krishna Yadav and Mr. Jwalant Baria,"Mini-BatchK- Means Clustering Using Map-Reduce in Hadoop", 2014. [33]Nadeem Akthar, Mohd Vasim Ahamad and Shahbaaz Ahmad, "MapReduce Model ofImprovedK-MeansClustering Algorithm Using Hadoop MapReduce", 2016.