SlideShare a Scribd company logo
1

Abstract— In recent years, data is being generated from
different sources and storing it and processing them is a great
issue. This data which cannot be handled using conventional
system is termed as Bigdata. Bigdata is a term that describes the
huge volume of data both structured and unstructured data. As
unstructured data keeps on evolving, there is a demand for
processing pipeline which combines NoSQL and Bigdata
processing platform such as MapReduce. Cassandra is a type of
NoSQL database. The suitability of a given NoSQL depends on
the problem it must solve. In this paper, we are going to process
the Denmark’s traffic datasets using variable parameters and
store it in Cassandra datastore which will lend a hand to the
transportation department to monitor and control the traffic.
Index Terms— Bigdata, Cassandra, Cloud, MapReduce
I. INTRODUCTION
Bigdata [1], [18], [19] is data that exceeds the processing
capacity of conventional database system. The data is too big,
moves too fast, or does not fit the structures of your database
architectures. To gain value from this data, you must choose
an alternative way to process it. Bigdata has become viable as
cost-effective approaches have emerged to tame the volume,
velocity and variability [11] of massive data. Within this data
lie valuable patterns and information, previously hidden
because of the amount of work required to extract them.
To leading corporations, such as Google, this power has
been in reach for some time, but at fantastic cost. Today’s
commodity hardware, cloud architectures and open source
software bring bigdata processing into the reach of the less
well-resourced. Bigdata processing is eminently feasible for
even the small garage setups, who can cheaply rent server
time in the cloud. The value of bigdata to an organization falls
into two categories: analytical use and enabling new products.
Bigdata analytics can reveal insights hidden previously by
data too costly to process such as peer influence among
customers, revealed by analyzing shopper’s transaction and
social and geographical data. Being able to process every item
of data in reasonable time removes the troublesome need for
sampling and promotes an investigative approach to data, in
contrast to the somewhat static nature of running
predetermined reports. The emergence of bigdata into the
enterprise brings with it necessary counterpart agility. The
three V’s of volume, velocity and variety are commonly used
to characterize different aspects of bigdata.
Volume: The benefit gained from the ability to process
large amounts of information in the main attraction of bigdata
Analytics. Assuming that the volumes of data are larger than
those conventional relational database infrastructures can
cope with, processing options break down broadly into a
choice between massively parallel processing architectures
[2] – data warehouses or databases such as Green Plum – and
Apache Hadoop-based [3] solutions. Hadoop’s MapReduce
[4] involves distributing a dataset among multiple servers and
operating on data: the “map” stage. The partial results are then
recombined: the “reduce” stage.
Velocity: The importance of data’s velocity- the increasing
rate at which data flows into an organization- has followed a
similar pattern to that of volume. Industry terminology for
such fast-moving data tends to be either “streaming data” or
“complex event processing”. There are two main reasons to
consider streaming processing. The first is when the input
data are too fast to store in their entirely: in order to keep
storage requirements practical, some level of analysis [17]
must occur as the data streams in. The second reason to
consider streaming is where the application mandates
immediate response of the data. It’s not just about input data.
The velocity of a system’s outputs can matter too.
Variety: Rarely does data present itself in a form perfectly
ordered and ready for processing. A common theme in
bigdata systems is that the source data is diverse, and doesn’t
fall into neat relational structures. It could be text from social
networks, image data, a raw feed directly from a sensor
source. A common use of bigdata processing is to take
unstructured data and extract ordered meaning, for
consumption either by humans or as a structured input to an
application.
Apache Cassandra [5] is a highly scalable,
high-performance distributed database designed to handle
large amounts of both structured and unstructured data across
many commodity servers, providing high availability [6] with
no single point of failure. It is a type of NoSQL database. The
primary objective of a NoSQL database is to have simplicity
of design, horizontal scaling and fine control over availability.
In this paper, we present MapReduce streaming [10] over
Cassandra datasets. Apache Cassandra is an open source,
distributed and decentralized storage system, for managing
very large amounts of unstructured data spread across the
world. Initially we retrieve the input data from Cassandra
store. The dataset taken under evaluation is the Denmark’s
traffic dataset. We use Apache Cassandra in our analysis
combined with MapReduce solution for processing the
datasets. We introduce the performance implications of
MAPREDUCE STREAMING OVER
CASSANDRA DATASETS
K S Archana#1
and Dr. P Ezhumalai*2
#
krishganesh70@gmail.com, *
hod.cse@rmd.ac.in
#1
M.E, Department of Computer Science, R.M.D Engineering College, Chennai, India
*2
B.E, M.Tech, Ph.D, Head of the Department, Computer Science, R.M.D Engineering College, Chennai, India
ISBN:978-1534910799
www.iaetsd.in
Proceedings of ICAER-2016
©IAETSD 201675
2
processing data directly from the Cassandra servers is high
when compared to the usage of Apache Hadoop [7].
II. BACKGROUND
A. Cassandra
Cassandra is a non-relational and column oriented,
distributed database. It was originally developed by Facebook.
It is now an open source Apache project. Cassandra was
designed to fulfill the storage needs of the Inbox search
problem. Cassandra is designed to store large datasets over a
set of commodity machines by using a peer-to-peer cluster
structure to promote horizontal scalability. In the
column-oriented data model of Cassandra, a column is the
smallest component of data. Columns associated with a
certain key constitute a row. Each row can contain any
number of columns. A collection of row forms a column
family, which is similar to tables in relational databases.
Records in the column families are stored in sorted order by
row keys, in separate files. The keyspace congregates one or
more column families in the application, similar to a schema
in a relational database. Interesting aspects of the Cassandra
framework include independence from any additional file
systems like HDFS, scalability, replication support for fault
tolerance, balanced data partitioning, and MapReduce
support with a Hadoop plug-in [8].
B. Cassandra Data Model
A table in Cassandra is a distributed multi dimensional map
indexed by a key. The value is an object which is highly
structured. The row key in a table is a string with no size
restrictions, although typically 16 to 36 bytes long. Every
operation under a single row key is atomic per replica no
matter how many columns are being read or written into.
Columns are grouped together into sets called column
families very much similar to what happens in the Bigtable
system. Cassandra exposes two kinds of column families,
Simple and Super column families. Super column families can
be visualized as a column family within a column family.
Furthermore, applications can specify the sort order of
columns within a Super column or Simple column family. The
system allows column to be sorted either by time or by name.
time sorting of column is exploited by applications like Inbox
Search where the results are always displayed in time sorted
order. Any column within a column family is accessed using
the convention column_family: column and any column
within a column family that is of type super is accessed using
the convention column_family: super_column: column.
Typically applications use a dedicated Cassandra cluster and
manage them as part of their service. Although the system
supports the notion of multiple tables all deployments have
only one table in their schema.
C. System Architecture of Cassandra
The architecture of a storage system that needs to operate in
a production setting is complex. In addition to the actual data
persistence component, the system needs to have following
characteristics: scalable and robust solutions for load
balancing, membership and failure detection, failure recovery,
replica synchronization, overload handling, state transfer,
concurrency and job scheduling, request routing, system
monitoring [14] and alarming, and configuration management.
The core distributed technique used in Cassandra is
partitioning, replication, membership, failure handling and
scaling. All these modules work in synchrony to handle
read/write requests. Typically a read/write request for a key
gets routed to any node in the Cassandra cluster. The node
then determines the replicas for this particular key. For writes,
the system routes the requests to the replicas and waits for a
quorum of replicas to acknowledge the completion of the
writes. For reads, based on the consistency guarantees
required by the client, the system either routes the request to
the closest replica or routes the request to all replicas and
waits for a quorum of responses.
III. MAPREDUCE PROGRAMMING MODEL
Created at Google in response to the problem of creating
web search indexes, the MapReduce framework is the
powerhouse behind most of today’s Big Data processing [15],
[16], [20]. The important innovation of MapReduce is the
ability to take a query over a dataset, divide it, and run it in
parallel over multiple nodes. Distributing the computation
solves the issue of data too large to fit onto a single machine.
For that computation to take place, each server must have
access to the data. MapReduce is a programming model and
an associated implementation for processing and generating
large data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Programs
written in this functional style are automatically parallelized
and executed on a large cluster of commodity machines. The
run-time system takes care of the details of partitioning the
input data, scheduling the program’s execution across a set of
machines, handling machine failures, and managing the
required inter-machine communication. This allows the
programmers without any experience with parallel and
distributed systems to easily utilize the resources of a large
distributed system. MapReduce runs on a large cluster of
commodity machines and is highly scalable: a typical
MapReduce computation process many terabytes of data on
thousands of machines. Programmers find the system easy to
use: hundreds of MapReduce programs have been
implemented and upwards of one thousand MapReduce jobs
are executed on Google’s clusters every day.
Over the past five years, the authors and many others at
Google have implemented hundreds of special-purpose
computations that process large amounts of raw data, such as
crawled documents, web request logs, etc., to compute
various kinds of derived data, such as inverted indices,
various representations of the graph structure of web
documents, etc.
ISBN:978-1534910799
www.iaetsd.in
Proceedings of ICAER-2016
©IAETSD 201676
3
IV. GOOGLE CLOUD PLATFORM
Google Cloud [9] Platform is a cloud computing platform
by Google that offers hosting on the same supporting
infrastructure that Google uses internally for end-user
products like Google Search and You Tube. Cloud Platform
provides developer products to build a range of programs
from simple websites to complex applications. Google Cloud
Platform is a part of a suite of enterprise solution from Google
for Work and provides a set of modular cloud-based services
with a host of development tools. For example, hosting and
computing, cloud storage, data storage, translations APIs and
prediction APIs.
The Google Cloud Platform is composed of a family of
products, each including a web interface, a command-line tool
and a REST API.
Google App Engine is a Platform as a service for
sandboxed web applications. App Engine offers automatic
scaling with resources increased automatically to handle
server load.
Google Compute Engine is the Infrastructure as a Service
component of the Google Cloud Platform that enables users to
launch virtual machines on demand.
Google Cloud Storage is an online storage service for files.
Google Cloud Datastore is a fully managed, highly
available NoSQL data storage for non-relational data that
includes a REST API.
V. IMPLEMENTATION
A. Creating Virtual Machine Instances
An instance is a virtual machine hosted on Google’s
infrastructure. We can create an instance by using the Google
Cloud Platform Console or using the gcloud command-line
tool. Create Cloud Platform Console Project. Enable billing
for your project. Create an instance named Cassandra. Choose
windows Server 2012 R2. After creation of the instance
connect to your instance. Each instance belongs to a Google
Cloud Platform Console project, and a project can have one or
more instances.
When we create an instance in a project, we specify the
zone, operating system, and machine type of that instance.
When you delete an instance, it is removed from the project.
B. Google Compute Engine
Google Compute Engine delivers virtual machines running
in Google’s innovative data centers and worldwide fiber
network. Compute engine’s tooling and workflow support
enable scaling from single instances to global instances,
load balancing cloud computing.
Compute Engine’s VMs boot quickly, come with persistent
disk storage, and deliver consistent performance. Our virtual
servers are available in many configurations including
predefined sizes or the option to create Custom Machine
Types optimized for your specific needs. Flexible pricing and
automatic sustained use discounts make Compute Engine the
leader in price/performance. Compute Engine offers
predefined virtual machine configurations for every need
from micro to instances with 32 vCPUs or 208 GB of
memory, in standard, high memory, and high CPU.
C. Checking the Instance Status
When we first create an instance, we should check the
instance status to see if it is running before we can expect it to
respond to requests. It can take a couple seconds before our
instance is fully up and running after the initial request. We
can also check the status of an instance at any time after
instance creation. Instances can be in the following states:
Provisioning- Resources are being reserved for the
instance. The instance is not running yet.
Staging- Resources have been acquired and instance is
being prepared for launch.
Running- The instance is booting up or running.
Stopping- The instance is being stopped either due to a
failure, or the instance being shut down. This is a temporary
status and the instance will move to terminated.
Terminated- The instance was shut down or encountered a
failure, either through the API or from inside the guest.
D. Accessing the Instance
The creator of an instance has full root privileges on the
instance. On a windows instance, the creator can use the
Cloud Platform Console to generate a username and
password. Another use can also be added to access the
instance by the administrator.
There are some tools to manage the instances. To create and
manage instances, we can use a variety of tools, including the
Google Cloud Platform Console, the gcloud command-line,
and the REST API. To perform advanced configuration,
connect to the instance using Secure Shell (SSH) for Linux
instance (or) Remote Desktop Protocol (RDP) for Windows.
By default, each Compute Engine instance has a small root
persistent disk that contains the operating system. When
application running on our instance require more storage
space, we can add additional storage options to our Cassandra
instance.
E. Processing the Dataset
A project can have up to five networks, and each Compute
Engine instance belongs to one network. Instances in the same
network communicate with each other through a local area
network protocol. An instance uses the internet to
communicate with any machine, virtual or physical, outside of
its own network.
Now the Denmark’s traffic dataset is being loaded to the
Cassandra server. Java code is written to connect the cluster to
the input datasets. The dataset consists of vehicle id, average
measured time, average speed, external id, time stamp,
vehicle count. Based on the average speed of the vehicle at
particular location [12], [13] for a particular time stamp we
are going to analyze the traffic. This analyzed data would be
helpful to monitor and control the traffic in the city. Based on
the results we would frame the time limit on the traffic signals
to have a safe journey on the road. To process the datasets we
use MapReduce implementation. Final results are retrieved
from the Cassandra datastore. To reduce the workload we are
not using Hadoop Distributed File System.
ISBN:978-1534910799
www.iaetsd.in
Proceedings of ICAER-2016
©IAETSD 201677
4
V. PERFORMANCE RESULTS
We are interested in data locality experiments as it is a core
fact of the MapReduce model. Data locality is accomplished
by placing the input splits on the local disk of computing
nodes. However, when using a distributed database a record
might be sitting on any node in the cluster. In order to force
data locality, the processing tasks of a compute node should
be chosen based on the range of records stored in the local
machine database instance. Since we are using Google Cloud
Platform Console our storage gets expanded when needed
according to usage. Cassandra is optimized for writes and
superiority over reads is observed both in local and remote
cases.
VI. CONCLUSION
For applications that require processing of terabytes size of
datasets, mass storage system is required. Cassandra is one of
the NoSQL database to process huge datasets within short
span of time. We have opted MapReduce solution since a
large variety of problems are easily expressible and it is used
to scale large clusters of machines comprising of thousand
machines
REFERENCES
[1] Deepali Kishor Jadhav, “Bigdata: The New Challenge in Data
Mining,” International Journal Of Innovative Research in Computer
science and Technology (IJIRCST) ISSN:2347-5552, vol.1, Issue-2,
September, 2013
[2] Z.Fadika, M.R.Head, and M.Govindaraju, “Parallel and Distributed
Apporoach for Processing Large-Scale XML Datasets,” in 10th
IEEE/ACM International Conference on Grid Computing, October
2009, pp.5-7.
[3] Apache Hadoop, http://hadoop.apache.org.
[4] J.Dean and S.Ghemwat, “MapReduce: Simplified Data Processing on
Large Clusters,” Communications of the ACM, vol.51, no.1,
pp.107-113, 2008.
[5] A.Lakshman and P.Malik, “Cassandra: structured storage system on a
p2p network”. In Proceedings of the 28th ACM Symposium on
Principles of Distributed Computing, PODC’09, pages 5-5, New
York, NY, USA, 2009, ACM.
[6] Jim Gray and Pat Helland, “The Dangers of Replication and a
Solution”. In Proceedings of the 1996 ACM SIGMOD International
Conference on Management of Data, pages 173-182, 1996.
[7] K.Shvachko, H.Kuang, S.Radia, and R.Chansler, “The Hadoop
Distributed File System”. In Mass Storage Systems and Technologies
(MSST), 2010 IEEE 26th Symposium on, pages 1-10, May 2010.
[8] E.Dede, B.Sendir, P.Kuzlu, J.Hartog, and M.Govindaraju, “An
Evaluation of Cassandra for Hadoop”. In Proceedings of the 2013
IEEE Sixth International Conference on Cloud Computing,
CLOUD’13, pages 494-501, Washington, DC, USA, 2013, IEEE
Computer Society.
[9] Omar Arif Abdul-Rahman, Kento Aida, “Towards Understanding the
Usage Behaviour of Google Cloud Users: The Mice and Elephants
Phenomenon”. In Proceedings of the 2014 IEEE Sixth International
Conference on Cloud Computing Technology and Science
(CloudCom), pages 272-277, Dec 2014, Singapore.
[10] E.Dede, Z.Fadika, J.Hartog, M.Govindaraju, L.Ramakrishnan,
D.Gunter, and R.Cnon, “ Marrisa: MapReduce implementation for
streaming science applications”. In E-Science (e-science), 2012 IEEE
8th International Conference on, pages 1-8, 2012.
[11] P.C.Zikopoulos, C.Eaton, D.deRoos, T.Deutsch, and G.Lapis,
“Understanding Big Data – Analytics for enterprise class Hadoop and
streaming data,” McGraw-Hill, 2012.
[12] R.Abbas, “The Social Implications of Location–Based Services: An
Observation Study of Users,” J.Location-Based Services, vol. 5,
nos.3-4, 2011, pp. 156-181.
[13] K.Michael and M.G.Michael, “The Social and Behavioural
Implications of Location-Based Services,” vol. 5, nos. 3-4, 2011, pp.
121-137.
[14] D.Bollier and C.M.Frestone, “The Promise and Peril of Big Data,”
Aspen Institute, Communications and Society Program, Washington,
DC, USA, 2010.
[15] E.Birney, “The Making of Encode: Lessons for Big Data Projects”,
Nature, vol. 489, no. 7414, 2012, pp. 49-51.
[16] C.Bizer et al., “The Meaning Use of Big Data: Four Perspectives- Four
Challenges,” ACM SIGMOD Record, vol. 40, no. 4, 2012, pp. 56-60.
[17] J.Cohen et al., “Mad Skills: New Analysis Practices for Big Data’”
Proc. VLDB Endowment, vol. 2, no. 2, 2009, pp. 1481-92.
[18] E.Dumbill, “Making Sense of Big Data,” Big Data, vol. 1, no.1, 2013,
pp. 1-2.
[19] J.Manyika et al., “Big Data: The Next Frontier for Innovation,
Competition, and Productivity,” McKinsey Global Institute, Tech.
Rep., 2011.
[20] D.Boyd and K.Crawford, “Critical Questions Big Data: Provocations
for a Cultural, Technological, and Scholarly Phenomenon,”
Information, Communication and Society, vol. 15, no. 5, 2012, pp.
662-79.
ISBN:978-1534910799
www.iaetsd.in
Proceedings of ICAER-2016
©IAETSD 201678

More Related Content

What's hot

Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
peertechzpublication
 
Hadoop
HadoopHadoop
Hadoop
Ankit Prasad
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
Tushar Dalvi
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
StevenChike
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
Shantanu Deshpande
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
StevenChike
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
ijdms
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 

What's hot (18)

paper
paperpaper
paper
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 

Viewers also liked

Các hàm thông dụng có sẵn trong php
Các hàm thông dụng có sẵn trong phpCác hàm thông dụng có sẵn trong php
Các hàm thông dụng có sẵn trong php
Son Nguyen
 
Jubakitの紹介
Jubakitの紹介Jubakitの紹介
Jubakitの紹介
JubatusOfficial
 
тот самый длинный день в году
тот самый длинный день в годутот самый длинный день в году
тот самый длинный день в году
Тотемская Библиотека
 
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
Fatoumata Chérif
 
Impactos de las tics en la educación
Impactos de las tics en la educaciónImpactos de las tics en la educación
Impactos de las tics en la educación
Diego bejarano
 
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
ΡΟΥΛΑ ΚΟΚΟΡΕΑ
 
El enigma de gaspar
El enigma de gasparEl enigma de gaspar
El enigma de gaspar
sacra_bachoco
 
Versiones de los sistemas operativos
Versiones de los sistemas operativosVersiones de los sistemas operativos
Versiones de los sistemas operativos
Diego bejarano
 
Počítačové sítě I, lekce 6: Techniky přenosu dat
Počítačové sítě I, lekce 6: Techniky přenosu datPočítačové sítě I, lekce 6: Techniky přenosu dat
Počítačové sítě I, lekce 6: Techniky přenosu dat
Jiří Peterka
 
13 función lógica si.error
13 función lógica si.error13 función lógica si.error
13 función lógica si.error
joselyn coello
 
errores de los datos
 errores de los datos  errores de los datos
errores de los datos
Diego bejarano
 
Teoria TPACK
Teoria TPACKTeoria TPACK
Teoria TPACK
Lucelia Braga
 

Viewers also liked (13)

Các hàm thông dụng có sẵn trong php
Các hàm thông dụng có sẵn trong phpCác hàm thông dụng có sẵn trong php
Các hàm thông dụng có sẵn trong php
 
Jubakitの紹介
Jubakitの紹介Jubakitの紹介
Jubakitの紹介
 
тот самый длинный день в году
тот самый длинный день в годутот самый длинный день в году
тот самый длинный день в году
 
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
Revue de presse cérémonie de remise du rapport des consultations -29 juin 2016
 
Impactos de las tics en la educación
Impactos de las tics en la educaciónImpactos de las tics en la educación
Impactos de las tics en la educación
 
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
ΠΡΟΕΛΕΥΣΗ ΚΑΙ ΕΞΕΛΙΞΗ ΤΟΥ ΟΝΟΜΑΤΟΣ ΤΗΣ ΠΟΛΗΣ ΤΟΥ ΑΓΡΙΝΙΟΥ
 
El enigma de gaspar
El enigma de gasparEl enigma de gaspar
El enigma de gaspar
 
Versiones de los sistemas operativos
Versiones de los sistemas operativosVersiones de los sistemas operativos
Versiones de los sistemas operativos
 
Počítačové sítě I, lekce 6: Techniky přenosu dat
Počítačové sítě I, lekce 6: Techniky přenosu datPočítačové sítě I, lekce 6: Techniky přenosu dat
Počítačové sítě I, lekce 6: Techniky přenosu dat
 
Iku
IkuIku
Iku
 
13 función lógica si.error
13 función lógica si.error13 función lógica si.error
13 función lógica si.error
 
errores de los datos
 errores de los datos  errores de los datos
errores de los datos
 
Teoria TPACK
Teoria TPACKTeoria TPACK
Teoria TPACK
 

Similar to Iaetsd mapreduce streaming over cassandra datasets

Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
Graisy Biswal
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
Gianvito Siciliano
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
B1803031217
B1803031217B1803031217
B1803031217
IOSR Journals
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
Information Security Awareness Group
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
huda2018
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
Stefan Prutianu
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
Stefan Ceriu
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
ijdms
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
Alireza Kamrani
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
snehal parikh
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
Rahul Chaturvedi
 

Similar to Iaetsd mapreduce streaming over cassandra datasets (20)

Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
B1803031217
B1803031217B1803031217
B1803031217
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
 

More from Iaetsd Iaetsd

iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmission
Iaetsd Iaetsd
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdl
Iaetsd Iaetsd
 
iaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarmiaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarm
Iaetsd Iaetsd
 
iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...
Iaetsd Iaetsd
 
iaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seatiaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seat
Iaetsd Iaetsd
 
iaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan applicationiaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan application
Iaetsd Iaetsd
 
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBSREVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
Iaetsd Iaetsd
 
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
Iaetsd Iaetsd
 
Fabrication of dual power bike
Fabrication of dual power bikeFabrication of dual power bike
Fabrication of dual power bike
Iaetsd Iaetsd
 
Blue brain technology
Blue brain technologyBlue brain technology
Blue brain technology
Iaetsd Iaetsd
 
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
Iaetsd Iaetsd
 
iirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic birdiirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic bird
Iaetsd Iaetsd
 
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growthiirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
Iaetsd Iaetsd
 
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithmiirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
Iaetsd Iaetsd
 
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
Iaetsd Iaetsd
 
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
Iaetsd Iaetsd
 
iaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocoliaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocol
Iaetsd Iaetsd
 
iaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databasesiaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databases
Iaetsd Iaetsd
 
iaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineriesiaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineries
Iaetsd Iaetsd
 
iaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using paraboliciaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using parabolic
Iaetsd Iaetsd
 

More from Iaetsd Iaetsd (20)

iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmission
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdl
 
iaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarmiaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarm
 
iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...
 
iaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seatiaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seat
 
iaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan applicationiaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan application
 
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBSREVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
 
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
 
Fabrication of dual power bike
Fabrication of dual power bikeFabrication of dual power bike
Fabrication of dual power bike
 
Blue brain technology
Blue brain technologyBlue brain technology
Blue brain technology
 
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
 
iirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic birdiirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic bird
 
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growthiirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
 
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithmiirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
 
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
 
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
 
iaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocoliaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocol
 
iaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databasesiaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databases
 
iaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineriesiaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineries
 
iaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using paraboliciaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using parabolic
 

Recently uploaded

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 

Recently uploaded (20)

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 

Iaetsd mapreduce streaming over cassandra datasets

  • 1. 1  Abstract— In recent years, data is being generated from different sources and storing it and processing them is a great issue. This data which cannot be handled using conventional system is termed as Bigdata. Bigdata is a term that describes the huge volume of data both structured and unstructured data. As unstructured data keeps on evolving, there is a demand for processing pipeline which combines NoSQL and Bigdata processing platform such as MapReduce. Cassandra is a type of NoSQL database. The suitability of a given NoSQL depends on the problem it must solve. In this paper, we are going to process the Denmark’s traffic datasets using variable parameters and store it in Cassandra datastore which will lend a hand to the transportation department to monitor and control the traffic. Index Terms— Bigdata, Cassandra, Cloud, MapReduce I. INTRODUCTION Bigdata [1], [18], [19] is data that exceeds the processing capacity of conventional database system. The data is too big, moves too fast, or does not fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it. Bigdata has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability [11] of massive data. Within this data lie valuable patterns and information, previously hidden because of the amount of work required to extract them. To leading corporations, such as Google, this power has been in reach for some time, but at fantastic cost. Today’s commodity hardware, cloud architectures and open source software bring bigdata processing into the reach of the less well-resourced. Bigdata processing is eminently feasible for even the small garage setups, who can cheaply rent server time in the cloud. The value of bigdata to an organization falls into two categories: analytical use and enabling new products. Bigdata analytics can reveal insights hidden previously by data too costly to process such as peer influence among customers, revealed by analyzing shopper’s transaction and social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports. The emergence of bigdata into the enterprise brings with it necessary counterpart agility. The three V’s of volume, velocity and variety are commonly used to characterize different aspects of bigdata. Volume: The benefit gained from the ability to process large amounts of information in the main attraction of bigdata Analytics. Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice between massively parallel processing architectures [2] – data warehouses or databases such as Green Plum – and Apache Hadoop-based [3] solutions. Hadoop’s MapReduce [4] involves distributing a dataset among multiple servers and operating on data: the “map” stage. The partial results are then recombined: the “reduce” stage. Velocity: The importance of data’s velocity- the increasing rate at which data flows into an organization- has followed a similar pattern to that of volume. Industry terminology for such fast-moving data tends to be either “streaming data” or “complex event processing”. There are two main reasons to consider streaming processing. The first is when the input data are too fast to store in their entirely: in order to keep storage requirements practical, some level of analysis [17] must occur as the data streams in. The second reason to consider streaming is where the application mandates immediate response of the data. It’s not just about input data. The velocity of a system’s outputs can matter too. Variety: Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in bigdata systems is that the source data is diverse, and doesn’t fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. A common use of bigdata processing is to take unstructured data and extract ordered meaning, for consumption either by humans or as a structured input to an application. Apache Cassandra [5] is a highly scalable, high-performance distributed database designed to handle large amounts of both structured and unstructured data across many commodity servers, providing high availability [6] with no single point of failure. It is a type of NoSQL database. The primary objective of a NoSQL database is to have simplicity of design, horizontal scaling and fine control over availability. In this paper, we present MapReduce streaming [10] over Cassandra datasets. Apache Cassandra is an open source, distributed and decentralized storage system, for managing very large amounts of unstructured data spread across the world. Initially we retrieve the input data from Cassandra store. The dataset taken under evaluation is the Denmark’s traffic dataset. We use Apache Cassandra in our analysis combined with MapReduce solution for processing the datasets. We introduce the performance implications of MAPREDUCE STREAMING OVER CASSANDRA DATASETS K S Archana#1 and Dr. P Ezhumalai*2 # krishganesh70@gmail.com, * hod.cse@rmd.ac.in #1 M.E, Department of Computer Science, R.M.D Engineering College, Chennai, India *2 B.E, M.Tech, Ph.D, Head of the Department, Computer Science, R.M.D Engineering College, Chennai, India ISBN:978-1534910799 www.iaetsd.in Proceedings of ICAER-2016 ©IAETSD 201675
  • 2. 2 processing data directly from the Cassandra servers is high when compared to the usage of Apache Hadoop [7]. II. BACKGROUND A. Cassandra Cassandra is a non-relational and column oriented, distributed database. It was originally developed by Facebook. It is now an open source Apache project. Cassandra was designed to fulfill the storage needs of the Inbox search problem. Cassandra is designed to store large datasets over a set of commodity machines by using a peer-to-peer cluster structure to promote horizontal scalability. In the column-oriented data model of Cassandra, a column is the smallest component of data. Columns associated with a certain key constitute a row. Each row can contain any number of columns. A collection of row forms a column family, which is similar to tables in relational databases. Records in the column families are stored in sorted order by row keys, in separate files. The keyspace congregates one or more column families in the application, similar to a schema in a relational database. Interesting aspects of the Cassandra framework include independence from any additional file systems like HDFS, scalability, replication support for fault tolerance, balanced data partitioning, and MapReduce support with a Hadoop plug-in [8]. B. Cassandra Data Model A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is highly structured. The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation under a single row key is atomic per replica no matter how many columns are being read or written into. Columns are grouped together into sets called column families very much similar to what happens in the Bigtable system. Cassandra exposes two kinds of column families, Simple and Super column families. Super column families can be visualized as a column family within a column family. Furthermore, applications can specify the sort order of columns within a Super column or Simple column family. The system allows column to be sorted either by time or by name. time sorting of column is exploited by applications like Inbox Search where the results are always displayed in time sorted order. Any column within a column family is accessed using the convention column_family: column and any column within a column family that is of type super is accessed using the convention column_family: super_column: column. Typically applications use a dedicated Cassandra cluster and manage them as part of their service. Although the system supports the notion of multiple tables all deployments have only one table in their schema. C. System Architecture of Cassandra The architecture of a storage system that needs to operate in a production setting is complex. In addition to the actual data persistence component, the system needs to have following characteristics: scalable and robust solutions for load balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request routing, system monitoring [14] and alarming, and configuration management. The core distributed technique used in Cassandra is partitioning, replication, membership, failure handling and scaling. All these modules work in synchrony to handle read/write requests. Typically a read/write request for a key gets routed to any node in the Cassandra cluster. The node then determines the replicas for this particular key. For writes, the system routes the requests to the replicas and waits for a quorum of replicas to acknowledge the completion of the writes. For reads, based on the consistency guarantees required by the client, the system either routes the request to the closest replica or routes the request to all replicas and waits for a quorum of responses. III. MAPREDUCE PROGRAMMING MODEL Created at Google in response to the problem of creating web search indexes, the MapReduce framework is the powerhouse behind most of today’s Big Data processing [15], [16], [20]. The important innovation of MapReduce is the ability to take a query over a dataset, divide it, and run it in parallel over multiple nodes. Distributing the computation solves the issue of data too large to fit onto a single machine. For that computation to take place, each server must have access to the data. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows the programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation process many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day. Over the past five years, the authors and many others at Google have implemented hundreds of special-purpose computations that process large amounts of raw data, such as crawled documents, web request logs, etc., to compute various kinds of derived data, such as inverted indices, various representations of the graph structure of web documents, etc. ISBN:978-1534910799 www.iaetsd.in Proceedings of ICAER-2016 ©IAETSD 201676
  • 3. 3 IV. GOOGLE CLOUD PLATFORM Google Cloud [9] Platform is a cloud computing platform by Google that offers hosting on the same supporting infrastructure that Google uses internally for end-user products like Google Search and You Tube. Cloud Platform provides developer products to build a range of programs from simple websites to complex applications. Google Cloud Platform is a part of a suite of enterprise solution from Google for Work and provides a set of modular cloud-based services with a host of development tools. For example, hosting and computing, cloud storage, data storage, translations APIs and prediction APIs. The Google Cloud Platform is composed of a family of products, each including a web interface, a command-line tool and a REST API. Google App Engine is a Platform as a service for sandboxed web applications. App Engine offers automatic scaling with resources increased automatically to handle server load. Google Compute Engine is the Infrastructure as a Service component of the Google Cloud Platform that enables users to launch virtual machines on demand. Google Cloud Storage is an online storage service for files. Google Cloud Datastore is a fully managed, highly available NoSQL data storage for non-relational data that includes a REST API. V. IMPLEMENTATION A. Creating Virtual Machine Instances An instance is a virtual machine hosted on Google’s infrastructure. We can create an instance by using the Google Cloud Platform Console or using the gcloud command-line tool. Create Cloud Platform Console Project. Enable billing for your project. Create an instance named Cassandra. Choose windows Server 2012 R2. After creation of the instance connect to your instance. Each instance belongs to a Google Cloud Platform Console project, and a project can have one or more instances. When we create an instance in a project, we specify the zone, operating system, and machine type of that instance. When you delete an instance, it is removed from the project. B. Google Compute Engine Google Compute Engine delivers virtual machines running in Google’s innovative data centers and worldwide fiber network. Compute engine’s tooling and workflow support enable scaling from single instances to global instances, load balancing cloud computing. Compute Engine’s VMs boot quickly, come with persistent disk storage, and deliver consistent performance. Our virtual servers are available in many configurations including predefined sizes or the option to create Custom Machine Types optimized for your specific needs. Flexible pricing and automatic sustained use discounts make Compute Engine the leader in price/performance. Compute Engine offers predefined virtual machine configurations for every need from micro to instances with 32 vCPUs or 208 GB of memory, in standard, high memory, and high CPU. C. Checking the Instance Status When we first create an instance, we should check the instance status to see if it is running before we can expect it to respond to requests. It can take a couple seconds before our instance is fully up and running after the initial request. We can also check the status of an instance at any time after instance creation. Instances can be in the following states: Provisioning- Resources are being reserved for the instance. The instance is not running yet. Staging- Resources have been acquired and instance is being prepared for launch. Running- The instance is booting up or running. Stopping- The instance is being stopped either due to a failure, or the instance being shut down. This is a temporary status and the instance will move to terminated. Terminated- The instance was shut down or encountered a failure, either through the API or from inside the guest. D. Accessing the Instance The creator of an instance has full root privileges on the instance. On a windows instance, the creator can use the Cloud Platform Console to generate a username and password. Another use can also be added to access the instance by the administrator. There are some tools to manage the instances. To create and manage instances, we can use a variety of tools, including the Google Cloud Platform Console, the gcloud command-line, and the REST API. To perform advanced configuration, connect to the instance using Secure Shell (SSH) for Linux instance (or) Remote Desktop Protocol (RDP) for Windows. By default, each Compute Engine instance has a small root persistent disk that contains the operating system. When application running on our instance require more storage space, we can add additional storage options to our Cassandra instance. E. Processing the Dataset A project can have up to five networks, and each Compute Engine instance belongs to one network. Instances in the same network communicate with each other through a local area network protocol. An instance uses the internet to communicate with any machine, virtual or physical, outside of its own network. Now the Denmark’s traffic dataset is being loaded to the Cassandra server. Java code is written to connect the cluster to the input datasets. The dataset consists of vehicle id, average measured time, average speed, external id, time stamp, vehicle count. Based on the average speed of the vehicle at particular location [12], [13] for a particular time stamp we are going to analyze the traffic. This analyzed data would be helpful to monitor and control the traffic in the city. Based on the results we would frame the time limit on the traffic signals to have a safe journey on the road. To process the datasets we use MapReduce implementation. Final results are retrieved from the Cassandra datastore. To reduce the workload we are not using Hadoop Distributed File System. ISBN:978-1534910799 www.iaetsd.in Proceedings of ICAER-2016 ©IAETSD 201677
  • 4. 4 V. PERFORMANCE RESULTS We are interested in data locality experiments as it is a core fact of the MapReduce model. Data locality is accomplished by placing the input splits on the local disk of computing nodes. However, when using a distributed database a record might be sitting on any node in the cluster. In order to force data locality, the processing tasks of a compute node should be chosen based on the range of records stored in the local machine database instance. Since we are using Google Cloud Platform Console our storage gets expanded when needed according to usage. Cassandra is optimized for writes and superiority over reads is observed both in local and remote cases. VI. CONCLUSION For applications that require processing of terabytes size of datasets, mass storage system is required. Cassandra is one of the NoSQL database to process huge datasets within short span of time. We have opted MapReduce solution since a large variety of problems are easily expressible and it is used to scale large clusters of machines comprising of thousand machines REFERENCES [1] Deepali Kishor Jadhav, “Bigdata: The New Challenge in Data Mining,” International Journal Of Innovative Research in Computer science and Technology (IJIRCST) ISSN:2347-5552, vol.1, Issue-2, September, 2013 [2] Z.Fadika, M.R.Head, and M.Govindaraju, “Parallel and Distributed Apporoach for Processing Large-Scale XML Datasets,” in 10th IEEE/ACM International Conference on Grid Computing, October 2009, pp.5-7. [3] Apache Hadoop, http://hadoop.apache.org. [4] J.Dean and S.Ghemwat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol.51, no.1, pp.107-113, 2008. [5] A.Lakshman and P.Malik, “Cassandra: structured storage system on a p2p network”. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, PODC’09, pages 5-5, New York, NY, USA, 2009, ACM. [6] Jim Gray and Pat Helland, “The Dangers of Replication and a Solution”. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 173-182, 1996. [7] K.Shvachko, H.Kuang, S.Radia, and R.Chansler, “The Hadoop Distributed File System”. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1-10, May 2010. [8] E.Dede, B.Sendir, P.Kuzlu, J.Hartog, and M.Govindaraju, “An Evaluation of Cassandra for Hadoop”. In Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, CLOUD’13, pages 494-501, Washington, DC, USA, 2013, IEEE Computer Society. [9] Omar Arif Abdul-Rahman, Kento Aida, “Towards Understanding the Usage Behaviour of Google Cloud Users: The Mice and Elephants Phenomenon”. In Proceedings of the 2014 IEEE Sixth International Conference on Cloud Computing Technology and Science (CloudCom), pages 272-277, Dec 2014, Singapore. [10] E.Dede, Z.Fadika, J.Hartog, M.Govindaraju, L.Ramakrishnan, D.Gunter, and R.Cnon, “ Marrisa: MapReduce implementation for streaming science applications”. In E-Science (e-science), 2012 IEEE 8th International Conference on, pages 1-8, 2012. [11] P.C.Zikopoulos, C.Eaton, D.deRoos, T.Deutsch, and G.Lapis, “Understanding Big Data – Analytics for enterprise class Hadoop and streaming data,” McGraw-Hill, 2012. [12] R.Abbas, “The Social Implications of Location–Based Services: An Observation Study of Users,” J.Location-Based Services, vol. 5, nos.3-4, 2011, pp. 156-181. [13] K.Michael and M.G.Michael, “The Social and Behavioural Implications of Location-Based Services,” vol. 5, nos. 3-4, 2011, pp. 121-137. [14] D.Bollier and C.M.Frestone, “The Promise and Peril of Big Data,” Aspen Institute, Communications and Society Program, Washington, DC, USA, 2010. [15] E.Birney, “The Making of Encode: Lessons for Big Data Projects”, Nature, vol. 489, no. 7414, 2012, pp. 49-51. [16] C.Bizer et al., “The Meaning Use of Big Data: Four Perspectives- Four Challenges,” ACM SIGMOD Record, vol. 40, no. 4, 2012, pp. 56-60. [17] J.Cohen et al., “Mad Skills: New Analysis Practices for Big Data’” Proc. VLDB Endowment, vol. 2, no. 2, 2009, pp. 1481-92. [18] E.Dumbill, “Making Sense of Big Data,” Big Data, vol. 1, no.1, 2013, pp. 1-2. [19] J.Manyika et al., “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global Institute, Tech. Rep., 2011. [20] D.Boyd and K.Crawford, “Critical Questions Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon,” Information, Communication and Society, vol. 15, no. 5, 2012, pp. 662-79. ISBN:978-1534910799 www.iaetsd.in Proceedings of ICAER-2016 ©IAETSD 201678