SlideShare a Scribd company logo
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
Vol. 4, No. 1, March 2016, pp. 74~80
ISSN: 2089-3272, DOI: 10.11591/ijeei.v4i1.195  74
Received October 9, 2015; Revised January 6, 2016; Accepted January 20, 2016
Big Data-Survey
PSG Aruna Sri*
1
, Anusha M
2
1
Department of Electronics and Computer Engineering, K L University,
2
Research Scholar, Dept of CSE, K L UNIVERSITY
Greenfields, Vaddeswaram, Guntur District, Andhra Pradesh 522502
*Corresponding author, e-mail: arunasri_2012@kluniversity.in
1
, anushaaa9@kluniversity.in
2
Abstract
Big data is the term for any gathering of information sets, so expensive and complex, that it gets
to be hard to process for utilizing customary information handling applications. The difficulties incorporate
investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection
infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data
sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most
social database administration frameworks and desktop measurements and perception bundles, needing
rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In
this paper there was an observation on Hadoop architecture, different tools used for big data and its
security issues.
Keywords: Big data, Hadoop, Software tools, Map Reduce
1. Introduction
We make 2.5 quintillion bytes of information [1] - so much that 90% of the data on the
planet today has been made in the last two years alone. This data originates from all around:
sensors used to accumulate atmosphere data, posts to social media sites, digital pictures and
videos, and cell phone GPS signal. This enormous amount of the data is known as "Big data".
Big data is a catchphrase, or motto, utilizes to describe a massive volume of both structured and
unstructured data that is so huge that it's complicated to process using conventional database
and software procedures. In most project circumstances the information is too big or it shifts too
quickly or it surpasses existing processing ability.
Big data has the potential to help organizations to improve operations and make faster
by taking more intelligent decisions. Now-a-days, Big Data is the term which finds to be normal
in IT businesses. As there was enormous information in the industry although there is nothing
before big data which comes into imagine. Big data is really an advancing term that illustrates
any huge amount of organized, semi organized information that can possibly be extracting for
data. Although big data doesn't refer to any particular quantity, so this term is often utilized
when talking about petabytes and exabytes of data Big data is a comprehensive term for
expansive accumulation of the data sets so this large and complex that it gets to be
troublesome to work with conventional data processing applications. At the point when
managing bigger datasets, organizations face challenges in creating and managing big data. No
standard tools and procedures for searching and analyzing large data sets in business analytics
A case of Big data may be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data
comprising of billions to trillions of records of a huge number of individuals all from distinctive
sources for example agreements, web, moveable data. The data is commonly approximately
organized data that is frequently fragmented also, inaccessible. The problems faced by big data
are analyzing, capturing, searching, sharing, storing, transferring, visualization and privacy
abuse. Larger data sets is needed in order to prevent diseases, combat crime, spot business
trends and so on. Because of large information sets in these area researchers identify
limitations frequently like meteorology, genomics, connectomics, complex physics simulations,
biological, ecological research, and funding and trade information. Data sets increased their size
due to collecting data from sensing portable devices, aerial sensory technology, software logs,
cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
In March 2012 [2], the Obama administration announced the big data research and
development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
75
and HP, have spent more than $15 billion on buying data management and analytics software.
Big data defined as far back as 2001, industry expert Doug Laney (right now with Gartner)
expressed the standard meaning of big data as the 5 Vs of big data: volume, velocity, variety,
veracity and value, as shown Figure 1.
Figure 1. Five (5) V’s factor of big data
Volume: At present big scale of systems are flooded with constantly increasing information,
simply growing terabytes or even petabytes of data.
Velocity: Information is flowing at unique speed and should be deal with a sensible way. In real
time for many organizations it is difficult to deal with RFID tag, sensors and smart metering data.
Variety: Organized and unorganized information are producing a variety of data types by
making it feasible to search novel approaches, while analyzing these information collectively,
prediction might be attained as data flow into the organization.
Veracity: Identifying and verifying inconsistent information is significant, to accomplish faithful
study. Creating faith in big data is a big challenge to manage even more variety of data is
available.
Value: It is to be derived from big data. There is no reason for building the capacity to store and
manage, if unable to get the value from data.
One of the capable remarks on the advances of the arrangement with the Big Data is Hadoop.
2. Hadoop
Hadoop was made by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was working
at Yahoo! at the time, named it after his kid's toy elephant.
Hadoop system is shown Figure 2 and consists [3]:
Reliable: this software can handle both hardware and software failures.
Scalable: Designed for gigantic size of processors, memory, and local appended capacity
Distributed: Map reduce recommends parallel programming model and provides concept of
replication
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
76
Figure 2. Hadoop system [1]
Hadoop is open-source programming that allows loyal, flexible, expressed figuring on
groups of reserved servers. That utilization the Map-Reduce framework presented by Google by
utilizing the idea of map and reduce functions that remarkable utilized in Functional
Programming. In spite of the fact that the Hadoop structure is composed in Java, it permits
designers to send custom-composed projects coded in Java or some other dialect to process
information in a parallel manner over hundreds or a great many thing servers. It is streamlined
for touching read requests (streaming reads), where transforming incorporates of checking all
the information. Contingent upon the intricacy of the procedure and the volume of information,
reaction time can change from minutes to hours. While Hadoop can forms information quick, so
its key focal point is its monstrous adaptability. Hadoop is right now being utilized for file web
seeks, email spam location, proposal motors, expectation in budgetary administrations, genome
control in life sciences, furthermore, for investigation of unstructured information, for example,
log, content, and click stream. While a considerable lot of these applications could indeed be
actualized in a social database (RDBMS) figure 2, the primary center of the Hadoop system is
practically not the same as a RDBMS. The accompanying examines some of these distinctions
Hadoop is especially helpful when:
 For handling of Complex data.
 Need of conversion of unstructured information to structured information
 Usage of SQL queries
 Recursive calculations are very large
 Complex geo-spatial investigation or genome sequencing
 Machine learning
 Data sets are so substantial it couldn't be possible fit into database RAM, plates, or require
an excess of centers (10's of TB up to PB)
 Data worth does not defend cost of steady ongoing accessibility, for example, files or
exceptional interest information, which can be moved to Hadoop and stay accessible at
lower expense
 Results are not required continuously
 Fault resistance is basic
 Significant custom coding would be obliged to handle employment booking
Hadoop was motivated by Google's Map Reduce, a product structure in which an
application is separated into various little parts. Any of these parts (additionally called sections
or squares) can be run on any hub in the group. Doug Cutting, Hadoop's maker, named the
system after his youngster's full toy elephant. The current Apache Hadoop biological system
comprises of the Hadoop bit, Map Reduce, the Hadoop circulated record framework (HDFS)
what's more, various related activities, for example, Apache Hive, HBase and Zookeeper. The
Hadoop structure is utilized by significant players including Google, Yahoo and IBM, generally
for applications including web search tools and promoting. The favored working frameworks are
Windows and Linux be that as it may Hadoop can likewise work with BSD and OS X.
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
77
A distributed file system framework is a customer/server-based application that permits
customers to get to and process information put away on the server as though it were all alone
PC. At the point when a client gets to a document on the server, the server sends the client a
duplicate of the document, which is stored on the client's PC while the information is being
prepared and is then come back to the server. Preferably, a convey record framework arranges
document and registry administrations of individual servers into a worldwide catalog in such a
way that remote information access is not area particular however is indistinguishable from any
customer. All records are available to all clients of the worldwide record framework and
association is progressive what more, registry based is. Since more than one customer may get
to the same information all the while, the server must have a system in spot, (for example,
keeping up data about the times of access) to arrange overhauls so that the customer
dependably gets the most current adaptation of information and that information clashes don't
emerge. Dispersed record frameworks ordinarily utilize record or database replication
(conveying duplicates of information on different servers) to ensure against information access
failures [4]. Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's
Distributed File Framework, and IBM/Transarc's DFS are a few samples of distributed file
systems framework.
3. HDFS
Hadoop frame work consists the Hadoop Distributed File System (HDFS) [4]. HDFS is
planned and improved to store information more than a lot of ease equipment in an
appropriated manner, as shown in Figure 3.
Figure 3. HDFS architecture
The structure of HDFS is master/slave. HDFS cluster consists of one Name Node, a
master server that manages the classification system namespace and regulates access to files
by clients. In addition, there square measure variety of information nodes, sometimes one per
node within the cluster, that manage storage hooked up to the nodes that they run on. HDFS
exposes a classification system namespace and permits user knowledge to be hold on in files.
Internally, a file is split into one or a lot of blocks and these blocks square measure hold on
during a set of information Nodes. The Name Node executes classification system namespace
operations like opening, closing, and renaming files and directories. It conjointly determines the
mapping of blocks to knowledge Nodes. The info Nodes square measure to responsibility for
serving scans and write requests from the file system’s clients. The info Nodes conjointly
performs block formation, deletion, and replication upon instruction from the Name Node.
The Name Node and knowledge Node square measure items of package designed to
run on goods machines. These machines generally run a GNU/Linux software system (OS).
HDFS is constructed exploitation the Java language; any machine that supports Java will run
the Name Node or the Data Node package. Usage of the extremely moveable Java language
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
78
implies that HDFS will be deployed on a large variety of machines. A typical readying includes a
dedicated machine that runs solely the Name Node package. Every of the opposite machines
within the cluster runs one instance of the Data Node package. The design doesn't preclude
running multiple knowledge Nodes on an equivalent machine however during a real readying
that's seldom the case. The existence of Name Node during a cluster greatly simplifies the
design of the system. The Name Node is that the intermediary and repository for all HDFS data.
The system is intended to flow the user knowledge through the Name Node.
The base Apache Hadoop structure is made out of the taking after modules: Hadoop
Common-contains libraries and utilities required by other Hadoop modules. Hadoop Distributed
File System (HDFS) - a appropriated document framework that stores information on product
machines, giving high total data transfer capacity over the group. Hadoop Map Reduce – a
programming model for huge scale information preparing. All the modules in Hadoop are
planned with a basic presumption that equipment disappointments (of individual machines, or
racks of machines) are normal also, consequently ought to be naturally taken care of in
programming by the structure. Apache Hadoop’s Map Reduce and HDFS parts initially got
individually from Google's Map Reduce and Google File System (GFS) papers."Hadoop"
frequently alludes not to simply the base Hadoop bundle yet rather to the Hadoop Ecosystem
fig.4 which incorporates the greater part of the extra programming bundles that can be
introduced on top of or nearby Hadoop, for example, Apache Hive, Apache Pig and A HBase.
4. Map Reduce Framework
Map Reduce as shown in Figure 4 is a product system for appropriated transforming of
Big data sets on PC groups [5]. It is first grown by Google. Map Reduce is planned to
encourage also, improve the preparing of incomprehensible measures of information in parallel
on extensive bunches of merchandise equipment in a solid, issue tolerant way. Map Reduce is
the key calculation that the Hadoop Map Reduce motor uses to circulate work around a bunch.
Commonplace Hadoop bunch coordinates Map Reduce and HFDS layer. In Map Reduce layer
job tracker assigns tasks to the task tracker. Master node job tracker also allots tasks to the
slave node task tracker figure.
Figure 4. Map reduce is based on the Master-Slave architecture
Master node contains -
 Job tracker node (Map Reduce layer)
 Task tracker node (Map Reduce layer)
 Name node (HFDS layer)
 Data node (HFDS layer)
Multiple slave nodes contain -
 Task tracker node (Map Reduce layer)
 Data node (HFDS layer)
 Map Reduce layer has job and task tracker nodes
 HFDS layer has name and data nodes
A. Map Reduce core functionality (I):
Map & Reduce stage plays an significant role in map reduce core functionality.
 ISSN: 2089-3272
IJEEI Vol. 4, No. 1, March 2016 : 74 – 80
79
Map stage: In Map step, master node takes consideration of large problem input and divided
into smaller problems and allotted to worker nodes. These nodes process the smaller problems
and return to the master node.
• Map (key1, value)  list<key2, value2>
Reduce stage: In this Reduce stage, Master node takes the response from the sub problems
and joins them in a predefined manner to get the output to original problem, as shown in
Figure 5.
• Reduce (key2, list < value2 >)  list
Figure 5. Reduce stage
5. PIG
Pigs [6] was at first shaped at Yahoo! to permit individuals utilizing Hadoop to
concentrate all the more on examining substantial information sets and invest less moment of
time is needed to compose mapper and reducer programs. Pig is comprised of two segments:
the first is the is called Pig Latin and the second is a runtime situation where Pig Latin programs
are executed.
6. HIVE
Apache Hive [7] was initially developed by face book. It has data warehouse structure
built on top of hadoop for analysis and inquiry of data. By default, Hive stores metadata in an
installed Apache Derby Database and other customer/server databases like MySQL can
alternatively be used. Right now, there are four document configurations upheld in Hive, which
are TEXTFILE SEQUENCEFILE, ORC and RCFILE.
7. HBase
HBase [8] is a section arranged database administration framework that keeps running
on top of HDFS. HBase applications are composed in Java much like an average Map Reduce
application. A HBase framework consists an arrangement of tables. Table Consists of rows and
columns like a conventional database.
8. Issues
Although a considerable measure of research is going on big data yet at the same time.
Several ideas are still to be investigated. Scientists would attempt to upgrade security stage to
enhance capacity of programming to discover propelled dangers, respond in like manner and
would create preventive measures for future. Specialists would attempt to enhance quality and
dependability of security framework. A few analysts are wanting to taken up information
accumulation, pretreatment, incorporation, Map Reduce and investigation utilizing machine
learning strategies. They would utilize the outcomes for securing and actualizing preventive
measures from dangers to big business information. Specialists would attempt to outline the
meet the creation needs of endeavors for growing high caliber item by applying efforts to
IJEEI ISSN: 2089-3272 
Big Data-Survey (PSG Aruna Sri)
80
establish safety with the assistance of big data Analytics with Hadoop. A few specialists are
utilizing systems administration checking tools like Packet pig, Mahout and so on to Improve the
security levels. Targeted threats will be analyzed by using hadoop cluster.
9. Conclusion
Big data is going to keep developing amid the following years, and every information
researcher will need to oversee considerably more measure of information will be more
different, bigger, and speedier. We talked about a few experiences about the theme, and what
we consider are the fundamental concerns and primary issues for what’s to come. Big data is
turning into new final boundary for experimental information research and for business
applications. Everyone is warmly welcomed to take part in this fearless trip.
References
[1] Ms Vibhavari Chavan et al. “Survey Paper on Big Data”. International Journal of Computer Science
and Information Technologies. 2014; 5 (6), ISSN: 0975-9646.
[2] Michael R Lyu. “Service-generated Big Data and Big Data-as-a-Service”. Internetware. 2013.
[3] Suman Arora et al. “Survey Paper on Scheduling in Hadoop”. International Journal of Advanced
Research in Computer Science and Software Engineering. 2014; 4.
[4] Dhruba Borthakur. “The Hadoop Distributed File System: Architecture and Design”. The Apache
Software Foundation. 2007.
[5] Yaxiong Zhao et al. “Dache: A Data AwareCaching for Big-Data Applications Using the MapReduce
Framework”. Tsinghua science and technology. 2014; 19(1), ISSN: l1007-0214l.
[6] Apache Pig. Available at http://pig.apache.org
[7] Apache Hive. Available at http://hive.apache.org
[8] Apache HBase. Available at http://hbase.apache.org

More Related Content

What's hot

ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
International Journal of Technical Research & Application
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
IJRESJOURNAL
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
Information Security Awareness Group
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
Information Security Awareness Group
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
Supriya Radhakrishna
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Nimish Kochhar
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
Shallote Dsouza
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
Indu Khemchandani
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
Dux Chandegra
 
Big data mining
Big data miningBig data mining
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
IRJET Journal
 

What's hot (20)

1
11
1
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 

Similar to Big Data-Survey

DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillDOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
ClaraZara1
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
IRJET Journal
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
Techsparks
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
IJERA Editor
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
ijp2p
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
ijcisjournal
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
Tyrone Systems
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
Audrey Britton
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
Valarmathi V
 
Big Data
Big DataBig Data
Big Data
Faisal Ahmed
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big data
Big dataBig data
Big data
Mohamed Salman
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
rajsharma159890
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 

Similar to Big Data-Survey (20)

DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillDOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond Hill
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data
Big dataBig data
Big data
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big Data
Big DataBig Data
Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big data
Big dataBig data
Big data
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 

More from ijeei-iaes

An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
ijeei-iaes
 
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
ijeei-iaes
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
ijeei-iaes
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distance
ijeei-iaes
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
ijeei-iaes
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
ijeei-iaes
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
ijeei-iaes
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
ijeei-iaes
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
ijeei-iaes
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumption
ijeei-iaes
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
ijeei-iaes
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classification
ijeei-iaes
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Graphene
ijeei-iaes
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stability
ijeei-iaes
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structure
ijeei-iaes
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagos
ijeei-iaes
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
ijeei-iaes
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...
ijeei-iaes
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...
ijeei-iaes
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detection
ijeei-iaes
 

More from ijeei-iaes (20)

An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distance
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumption
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classification
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Graphene
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stability
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structure
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagos
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detection
 

Recently uploaded

Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 

Recently uploaded (20)

Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 

Big Data-Survey

  • 1. Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol. 4, No. 1, March 2016, pp. 74~80 ISSN: 2089-3272, DOI: 10.11591/ijeei.v4i1.195  74 Received October 9, 2015; Revised January 6, 2016; Accepted January 20, 2016 Big Data-Survey PSG Aruna Sri* 1 , Anusha M 2 1 Department of Electronics and Computer Engineering, K L University, 2 Research Scholar, Dept of CSE, K L UNIVERSITY Greenfields, Vaddeswaram, Guntur District, Andhra Pradesh 522502 *Corresponding author, e-mail: arunasri_2012@kluniversity.in 1 , anushaaa9@kluniversity.in 2 Abstract Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues. Keywords: Big data, Hadoop, Software tools, Map Reduce 1. Introduction We make 2.5 quintillion bytes of information [1] - so much that 90% of the data on the planet today has been made in the last two years alone. This data originates from all around: sensors used to accumulate atmosphere data, posts to social media sites, digital pictures and videos, and cell phone GPS signal. This enormous amount of the data is known as "Big data". Big data is a catchphrase, or motto, utilizes to describe a massive volume of both structured and unstructured data that is so huge that it's complicated to process using conventional database and software procedures. In most project circumstances the information is too big or it shifts too quickly or it surpasses existing processing ability. Big data has the potential to help organizations to improve operations and make faster by taking more intelligent decisions. Now-a-days, Big Data is the term which finds to be normal in IT businesses. As there was enormous information in the industry although there is nothing before big data which comes into imagine. Big data is really an advancing term that illustrates any huge amount of organized, semi organized information that can possibly be extracting for data. Although big data doesn't refer to any particular quantity, so this term is often utilized when talking about petabytes and exabytes of data Big data is a comprehensive term for expansive accumulation of the data sets so this large and complex that it gets to be troublesome to work with conventional data processing applications. At the point when managing bigger datasets, organizations face challenges in creating and managing big data. No standard tools and procedures for searching and analyzing large data sets in business analytics A case of Big data may be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data comprising of billions to trillions of records of a huge number of individuals all from distinctive sources for example agreements, web, moveable data. The data is commonly approximately organized data that is frequently fragmented also, inaccessible. The problems faced by big data are analyzing, capturing, searching, sharing, storing, transferring, visualization and privacy abuse. Larger data sets is needed in order to prevent diseases, combat crime, spot business trends and so on. Because of large information sets in these area researchers identify limitations frequently like meteorology, genomics, connectomics, complex physics simulations, biological, ecological research, and funding and trade information. Data sets increased their size due to collecting data from sensing portable devices, aerial sensory technology, software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. In March 2012 [2], the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP
  • 2.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 75 and HP, have spent more than $15 billion on buying data management and analytics software. Big data defined as far back as 2001, industry expert Doug Laney (right now with Gartner) expressed the standard meaning of big data as the 5 Vs of big data: volume, velocity, variety, veracity and value, as shown Figure 1. Figure 1. Five (5) V’s factor of big data Volume: At present big scale of systems are flooded with constantly increasing information, simply growing terabytes or even petabytes of data. Velocity: Information is flowing at unique speed and should be deal with a sensible way. In real time for many organizations it is difficult to deal with RFID tag, sensors and smart metering data. Variety: Organized and unorganized information are producing a variety of data types by making it feasible to search novel approaches, while analyzing these information collectively, prediction might be attained as data flow into the organization. Veracity: Identifying and verifying inconsistent information is significant, to accomplish faithful study. Creating faith in big data is a big challenge to manage even more variety of data is available. Value: It is to be derived from big data. There is no reason for building the capacity to store and manage, if unable to get the value from data. One of the capable remarks on the advances of the arrangement with the Big Data is Hadoop. 2. Hadoop Hadoop was made by Doug Cutting and Mike Cafarella in 2005. Doug Cutting, who was working at Yahoo! at the time, named it after his kid's toy elephant. Hadoop system is shown Figure 2 and consists [3]: Reliable: this software can handle both hardware and software failures. Scalable: Designed for gigantic size of processors, memory, and local appended capacity Distributed: Map reduce recommends parallel programming model and provides concept of replication
  • 3. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 76 Figure 2. Hadoop system [1] Hadoop is open-source programming that allows loyal, flexible, expressed figuring on groups of reserved servers. That utilization the Map-Reduce framework presented by Google by utilizing the idea of map and reduce functions that remarkable utilized in Functional Programming. In spite of the fact that the Hadoop structure is composed in Java, it permits designers to send custom-composed projects coded in Java or some other dialect to process information in a parallel manner over hundreds or a great many thing servers. It is streamlined for touching read requests (streaming reads), where transforming incorporates of checking all the information. Contingent upon the intricacy of the procedure and the volume of information, reaction time can change from minutes to hours. While Hadoop can forms information quick, so its key focal point is its monstrous adaptability. Hadoop is right now being utilized for file web seeks, email spam location, proposal motors, expectation in budgetary administrations, genome control in life sciences, furthermore, for investigation of unstructured information, for example, log, content, and click stream. While a considerable lot of these applications could indeed be actualized in a social database (RDBMS) figure 2, the primary center of the Hadoop system is practically not the same as a RDBMS. The accompanying examines some of these distinctions Hadoop is especially helpful when:  For handling of Complex data.  Need of conversion of unstructured information to structured information  Usage of SQL queries  Recursive calculations are very large  Complex geo-spatial investigation or genome sequencing  Machine learning  Data sets are so substantial it couldn't be possible fit into database RAM, plates, or require an excess of centers (10's of TB up to PB)  Data worth does not defend cost of steady ongoing accessibility, for example, files or exceptional interest information, which can be moved to Hadoop and stay accessible at lower expense  Results are not required continuously  Fault resistance is basic  Significant custom coding would be obliged to handle employment booking Hadoop was motivated by Google's Map Reduce, a product structure in which an application is separated into various little parts. Any of these parts (additionally called sections or squares) can be run on any hub in the group. Doug Cutting, Hadoop's maker, named the system after his youngster's full toy elephant. The current Apache Hadoop biological system comprises of the Hadoop bit, Map Reduce, the Hadoop circulated record framework (HDFS) what's more, various related activities, for example, Apache Hive, HBase and Zookeeper. The Hadoop structure is utilized by significant players including Google, Yahoo and IBM, generally for applications including web search tools and promoting. The favored working frameworks are Windows and Linux be that as it may Hadoop can likewise work with BSD and OS X.
  • 4.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 77 A distributed file system framework is a customer/server-based application that permits customers to get to and process information put away on the server as though it were all alone PC. At the point when a client gets to a document on the server, the server sends the client a duplicate of the document, which is stored on the client's PC while the information is being prepared and is then come back to the server. Preferably, a convey record framework arranges document and registry administrations of individual servers into a worldwide catalog in such a way that remote information access is not area particular however is indistinguishable from any customer. All records are available to all clients of the worldwide record framework and association is progressive what more, registry based is. Since more than one customer may get to the same information all the while, the server must have a system in spot, (for example, keeping up data about the times of access) to arrange overhauls so that the customer dependably gets the most current adaptation of information and that information clashes don't emerge. Dispersed record frameworks ordinarily utilize record or database replication (conveying duplicates of information on different servers) to ensure against information access failures [4]. Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's Distributed File Framework, and IBM/Transarc's DFS are a few samples of distributed file systems framework. 3. HDFS Hadoop frame work consists the Hadoop Distributed File System (HDFS) [4]. HDFS is planned and improved to store information more than a lot of ease equipment in an appropriated manner, as shown in Figure 3. Figure 3. HDFS architecture The structure of HDFS is master/slave. HDFS cluster consists of one Name Node, a master server that manages the classification system namespace and regulates access to files by clients. In addition, there square measure variety of information nodes, sometimes one per node within the cluster, that manage storage hooked up to the nodes that they run on. HDFS exposes a classification system namespace and permits user knowledge to be hold on in files. Internally, a file is split into one or a lot of blocks and these blocks square measure hold on during a set of information Nodes. The Name Node executes classification system namespace operations like opening, closing, and renaming files and directories. It conjointly determines the mapping of blocks to knowledge Nodes. The info Nodes square measure to responsibility for serving scans and write requests from the file system’s clients. The info Nodes conjointly performs block formation, deletion, and replication upon instruction from the Name Node. The Name Node and knowledge Node square measure items of package designed to run on goods machines. These machines generally run a GNU/Linux software system (OS). HDFS is constructed exploitation the Java language; any machine that supports Java will run the Name Node or the Data Node package. Usage of the extremely moveable Java language
  • 5. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 78 implies that HDFS will be deployed on a large variety of machines. A typical readying includes a dedicated machine that runs solely the Name Node package. Every of the opposite machines within the cluster runs one instance of the Data Node package. The design doesn't preclude running multiple knowledge Nodes on an equivalent machine however during a real readying that's seldom the case. The existence of Name Node during a cluster greatly simplifies the design of the system. The Name Node is that the intermediary and repository for all HDFS data. The system is intended to flow the user knowledge through the Name Node. The base Apache Hadoop structure is made out of the taking after modules: Hadoop Common-contains libraries and utilities required by other Hadoop modules. Hadoop Distributed File System (HDFS) - a appropriated document framework that stores information on product machines, giving high total data transfer capacity over the group. Hadoop Map Reduce – a programming model for huge scale information preparing. All the modules in Hadoop are planned with a basic presumption that equipment disappointments (of individual machines, or racks of machines) are normal also, consequently ought to be naturally taken care of in programming by the structure. Apache Hadoop’s Map Reduce and HDFS parts initially got individually from Google's Map Reduce and Google File System (GFS) papers."Hadoop" frequently alludes not to simply the base Hadoop bundle yet rather to the Hadoop Ecosystem fig.4 which incorporates the greater part of the extra programming bundles that can be introduced on top of or nearby Hadoop, for example, Apache Hive, Apache Pig and A HBase. 4. Map Reduce Framework Map Reduce as shown in Figure 4 is a product system for appropriated transforming of Big data sets on PC groups [5]. It is first grown by Google. Map Reduce is planned to encourage also, improve the preparing of incomprehensible measures of information in parallel on extensive bunches of merchandise equipment in a solid, issue tolerant way. Map Reduce is the key calculation that the Hadoop Map Reduce motor uses to circulate work around a bunch. Commonplace Hadoop bunch coordinates Map Reduce and HFDS layer. In Map Reduce layer job tracker assigns tasks to the task tracker. Master node job tracker also allots tasks to the slave node task tracker figure. Figure 4. Map reduce is based on the Master-Slave architecture Master node contains -  Job tracker node (Map Reduce layer)  Task tracker node (Map Reduce layer)  Name node (HFDS layer)  Data node (HFDS layer) Multiple slave nodes contain -  Task tracker node (Map Reduce layer)  Data node (HFDS layer)  Map Reduce layer has job and task tracker nodes  HFDS layer has name and data nodes A. Map Reduce core functionality (I): Map & Reduce stage plays an significant role in map reduce core functionality.
  • 6.  ISSN: 2089-3272 IJEEI Vol. 4, No. 1, March 2016 : 74 – 80 79 Map stage: In Map step, master node takes consideration of large problem input and divided into smaller problems and allotted to worker nodes. These nodes process the smaller problems and return to the master node. • Map (key1, value)  list<key2, value2> Reduce stage: In this Reduce stage, Master node takes the response from the sub problems and joins them in a predefined manner to get the output to original problem, as shown in Figure 5. • Reduce (key2, list < value2 >)  list Figure 5. Reduce stage 5. PIG Pigs [6] was at first shaped at Yahoo! to permit individuals utilizing Hadoop to concentrate all the more on examining substantial information sets and invest less moment of time is needed to compose mapper and reducer programs. Pig is comprised of two segments: the first is the is called Pig Latin and the second is a runtime situation where Pig Latin programs are executed. 6. HIVE Apache Hive [7] was initially developed by face book. It has data warehouse structure built on top of hadoop for analysis and inquiry of data. By default, Hive stores metadata in an installed Apache Derby Database and other customer/server databases like MySQL can alternatively be used. Right now, there are four document configurations upheld in Hive, which are TEXTFILE SEQUENCEFILE, ORC and RCFILE. 7. HBase HBase [8] is a section arranged database administration framework that keeps running on top of HDFS. HBase applications are composed in Java much like an average Map Reduce application. A HBase framework consists an arrangement of tables. Table Consists of rows and columns like a conventional database. 8. Issues Although a considerable measure of research is going on big data yet at the same time. Several ideas are still to be investigated. Scientists would attempt to upgrade security stage to enhance capacity of programming to discover propelled dangers, respond in like manner and would create preventive measures for future. Specialists would attempt to enhance quality and dependability of security framework. A few analysts are wanting to taken up information accumulation, pretreatment, incorporation, Map Reduce and investigation utilizing machine learning strategies. They would utilize the outcomes for securing and actualizing preventive measures from dangers to big business information. Specialists would attempt to outline the meet the creation needs of endeavors for growing high caliber item by applying efforts to
  • 7. IJEEI ISSN: 2089-3272  Big Data-Survey (PSG Aruna Sri) 80 establish safety with the assistance of big data Analytics with Hadoop. A few specialists are utilizing systems administration checking tools like Packet pig, Mahout and so on to Improve the security levels. Targeted threats will be analyzed by using hadoop cluster. 9. Conclusion Big data is going to keep developing amid the following years, and every information researcher will need to oversee considerably more measure of information will be more different, bigger, and speedier. We talked about a few experiences about the theme, and what we consider are the fundamental concerns and primary issues for what’s to come. Big data is turning into new final boundary for experimental information research and for business applications. Everyone is warmly welcomed to take part in this fearless trip. References [1] Ms Vibhavari Chavan et al. “Survey Paper on Big Data”. International Journal of Computer Science and Information Technologies. 2014; 5 (6), ISSN: 0975-9646. [2] Michael R Lyu. “Service-generated Big Data and Big Data-as-a-Service”. Internetware. 2013. [3] Suman Arora et al. “Survey Paper on Scheduling in Hadoop”. International Journal of Advanced Research in Computer Science and Software Engineering. 2014; 4. [4] Dhruba Borthakur. “The Hadoop Distributed File System: Architecture and Design”. The Apache Software Foundation. 2007. [5] Yaxiong Zhao et al. “Dache: A Data AwareCaching for Big-Data Applications Using the MapReduce Framework”. Tsinghua science and technology. 2014; 19(1), ISSN: l1007-0214l. [6] Apache Pig. Available at http://pig.apache.org [7] Apache Hive. Available at http://hive.apache.org [8] Apache HBase. Available at http://hbase.apache.org