SlideShare a Scribd company logo
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 17
Big Data: Review, Classification and Analysis
Survey
K.Arun Dr.L.Jabasheela
Department of Computer Applications, Department of Computer Applications,
Jeppiaar Engineering College, Panimalar Engineering College,
Chennai, India. Chennai, India.
Abstract— World Wide Web plays an important role in providing various knowledge sources to the world, which helps
many applications to provide quality service to the consumers. As the years go on the web is overloaded with lot of
information and it becomes very hard to extract the relevant information from the web. This gives way to the evolution
of the Big Data and the volume of the data keeps increasing rapidly day by day. Data mining techniques are used to
find the hidden information from the big data. In this paper we focus on the review of Big Data, its data classification
methods and the way it can be mined using various mining methods.
Keywords-Big Data,Data Mining,Data Classificaion,Mining Techniques
I. INTRODUCTION
The concept of big data has been endemic within computer science since the earliest days of computing. “Big Data”
originally meant the volume of data that could not be processed by traditional database methods and tools. Each time a
new storage medium was invented, the amount of data aaccessible exploded because it could be easily accessed. The
original definition focused on structured data, but most researchers and practitioners have come to realize that most of the
world’s information resides in massive, unstructured information, largely in the form of text and imagery. The explosion
of data has not been accompanied by a corresponding new storage medium. The structure of this paper is as follows:
Section 2 is about Big Data, Section 3 Big Data Characteristics, Section 4 Architecture and Classification, Sections 5, 6,
and 7 discuss on Big Data Analytics, Open Source Revolution, and Mining Techniques for Big Data, and finally Section
8 concludes the paper.
II. BIG DATA
Big Data is a new term assigned to the datasets which appear large in size; we cannot manage them with the traditional
data mining techniques and software tools available. “Big Data “appears as a concrete large size dataset which hides any
information in its massive volume, which cannot be explored without using new algorithms or data mining techniques.
III. BIG DATA CHARACTERISTICS
We have all heard of the 3Vs of big data which are Volume, Variety and Velocity, yet other Vs that IT, business and
data scientists need to be concerned with, most notably big data Veracity.
 Data Volume: Data volume measures the amount of data available to an organization, which does not
necessarily have to own all of it as long as it can access it. As data volume increases, the value of different data
records will decrease in proportion to age, type, richness, and quantity among other factors.
 Data Variety: Data variety is a measure of the richness of the data representation – text, images video, audio,
etc. From an analytic perspective, it is probably the biggest obstacle to effectively using large volumes of data.
Incompatible data formats, non-aligned data structures, and inconsistent data semantics represents significant
challenges that can lead to analytic sprawl.
 Data Velocity: Data velocity measures the speed of data creation, streaming, and aggregation. Ecommerce has
rapidly increased the speed and richness of data used for different business transactions (for example, web-site
clicks). Data velocity management is much more than a bandwidth issue; it is also an ingest issue.
 Data Veracity: Data veracity refers to the biases, noise and abnormality in data. Is the data that is being stored,
and mined meaningful to the problem being analyzed. Veracity in data analysis is the biggest challenge when
compares to things like volume and velocity.
IV. BIG DATA ARCHITECTURE AND CLASSIFICATION
This "Big data architecture and patterns" series presents a structured and pattern-based approach to simplify the task
of defining an overall big data architecture [8].
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 18
Fig 1: Big Data Architecture
Because it is important to assess whether a business scenario is a big data problem, we include pointers to help determine
which business problems are good candidates for big data solutions.
TABLE 1: Big Data Business Problem by type
Business problem Big data type Description
Utilities: Predict
power consumption
Machine-
generated data
Utility companies have rolled out smart meters to measure the consumption of
water, gas, and electricity at regular intervals of one hour or less. These smart meters
generate huge volumes of interval data that needs to be analyzed.
Utilities also run big, expensive, and complicated systems to generate power. Each
grid includes sophisticated sensors that monitor voltage, current, frequency, and?
other important operating characteristics.
Telecommunications:
Customer churn
analytics
Web and social
data
Transaction data
Telecommunications operators need to build detailed customer churn models that
include social media and transaction data, such as CDRs, to keep up with the
competition.
The value of the churn models depends on the quality of customer attributes
(customer master data such as date of birth, gender, location, and income) and the
social behaviour of customers. Telecommunications providers who implement a
predictive analytics strategy can manage and predict churn by analyzing the calling
patterns of subscribers.
Marketing:
Sentiment analysis
Web and social
data
Marketing departments use Twitter feeds to conduct sentiment analysis to determine
what users are saying about the company and its products or services, especially
after a new product or release is launched.
Customer sentiment must be integrated with customer profile data to derive
meaningful results. Customer feedback may vary according to customer
demographics.
Customer service:
Call monitoring
Human-generated IT departments are turning to big data solutions to analyze application logs to gain
insight that can improve system performance. Log files from various application
vendors are in different formats; they must be standardized before IT departments
can use them.
Retail: Personalized
messaging based on
facial recognition
and social media
Web and social
data
Biometrics
Retailers can use facial recognition technology in combination with a photo from
social media to make personalized offers to customers based on buying behaviour
and location.
This capability could have a tremendous impact on retailers? Loyalty programs, but
it has serious privacy ramifications. Retailers would need to make the appropriate
privacy disclosures before implementing these applications.
Retail and marketing:
Mobile data and
location-based
targeting
Machine-
generated data
Transaction data
Retailers can target customers with specific promotions and coupons based location
data. Solutions are typically designed to detect a user's location upon entry to a store
or through GPS.
Location data combined with customer preference data from social networks enable
retailers to target online and in-store marketing campaigns based on buying history.
Notifications are delivered through mobile applications, SMS, and email.
a. From classifying big data to choosing a big data solution
If we spent any time investigating big data solutions, you know it's no simple task. This series takes you through the
major steps involved in finding the big data solution that meets your needs. We begin by looking at types of data
described by the term "big data." To simplify the complexity of big data types, we classify big data according to various
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 19
parameters and provide a logical architecture for the layers and high-level components involved in any big data solution.
Next, we propose a structure for classifying big data business problems by defining atomic and composite classification
patterns. These patterns help determine the appropriate solution pattern to apply. We include sample business problems
from various industries. And finally, for every component and pattern, we present the products that offer the relevant
function.
b. Classifying business problems according to big data type
Business problems can be categorized into types of big data problems. Down the road, we'll use this type to determine
the appropriate classification pattern (atomic or composite) and the appropriate big data solution. But the first step is to
map the business problem to its big data type.Table1 lists common business problems and assigns a big data type to each.
Categorizing big data problems by type makes it simpler to see the characteristics of each kind of data. These
characteristics can help us understand how the data is acquired, how it is processed into the appropriate format, and how
frequently new data becomes available. Data from different sources has different characteristics; for example, social
media data can have video, images, and unstructured text such as blog posts, coming in continuously.
c. Using big data type to classify big data characteristics
It's helpful to look at the characteristics of the big data along certain lines — for example, figure 2 shows how the
data is collected, analyzed, and processed. Once the data is classified, it can be matched with the appropriate big data
pattern:
Fig 2 : Big Data Classification
Analysis type — whether the data is analyzed in real time or batched for later analysis. Give careful consideration to
choosing the analysis type, since it affects several other decisions about products, tools, hardware, data sources, and
expected data frequency. A mix of both types may be required by the use case: Fraud detection; analysis must be done in
real time or near real time. Trend analysis for strategic business decisions; analysis can be in batch mode.
Processing methodology — the type of technique to be applied for processing data (e.g., predictive, analytical, ad-hoc
query, and reporting). Business requirements determine the appropriate processing methodology. A combination of
techniques can be used. The choice of processing methodology helps identify the appropriate tools and techniques to be
used in your big data solution.
Data frequency and size — how much data is expected and at what frequency does it arrive. Knowing frequency and size
helps determine the storage mechanism, storage format, and the necessary pre-processing tools. Data frequency and size
depend on data sources: On demand, as with social media data, Continuous feed, real-time (weather data, transactional
data) Time series (time-based data)
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 20
Data type — Type of data to be processed — transactional, historical, master data, and others. Knowing the data type
helps segregate the data in storage.
Content format — Format of incoming data — structured (RDMBS, for example), unstructured (audio, video, and
images, for example), or semi-structured. Format determines how the incoming data needs to be processed and is key to
choosing tools and techniques and defining a solution from a business perspective.
Data source — Sources of data (where the data is generated) — web and social media, machine-generated, human-
generated, etc. Identifying all the data sources helps determine the scope from a business perspective. The figure shows
the most widely used data sources.
Data consumers — A list of all of the possible consumers of the processed data:
Business processes
Business users
Enterprise applications
Individual people in various business roles
Part of the process flows
Other data repositories or enterprise applications
Hardware — the type of hardware on which the big data solution will be implemented — commodity hardware or state f
the art. Understanding the limitations of hardware helps inform the choice of big data solution.
V. BIG DATA ANALYTICS
Big data analytics refers to the process of collecting, organizing and analyzing large sets of data ("big data") to discover
patterns and other useful information. Not only will big data analytics help you to understand the information contained
within the data, but it will also help identify the data that is most important to the business and future business decisions.
Big data analysts basically want the knowledge that comes from analyzing the data.
a. The Benefits of Big Data Analytics
Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from
the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can
boost sales, increase efficiency, and improve operations, customer service and risk management.
b. The Challenges of Big Data Analytics
For most organizations, big data analysis is a challenge. Consider the sheer volume of data and the many different
formats of the data (both structured and unstructured data) collected across the entire organization and the many different
ways different types of data can be combined, contrasted and analyzed to find patterns and other useful information.
The first challenge is in breaking down data silos to access all data an organization stores in different places and often in
different systems. A second big data challenge is in creating platforms that can pull in unstructured data as easily as
structured data. This massive volume of data is typically so large that it's difficult to process using traditional database
and software methods.
c. Big Data Requires High-Performance Analytics
To analyze such a large volume of data, big data analytics is typically performed using specialized software tools and
applications for predictive analytics, data mining, text mining, and forecasting and data optimization. Collectively these
processes are separate but highly integrated functions of high-performance analytics. Using big data tools and software
enables an organization to process extremely large volumes of data that a business has collected to determine which data
is relevant and can be analyzed to drive better business decisions in the future.
d. Examples of How Big Data Analytics is Used Today
As technology to break down data silos and analyze data improves, business can be transformed in all sorts of ways.
Big Data allow researchers to decode human DNA in minutes, predict where terrorists plan to attack, determine which
gene is mostly likely to be responsible for certain diseases and, of course, which ads you are most likely to respond to on
Face book. The business cases for leveraging Big Data are compelling. For instance, Netflix mined its subscriber data to
put the essential ingredients together for its recent hit House of Cards, and subscriber data also prompted the company to
bring Arrested Development back from the dead.
Another example comes from one of the biggest mobile carriers in the world. France's Orange launched its Data for
Development project by releasing subscriber data for customers in the Ivory Coast. The 2.5 billion records, which were
made anonymous, included details on calls and text messages exchanged between 5 million users. Researchers accessed
the data and sent Orange proposals for how the data could serve as the foundation for development projects to improve
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 21
public health and safety. Proposed projects included one that showed how to improve public safety by tracking cell phone
data to map where people went after emergencies; another showed how to use cellular data for disease containment.
VI. TOOLS : OPEN SOURCE REVOLUTION
Apache Hadoop [3]: software for data-intensive distributed applications, based in the MapReduce programming model
and a distributed file system called Hadoop Distributed Filesystem (HDFS). Hadoop allows writing applications that
rapidly process large amounts of data in parallel on large clusters of compute nodes. A MapReduce job divides the input
dataset into independent subsets that are processed by map tasks in parallel. This step of mapping is then followed by a
step of reducing tasks. These reduce tasks use the output of the maps to obtain the final result of the job.
Apache Pig [6]: software for analyzing large data sets that consists of a high-level language similar to SQL for expressing
data analysis programs, coupled with infrastructure for evaluating these rograms. It contains a compiler that produces
sequences of Map- Reduce programs.
Cascading [10]: software abstraction layer for Hadoop, intended to hide the underlying complexity of MapReduce jobs.
Cascading allows users to create and execute data processing workflows on Hadoop clusters using any JVM-based
language.
Scribe [11]: server software developed by Facebook and released in 2008. It is intended for aggregating log data
streamed in real time from a large number of servers.
Apache HBase [4]: non-relational columnar distributed database designed to run on top of Hadoop Distributed
Filesystem (HDFS). It is written in Java and modeled after Google’s BigTable. HBase is an example if a NoSQL data
store.
Apache Cassandra [2]: another open source distributed database management system developed by Facebook. Cassandra
is used by Netflix, which uses Cassandra as the back-end database for its streaming services.
Apache S4 [15]: platform for processing continuous data streams. S4 is designed specifically for managing data streams.
S4 apps are designed combining streams and processing elements in real time.
In Big Data Mining, there are many open source initiatives. The most popular are the following:
– Apache Mahout [5]: Scalable machine learning and data mining open source software based mainly in Hadoop. It has
implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative
filtering and frequent pattern mining.
MOA [9]: Stream data mining open source software to perform data mining in real time. It has implementations of
classification, regression, clustering and frequent item set mining and frequent graph mining. It started as a project of the
Machine Learning group of University of Waikato, New Zealand, famous for the WEKA software. The streams
framework [12] provides an environment for defining and running stream processes using simple XML based definitions
and is able to use MOA.
– R [16]: open source programming language and software environment designed for statistical computing and
visualization. R was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand
beginning in 1993 and is used for statistical analysis of very large data sets.
Vowpal Wabbit [13]: open source project started at Yahoo! Research and continuing at Microsoft Research to design a
fast, scalable, useful learning algorithm. VW is able to learn from terafeature datasets. It can exceed the throughput of
any single machine network interface when doing linear learning, via parallel learning.
– PEGASUS [12]: big graph mining system built on top of MAPREDUCE. It allows to find patterns and anomalies in
massive real-world graphs.
– GraphLab [14]: high-level graph-parallel system built without using MAPREDUCE. GraphLab computes over
dependent records which are stored as vertices in a large distributed data-graph. Algorithms in GraphLab are expressed as
vertex-programs which are executed in parallel on each vertex and can interact with neighboring vertices.
VII. MINING TECHINQUES FOR BIG DATA
There are many different types of analysis that can be done in order to retrieve information from big data. Each type of
analysis will have a different impact or result. Which type of data mining technique you should use really depends on the
type of business problem that you are trying to solve. Different analyses will deliver different outcomes and thus provide
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 22
different insights. One of the common ways to recover valuable insights is via the process of data mining. Data mining is
a buzzword that often is used to describe the entire range of big data analytics, including collection, extraction, analysis
and statistics. This however, is too broad as data mining especially refers to the discovery of previously unknown
interesting patterns, unusual records or dependencies. When developing your big data strategy it is important to have a
clear understanding of what data mining is and how it can help you.
i. Anomaly or Outlier detection
Anomaly detection refers to the search for data items in a dataset that do not match a projected pattern or expected
behaviour. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and
actionable information. An outlier is an object that deviates significantly from the general average within a dataset or a
combination of data. It is numerically distant from the rest of the data and therefore, the outlier indicates that something
is out of the ordinary and requires additional analysis.
Anomaly detection is used to detect fraud or risks within critical systems and they have all the characteristics to be of
interest to an analyst, who can further analyse the anomalies to find out what’s really going on. It can help find
extraordinary occurrences that could indicate fraudulent actions, flawed procedures or areas where a certain theory is
invalid. Important to note is that in large datasets, a small amount of outliers is common. Outliers may indicate bad data
but may also be due to random variation or may indicate something scientifically interesting. In all cases, additional
research is required.
ii. Association rule learning
Association rule learning enables the discovery of interesting relations (interdependencies) between different variables
in large databases. Association rule learning uncovers hidden patterns in the data that can be used to identify variables
within the data and the co-occurrences of different variables that appear with the greatest frequencies.
Association rule learning is often used in the retail industry when finding patterns in point-of-sales data. These patterns
can be used when recommending new products to others based on what others have bought before or based on which
products are bought together. If this is done correctly, it can help organisations increase their conversion rate. A well-
known example is that thanks to data mining, Walmart, already in 2004, discovered that Strawberry Pop-tarts sales
increase by seven times prior to a hurricane. Since this discovery, Walmart places the Strawberry Pop-Tarts at the
checkouts prior to a hurricane.
iii. Clustering analysis
Clustering analysis is the process of identifying data sets that are similar to each other to understand the differences as
well as the similarities within the data. Clusters have certain traits in common that can be used to improve targeting
algorithms. For example, clusters of customers with similar buying behaviour can be targeted with similar products and
services in order to increase the conversation rate. A result from a clustering analysis can be the creation of
personas. Personas are fictional characters created to represent the different user types within a targeted demographic,
attitude and/or behaviour set that might use a site, brand or product in a similar way. The programming language R has
large variety of functions to perform relevant cluster analysis and is therefore especially relevant for performing a
clustering analysis.
iv. Classification analysis
Classification Analysis is a systematic process for obtaining important and relevant information about data, and
metadata – data about data. The classification analysis helps identifying to which of a set of categories different types of
data belong. Classification analysis is closely linked to cluster analysis as the classification can be used to cluster data.
Your email provider performs a well-known example of classification analysis: they use algorithms that are capable of
classifying your email as legitimate or mark it as spam. This is done based on data that is linked with the email or the
information that is in the email, for example certain words or attachments that indicate spam.
v. Regression analysis
Regression analysis tries to define the dependency between variables. It assumes a one-way causal effect from one
variable to the response of another variable. Independent variables can be affected by each other but it does not mean that
this dependency is both ways as is the case with correlation analysis. A regression analysis can show that one variable is
dependent on another but not vice-versa.
Regression analysis is used to determine different levels of customer satisfactions and how they affect customer loyalty
and how service levels can be affected by for example the weather. A more concrete example is that a regression analysis
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O)
Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P)
www.ijiris.com
_________________________________________________________________________________________________
© 2014, IJIRIS- All Rights Reserved Page - 23
can help you find the love of your live on an online dating website. The website eHarmony uses a regression model that
matches two individual singles based on 29 variables to find the best partner.
Data mining can help organisations and scientists to find and select the most important and relevant information. This
information can be used to create models that can help make predictions how people or systems will behave so you can
anticipate on it. The more data you have the better the models will become that you can create using the data mining
techniques, resulting in more business value for your organisation.
VIII. CONCLUSION
This paper describes about the advent of Big Data, Architecture and Characteristics. Here we discussed about the
classifications of Big Data to the business needs and how for it will help us in decision making in the business
environment. Our future work focuses on the analysis part of the big data classification by implementing a different data
mining techniques in it.
REFERENCE
[1] http://www.pro.techtarget.com
[2] Apache Cassandra, http://cassandra. apache.org.
[3] Apache Hadoop, http://hadoop.apache.org.
[4] Apache HBase, http://hbase.apache.org.
[5] Apache Mahout, http://mahout.apache.org.
[6] Apache Pig, http://www.pig.apache.org/.
[7] http://www.webopedia.com/
[8] http://www.ibm.com/library/
[9] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer.MOA: Massive Online Analysis http://moa.cms.waikato.ac.nz/.
Journal of Machine Learning Research (JMLR), 2010.
[10] Cascading, http://www.cascading.org/.
[11] Facebook Scribe, https://github.com/ facebook/scribe.
[12] U. Kang, D. H. Chau, and C. Faloutsos. PEGASUS:Mining Billion-Scale Graphs in the Cloud. 2012.
[13] J. Langford. Vowpal Wabbit, http://hunch.net/˜vw/,2011.
[14] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson,C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework
for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July
2010.
[15] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4:Distributed Stream Computing Platform. In ICDM
Workshops, pages 170–177, 2010.
[16] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing,
Vienna, Austria, 2012. ISBN 3-900051-07-0.

More Related Content

What's hot

IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
IRJET Journal
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
Aravindharamanan S
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Taniya Fansupkar
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
Aravindharamanan S
 
the influence of machine language and data science in the emerging world
the influence of machine language and data science in the emerging worldthe influence of machine language and data science in the emerging world
the influence of machine language and data science in the emerging world
ijtsrd
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
Editor IJCATR
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big Data
AgeFriendlyEconomy
 
Story of Bigdata and its Applications in Financial Institutions
Story of Bigdata and its Applications in Financial InstitutionsStory of Bigdata and its Applications in Financial Institutions
Story of Bigdata and its Applications in Financial Institutions
ijtsrd
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transfer
ijwmn
 
Data Science
Data ScienceData Science
Data Science
Prakhyath Rai
 
Data set module 1
Data set   module 1Data set   module 1
Data set module 1
Data-Set
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
Editor IJMTER
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
 
Big Data technology
Big Data technologyBig Data technology
Big Data technology
Nicolae Sfetcu
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
IRJET Journal
 
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAA REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
IJMIT JOURNAL
 
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
IRJET Journal
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET Journal
 
Big Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research ActivityBig Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research Activity
Andry Alamsyah
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Oomph! Recruitment
 

What's hot (20)

IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
the influence of machine language and data science in the emerging world
the influence of machine language and data science in the emerging worldthe influence of machine language and data science in the emerging world
the influence of machine language and data science in the emerging world
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big Data
 
Story of Bigdata and its Applications in Financial Institutions
Story of Bigdata and its Applications in Financial InstitutionsStory of Bigdata and its Applications in Financial Institutions
Story of Bigdata and its Applications in Financial Institutions
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transfer
 
Data Science
Data ScienceData Science
Data Science
 
Data set module 1
Data set   module 1Data set   module 1
Data set module 1
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
 
Big Data technology
Big Data technologyBig Data technology
Big Data technology
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAA REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
 
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
Wearable Technology Orientation using Big Data Analytics for Improving Qualit...
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Big Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research ActivityBig Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research Activity
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
 

Similar to Big Data: Review, Classification and Analysis Survey

CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSISCASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
IRJET Journal
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
IRJET Journal
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
PreethaSuresh2
 
Encroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data TechnologyEncroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data Technology
MangaiK4
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
eSAT Journals
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate Environment
IRJET Journal
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
IRJET Journal
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
IRJET Journal
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
stilliegeorgiana
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
Sitamarhi Institute of Technology
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
Sitamarhi Institute of Technology
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
saranya270513
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
aditi276464
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
ijsrd.com
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
JOSEPH FRANCIS
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
ijistjournal
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
jadhavpravin920
 

Similar to Big Data: Review, Classification and Analysis Survey (20)

CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSISCASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Encroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data TechnologyEncroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data Technology
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate Environment
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
IOT DATA AND BIG DATA
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Big Data: Review, Classification and Analysis Survey

  • 1. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 17 Big Data: Review, Classification and Analysis Survey K.Arun Dr.L.Jabasheela Department of Computer Applications, Department of Computer Applications, Jeppiaar Engineering College, Panimalar Engineering College, Chennai, India. Chennai, India. Abstract— World Wide Web plays an important role in providing various knowledge sources to the world, which helps many applications to provide quality service to the consumers. As the years go on the web is overloaded with lot of information and it becomes very hard to extract the relevant information from the web. This gives way to the evolution of the Big Data and the volume of the data keeps increasing rapidly day by day. Data mining techniques are used to find the hidden information from the big data. In this paper we focus on the review of Big Data, its data classification methods and the way it can be mined using various mining methods. Keywords-Big Data,Data Mining,Data Classificaion,Mining Techniques I. INTRODUCTION The concept of big data has been endemic within computer science since the earliest days of computing. “Big Data” originally meant the volume of data that could not be processed by traditional database methods and tools. Each time a new storage medium was invented, the amount of data aaccessible exploded because it could be easily accessed. The original definition focused on structured data, but most researchers and practitioners have come to realize that most of the world’s information resides in massive, unstructured information, largely in the form of text and imagery. The explosion of data has not been accompanied by a corresponding new storage medium. The structure of this paper is as follows: Section 2 is about Big Data, Section 3 Big Data Characteristics, Section 4 Architecture and Classification, Sections 5, 6, and 7 discuss on Big Data Analytics, Open Source Revolution, and Mining Techniques for Big Data, and finally Section 8 concludes the paper. II. BIG DATA Big Data is a new term assigned to the datasets which appear large in size; we cannot manage them with the traditional data mining techniques and software tools available. “Big Data “appears as a concrete large size dataset which hides any information in its massive volume, which cannot be explored without using new algorithms or data mining techniques. III. BIG DATA CHARACTERISTICS We have all heard of the 3Vs of big data which are Volume, Variety and Velocity, yet other Vs that IT, business and data scientists need to be concerned with, most notably big data Veracity.  Data Volume: Data volume measures the amount of data available to an organization, which does not necessarily have to own all of it as long as it can access it. As data volume increases, the value of different data records will decrease in proportion to age, type, richness, and quantity among other factors.  Data Variety: Data variety is a measure of the richness of the data representation – text, images video, audio, etc. From an analytic perspective, it is probably the biggest obstacle to effectively using large volumes of data. Incompatible data formats, non-aligned data structures, and inconsistent data semantics represents significant challenges that can lead to analytic sprawl.  Data Velocity: Data velocity measures the speed of data creation, streaming, and aggregation. Ecommerce has rapidly increased the speed and richness of data used for different business transactions (for example, web-site clicks). Data velocity management is much more than a bandwidth issue; it is also an ingest issue.  Data Veracity: Data veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Veracity in data analysis is the biggest challenge when compares to things like volume and velocity. IV. BIG DATA ARCHITECTURE AND CLASSIFICATION This "Big data architecture and patterns" series presents a structured and pattern-based approach to simplify the task of defining an overall big data architecture [8].
  • 2. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 18 Fig 1: Big Data Architecture Because it is important to assess whether a business scenario is a big data problem, we include pointers to help determine which business problems are good candidates for big data solutions. TABLE 1: Big Data Business Problem by type Business problem Big data type Description Utilities: Predict power consumption Machine- generated data Utility companies have rolled out smart meters to measure the consumption of water, gas, and electricity at regular intervals of one hour or less. These smart meters generate huge volumes of interval data that needs to be analyzed. Utilities also run big, expensive, and complicated systems to generate power. Each grid includes sophisticated sensors that monitor voltage, current, frequency, and? other important operating characteristics. Telecommunications: Customer churn analytics Web and social data Transaction data Telecommunications operators need to build detailed customer churn models that include social media and transaction data, such as CDRs, to keep up with the competition. The value of the churn models depends on the quality of customer attributes (customer master data such as date of birth, gender, location, and income) and the social behaviour of customers. Telecommunications providers who implement a predictive analytics strategy can manage and predict churn by analyzing the calling patterns of subscribers. Marketing: Sentiment analysis Web and social data Marketing departments use Twitter feeds to conduct sentiment analysis to determine what users are saying about the company and its products or services, especially after a new product or release is launched. Customer sentiment must be integrated with customer profile data to derive meaningful results. Customer feedback may vary according to customer demographics. Customer service: Call monitoring Human-generated IT departments are turning to big data solutions to analyze application logs to gain insight that can improve system performance. Log files from various application vendors are in different formats; they must be standardized before IT departments can use them. Retail: Personalized messaging based on facial recognition and social media Web and social data Biometrics Retailers can use facial recognition technology in combination with a photo from social media to make personalized offers to customers based on buying behaviour and location. This capability could have a tremendous impact on retailers? Loyalty programs, but it has serious privacy ramifications. Retailers would need to make the appropriate privacy disclosures before implementing these applications. Retail and marketing: Mobile data and location-based targeting Machine- generated data Transaction data Retailers can target customers with specific promotions and coupons based location data. Solutions are typically designed to detect a user's location upon entry to a store or through GPS. Location data combined with customer preference data from social networks enable retailers to target online and in-store marketing campaigns based on buying history. Notifications are delivered through mobile applications, SMS, and email. a. From classifying big data to choosing a big data solution If we spent any time investigating big data solutions, you know it's no simple task. This series takes you through the major steps involved in finding the big data solution that meets your needs. We begin by looking at types of data described by the term "big data." To simplify the complexity of big data types, we classify big data according to various
  • 3. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 19 parameters and provide a logical architecture for the layers and high-level components involved in any big data solution. Next, we propose a structure for classifying big data business problems by defining atomic and composite classification patterns. These patterns help determine the appropriate solution pattern to apply. We include sample business problems from various industries. And finally, for every component and pattern, we present the products that offer the relevant function. b. Classifying business problems according to big data type Business problems can be categorized into types of big data problems. Down the road, we'll use this type to determine the appropriate classification pattern (atomic or composite) and the appropriate big data solution. But the first step is to map the business problem to its big data type.Table1 lists common business problems and assigns a big data type to each. Categorizing big data problems by type makes it simpler to see the characteristics of each kind of data. These characteristics can help us understand how the data is acquired, how it is processed into the appropriate format, and how frequently new data becomes available. Data from different sources has different characteristics; for example, social media data can have video, images, and unstructured text such as blog posts, coming in continuously. c. Using big data type to classify big data characteristics It's helpful to look at the characteristics of the big data along certain lines — for example, figure 2 shows how the data is collected, analyzed, and processed. Once the data is classified, it can be matched with the appropriate big data pattern: Fig 2 : Big Data Classification Analysis type — whether the data is analyzed in real time or batched for later analysis. Give careful consideration to choosing the analysis type, since it affects several other decisions about products, tools, hardware, data sources, and expected data frequency. A mix of both types may be required by the use case: Fraud detection; analysis must be done in real time or near real time. Trend analysis for strategic business decisions; analysis can be in batch mode. Processing methodology — the type of technique to be applied for processing data (e.g., predictive, analytical, ad-hoc query, and reporting). Business requirements determine the appropriate processing methodology. A combination of techniques can be used. The choice of processing methodology helps identify the appropriate tools and techniques to be used in your big data solution. Data frequency and size — how much data is expected and at what frequency does it arrive. Knowing frequency and size helps determine the storage mechanism, storage format, and the necessary pre-processing tools. Data frequency and size depend on data sources: On demand, as with social media data, Continuous feed, real-time (weather data, transactional data) Time series (time-based data)
  • 4. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 20 Data type — Type of data to be processed — transactional, historical, master data, and others. Knowing the data type helps segregate the data in storage. Content format — Format of incoming data — structured (RDMBS, for example), unstructured (audio, video, and images, for example), or semi-structured. Format determines how the incoming data needs to be processed and is key to choosing tools and techniques and defining a solution from a business perspective. Data source — Sources of data (where the data is generated) — web and social media, machine-generated, human- generated, etc. Identifying all the data sources helps determine the scope from a business perspective. The figure shows the most widely used data sources. Data consumers — A list of all of the possible consumers of the processed data: Business processes Business users Enterprise applications Individual people in various business roles Part of the process flows Other data repositories or enterprise applications Hardware — the type of hardware on which the big data solution will be implemented — commodity hardware or state f the art. Understanding the limitations of hardware helps inform the choice of big data solution. V. BIG DATA ANALYTICS Big data analytics refers to the process of collecting, organizing and analyzing large sets of data ("big data") to discover patterns and other useful information. Not only will big data analytics help you to understand the information contained within the data, but it will also help identify the data that is most important to the business and future business decisions. Big data analysts basically want the knowledge that comes from analyzing the data. a. The Benefits of Big Data Analytics Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. b. The Challenges of Big Data Analytics For most organizations, big data analysis is a challenge. Consider the sheer volume of data and the many different formats of the data (both structured and unstructured data) collected across the entire organization and the many different ways different types of data can be combined, contrasted and analyzed to find patterns and other useful information. The first challenge is in breaking down data silos to access all data an organization stores in different places and often in different systems. A second big data challenge is in creating platforms that can pull in unstructured data as easily as structured data. This massive volume of data is typically so large that it's difficult to process using traditional database and software methods. c. Big Data Requires High-Performance Analytics To analyze such a large volume of data, big data analytics is typically performed using specialized software tools and applications for predictive analytics, data mining, text mining, and forecasting and data optimization. Collectively these processes are separate but highly integrated functions of high-performance analytics. Using big data tools and software enables an organization to process extremely large volumes of data that a business has collected to determine which data is relevant and can be analyzed to drive better business decisions in the future. d. Examples of How Big Data Analytics is Used Today As technology to break down data silos and analyze data improves, business can be transformed in all sorts of ways. Big Data allow researchers to decode human DNA in minutes, predict where terrorists plan to attack, determine which gene is mostly likely to be responsible for certain diseases and, of course, which ads you are most likely to respond to on Face book. The business cases for leveraging Big Data are compelling. For instance, Netflix mined its subscriber data to put the essential ingredients together for its recent hit House of Cards, and subscriber data also prompted the company to bring Arrested Development back from the dead. Another example comes from one of the biggest mobile carriers in the world. France's Orange launched its Data for Development project by releasing subscriber data for customers in the Ivory Coast. The 2.5 billion records, which were made anonymous, included details on calls and text messages exchanged between 5 million users. Researchers accessed the data and sent Orange proposals for how the data could serve as the foundation for development projects to improve
  • 5. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 21 public health and safety. Proposed projects included one that showed how to improve public safety by tracking cell phone data to map where people went after emergencies; another showed how to use cellular data for disease containment. VI. TOOLS : OPEN SOURCE REVOLUTION Apache Hadoop [3]: software for data-intensive distributed applications, based in the MapReduce programming model and a distributed file system called Hadoop Distributed Filesystem (HDFS). Hadoop allows writing applications that rapidly process large amounts of data in parallel on large clusters of compute nodes. A MapReduce job divides the input dataset into independent subsets that are processed by map tasks in parallel. This step of mapping is then followed by a step of reducing tasks. These reduce tasks use the output of the maps to obtain the final result of the job. Apache Pig [6]: software for analyzing large data sets that consists of a high-level language similar to SQL for expressing data analysis programs, coupled with infrastructure for evaluating these rograms. It contains a compiler that produces sequences of Map- Reduce programs. Cascading [10]: software abstraction layer for Hadoop, intended to hide the underlying complexity of MapReduce jobs. Cascading allows users to create and execute data processing workflows on Hadoop clusters using any JVM-based language. Scribe [11]: server software developed by Facebook and released in 2008. It is intended for aggregating log data streamed in real time from a large number of servers. Apache HBase [4]: non-relational columnar distributed database designed to run on top of Hadoop Distributed Filesystem (HDFS). It is written in Java and modeled after Google’s BigTable. HBase is an example if a NoSQL data store. Apache Cassandra [2]: another open source distributed database management system developed by Facebook. Cassandra is used by Netflix, which uses Cassandra as the back-end database for its streaming services. Apache S4 [15]: platform for processing continuous data streams. S4 is designed specifically for managing data streams. S4 apps are designed combining streams and processing elements in real time. In Big Data Mining, there are many open source initiatives. The most popular are the following: – Apache Mahout [5]: Scalable machine learning and data mining open source software based mainly in Hadoop. It has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining. MOA [9]: Stream data mining open source software to perform data mining in real time. It has implementations of classification, regression, clustering and frequent item set mining and frequent graph mining. It started as a project of the Machine Learning group of University of Waikato, New Zealand, famous for the WEKA software. The streams framework [12] provides an environment for defining and running stream processes using simple XML based definitions and is able to use MOA. – R [16]: open source programming language and software environment designed for statistical computing and visualization. R was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand beginning in 1993 and is used for statistical analysis of very large data sets. Vowpal Wabbit [13]: open source project started at Yahoo! Research and continuing at Microsoft Research to design a fast, scalable, useful learning algorithm. VW is able to learn from terafeature datasets. It can exceed the throughput of any single machine network interface when doing linear learning, via parallel learning. – PEGASUS [12]: big graph mining system built on top of MAPREDUCE. It allows to find patterns and anomalies in massive real-world graphs. – GraphLab [14]: high-level graph-parallel system built without using MAPREDUCE. GraphLab computes over dependent records which are stored as vertices in a large distributed data-graph. Algorithms in GraphLab are expressed as vertex-programs which are executed in parallel on each vertex and can interact with neighboring vertices. VII. MINING TECHINQUES FOR BIG DATA There are many different types of analysis that can be done in order to retrieve information from big data. Each type of analysis will have a different impact or result. Which type of data mining technique you should use really depends on the type of business problem that you are trying to solve. Different analyses will deliver different outcomes and thus provide
  • 6. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 22 different insights. One of the common ways to recover valuable insights is via the process of data mining. Data mining is a buzzword that often is used to describe the entire range of big data analytics, including collection, extraction, analysis and statistics. This however, is too broad as data mining especially refers to the discovery of previously unknown interesting patterns, unusual records or dependencies. When developing your big data strategy it is important to have a clear understanding of what data mining is and how it can help you. i. Anomaly or Outlier detection Anomaly detection refers to the search for data items in a dataset that do not match a projected pattern or expected behaviour. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and actionable information. An outlier is an object that deviates significantly from the general average within a dataset or a combination of data. It is numerically distant from the rest of the data and therefore, the outlier indicates that something is out of the ordinary and requires additional analysis. Anomaly detection is used to detect fraud or risks within critical systems and they have all the characteristics to be of interest to an analyst, who can further analyse the anomalies to find out what’s really going on. It can help find extraordinary occurrences that could indicate fraudulent actions, flawed procedures or areas where a certain theory is invalid. Important to note is that in large datasets, a small amount of outliers is common. Outliers may indicate bad data but may also be due to random variation or may indicate something scientifically interesting. In all cases, additional research is required. ii. Association rule learning Association rule learning enables the discovery of interesting relations (interdependencies) between different variables in large databases. Association rule learning uncovers hidden patterns in the data that can be used to identify variables within the data and the co-occurrences of different variables that appear with the greatest frequencies. Association rule learning is often used in the retail industry when finding patterns in point-of-sales data. These patterns can be used when recommending new products to others based on what others have bought before or based on which products are bought together. If this is done correctly, it can help organisations increase their conversion rate. A well- known example is that thanks to data mining, Walmart, already in 2004, discovered that Strawberry Pop-tarts sales increase by seven times prior to a hurricane. Since this discovery, Walmart places the Strawberry Pop-Tarts at the checkouts prior to a hurricane. iii. Clustering analysis Clustering analysis is the process of identifying data sets that are similar to each other to understand the differences as well as the similarities within the data. Clusters have certain traits in common that can be used to improve targeting algorithms. For example, clusters of customers with similar buying behaviour can be targeted with similar products and services in order to increase the conversation rate. A result from a clustering analysis can be the creation of personas. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behaviour set that might use a site, brand or product in a similar way. The programming language R has large variety of functions to perform relevant cluster analysis and is therefore especially relevant for performing a clustering analysis. iv. Classification analysis Classification Analysis is a systematic process for obtaining important and relevant information about data, and metadata – data about data. The classification analysis helps identifying to which of a set of categories different types of data belong. Classification analysis is closely linked to cluster analysis as the classification can be used to cluster data. Your email provider performs a well-known example of classification analysis: they use algorithms that are capable of classifying your email as legitimate or mark it as spam. This is done based on data that is linked with the email or the information that is in the email, for example certain words or attachments that indicate spam. v. Regression analysis Regression analysis tries to define the dependency between variables. It assumes a one-way causal effect from one variable to the response of another variable. Independent variables can be affected by each other but it does not mean that this dependency is both ways as is the case with correlation analysis. A regression analysis can show that one variable is dependent on another but not vice-versa. Regression analysis is used to determine different levels of customer satisfactions and how they affect customer loyalty and how service levels can be affected by for example the weather. A more concrete example is that a regression analysis
  • 7. International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014) ISSN: 2349-7009(P) www.ijiris.com _________________________________________________________________________________________________ © 2014, IJIRIS- All Rights Reserved Page - 23 can help you find the love of your live on an online dating website. The website eHarmony uses a regression model that matches two individual singles based on 29 variables to find the best partner. Data mining can help organisations and scientists to find and select the most important and relevant information. This information can be used to create models that can help make predictions how people or systems will behave so you can anticipate on it. The more data you have the better the models will become that you can create using the data mining techniques, resulting in more business value for your organisation. VIII. CONCLUSION This paper describes about the advent of Big Data, Architecture and Characteristics. Here we discussed about the classifications of Big Data to the business needs and how for it will help us in decision making in the business environment. Our future work focuses on the analysis part of the big data classification by implementing a different data mining techniques in it. REFERENCE [1] http://www.pro.techtarget.com [2] Apache Cassandra, http://cassandra. apache.org. [3] Apache Hadoop, http://hadoop.apache.org. [4] Apache HBase, http://hbase.apache.org. [5] Apache Mahout, http://mahout.apache.org. [6] Apache Pig, http://www.pig.apache.org/. [7] http://www.webopedia.com/ [8] http://www.ibm.com/library/ [9] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer.MOA: Massive Online Analysis http://moa.cms.waikato.ac.nz/. Journal of Machine Learning Research (JMLR), 2010. [10] Cascading, http://www.cascading.org/. [11] Facebook Scribe, https://github.com/ facebook/scribe. [12] U. Kang, D. H. Chau, and C. Faloutsos. PEGASUS:Mining Billion-Scale Graphs in the Cloud. 2012. [13] J. Langford. Vowpal Wabbit, http://hunch.net/˜vw/,2011. [14] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson,C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010. [15] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4:Distributed Stream Computing Platform. In ICDM Workshops, pages 170–177, 2010. [16] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.