SlideShare a Scribd company logo
POORNIMA UNIVERSITY
Submitted by:
Nitesh Saxena
M.TECH(CE)
SEMINAR
REPRESENTATION ON :
BIG DATA ANALYTICS
Submitted to:
Ass. Prof: Nidhi Mishra
CONTENT
1. Introduction
2. List of papers
3. Review process adopted
4. List of issues
5. List of solution approaches
6. Issue wise review
7. Strengths and Weaknesses
8. Scope of our work
9. Conclusion
10.References
INTRODUCTION

 Human beings now create 2.5 quintillion bytes of data per
day. The rate of data creation has increased so much that 90%
of the data in the world today has been created in the last two
years alone.
 The term Big Data refers to large scale information
management and analysis technologies that exceed the
capability of traditional data processing technologies.
 The incorporation of Big Data is changing Business Intelligence
and Analytics by providing new tools and opportunities for
leveraging large quantities of structured and unstructured
data.
 Big data analysis-Efficient and effective handling of large data

LIST OF PAPERS
1)“Mobile Agent based New Framework for Improving Big Data
Analysis” .(2013)
2)“pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time
Big Data”.(2013)4)“5Ws model for big data analysis and
visualization”(2013)
3) “IOT-StatisticDB: A General Statistical Database Cluster
Mechanism for Big Data Analysis in the Internet of Things”.(2013)
4)“Road Traffic Big Data Collision Analysis Processing
Framework”.(2012)
5)“ RUBA: Real-time Unstructured Big Data Analysis
Framework”(2013)
6)“An Integrated Framework for Disaster Event Analysis in Big Data
Environment”(2013)
7)“Large Imbalance Data Classification Based on MapReduce
for Traffic Accident Prediction”.(2014)
8)Addressing Big Data Problem Using Hadoop and
Map Reduce”.(2012)
9)“Big R: Large-scale Analytics on Hadoop using R”. (2014)
10)“High Performance and Fault Tolerant Distributed File
System for Big Data Storage and Processing using
Hadoop”. (2014)
11)“Big Data Analysis Using Apache Hadoop”(2012)
12)”5Ws model for Big Data Analysis and Visualization”(2013)
13)” IRIS recognition on hadoop:a biometrics system
implementation on cloudcomputing”(2012)
14)“log analysis in cloud computing environment with hadoop
and spark”.(2011)
15) “Minimizing Big Data Problems using Cloud Computing Based
on Hadoop Architecture”.(2012)
16)“Big R: Large-scale Analytics on Hadoop using R”. (2014)
17)“Access Security on Cloud Computing Implemented in
Hadoop System”. (2012)
18)“Big Data Analysis Using Apache Hadoop”(2012)
19)” Applying Hadoop’s MapReduce Framework on
Clustering the GPS Signals through Cloud Computing”(2011)
20)” IRIS recognition on hadoop:a biometrics system
implementation on cloudcomputing”(2012)
21)“Mass Log Data Processing and Mining Based on
Hadoop and Cloud Computing”.(2011)
22) “H2T: A simple Hadoop-to-Twister Translator for Cloud
Computing”.(2012)
23)“An In-depth Study of Map Reduce in Cloud Environment”.
(2014)
24)“Optimizing Multiway Joins in a Map-Reduce Environment”.
(2012)
25)“Comparing Map-Reduce and FREERIDE for Data-Intensive
Applications”(2013)
Review process adopted
• There are basically 5 stages for review
process:
1. Stage 0
2. Stage 1
3. Stage 2
4. Stage 3
5. Stage 3+
 Stage 0 – “Get a feel”
In this stage, we collect the data from environment.
-Conference research papers
 Stage 1 – “Get the picture”
We describe a picture of our research from collected data.
 Stage 2- “Get the detail”
Define all information about research topic such as title,
issue, solution approach from collected data and find out
that what we are looking for and where to find it?
 Stage 3- “Evaluate the detail”
Here we defined the solution approach in detail such as
algorithm, methodology, mathematical explanation,
assumptions.
 Stage 3+ - “Synthesize”
There are we synthesize our review, its topic, issue, solution
approach, mathematical explanation of solution approach,
type of research and find out the alternative approaches.
LIST OF ISSUES
 These papers present different issues, which are listed as
below :
Paper no. Issues
1,2,3,12,13,
14,15,16
Big data analysis
4,6,7,17,18,
19,20
Real time big data analysis using hadoop in
cloud computing
5,8,10,11,
21,22,23,24,
25
Classification of big data using Tools and
Frameworks
LIST OF SOLUTION APPROACHES
Paper
No.
Issues Solution
1,2,3,12
,13,
14,15,
16
Big data
analysis
1)-MapReduce Agent Mobility (MRAM) used to
overcome the drawbacks of Hadoop.
2)-A new plug-in system PuntStore with pLSM (Punt
Log Structured Merge Tree) improve the read and
write throughput in NoSql database.
COLA(Cache Oblivious Look-ahead Array ) was also
used for efficiently insertion and range queries.
3)-“IOT-StatisticDB”- Statistical Database Cluster
Mechanism
Can support complicated statistical queries through
PostgreSQL8.2.4
12)-a 5Ws model to analyze the big data attributes and
patterns and densities between data.
Paper no Issue Solutions
4,6,7,17,18,19,20
Real time big data
analysis using
hadoop in cloud
computing
4)-Road Traffic Big Data Collision
Analysis Processing Frame work
proposed the distributed CEP which
dynamically distributed event
processing load in road traffic event
6)-An integrated framework using Co-
occurring Theory and Markov chain
approach to find out probabilities
7)-Hadoop framework and sampling
method for removing the imbalance
in data.
LIST OF SOLUTION APPROACHES
Paper no Issue Solutions
5,8,9,10,11,
21,22,23,24,
25
Classification of
big data using
Tools and
Frameworks
Hadoop Distributed File System
(HDFS), Hadoop cluster,
Map Reduce programming
framework
Visual clustering analysis
RUBA Unstructured Big data
Analysis framework
Apache Hadoop
LIST OF SOLUTION APPROCHES
Issue-Wise Findings :-
Issue 1 :- Big Data Analysis
• Worked to improve big data analysis and overcome the drawbacks of
Hadoop.
• Designed and developed the MapReduce Agent Mobility (MRAM) which is
based on the Java Agent Development Framework (JADE).
• Discussed few research works on big data analysis by using Hadoop and
stated the drawbacks of Hadoop on its performance and reliability
against big data analysis.
• Designed and developed a new plug-in system PuntStore with pLSM (Punt
Log Structured Merge Tree) index engine to provide scalable and efficient
index services for real-time data analysis.
• The Punt LSM (pLSM) can satisfy the needs for performing index probes
in write optimized systems.
 Issue 2 :- Real time big data analysis using hadoop in cloud computing
• Worked to solve the Road traffic collision problem for big data analyzing
and processing
• Tested the proposed framework on road traffic data on a 45-mile section
of I-880N freeway CA, USA. By integrating freeway traffic big data and
collision data over a ten year period (1TB Size), and obtained the collision
probability.
• Worked for Real-time analysis and dynamic modification in unstructured
big data analysis
• the insufficient number of compute nodes as number of map tasks
increases with growing dataset size.
• Hadoop makes the users program the distributed software easily even
they know nothing about the bottom circumstances..
• A Markov chain with transition probabilities applied to the random
variables of cubes and result was taken to find the probability of disaster
events.
 Issue 3 :- Classification of big data using Tools and
Frameworks
• Worked to investigate the database kernel level, parallel statistical analysis
techniques for massive sensor sampling data in the Internet of Things.
• The General Statistical Database Cluster Mechanism for Big Data Analysis
in the Internet of Things (“IOT-StatisticDB”)on sensor sampling data is one
of the most important procedures in IoT systems to transform “data” into
“knowledge”.
• Designed and developed a 5Ws model to analyze the big data attributes
and patterns and densities Between data.
• Hadoop Distributed File System (HDFS), Hadoop cluster.
• Map Reduce programming framework.
STRENGTH
• Solve the problem of centralized master node if it fails and fault tolerance
of the system in hadoop
• Increase the performance by MRAM to analyze the data comparing to
Hadoop
• Replace the MySql by NoSql by increasing the read and write throughput
and making searching, inserting and deletion easily in database.
• Provide parallel statistical analysis techniques for massive sensor sampling
data in the Internet of Things.
• Solve the problem of sampling the sensor data in parallel and distributed
system.
• Provide the information about the big data pattern and visualization by
using the 5Ws model.
• Can find out about the attackers location or ip addresses using 5Ws model
and its application.
• Many kinds of real time big data analysis can be done using hadoop
clustering techniques.
• Hadoop and HBase techniques can be used for analysis of real time road
traffic collision data.
• CEP analysis can be used to analyze an unstructured big data like CCTV
data and process it in distributed system.
• One can obtain the information about the current situation for the
disaster event.
WEAKNESSES
• Event analysis methods can not be applied for faster and
reliable insight information of real time data.
• Working of MRAM based on the Java Agent Development
Framework (JADE) so to develop it ,is more complex for
anyone.
• pLSM NoSql requires more space and memory size to
implement its work.
• Its uneasy to apply statistical analyzing methods on the
unstructured data in parallel and distributed environment.
• Providing useful traffic data form loop detectors is quite tough
work .
SCOPE OF OUR WORK
• Further work can be done on the Hadoop techniques as
MapReduce, HDFS, HBase environment to process the
distributed data by using MRAM framework.
• We can apply the RUBA framework to fields of U-city, U-plant
and ITS.
• In future we can use the 5WS model by deploying the
densities classification in more areas and more data sets and
use of Gapminder’s visualization techniques.
• We can improve the current disaster event analysis methods
for faster and reliable insight information
 Future work will focus on performance evaluation and
modeling of hadoop data-intensive applications on cloud
platforms like Amazon Elastic Compute Cloud (EC2).
Conclusion
We have elaborated review of 25 research papers ranging from
2011 to 2014 based on Big Data Analysis. The review process
consists of 3 stage analysis. Basically we found three main
issues in the field of Big Data viz Big data analysis tools,
Classification of big data using Tools and Frameworks and
Real Time Big Data Analysis.
Here after finding the solution approaches we concluded that
Big Data Analysis is the main area into which the future work
can be done. We found many Solution approaches out of
which MapReduce Agent Mobility (MRAM), PuntStore with
pLSM (Punt Log Structured Merge Tree), “IOT-StatisticDB”-
Statistical Database Cluster Mechanism & Visual clustering
analysis are most promising due to its advantages &
properties.
References
1) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent
based New Framework for Improving Big Data Analysis” 978-1-4799-
2829-3/13 $26.00 © 2013 IEEE
2) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient
LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th
Annual Computer Software and Applications Conference
3) Jinson Zhang “5Ws model for big data analysis and visualization”2013
IEEE 16th International Conference on Computational Science and
Engineering
4) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General
Statistical Database Cluster Mechanism forBig Data Analysis in the
Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE
5) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big
Data Collision Analysis Processing Framework”(2013)
6 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster
Event Analysis in Big Data Environments” 2013 Ninth International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing
7) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time
Unstructured Big Data Analysis Framework “ 978-1-4799-0698-
7/13/$31.00 ©2013 IEEE
8) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent
based New Framework for Improving Big Data Analysis” 978-1-4799-
2829-3/13 $26.00 © 2013 IEEE
9) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient
LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th
Annual Computer Software and Applications Conference
10) Jinson Zhang “5Ws model for big data analysis and visualization”2013
IEEE 16th International Conference on Computational Science and
Engineering
11) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General
Statistical Database Cluster Mechanism forBig Data Analysis in the
Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE
12) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big
13 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster
Event Analysis in Big Data Environments” 2013 Ninth International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing
14) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time
Unstructured Big Data Analysis Framework “ 978-1-4799-0698-
7/13/$31.00 ©2013 IEEE
15) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent
based New Framework for Improving Big Data Analysis” 978-1-4799-
2829-3/13 $26.00 © 2013 IEEE
16) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient
LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th
Annual Computer Software and Applications Conference
17) Jinson Zhang “5Ws model for big data analysis and visualization”2013
IEEE 16th International Conference on Computational Science and
Engineering
18) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General
Statistical Database Cluster Mechanism forBig Data Analysis in the
Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE
19) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big
20 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster
Event Analysis in Big Data Environments” 2013 Ninth International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing
21) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time
Unstructured Big Data Analysis Framework “ 978-1-4799-0698-
7/13/$31.00 ©2013 IEEE
22) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent
based New Framework for Improving Big Data Analysis” 978-1-4799-
2829-3/13 $26.00 © 2013 IEEE
23) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient
LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th
Annual Computer Software and Applications Conference
24) Jinson Zhang “5Ws model for big data analysis and visualization”2013
IEEE 16th International Conference on Computational Science and
Engineering
25) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General
Statistical Database Cluster Mechanism forBig Data Analysis in the
Big data analytics

More Related Content

What's hot

Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
Robert Grossman
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
bigdataviz_bay
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
Robert Grossman
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
Geoffrey Fox
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
Ganesan Narayanasamy
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
Robert Grossman
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
Larry Smarr
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
bigdataviz_bay
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
bigdataviz_bay
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Tal Lavian Ph.D.
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
Vijay Srinivas Agneeswaran, Ph.D
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
Vaibhav Dhattarwal
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Robert Grossman
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 

What's hot (17)

Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 

Similar to Big data analytics

High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
B1803031217
B1803031217B1803031217
B1803031217
IOSR Journals
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
siliconsudipt
 
Big data mining
Big data miningBig data mining
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
Big data business case
Big data   business caseBig data   business case
Big data business case
Karthik Padmanabhan ( MLE℠)
 
FR.pptx
FR.pptxFR.pptx
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
Geoffrey Fox
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
HPCC Systems
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
IJSTA
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
ijdms
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
Abhi Jit
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
sethuraman R
 

Similar to Big data analytics (20)

High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
B1803031217
B1803031217B1803031217
B1803031217
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Big data mining
Big data miningBig data mining
Big data mining
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
FR.pptx
FR.pptxFR.pptx
FR.pptx
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
 

Big data analytics

  • 1. POORNIMA UNIVERSITY Submitted by: Nitesh Saxena M.TECH(CE) SEMINAR REPRESENTATION ON : BIG DATA ANALYTICS Submitted to: Ass. Prof: Nidhi Mishra
  • 2. CONTENT 1. Introduction 2. List of papers 3. Review process adopted 4. List of issues 5. List of solution approaches 6. Issue wise review 7. Strengths and Weaknesses 8. Scope of our work 9. Conclusion 10.References
  • 3. INTRODUCTION   Human beings now create 2.5 quintillion bytes of data per day. The rate of data creation has increased so much that 90% of the data in the world today has been created in the last two years alone.  The term Big Data refers to large scale information management and analysis technologies that exceed the capability of traditional data processing technologies.  The incorporation of Big Data is changing Business Intelligence and Analytics by providing new tools and opportunities for leveraging large quantities of structured and unstructured data.  Big data analysis-Efficient and effective handling of large data 
  • 4. LIST OF PAPERS 1)“Mobile Agent based New Framework for Improving Big Data Analysis” .(2013) 2)“pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data”.(2013)4)“5Ws model for big data analysis and visualization”(2013) 3) “IOT-StatisticDB: A General Statistical Database Cluster Mechanism for Big Data Analysis in the Internet of Things”.(2013) 4)“Road Traffic Big Data Collision Analysis Processing Framework”.(2012) 5)“ RUBA: Real-time Unstructured Big Data Analysis Framework”(2013) 6)“An Integrated Framework for Disaster Event Analysis in Big Data Environment”(2013)
  • 5. 7)“Large Imbalance Data Classification Based on MapReduce for Traffic Accident Prediction”.(2014) 8)Addressing Big Data Problem Using Hadoop and Map Reduce”.(2012) 9)“Big R: Large-scale Analytics on Hadoop using R”. (2014) 10)“High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop”. (2014) 11)“Big Data Analysis Using Apache Hadoop”(2012) 12)”5Ws model for Big Data Analysis and Visualization”(2013) 13)” IRIS recognition on hadoop:a biometrics system implementation on cloudcomputing”(2012)
  • 6. 14)“log analysis in cloud computing environment with hadoop and spark”.(2011) 15) “Minimizing Big Data Problems using Cloud Computing Based on Hadoop Architecture”.(2012) 16)“Big R: Large-scale Analytics on Hadoop using R”. (2014) 17)“Access Security on Cloud Computing Implemented in Hadoop System”. (2012) 18)“Big Data Analysis Using Apache Hadoop”(2012) 19)” Applying Hadoop’s MapReduce Framework on Clustering the GPS Signals through Cloud Computing”(2011) 20)” IRIS recognition on hadoop:a biometrics system implementation on cloudcomputing”(2012)
  • 7. 21)“Mass Log Data Processing and Mining Based on Hadoop and Cloud Computing”.(2011) 22) “H2T: A simple Hadoop-to-Twister Translator for Cloud Computing”.(2012) 23)“An In-depth Study of Map Reduce in Cloud Environment”. (2014) 24)“Optimizing Multiway Joins in a Map-Reduce Environment”. (2012) 25)“Comparing Map-Reduce and FREERIDE for Data-Intensive Applications”(2013)
  • 8. Review process adopted • There are basically 5 stages for review process: 1. Stage 0 2. Stage 1 3. Stage 2 4. Stage 3 5. Stage 3+
  • 9.  Stage 0 – “Get a feel” In this stage, we collect the data from environment. -Conference research papers  Stage 1 – “Get the picture” We describe a picture of our research from collected data.
  • 10.  Stage 2- “Get the detail” Define all information about research topic such as title, issue, solution approach from collected data and find out that what we are looking for and where to find it?  Stage 3- “Evaluate the detail” Here we defined the solution approach in detail such as algorithm, methodology, mathematical explanation, assumptions.
  • 11.  Stage 3+ - “Synthesize” There are we synthesize our review, its topic, issue, solution approach, mathematical explanation of solution approach, type of research and find out the alternative approaches.
  • 12. LIST OF ISSUES  These papers present different issues, which are listed as below : Paper no. Issues 1,2,3,12,13, 14,15,16 Big data analysis 4,6,7,17,18, 19,20 Real time big data analysis using hadoop in cloud computing 5,8,10,11, 21,22,23,24, 25 Classification of big data using Tools and Frameworks
  • 13. LIST OF SOLUTION APPROACHES Paper No. Issues Solution 1,2,3,12 ,13, 14,15, 16 Big data analysis 1)-MapReduce Agent Mobility (MRAM) used to overcome the drawbacks of Hadoop. 2)-A new plug-in system PuntStore with pLSM (Punt Log Structured Merge Tree) improve the read and write throughput in NoSql database. COLA(Cache Oblivious Look-ahead Array ) was also used for efficiently insertion and range queries. 3)-“IOT-StatisticDB”- Statistical Database Cluster Mechanism Can support complicated statistical queries through PostgreSQL8.2.4 12)-a 5Ws model to analyze the big data attributes and patterns and densities between data.
  • 14. Paper no Issue Solutions 4,6,7,17,18,19,20 Real time big data analysis using hadoop in cloud computing 4)-Road Traffic Big Data Collision Analysis Processing Frame work proposed the distributed CEP which dynamically distributed event processing load in road traffic event 6)-An integrated framework using Co- occurring Theory and Markov chain approach to find out probabilities 7)-Hadoop framework and sampling method for removing the imbalance in data. LIST OF SOLUTION APPROACHES
  • 15. Paper no Issue Solutions 5,8,9,10,11, 21,22,23,24, 25 Classification of big data using Tools and Frameworks Hadoop Distributed File System (HDFS), Hadoop cluster, Map Reduce programming framework Visual clustering analysis RUBA Unstructured Big data Analysis framework Apache Hadoop LIST OF SOLUTION APPROCHES
  • 16. Issue-Wise Findings :- Issue 1 :- Big Data Analysis • Worked to improve big data analysis and overcome the drawbacks of Hadoop. • Designed and developed the MapReduce Agent Mobility (MRAM) which is based on the Java Agent Development Framework (JADE). • Discussed few research works on big data analysis by using Hadoop and stated the drawbacks of Hadoop on its performance and reliability against big data analysis. • Designed and developed a new plug-in system PuntStore with pLSM (Punt Log Structured Merge Tree) index engine to provide scalable and efficient index services for real-time data analysis. • The Punt LSM (pLSM) can satisfy the needs for performing index probes in write optimized systems.
  • 17.  Issue 2 :- Real time big data analysis using hadoop in cloud computing • Worked to solve the Road traffic collision problem for big data analyzing and processing • Tested the proposed framework on road traffic data on a 45-mile section of I-880N freeway CA, USA. By integrating freeway traffic big data and collision data over a ten year period (1TB Size), and obtained the collision probability. • Worked for Real-time analysis and dynamic modification in unstructured big data analysis • the insufficient number of compute nodes as number of map tasks increases with growing dataset size. • Hadoop makes the users program the distributed software easily even they know nothing about the bottom circumstances.. • A Markov chain with transition probabilities applied to the random variables of cubes and result was taken to find the probability of disaster events.
  • 18.  Issue 3 :- Classification of big data using Tools and Frameworks • Worked to investigate the database kernel level, parallel statistical analysis techniques for massive sensor sampling data in the Internet of Things. • The General Statistical Database Cluster Mechanism for Big Data Analysis in the Internet of Things (“IOT-StatisticDB”)on sensor sampling data is one of the most important procedures in IoT systems to transform “data” into “knowledge”. • Designed and developed a 5Ws model to analyze the big data attributes and patterns and densities Between data. • Hadoop Distributed File System (HDFS), Hadoop cluster. • Map Reduce programming framework.
  • 19. STRENGTH • Solve the problem of centralized master node if it fails and fault tolerance of the system in hadoop • Increase the performance by MRAM to analyze the data comparing to Hadoop • Replace the MySql by NoSql by increasing the read and write throughput and making searching, inserting and deletion easily in database. • Provide parallel statistical analysis techniques for massive sensor sampling data in the Internet of Things. • Solve the problem of sampling the sensor data in parallel and distributed system. • Provide the information about the big data pattern and visualization by using the 5Ws model. • Can find out about the attackers location or ip addresses using 5Ws model and its application.
  • 20. • Many kinds of real time big data analysis can be done using hadoop clustering techniques. • Hadoop and HBase techniques can be used for analysis of real time road traffic collision data. • CEP analysis can be used to analyze an unstructured big data like CCTV data and process it in distributed system. • One can obtain the information about the current situation for the disaster event.
  • 21. WEAKNESSES • Event analysis methods can not be applied for faster and reliable insight information of real time data. • Working of MRAM based on the Java Agent Development Framework (JADE) so to develop it ,is more complex for anyone. • pLSM NoSql requires more space and memory size to implement its work. • Its uneasy to apply statistical analyzing methods on the unstructured data in parallel and distributed environment. • Providing useful traffic data form loop detectors is quite tough work .
  • 22. SCOPE OF OUR WORK • Further work can be done on the Hadoop techniques as MapReduce, HDFS, HBase environment to process the distributed data by using MRAM framework. • We can apply the RUBA framework to fields of U-city, U-plant and ITS. • In future we can use the 5WS model by deploying the densities classification in more areas and more data sets and use of Gapminder’s visualization techniques. • We can improve the current disaster event analysis methods for faster and reliable insight information  Future work will focus on performance evaluation and modeling of hadoop data-intensive applications on cloud platforms like Amazon Elastic Compute Cloud (EC2).
  • 23. Conclusion We have elaborated review of 25 research papers ranging from 2011 to 2014 based on Big Data Analysis. The review process consists of 3 stage analysis. Basically we found three main issues in the field of Big Data viz Big data analysis tools, Classification of big data using Tools and Frameworks and Real Time Big Data Analysis. Here after finding the solution approaches we concluded that Big Data Analysis is the main area into which the future work can be done. We found many Solution approaches out of which MapReduce Agent Mobility (MRAM), PuntStore with pLSM (Punt Log Structured Merge Tree), “IOT-StatisticDB”- Statistical Database Cluster Mechanism & Visual clustering analysis are most promising due to its advantages & properties.
  • 24. References 1) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799- 2829-3/13 $26.00 © 2013 IEEE 2) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference 3) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering 4) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE 5) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big Data Collision Analysis Processing Framework”(2013)
  • 25. 6 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing 7) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698- 7/13/$31.00 ©2013 IEEE 8) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799- 2829-3/13 $26.00 © 2013 IEEE 9) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference 10) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering 11) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE 12) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big
  • 26. 13 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing 14) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698- 7/13/$31.00 ©2013 IEEE 15) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799- 2829-3/13 $26.00 © 2013 IEEE 16) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference 17) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering 18) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE 19) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big
  • 27. 20 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing 21) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698- 7/13/$31.00 ©2013 IEEE 22) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799- 2829-3/13 $26.00 © 2013 IEEE 23) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference 24) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering 25) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the