SlideShare a Scribd company logo
Big Data Analytics
CE Dep, IAUSDJ.ac.ir
23-10-95(12 Jan, 2017)
Survey
 explosive increase of global data
 In 2011 overall created and copied data
volume in the world was 1.8ZB
 the term of big data is mainly used to
describe enormous datasets
 many government agencies announced
 major plans to accelerate big data research
and applications
 Big data is an abstract concept
 Different with “massive data” or “very big
data.”
 In 2010, Apache Hadoop defined big data as
“datasets which could not be captured,
managed, and processed by general
computers within an acceptable scope.”
 in May 2011, McKinsey & Company. Big data
shall mean such datasets which could not be
acquired, stored, and managed by classic
database software
 Doug Laney, an analyst of META (presently
Gartner)defined challenges and opportunities
brought about by increased data with a 3Vs
model
 The increase ofVolume,Velocity, andVariety
 The increase ofVolume,Velocity,Variety and
Big DataValue
 retailers that fully utilize big data may
improve their profit by more than 60 %
 developed economies in Europe could save
over EUR 100 billion (which excludes the
effect of reduced frauds, errors, and tax
difference)
 Extract wisdom from data
Data representation
 Data representation aims to make data more
meaningful for computer analysis and user
interpretation
 Levels of heterogeneity in type, structure,
semantics, organization, granularity, and
accessibility.
Redundancy reduction and data
compression
 There is high level of redundancy in datasets
 Is effective to reduce the indirect cost of the
entire system
Data life cycle management
 Values hidden in big data depend on data
freshness
 Which data shall be stored and which data
shall be discarded.
Analytical mechanism
 Traditional RDBMSs are strictly designed
with a lack of scalability and expandability
 We shall find a compromising solution
between RDBMSs and non-relational
databases.
Data confidentiality
 Distributed nodes and servers
 Third party for processing
Energy management
 From both economy and environment
perspectives
 With increase of data volume and analytical
demands, processing, storage, and
transmission of big data consume more
electric energy so:
 System-level power consumption control
 Management mechanism.
Expendability and scalability
 The analytical system of big data must
support present and future datasets.
 The analytical algorithm must be able to
process increasingly expanding and more
complex datasets.
Cooperation
 Analysis of big data is an interdisciplinary
research, which requires experts in different
fields cooperate.
 The main objective of cloud computing is to
use huge computing and storage resources
under concentrated management.
 The development of cloud computing
provides solutions for the storage and
processing of big data.
 The distributed storage technology based on cloud computing can
effectively manage big data
 The parallel computing capacity by virtue of cloud computing can
improve the efficiency of acquisition and analyzing big data.
 Cloud computing transforms the IT
architecture
 While big data influences business decision
making.
 Big data and cloud computing have different
 target customers
 Cloud computing is an advanced IT solution
 Big data focusing on business operations.
 Enormous amount of networking sensors are
embedded into various devices and machines
in the real world.
 collect various kinds of data, such as
environmental data, geographical data,
astronomical data, and logistic data.
 Mobile equipment's, transportation facilities,
public facilities, and home appliances could
all be data acquisition equipments in IoT
 Not only is a platform for concentrated
storage of data.
 acquiring data
 managing data
 organizing data
 leveraging the data values and functions.
 Hadoop is widely used in big data
applications in the industry, e.g., spam
filtering, network searching, clickstream
analysis, and social recommendation.
 Academic research is now based on Hadoop.
 In June 2012,Yahoo runs Hadoop in 42,000
servers at four data centers to support its
products and services, e.g., searching and
spam filtering
 Facebook announced that their Hadoop
cluster can process 100 PB data, which grew
by 0.5 PB per day as in November 2012.
 many companies provide Hadoop ommercial
execution and/or support, including Cloudera,
IBM, MapR, EMC, and Oracle.
Enterprise data
 Amazon processes millions of terminal
operations and more than 500,000 queries
from third-party sellers per day.
 Walmart processes one million customer
trades per hour, over 2.5PB.
 Akamai analyzes 75 million events per day
 Come from industry, agriculture, traffic,
transportation, medical care, public
departments, and families, etc.
 The data generated from IoT has the following
features:
 Large-scale data
 Heterogeneity
 Strong time and space correlation
 Effective data accounts for only a small portion of
the big data
 High-throughput bio-measurement
technologies are developed
 The completion of HGP (Human Genome
Project)
 One sequencing of human gene may
generate 100-600GB raw data.
 1.15 million human samples and 150,000
animal, plant, and microorganism samples in
China National Genebank, 2015, 30 million.
 Clinical medical care and medical R & D Data
 Pittsburgh Medical Center has stored 2TB such
data.
 By 2015, the average data volume of every
hospital will increase from 167TB to 665TB.
 Astronomy data, recorded 25TB data from
1998 to 2008.
 In the beginning of 2008, the Atlas
experiment of Large Hadron Collider (LHC) of
European Organization for Nuclear Research
generates raw data at 2PB/s and stores about
10TB processed data per year.
 Mobile data.
 Etc.
 Data collection
 Data transmission
 Data pre-processing
 Log files
 Sensing
 Methods for acquiring network data(crowlers
etc.)
 Lipcap-based pocket capture-easy to use library
 Zero copy packet capture
 Monitoring software’s:Wireshark, Smartsniff,
WinNetCap
 Mobile equipment’s
 Inter-DCN transmissions
 from data source to data center
 Intra-DCNTransmissions
 data communication flows within data centers
 High speed infrastructure
The collected datasets vary with respect to noise,
redundancy, and consistency, and it is undoubtedly a
waste to store meaningless data.
Integration
 Combination of data from different sources and
provides users with a uniform view of data
Cleaning
 Cleaning: data cleaning is a process to identify
inaccurate, incomplete, or unreasonable data,
Redundancy elimination
 Direct Attached Storage (DAS)
 Various hard disks are directly connected with
servers
 Network Storage
 Network Attached Storage (NAS)
 Storage Area Network (SAN).
 NAS
 Consistency
 A distributed storage system requires multiple
servers to cooperatively store data
 Availability
 It would be desirable if the entire system is not
seriously affected to satisfy customer’s requests in
terms of reading and writing.
 PartitionTolerance
 The distributed storage still works well when the
network is partitioned
 Eric Brewer proposed a CAP theory in 2000,
which indicated that a distributed system
could not simultaneously meet the
requirements on consistency, availability, and
partition tolerance; at most two of the three
requirements can be satisfied simultaneously.
 Seth Gilbert and Nancy Lynch from MIT
proved the correctness of CAP theory in 2002.
 Eric Brewer proposed a CAP theory in 2000,
which indicated that a distributed system
could not simultaneously meet the
requirements on consistency, availability, and
partition tolerance; at most two of the three
requirements can be satisfied simultaneously.
 Seth Gilbert and Nancy Lynch from MIT
proved the correctness of CAP theory in 2002.
 classified into three bottom-up levels:
1. File systems
2. Databases
3. Programming models.
 classified into three bottom-up levels:
1. File systems
 Google’s GFS, HDFS, Microsoft Cosmos,
Facebook Haystack,TaobaoTFS and FastDFS
 classified into three bottom-up levels:
1. File systems
2. Databases, NoSQL
1. Key-value Databases: Dynamo, Voldemort
2. Column-oriented Database: BigTable, Cassandra
3. Document Database: More complex with KV
MongoDB, SimpleDB, and CouchDB
 classified into three bottom-up levels:
1. File systems
2. Databases, NoSQL
3. Programming models
1.MapReduce: Automatic parallel processing. Sawzall
of Google, Pig Latin ofYahoo, Hive of Facebook,
and Scope of Microsoft.
2.Dryad: Dryad is a general-purpose distributed
execution engine for processing parallel
applications. And All-Pairs.
 Use proper statistical methods to analyze
massive data, to concentrate, extract, and
refine useful data hidden in a batch of chaotic
datasets, and to identify the inherent law of
the subject matter.
 Cluster Analysis: is a statistical method for
grouping objects, and specifically, classifying
objects according to some features.
 Factor Analysis: is basically targeted at
describing the relation among many
elements with only a few factors, i.e.,
grouping several closely related variables into
a factor, and the few factors are then used to
reveal the most information of the original
data.
 Correlation Analysis: is an analytical method
for determining the law of relations, such as
correlation, correlative dependence, and
mutual restriction, among observed
phenomena and accordingly conducting
forecast and control.
 Regression Analysis: is a mathematical tool
for revealing correlations between one
variable and several other variables.
 A/BTesting: also called bucket testing. It is a
technology for determining how to improve
target variables by comparing the tested
group. Big data will require a large number of
tests to be executed and analyzed.
 Statistical Analysis: Statistical analysis is
based on the statistical theory, a branch of
applied mathematics.
 Data Mining Algorithms: Data mining is a
process for extracting hidden, unknown, but
potentially useful information and knowledge
from massive, incomplete, noisy, fuzzy, and
random data.
 Bloom Filter: Bloom Filter consists of a series
of Hash functions.
 Hashing: it is a method that essentially
transforms data into shorter fixed-length
numerical values or index values.
 Index: index is always an effective method to
reduce the expense of disk reading and
writing, and improve insertion, deletion,
modification, and query speeds in both
traditional relational databases that manage
structured data, and other technologies that
manage semi structured and unstructured
data
 Triel: also called trie tree, a variant of Hash
Tree. It is mainly applied to rapid retrieval and
word frequency statistics.The main idea of
Triel is to utilize common prefixes of
character strings to reduce comparison on
character strings to the greatest extent, so as
to improve query efficiency.
 Parallel Computing: compared to traditional
serial computing, parallel computing refers to
simultaneously utilizing several computing
resources to complete a computation task.
 Presently, some classic parallel computing
models include MPI (Message Passing
Interface), MapReduce, and Dryad
1.Real-time vs. offline analysis:
Real-time is mainly used in E-commerce and
finance.
Offline analysis: is usually used for
applications without high requirements on
response time, e.g., machine learning,
statistical analysis, and recommendation
algorithms.
2.Analysis at different levels
Memory-level analysis: is for the case where the
total data volume is smaller than the maximum
memory of a cluster.
BI analysis: is for the case when the data scale
surpasses the memory level but may be imported
into the BI analysis environment.
Massive analysis: is for the case when the data
scale has completely surpassed the capacities of BI
products and traditional relational databases.
3. Analysis with different complexity
 R (30.7 %): R, an open source programming
language and software environment, is
designed for data mining/analysis and
visualization.
 Excel (29.8 %): Excel, a core component of
Microsoft Office, provides powerful data
processing and statistical analysis
capabilities.
 Rapid-I Rapidminer (26.7 %): Rapidminer is an
open source software used for data mining,
machine learning, and predictive analysis.
 KNMINE (21.8%): KNIME (Konstanz
Information Miner) is a user-friendly,
intelligent, and open-source rich data
integration, data processing, data analysis,
and data mining platform
 Weka/Pentaho (14.8 %):Weka, abbreviated
fromWaikato Environment for Knowledge
Analysis, is a free and open-source machine
learning and data mining software written in
Java.
 Application of big data in enterprises
 Application of IoT based big data
 Application of online social network-oriented
big data
 Application of online social network-oriented big
data
 Structure-based Applications
 EarlyWarning
 Real-time Monitoring
 Real-time Feedback
 Applications of healthcare and medical big
data
 prediction
 Fundamental problems of big data:There is a
compelling need for a rigorous and holistic
definition of big data, a structural model of
big data, a formal description of big data.
 Standardization of big data: An evaluation
system of data quality and an evaluation
standard/benchmark of data computing
efficiency should be developed.
 Evolution of big data computing modes:This
includes memory mode, data flow mode,
PRAM mode, and MR mode, etc.
 Evolution of big data computing modes:This
includes memory mode, data flow mode,
PRAM mode, and MR mode, etc.
 Format conversion of big data
 Big data transfer
 Real-time performance of big data
 Processing of big data
 Big data management
 Searching, mining, and analysis of big data
 Integration and provenance of big data
 Big data application
 Big data privacy
 Protection of personal privacy during data acquisition
 Personal privacy data may also be leaked during storage,
transmission, and usage, even if acquired with the
permission of users
 Data quality: Data quality influences big data utilization.
Low quality data wastes transmission and storage
resources with poor usability
 Big data safety mechanism: Big data brings about
challenges to data encryption due to its large scale and
high diversity.
“Patience ensures
victory.”
Thank you for
your attention.

More Related Content

What's hot

Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
VIKAS KATARE
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
Fujitsu UK
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
Praveen Hanchinal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Joey Li
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
Mahmoud Yassin
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 

What's hot (20)

Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 

Similar to Big data analytics, survey r.nabati

Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
Sarvesh Meena
 
E018142329
E018142329E018142329
E018142329
IOSR Journals
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big data mining
Big data miningBig data mining
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Polash Halder
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Sreedhar Chowdam
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
Md Mizanur Rahman
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
IJRESJOURNAL
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Anusha sweety
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
aciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
aciijournal
 

Similar to Big data analytics, survey r.nabati (20)

Big Data
Big DataBig Data
Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
E018142329
E018142329E018142329
E018142329
 
Big Data
Big DataBig Data
Big Data
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 

More from nabati

Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadehSmart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
nabati
 
Ip core -iausdj.ac.ir
Ip core -iausdj.ac.irIp core -iausdj.ac.ir
Ip core -iausdj.ac.ir
nabati
 
Introduction to R r.nabati - iausdj.ac.ir
Introduction to R   r.nabati - iausdj.ac.irIntroduction to R   r.nabati - iausdj.ac.ir
Introduction to R r.nabati - iausdj.ac.ir
nabati
 
Contiki IoT simulation
Contiki IoT simulationContiki IoT simulation
Contiki IoT simulation
nabati
 
Cloud computing standards and protocols r.nabati
Cloud computing standards and protocols r.nabatiCloud computing standards and protocols r.nabati
Cloud computing standards and protocols r.nabati
nabati
 
Internet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabatiInternet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabati
nabati
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabati
nabati
 
Random walks on graphs - link prediction by Rouhollah Nabati
Random walks on graphs - link prediction by Rouhollah NabatiRandom walks on graphs - link prediction by Rouhollah Nabati
Random walks on graphs - link prediction by Rouhollah Nabati
nabati
 
Introduction to latex by Rouhollah Nabati
Introduction to latex by Rouhollah NabatiIntroduction to latex by Rouhollah Nabati
Introduction to latex by Rouhollah Nabati
nabati
 

More from nabati (9)

Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadehSmart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
 
Ip core -iausdj.ac.ir
Ip core -iausdj.ac.irIp core -iausdj.ac.ir
Ip core -iausdj.ac.ir
 
Introduction to R r.nabati - iausdj.ac.ir
Introduction to R   r.nabati - iausdj.ac.irIntroduction to R   r.nabati - iausdj.ac.ir
Introduction to R r.nabati - iausdj.ac.ir
 
Contiki IoT simulation
Contiki IoT simulationContiki IoT simulation
Contiki IoT simulation
 
Cloud computing standards and protocols r.nabati
Cloud computing standards and protocols r.nabatiCloud computing standards and protocols r.nabati
Cloud computing standards and protocols r.nabati
 
Internet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabatiInternet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabati
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabati
 
Random walks on graphs - link prediction by Rouhollah Nabati
Random walks on graphs - link prediction by Rouhollah NabatiRandom walks on graphs - link prediction by Rouhollah Nabati
Random walks on graphs - link prediction by Rouhollah Nabati
 
Introduction to latex by Rouhollah Nabati
Introduction to latex by Rouhollah NabatiIntroduction to latex by Rouhollah Nabati
Introduction to latex by Rouhollah Nabati
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Big data analytics, survey r.nabati

  • 1. Big Data Analytics CE Dep, IAUSDJ.ac.ir 23-10-95(12 Jan, 2017) Survey
  • 2.
  • 3.  explosive increase of global data  In 2011 overall created and copied data volume in the world was 1.8ZB  the term of big data is mainly used to describe enormous datasets  many government agencies announced  major plans to accelerate big data research and applications
  • 4.
  • 5.  Big data is an abstract concept  Different with “massive data” or “very big data.”  In 2010, Apache Hadoop defined big data as “datasets which could not be captured, managed, and processed by general computers within an acceptable scope.”
  • 6.  in May 2011, McKinsey & Company. Big data shall mean such datasets which could not be acquired, stored, and managed by classic database software
  • 7.  Doug Laney, an analyst of META (presently Gartner)defined challenges and opportunities brought about by increased data with a 3Vs model  The increase ofVolume,Velocity, andVariety
  • 8.  The increase ofVolume,Velocity,Variety and Big DataValue
  • 9.  retailers that fully utilize big data may improve their profit by more than 60 %  developed economies in Europe could save over EUR 100 billion (which excludes the effect of reduced frauds, errors, and tax difference)  Extract wisdom from data
  • 10. Data representation  Data representation aims to make data more meaningful for computer analysis and user interpretation  Levels of heterogeneity in type, structure, semantics, organization, granularity, and accessibility.
  • 11. Redundancy reduction and data compression  There is high level of redundancy in datasets  Is effective to reduce the indirect cost of the entire system
  • 12. Data life cycle management  Values hidden in big data depend on data freshness  Which data shall be stored and which data shall be discarded.
  • 13. Analytical mechanism  Traditional RDBMSs are strictly designed with a lack of scalability and expandability  We shall find a compromising solution between RDBMSs and non-relational databases.
  • 14. Data confidentiality  Distributed nodes and servers  Third party for processing
  • 15. Energy management  From both economy and environment perspectives  With increase of data volume and analytical demands, processing, storage, and transmission of big data consume more electric energy so:  System-level power consumption control  Management mechanism.
  • 16. Expendability and scalability  The analytical system of big data must support present and future datasets.  The analytical algorithm must be able to process increasingly expanding and more complex datasets.
  • 17. Cooperation  Analysis of big data is an interdisciplinary research, which requires experts in different fields cooperate.
  • 18.
  • 19.  The main objective of cloud computing is to use huge computing and storage resources under concentrated management.  The development of cloud computing provides solutions for the storage and processing of big data.  The distributed storage technology based on cloud computing can effectively manage big data  The parallel computing capacity by virtue of cloud computing can improve the efficiency of acquisition and analyzing big data.
  • 20.  Cloud computing transforms the IT architecture  While big data influences business decision making.  Big data and cloud computing have different  target customers  Cloud computing is an advanced IT solution  Big data focusing on business operations.
  • 21.
  • 22.  Enormous amount of networking sensors are embedded into various devices and machines in the real world.  collect various kinds of data, such as environmental data, geographical data, astronomical data, and logistic data.  Mobile equipment's, transportation facilities, public facilities, and home appliances could all be data acquisition equipments in IoT
  • 23.
  • 24.  Not only is a platform for concentrated storage of data.  acquiring data  managing data  organizing data  leveraging the data values and functions.
  • 25.  Hadoop is widely used in big data applications in the industry, e.g., spam filtering, network searching, clickstream analysis, and social recommendation.  Academic research is now based on Hadoop.  In June 2012,Yahoo runs Hadoop in 42,000 servers at four data centers to support its products and services, e.g., searching and spam filtering
  • 26.  Facebook announced that their Hadoop cluster can process 100 PB data, which grew by 0.5 PB per day as in November 2012.  many companies provide Hadoop ommercial execution and/or support, including Cloudera, IBM, MapR, EMC, and Oracle.
  • 27.
  • 28. Enterprise data  Amazon processes millions of terminal operations and more than 500,000 queries from third-party sellers per day.  Walmart processes one million customer trades per hour, over 2.5PB.  Akamai analyzes 75 million events per day
  • 29.  Come from industry, agriculture, traffic, transportation, medical care, public departments, and families, etc.  The data generated from IoT has the following features:  Large-scale data  Heterogeneity  Strong time and space correlation  Effective data accounts for only a small portion of the big data
  • 30.  High-throughput bio-measurement technologies are developed  The completion of HGP (Human Genome Project)  One sequencing of human gene may generate 100-600GB raw data.  1.15 million human samples and 150,000 animal, plant, and microorganism samples in China National Genebank, 2015, 30 million.
  • 31.  Clinical medical care and medical R & D Data  Pittsburgh Medical Center has stored 2TB such data.  By 2015, the average data volume of every hospital will increase from 167TB to 665TB.
  • 32.  Astronomy data, recorded 25TB data from 1998 to 2008.  In the beginning of 2008, the Atlas experiment of Large Hadron Collider (LHC) of European Organization for Nuclear Research generates raw data at 2PB/s and stores about 10TB processed data per year.  Mobile data.  Etc.
  • 33.  Data collection  Data transmission  Data pre-processing
  • 34.  Log files  Sensing  Methods for acquiring network data(crowlers etc.)  Lipcap-based pocket capture-easy to use library  Zero copy packet capture  Monitoring software’s:Wireshark, Smartsniff, WinNetCap  Mobile equipment’s
  • 35.  Inter-DCN transmissions  from data source to data center  Intra-DCNTransmissions  data communication flows within data centers  High speed infrastructure
  • 36. The collected datasets vary with respect to noise, redundancy, and consistency, and it is undoubtedly a waste to store meaningless data. Integration  Combination of data from different sources and provides users with a uniform view of data Cleaning  Cleaning: data cleaning is a process to identify inaccurate, incomplete, or unreasonable data, Redundancy elimination
  • 37.
  • 38.  Direct Attached Storage (DAS)  Various hard disks are directly connected with servers  Network Storage  Network Attached Storage (NAS)  Storage Area Network (SAN).
  • 40.  Consistency  A distributed storage system requires multiple servers to cooperatively store data  Availability  It would be desirable if the entire system is not seriously affected to satisfy customer’s requests in terms of reading and writing.  PartitionTolerance  The distributed storage still works well when the network is partitioned
  • 41.  Eric Brewer proposed a CAP theory in 2000, which indicated that a distributed system could not simultaneously meet the requirements on consistency, availability, and partition tolerance; at most two of the three requirements can be satisfied simultaneously.  Seth Gilbert and Nancy Lynch from MIT proved the correctness of CAP theory in 2002.
  • 42.  Eric Brewer proposed a CAP theory in 2000, which indicated that a distributed system could not simultaneously meet the requirements on consistency, availability, and partition tolerance; at most two of the three requirements can be satisfied simultaneously.  Seth Gilbert and Nancy Lynch from MIT proved the correctness of CAP theory in 2002.
  • 43.  classified into three bottom-up levels: 1. File systems 2. Databases 3. Programming models.
  • 44.  classified into three bottom-up levels: 1. File systems  Google’s GFS, HDFS, Microsoft Cosmos, Facebook Haystack,TaobaoTFS and FastDFS
  • 45.  classified into three bottom-up levels: 1. File systems 2. Databases, NoSQL 1. Key-value Databases: Dynamo, Voldemort 2. Column-oriented Database: BigTable, Cassandra 3. Document Database: More complex with KV MongoDB, SimpleDB, and CouchDB
  • 46.  classified into three bottom-up levels: 1. File systems 2. Databases, NoSQL 3. Programming models 1.MapReduce: Automatic parallel processing. Sawzall of Google, Pig Latin ofYahoo, Hive of Facebook, and Scope of Microsoft. 2.Dryad: Dryad is a general-purpose distributed execution engine for processing parallel applications. And All-Pairs.
  • 47.
  • 48.
  • 49.  Use proper statistical methods to analyze massive data, to concentrate, extract, and refine useful data hidden in a batch of chaotic datasets, and to identify the inherent law of the subject matter.
  • 50.  Cluster Analysis: is a statistical method for grouping objects, and specifically, classifying objects according to some features.
  • 51.  Factor Analysis: is basically targeted at describing the relation among many elements with only a few factors, i.e., grouping several closely related variables into a factor, and the few factors are then used to reveal the most information of the original data.
  • 52.  Correlation Analysis: is an analytical method for determining the law of relations, such as correlation, correlative dependence, and mutual restriction, among observed phenomena and accordingly conducting forecast and control.
  • 53.  Regression Analysis: is a mathematical tool for revealing correlations between one variable and several other variables.
  • 54.  A/BTesting: also called bucket testing. It is a technology for determining how to improve target variables by comparing the tested group. Big data will require a large number of tests to be executed and analyzed.
  • 55.  Statistical Analysis: Statistical analysis is based on the statistical theory, a branch of applied mathematics.
  • 56.  Data Mining Algorithms: Data mining is a process for extracting hidden, unknown, but potentially useful information and knowledge from massive, incomplete, noisy, fuzzy, and random data.
  • 57.  Bloom Filter: Bloom Filter consists of a series of Hash functions.
  • 58.  Hashing: it is a method that essentially transforms data into shorter fixed-length numerical values or index values.
  • 59.  Index: index is always an effective method to reduce the expense of disk reading and writing, and improve insertion, deletion, modification, and query speeds in both traditional relational databases that manage structured data, and other technologies that manage semi structured and unstructured data
  • 60.  Triel: also called trie tree, a variant of Hash Tree. It is mainly applied to rapid retrieval and word frequency statistics.The main idea of Triel is to utilize common prefixes of character strings to reduce comparison on character strings to the greatest extent, so as to improve query efficiency.
  • 61.  Parallel Computing: compared to traditional serial computing, parallel computing refers to simultaneously utilizing several computing resources to complete a computation task.  Presently, some classic parallel computing models include MPI (Message Passing Interface), MapReduce, and Dryad
  • 62. 1.Real-time vs. offline analysis: Real-time is mainly used in E-commerce and finance. Offline analysis: is usually used for applications without high requirements on response time, e.g., machine learning, statistical analysis, and recommendation algorithms.
  • 63. 2.Analysis at different levels Memory-level analysis: is for the case where the total data volume is smaller than the maximum memory of a cluster. BI analysis: is for the case when the data scale surpasses the memory level but may be imported into the BI analysis environment. Massive analysis: is for the case when the data scale has completely surpassed the capacities of BI products and traditional relational databases.
  • 64. 3. Analysis with different complexity
  • 65.  R (30.7 %): R, an open source programming language and software environment, is designed for data mining/analysis and visualization.  Excel (29.8 %): Excel, a core component of Microsoft Office, provides powerful data processing and statistical analysis capabilities.
  • 66.  Rapid-I Rapidminer (26.7 %): Rapidminer is an open source software used for data mining, machine learning, and predictive analysis.  KNMINE (21.8%): KNIME (Konstanz Information Miner) is a user-friendly, intelligent, and open-source rich data integration, data processing, data analysis, and data mining platform
  • 67.  Weka/Pentaho (14.8 %):Weka, abbreviated fromWaikato Environment for Knowledge Analysis, is a free and open-source machine learning and data mining software written in Java.
  • 68.
  • 69.  Application of big data in enterprises  Application of IoT based big data  Application of online social network-oriented big data  Application of online social network-oriented big data  Structure-based Applications  EarlyWarning  Real-time Monitoring  Real-time Feedback
  • 70.  Applications of healthcare and medical big data
  • 72.
  • 73.  Fundamental problems of big data:There is a compelling need for a rigorous and holistic definition of big data, a structural model of big data, a formal description of big data.  Standardization of big data: An evaluation system of data quality and an evaluation standard/benchmark of data computing efficiency should be developed.
  • 74.  Evolution of big data computing modes:This includes memory mode, data flow mode, PRAM mode, and MR mode, etc.
  • 75.  Evolution of big data computing modes:This includes memory mode, data flow mode, PRAM mode, and MR mode, etc.
  • 76.  Format conversion of big data  Big data transfer  Real-time performance of big data  Processing of big data
  • 77.  Big data management  Searching, mining, and analysis of big data  Integration and provenance of big data  Big data application
  • 78.  Big data privacy  Protection of personal privacy during data acquisition  Personal privacy data may also be leaked during storage, transmission, and usage, even if acquired with the permission of users  Data quality: Data quality influences big data utilization. Low quality data wastes transmission and storage resources with poor usability  Big data safety mechanism: Big data brings about challenges to data encryption due to its large scale and high diversity.
  • 80. Thank you for your attention.

Editor's Notes

  1. Dryad is a directed acyclic graph (DAG) Dryad executes operations on the vertexes in clusters and transmits data via data channels, including documents, TCP connections, and shared-memory FIFO. During operation, resources in a logic operation graph are automatically map to physical resources.