Big Data Analytics

 It is the process of examining large amounts of data of a
variety of types (big data) to uncover hidden patterns, unknown
correlations and other real- time insights.
 Use of Big Data Analytics – Google Search
recommendations, Satyamev jayte, Genes reading
Data Mining Big data Analytics
Data constraints like data
must be neat and clean
 Big data can not be neat
as it is unstructured
 Elaborate ETL required
thus have to wait for
completion of ETL cycle for
insights.
 Big data analytics provide
real – time insights.

 Descriptive
 Diagnostic
 Predictive

Prescriptive

 Relational databases failed to store and process Big Data.
 As a result, a new class of big data technology has emerged
and is being used in many big data analytics environments.
 The technologies associated with big data analytics include
 Hadoop
 Mapreduce
 NoSQL

 Hadoop is a open source framework
 Java-based programming framework
 Processing and storing of large data sets
 Distributed computing environment.
 Components of hadoop
 HDFS( hadoop
distributed file system)
 Mapreduce

 HDFS stores data in DISTRIBUTED,SCALABLE and
FAULT-TOLERANT WAY.
 Name node have metadata about data on DataNodes
 DataNodes actually have data on them in form of blocks
and they are capable of communicating

 MapReduce is a programming model designed for
processing large volumes of data in parallel by dividing the
work into a set of independent tasks.
as in previous example twitter data was processed on
different servers on basis of months .
 Hadoop is the physical implementation of Mapreduce .
 It is combination of 2 java functions : Mapper() and
Reducer()
 example: to check popularity of text.

 Mapper function maps the split files and provide input to
reducer
 Mapper ( filename , file –contents):
for each word in file-contents:
emit (word , 1)
 Reducer function clubs the input provided by mapper and
produce output
 Reducer ( word , values):
sum=0;
for each value in values:
sum=sum + value
emit(word , sum)

 Not only SQL
 Non- relational database management system
 Used where no fix schemas are required and data is scaled
horizontally.
 4 Categories of Nosql databases:
 Key-value pair
 Columnar database
 Graph databases
 Document databases

 KEY-VALUE PAIR
 keys used to get
Value from opaque
Data blocks
 Hash map
 Tremendously fast
Drawback:
No provision for content based queries .

Stay Tuned With Us for More
Information
https://www.linkedin.com/company/tyronesystems
https://twitter.com/tyronesystems
https://www.facebook.com/tyronesystems

Big Data Analytics

More Related Content

What's hot

Viewers also liked

Similar to Big Data Analytics

More from Tyrone Systems

Recently uploaded

Big Data Analytics