Big Data
Analytics
 It is the process of examining large amounts of data of a
variety of types (big data) to uncover hidden patterns, unknown
correlations and other real- time insights.
 Use of Big Data Analytics – Google Search
recommendations, Satyamev jayte, Genes reading
Data Mining Big data Analytics
Data constraints like data
must be neat and clean
 Big data can not be neat
as it is unstructured
 Elaborate ETL required
thus have to wait for
completion of ETL cycle for
insights.
 Big data analytics provide
real – time insights.
 Descriptive
 Diagnostic
 Predictive

Prescriptive
 Relational databases failed to store and process Big Data.
 As a result, a new class of big data technology has emerged
and is being used in many big data analytics environments.
 The technologies associated with big data analytics include
 Hadoop
 Mapreduce
 NoSQL
 Hadoop is a open source framework
 Java-based programming framework
 Processing and storing of large data sets
 Distributed computing environment.
 Components of hadoop
 HDFS( hadoop
distributed file system)
 Mapreduce
 HDFS stores data in DISTRIBUTED,SCALABLE and
FAULT-TOLERANT WAY.
 Name node have metadata about data on DataNodes
 DataNodes actually have data on them in form of blocks
and they are capable of communicating
 MapReduce is a programming model designed for
processing large volumes of data in parallel by dividing the
work into a set of independent tasks.
as in previous example twitter data was processed on
different servers on basis of months .
 Hadoop is the physical implementation of Mapreduce .
 It is combination of 2 java functions : Mapper() and
Reducer()
 example: to check popularity of text.
 Mapper function maps the split files and provide input to
reducer
 Mapper ( filename , file –contents):
for each word in file-contents:
emit (word , 1)
 Reducer function clubs the input provided by mapper and
produce output
 Reducer ( word , values):
sum=0;
for each value in values:
sum=sum + value
emit(word , sum)
 Not only SQL
 Non- relational database management system
 Used where no fix schemas are required and data is scaled
horizontally.
 4 Categories of Nosql databases:
 Key-value pair
 Columnar database
 Graph databases
 Document databases
 KEY-VALUE PAIR
 keys used to get
Value from opaque
Data blocks
 Hash map
 Tremendously fast
Drawback:
No provision for content based queries .
Stay Tuned With Us for More
Information
https://www.linkedin.com/company/tyronesystems
https://twitter.com/tyronesystems
https://www.facebook.com/tyronesystems

Big Data Analytics

  • 1.
  • 2.
     It isthe process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other real- time insights.  Use of Big Data Analytics – Google Search recommendations, Satyamev jayte, Genes reading Data Mining Big data Analytics Data constraints like data must be neat and clean  Big data can not be neat as it is unstructured  Elaborate ETL required thus have to wait for completion of ETL cycle for insights.  Big data analytics provide real – time insights.
  • 3.
     Descriptive  Diagnostic Predictive  Prescriptive
  • 4.
     Relational databasesfailed to store and process Big Data.  As a result, a new class of big data technology has emerged and is being used in many big data analytics environments.  The technologies associated with big data analytics include  Hadoop  Mapreduce  NoSQL
  • 5.
     Hadoop isa open source framework  Java-based programming framework  Processing and storing of large data sets  Distributed computing environment.  Components of hadoop  HDFS( hadoop distributed file system)  Mapreduce
  • 6.
     HDFS storesdata in DISTRIBUTED,SCALABLE and FAULT-TOLERANT WAY.  Name node have metadata about data on DataNodes  DataNodes actually have data on them in form of blocks and they are capable of communicating
  • 7.
     MapReduce isa programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. as in previous example twitter data was processed on different servers on basis of months .  Hadoop is the physical implementation of Mapreduce .  It is combination of 2 java functions : Mapper() and Reducer()  example: to check popularity of text.
  • 8.
     Mapper functionmaps the split files and provide input to reducer  Mapper ( filename , file –contents): for each word in file-contents: emit (word , 1)  Reducer function clubs the input provided by mapper and produce output  Reducer ( word , values): sum=0; for each value in values: sum=sum + value emit(word , sum)
  • 9.
     Not onlySQL  Non- relational database management system  Used where no fix schemas are required and data is scaled horizontally.  4 Categories of Nosql databases:  Key-value pair  Columnar database  Graph databases  Document databases
  • 10.
     KEY-VALUE PAIR keys used to get Value from opaque Data blocks  Hash map  Tremendously fast Drawback: No provision for content based queries .
  • 11.
    Stay Tuned WithUs for More Information https://www.linkedin.com/company/tyronesystems https://twitter.com/tyronesystems https://www.facebook.com/tyronesystems