Outline
 Introduction
 What is Big Data
 Generators of Big Data
 Characteristic of Big Data
 Benefit of Big data
 Hadoop
 Hadoop components
 Big Data vs. Hadoop
Introduction
What is Big data
 Is very large data sets that may be analyzed computationally to
reveal patterns, trends, and associations, especially relating to
Customers behavior and interactions.
 A technology term about Data that becomes too large to be
managed in a manner that is previously known to work
normally.
Big Data generators
This data comes from everywhere:
sensors used to gather climate information,
posts to social media sites,
digital pictures
online Shopping
Airlines , and many more…
This data is “ big data.”
Characteristic Of Big Data
“Big data is the data characterized by 3 attributes: Volume, Velocity and Variety .”
Volume
 It is the size of the data which determines the value and potential of the data under
consideration. The name ‘Big Data’ itself contains a term which is related to size
and hence the characteristic.
Variety
 Data today comes in all types of formats. Structured, numeric data in traditional
databases. Unstructured text documents, email, stock ticker data and financial
transactions and semi-structured data too.
Velocity
 speed of generation of data or how fast the data is generated and processed to meet
the demands and the challenges which lie ahead in the path of growth and
development.
 FB generates 100TB daily
 Twitter generates 8TB of data Daily
Benefit of Big data
 Cost Reduction from Big Data Technologies
 Time Reduction from Big Data
 Developing New Big Data-Based Offerings
 Supporting Internal Business Decisions
 Real-time big data isn’t just a process for storing petabytes or Exabyte's of
data in a data warehouse, It’s about the ability to make better decisions and
take meaningful actions at the right time.
Advantages and disadvantages of big data
What is Hadoop
 Flexible and available architecture for large scale computation and data
performance on a network of commodity hardware
 Framework that allows for distributed processing of large data sets across
clusters of commodity servers
 – Store large amount of data
 – Process the large amount of data stored
 Getting result from HDFS
Hadoop Components
 Hadoop Distributed File system (HDFS)
 Map Reduce
 Name Node
 Data Node
 Pig , hive
Map reduces
MapReduceisaprogrammingmodelforprocessinglargedatasetswithaparallel,distributed
algorithmonacluster
 Scale-outArchitecture-Addserverstoincreaseprocessingpower
 Security&Authentication-WorkswithHDFSsecuritytomakesurethatonlyapprovedusers
canoperateagainstthedatainthesystem
 ResourceManager-Employsdatalocalityandserverresourcestodetermineoptimalcomputing
operations
 OptimizedScheduling-Completesjobsaccordingtoprioritization
 Flexibility-Procedurescanbewritteninvirtuallyanyprogramminglanguage
 Resiliency&HighAvailability-Multiplejobandtasktrackersensurethatjobsfail
independently andrestartautomatically
BI VS Big Data
Big data peresintaion
Big data peresintaion

Big data peresintaion

  • 2.
    Outline  Introduction  Whatis Big Data  Generators of Big Data  Characteristic of Big Data  Benefit of Big data  Hadoop  Hadoop components  Big Data vs. Hadoop
  • 3.
  • 5.
    What is Bigdata  Is very large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to Customers behavior and interactions.  A technology term about Data that becomes too large to be managed in a manner that is previously known to work normally.
  • 6.
    Big Data generators Thisdata comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures online Shopping Airlines , and many more… This data is “ big data.”
  • 7.
    Characteristic Of BigData “Big data is the data characterized by 3 attributes: Volume, Velocity and Variety .”
  • 8.
    Volume  It isthe size of the data which determines the value and potential of the data under consideration. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
  • 9.
    Variety  Data todaycomes in all types of formats. Structured, numeric data in traditional databases. Unstructured text documents, email, stock ticker data and financial transactions and semi-structured data too.
  • 10.
    Velocity  speed ofgeneration of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development.
  • 11.
     FB generates100TB daily  Twitter generates 8TB of data Daily
  • 12.
    Benefit of Bigdata  Cost Reduction from Big Data Technologies  Time Reduction from Big Data  Developing New Big Data-Based Offerings  Supporting Internal Business Decisions  Real-time big data isn’t just a process for storing petabytes or Exabyte's of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time.
  • 14.
  • 15.
    What is Hadoop Flexible and available architecture for large scale computation and data performance on a network of commodity hardware  Framework that allows for distributed processing of large data sets across clusters of commodity servers  – Store large amount of data  – Process the large amount of data stored  Getting result from HDFS
  • 16.
    Hadoop Components  HadoopDistributed File system (HDFS)  Map Reduce  Name Node  Data Node  Pig , hive
  • 20.
    Map reduces MapReduceisaprogrammingmodelforprocessinglargedatasetswithaparallel,distributed algorithmonacluster  Scale-outArchitecture-Addserverstoincreaseprocessingpower Security&Authentication-WorkswithHDFSsecuritytomakesurethatonlyapprovedusers canoperateagainstthedatainthesystem  ResourceManager-Employsdatalocalityandserverresourcestodetermineoptimalcomputing operations  OptimizedScheduling-Completesjobsaccordingtoprioritization  Flexibility-Procedurescanbewritteninvirtuallyanyprogramminglanguage  Resiliency&HighAvailability-Multiplejobandtasktrackersensurethatjobsfail independently andrestartautomatically
  • 23.