And HAdoop
A Glimpse on Big Data and Hadoop
Outline :
• Introduction to Big Data
• Big data Architecture- Tools and Technologies
• What is Hadoop?
• Key Distinctions of Hadoop
• Core Hadoop components
A Glimpse on Big Data and Hadoop
What is Big Data?
• Big Data is a term for collection of data sets so large and complex that
it becomes difficult to process using on-hand database management
tools or any traditional approach.
• Lots of data
• Combination of structured and unstructured
A Glimpse on Big Data and Hadoop
Big Data by four words:
• Data Volume
• Data Velocity
• Data Variety
• Data Veracity
A Glimpse on Big Data and Hadoop
Challenges:
• data capture
• storage
• search
• Sharing
• analytics
• and visualization etc.
A Glimpse on Big Data and Hadoop
Big data Architecture- Tools and Technologies
Hadoop
• Low cost, reliable scale-
out architecture
• Distributed computing
Proven success in
Fortune 500 companies
• Exploding interest
NoSQL Databases
• Huge horizontal scaling
and high availability
• Highly optimized for
retrieval and appending
• Types
• Document stores
• Key Value stores
• Graph databases
Analytic RDBMS
• Optimized for bulk-
load and fast
aggregate query
workloads
• Types
• Column-
oriented
• MPP
• In-memory
Hadoop
NoSQL Databases
Analytic Databases
A Glimpse on Big Data and Hadoop
A Glimpse on Big Data and Hadoop
What is Hadoop?
• Apache Hadoop is an open source framework for distributed storage and
processing of large sets of data on commodity hardware. Hadoop enables
businesses to quickly gain insight from massive amounts of structured and
unstructured data.
• Hadoop was created by Doug Cutting and Mike cafarella
• It is designed to scale up from a single server to thousands of machines
• Hadoop provides reliable shared storage and analysis system
A Glimpse on Big Data and Hadoop
Hadoop History
A Glimpse on Big Data and Hadoop
Why we move to Hadoop
Hadoop is red-hot as it:
 Allows distributed processing of large data sets across clusters of
computers using simple programming model.
 Is cheaper to use in comparison to other traditional proprietary
technologies such as Oracle , IBM, etc.. It can run on low cost
commodity hardware.
 Has become de facto standard for storing , processing and
analyzing hundreds of terabytes and petabytes of data.
 Can handle all types of data from disparate systems such as
server logs, emails , sensors , images , etc..
A Glimpse on Big Data and Hadoop
Hadoop core components:
• Hadoop is a system for large scale data processing
• It has two main components:
 Hadoop Distributed File System:
Distributed across “nodes”
Natively redundant
Namenode track locations
 MapReduce:
Splits a tasks across processors
Shuffle and sort
Clustered storage
A Glimpse on Big Data and Hadoop
Hadoop Distributed File System
• HDFS is the primary distributed storage used by Hadoop applications.
• HDFS was designed to be a scalable, fault-tolerant, distributed storage
system that works closely with MapReduce.
• supports shell-like commands to interact with HDFS directly
• Features of HDFS are:
 Rack Awareness
 Minimal data motion
 Utilities
 Highly operable
A Glimpse on Big Data and Hadoop
A Glimpse on Big Data and Hadoop
MapReduce:
• MapReduce is a framework for processing parallelizable problems
across huge datasets
• Uses clusters to process data Or grid to process data
• MapReduce’s key benefits are:
 Simplicity
 Scalability
 Speed
 Built-in recovery
 Minimal data motion
A Glimpse on Big Data and Hadoop
A Glimpse on Big Data and Hadoop
Open
Discussion
A Glimpse on Big Data and Hadoop
References:
http://en.wikipedia.org/wiki/Apache_Hadoop - Apache Hadoop Wiki
http://hadoop.apache.org/ -Apache Hadoop Project
http://www-01.ibm.com/software/data/infosphere/hadoop/ - IBM’s
Definition for Big Data and Hadoop
http://hortonworks.com/hadoop/ - Hadoop Sandbox
A Glimpse on Big Data and Hadoop
Thank you
Join me at:
Presented by:
Prashanth Yennampelli
pyennamp@gmail.com

Big data and hadoop

  • 1.
  • 2.
    A Glimpse onBig Data and Hadoop Outline : • Introduction to Big Data • Big data Architecture- Tools and Technologies • What is Hadoop? • Key Distinctions of Hadoop • Core Hadoop components
  • 3.
    A Glimpse onBig Data and Hadoop What is Big Data? • Big Data is a term for collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or any traditional approach. • Lots of data • Combination of structured and unstructured
  • 4.
    A Glimpse onBig Data and Hadoop Big Data by four words: • Data Volume • Data Velocity • Data Variety • Data Veracity
  • 5.
    A Glimpse onBig Data and Hadoop Challenges: • data capture • storage • search • Sharing • analytics • and visualization etc.
  • 6.
    A Glimpse onBig Data and Hadoop Big data Architecture- Tools and Technologies Hadoop • Low cost, reliable scale- out architecture • Distributed computing Proven success in Fortune 500 companies • Exploding interest NoSQL Databases • Huge horizontal scaling and high availability • Highly optimized for retrieval and appending • Types • Document stores • Key Value stores • Graph databases Analytic RDBMS • Optimized for bulk- load and fast aggregate query workloads • Types • Column- oriented • MPP • In-memory Hadoop NoSQL Databases Analytic Databases
  • 7.
    A Glimpse onBig Data and Hadoop
  • 8.
    A Glimpse onBig Data and Hadoop What is Hadoop? • Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly gain insight from massive amounts of structured and unstructured data. • Hadoop was created by Doug Cutting and Mike cafarella • It is designed to scale up from a single server to thousands of machines • Hadoop provides reliable shared storage and analysis system
  • 9.
    A Glimpse onBig Data and Hadoop Hadoop History
  • 10.
    A Glimpse onBig Data and Hadoop Why we move to Hadoop Hadoop is red-hot as it:  Allows distributed processing of large data sets across clusters of computers using simple programming model.  Is cheaper to use in comparison to other traditional proprietary technologies such as Oracle , IBM, etc.. It can run on low cost commodity hardware.  Has become de facto standard for storing , processing and analyzing hundreds of terabytes and petabytes of data.  Can handle all types of data from disparate systems such as server logs, emails , sensors , images , etc..
  • 11.
    A Glimpse onBig Data and Hadoop Hadoop core components: • Hadoop is a system for large scale data processing • It has two main components:  Hadoop Distributed File System: Distributed across “nodes” Natively redundant Namenode track locations  MapReduce: Splits a tasks across processors Shuffle and sort Clustered storage
  • 12.
    A Glimpse onBig Data and Hadoop Hadoop Distributed File System • HDFS is the primary distributed storage used by Hadoop applications. • HDFS was designed to be a scalable, fault-tolerant, distributed storage system that works closely with MapReduce. • supports shell-like commands to interact with HDFS directly • Features of HDFS are:  Rack Awareness  Minimal data motion  Utilities  Highly operable
  • 13.
    A Glimpse onBig Data and Hadoop
  • 14.
    A Glimpse onBig Data and Hadoop MapReduce: • MapReduce is a framework for processing parallelizable problems across huge datasets • Uses clusters to process data Or grid to process data • MapReduce’s key benefits are:  Simplicity  Scalability  Speed  Built-in recovery  Minimal data motion
  • 15.
    A Glimpse onBig Data and Hadoop
  • 16.
    A Glimpse onBig Data and Hadoop Open Discussion
  • 17.
    A Glimpse onBig Data and Hadoop References: http://en.wikipedia.org/wiki/Apache_Hadoop - Apache Hadoop Wiki http://hadoop.apache.org/ -Apache Hadoop Project http://www-01.ibm.com/software/data/infosphere/hadoop/ - IBM’s Definition for Big Data and Hadoop http://hortonworks.com/hadoop/ - Hadoop Sandbox
  • 18.
    A Glimpse onBig Data and Hadoop Thank you Join me at: Presented by: Prashanth Yennampelli pyennamp@gmail.com

Editor's Notes

  • #2 <number>
  • #3 <number>
  • #4 <number>
  • #5 <number>
  • #6 <number>
  • #7 <number>
  • #8 <number>
  • #9 <number>
  • #10 <number>
  • #11 <number>
  • #12 <number>
  • #13 <number>
  • #14 <number>
  • #15 <number>
  • #16 <number>
  • #17 <number>
  • #18 <number>
  • #19 <number>