Hadoop
(An application of big data )
Presented by :-
Ansuman Mohapatro
1201110094,CSE
Content
 Introduction of big data .
 Data sources .
 What is hadoop ??.
 Why hadoop ??.
 How hadoop works ??.
 Mapreduce algorithm .
 Problem’s ??.
 Conclusion .
Introduction to big data
 Doug cutting and Mike cafarella involved in a project called
“Nutch” .
 Data which is unable to process by traditional systems .
 Problems faced by many organisation like google,ibm,facebook
etc.
 Explosive growth of data – difficult to make sense.
 3 v’s –velocity,variety,volume.
Data sources
 Facebook generates >25 TB daily.
 Airbus >10 TB every 30 min.
 Smartphones >5 billion camera phones which are gps
enabled.
 Internet users >2 billion people and cisco estimates
internet traffic to be 8 ZB per year.
 E-mail sent 300 billion every day .
What is Hadoop ????
 Open-source software for storing and processing big data .
 Distributed .
 Framework.
 Massive data storage.
 Faster processing .
Why hadoop ???
 Low cost - HDFSs.
 Computing power.
 Scalability.
 Storage flexibility.
 Inherent data processing and self healing capabilities.
 Large data,calculation,unstructured data..
How hadoop works ???
 HDFS – java based distributed file system that can store all kind
of data.
 MAPREDUCE – a s/w programming model for processing large
sets of data parallel.
 YARN – a resource management for scheduling and handling
resource request from distributed applications.
 PIG – platform for manipulating data stored in hdfs.
 HIVE – a data warehouse.
 ZOOKEPER – application that coordinates distributed process.
Map reduce algorithm !!!
 Large data -> smaller data and mapped to computer -> theme ->
single computer -> o/p.
Problem’s ???
 Mapreduce –not suitable for iterative and interactive analytic
task.
 Mapreduce is file intensive – creates multiple files.
 Talent gap.
 Fragmented data security issues.
 Lacking tools for data quality and standardization.
Conclusion
 Select the right projects for hadoop implementation.
 Rethink and adapt existing architecture to hadoop.
 Plan availability of skills and resource before started.
 Prepare to deliver trusted data for areas that impacts business
insight and operation .
 Adopt lean and agile integration principles.
 To have edge in compitition
Thank you

Hadoop

  • 1.
    Hadoop (An application ofbig data ) Presented by :- Ansuman Mohapatro 1201110094,CSE
  • 2.
    Content  Introduction ofbig data .  Data sources .  What is hadoop ??.  Why hadoop ??.  How hadoop works ??.  Mapreduce algorithm .  Problem’s ??.  Conclusion .
  • 3.
    Introduction to bigdata  Doug cutting and Mike cafarella involved in a project called “Nutch” .  Data which is unable to process by traditional systems .  Problems faced by many organisation like google,ibm,facebook etc.  Explosive growth of data – difficult to make sense.  3 v’s –velocity,variety,volume.
  • 4.
    Data sources  Facebookgenerates >25 TB daily.  Airbus >10 TB every 30 min.  Smartphones >5 billion camera phones which are gps enabled.  Internet users >2 billion people and cisco estimates internet traffic to be 8 ZB per year.  E-mail sent 300 billion every day .
  • 5.
    What is Hadoop????  Open-source software for storing and processing big data .  Distributed .  Framework.  Massive data storage.  Faster processing .
  • 6.
    Why hadoop ??? Low cost - HDFSs.  Computing power.  Scalability.  Storage flexibility.  Inherent data processing and self healing capabilities.  Large data,calculation,unstructured data..
  • 7.
    How hadoop works???  HDFS – java based distributed file system that can store all kind of data.  MAPREDUCE – a s/w programming model for processing large sets of data parallel.  YARN – a resource management for scheduling and handling resource request from distributed applications.  PIG – platform for manipulating data stored in hdfs.  HIVE – a data warehouse.  ZOOKEPER – application that coordinates distributed process.
  • 8.
    Map reduce algorithm!!!  Large data -> smaller data and mapped to computer -> theme -> single computer -> o/p.
  • 9.
    Problem’s ???  Mapreduce–not suitable for iterative and interactive analytic task.  Mapreduce is file intensive – creates multiple files.  Talent gap.  Fragmented data security issues.  Lacking tools for data quality and standardization.
  • 10.
    Conclusion  Select theright projects for hadoop implementation.  Rethink and adapt existing architecture to hadoop.  Plan availability of skills and resource before started.  Prepare to deliver trusted data for areas that impacts business insight and operation .  Adopt lean and agile integration principles.  To have edge in compitition
  • 11.