Hadoop

Hadoop
(An application of big data )
Presented by :-
Ansuman Mohapatro
1201110094,CSE

Content
 Introduction of big data .
 Data sources .
 What is hadoop ??.
 Why hadoop ??.
 How hadoop works ??.
 Mapreduce algorithm .
 Problem’s ??.
 Conclusion .

Introduction to big data
 Doug cutting and Mike cafarella involved in a project called
“Nutch” .
 Data which is unable to process by traditional systems .
 Problems faced by many organisation like google,ibm,facebook
etc.
 Explosive growth of data – difficult to make sense.
 3 v’s –velocity,variety,volume.

Data sources
 Facebook generates >25 TB daily.
 Airbus >10 TB every 30 min.
 Smartphones >5 billion camera phones which are gps
enabled.
 Internet users >2 billion people and cisco estimates
internet traffic to be 8 ZB per year.
 E-mail sent 300 billion every day .

What is Hadoop ????
 Open-source software for storing and processing big data .
 Distributed .
 Framework.
 Massive data storage.
 Faster processing .

Why hadoop ???
 Low cost - HDFSs.
 Computing power.
 Scalability.
 Storage flexibility.
 Inherent data processing and self healing capabilities.
 Large data,calculation,unstructured data..

How hadoop works ???
 HDFS – java based distributed file system that can store all kind
of data.
 MAPREDUCE – a s/w programming model for processing large
sets of data parallel.
 YARN – a resource management for scheduling and handling
resource request from distributed applications.
 PIG – platform for manipulating data stored in hdfs.
 HIVE – a data warehouse.
 ZOOKEPER – application that coordinates distributed process.

Map reduce algorithm !!!
 Large data -> smaller data and mapped to computer -> theme ->
single computer -> o/p.

Problem’s ???
 Mapreduce –not suitable for iterative and interactive analytic
task.
 Mapreduce is file intensive – creates multiple files.
 Talent gap.
 Fragmented data security issues.
 Lacking tools for data quality and standardization.

Conclusion
 Select the right projects for hadoop implementation.
 Rethink and adapt existing architecture to hadoop.
 Plan availability of skills and resource before started.
 Prepare to deliver trusted data for areas that impacts business
insight and operation .
 Adopt lean and agile integration principles.
 To have edge in compitition

Hadoop

More Related Content

What's hot

Similar to Hadoop

Recently uploaded

Hadoop