SlideShare a Scribd company logo
1 of 22
Download to read offline
∞
Agenda
   Need for a new processing platform (BigData)
   Origin of Hadoop
   What is Hadoop & what it is not ?
   Hadoop architecture
   Hadoop components
    (Common/HDFS/MapReduce)
   Hadoop ecosystem
   When should we go for Hadoop ?
   Real world use cases
   Questions
Need for a new processing
platform (Big Data)
   What is BigData ?
       - Twitter (over 7~ TB/day)
       - Facebook (over 10~ TB/day)
       - Google (over 20~ PB/day)
   Where does it come from ?
   Why to take so much of pain ?
        - Information everywhere, but where is the
          knowledge?
   Existing systems (vertical scalibility)
   Why Hadoop (horizontal scalibility)?
Origin of Hadoop
   Seminal whitepapers by Google in 2004
    on a new programming paradigm to
    handle data at internet scale
   Hadoop started as a part of the Nutch
    project.
   In Jan 2006 Doug Cutting started working
    on Hadoop at Yahoo
   Factored out of Nutch in Feb 2006
   First release of Apache Hadoop in
    September 2007
   Jan 2008 - Hadoop became a top level
    Apache project
Hadoop distributions

   Amazon
   Cloudera
   MapR
   HortonWorks
   Microsoft Windows Azure.
   IBM InfoSphere Biginsights
   Datameer
   EMC Greenplum HD Hadoop distribution
   Hadapt
What is Hadoop ?
 Flexibleinfrastructure for large
  scale computation & data
  processing on a network of
  commodity hardware
 Completely written in java
 Open source & distributed under
  Apache license
 Hadoop Common, HDFS &
  MapReduce
What Hadoop is not

A  replacement for existing data
  warehouse systems
 A File system
 An online transaction
  processing (OLTP) system
 Replacement of all
  programming logic
 A database
Hadoop architecture
   High level view (NN, DN, JT, TT) –
HDFS (Hadoop Distributed File
         System)
   Hadoop distributed file system
   Default storage for the Hadoop cluster
   NameNode/DataNode
   The File System Namespace(similar to our local
    file system)
   Master/slave architecture (1 master 'n' slaves)
   Virtual not physical
   Provides configurable replication (user specific)
   Data is stored as chunks (64 MB default, but
    configurable) across all the nodes
HDFS architecture
Data replication in HDFS.
Rack awareness




Typically large Hadoop clusters are arranged in racks and
network traffic between different nodes with in the same rack
is much more desirable than network traffic across the racks.
In addition Namenode tries to place replicas of block on
multiple racks for improved fault tolerance. A default
installation assumes all the nodes belong to the same rack.
MapReduce
   Framework provided by Hadoop to process
    large amount of data across a cluster of
    machines in a parallel manner
   Comprises of three classes –
    Mapper class
    Reducer class
    Driver class
   Tasktracker/ Jobtracker
   Reducer phase will start only after mapper is
    done
   Takes (k,v) pairs and emits (k,v) pair
   public static class Map extends Mapper<LongWritable,
    Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text(); public void
      map(LongWritable key, Text value, Context context)
throws
       IOException, InterruptedException {
               String line = value.toString();
         StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
         word.set(tokenizer.nextToken());
         context.write(word, one); } } }
MapReduce job flow
Modes of operation

 Standalone   mode


 Pseudo-distributed    mode


 Fully-distributed   mode
Hadoop ecosystem
When should we go for
       Hadoop?
 Data   is too huge
 Processes    are independent
 Online   analytical processing
 (OLAP)
 Better   scalability
 Parallelism

 Unstructured    data
Real world use cases

Clickstream   analysis
Sentiment   analysis
Recommendation         engines
Ad   Targeting
Search   Quality
   What I have been doing…
     Seismic   Data Management & Processing
     WITSML    Server & Drilling Analytics
     Orchestra      Permission Map management for
      Search
     SDIS   (just started)
   Next steps: Get your hands dirty with
    code in a workshop on …
     Hadoop     Configuration
     HDFS    Data loading
     Map    Reduce programming
     Hbase

     Hive   & Pig
QUESTIONS ?

More Related Content

What's hot

Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 

What's hot (20)

Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Word count program execution steps in hadoop
Word count program execution steps in hadoopWord count program execution steps in hadoop
Word count program execution steps in hadoop
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 

Viewers also liked

Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Viewers also liked (20)

Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Welcome to Tolteq: The Leader in MWD Technology
Welcome to Tolteq: The Leader in MWD TechnologyWelcome to Tolteq: The Leader in MWD Technology
Welcome to Tolteq: The Leader in MWD Technology
 
Cath preso
Cath presoCath preso
Cath preso
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Big Data
Big DataBig Data
Big Data
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Introduction to apache hadoop

Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
BOSC 2010
 

Similar to Introduction to apache hadoop (20)

Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop
HadoopHadoop
Hadoop
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 

More from Shashwat Shriparv

LibreOffice 7.3.pptx
LibreOffice 7.3.pptxLibreOffice 7.3.pptx
LibreOffice 7.3.pptx
Shashwat Shriparv
 

More from Shashwat Shriparv (20)

Learning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptxLearning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptx
 
LibreOffice 7.3.pptx
LibreOffice 7.3.pptxLibreOffice 7.3.pptx
LibreOffice 7.3.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Suspending a Process in Linux.pptx
Suspending a Process in Linux.pptxSuspending a Process in Linux.pptx
Suspending a Process in Linux.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Command Seperators.pptx
Command Seperators.pptxCommand Seperators.pptx
Command Seperators.pptx
 
R language introduction
R language introductionR language introduction
R language introduction
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Hbase interact with shell
Hbase interact with shellHbase interact with shell
Hbase interact with shell
 
H base development
H base developmentH base development
H base development
 
Hbase
HbaseHbase
Hbase
 
H base
H baseH base
H base
 
My sql
My sqlMy sql
My sql
 
Apache tomcat
Apache tomcatApache tomcat
Apache tomcat
 
Linux 4 you
Linux 4 youLinux 4 you
Linux 4 you
 
Java interview questions
Java interview questionsJava interview questions
Java interview questions
 
C# interview quesions
C# interview quesionsC# interview quesions
C# interview quesions
 
I pv6
I pv6I pv6
I pv6
 
Inventory system
Inventory systemInventory system
Inventory system
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Introduction to apache hadoop

  • 1.
  • 2. Agenda  Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)  Hadoop ecosystem  When should we go for Hadoop ?  Real world use cases  Questions
  • 3. Need for a new processing platform (Big Data)  What is BigData ? - Twitter (over 7~ TB/day) - Facebook (over 10~ TB/day) - Google (over 20~ PB/day)  Where does it come from ?  Why to take so much of pain ?  - Information everywhere, but where is the  knowledge?  Existing systems (vertical scalibility)  Why Hadoop (horizontal scalibility)?
  • 4. Origin of Hadoop  Seminal whitepapers by Google in 2004 on a new programming paradigm to handle data at internet scale  Hadoop started as a part of the Nutch project.  In Jan 2006 Doug Cutting started working on Hadoop at Yahoo  Factored out of Nutch in Feb 2006  First release of Apache Hadoop in September 2007  Jan 2008 - Hadoop became a top level Apache project
  • 5. Hadoop distributions  Amazon  Cloudera  MapR  HortonWorks  Microsoft Windows Azure.  IBM InfoSphere Biginsights  Datameer  EMC Greenplum HD Hadoop distribution  Hadapt
  • 6. What is Hadoop ?  Flexibleinfrastructure for large scale computation & data processing on a network of commodity hardware  Completely written in java  Open source & distributed under Apache license  Hadoop Common, HDFS & MapReduce
  • 7. What Hadoop is not A replacement for existing data warehouse systems  A File system  An online transaction processing (OLTP) system  Replacement of all programming logic  A database
  • 8. Hadoop architecture  High level view (NN, DN, JT, TT) –
  • 9. HDFS (Hadoop Distributed File System)  Hadoop distributed file system  Default storage for the Hadoop cluster  NameNode/DataNode  The File System Namespace(similar to our local file system)  Master/slave architecture (1 master 'n' slaves)  Virtual not physical  Provides configurable replication (user specific)  Data is stored as chunks (64 MB default, but configurable) across all the nodes
  • 12. Rack awareness Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition Namenode tries to place replicas of block on multiple racks for improved fault tolerance. A default installation assumes all the nodes belong to the same rack.
  • 13. MapReduce  Framework provided by Hadoop to process large amount of data across a cluster of machines in a parallel manner  Comprises of three classes – Mapper class Reducer class Driver class  Tasktracker/ Jobtracker  Reducer phase will start only after mapper is done  Takes (k,v) pairs and emits (k,v) pair
  • 14.
  • 15. public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }
  • 17. Modes of operation  Standalone mode  Pseudo-distributed mode  Fully-distributed mode
  • 19. When should we go for Hadoop?  Data is too huge  Processes are independent  Online analytical processing (OLAP)  Better scalability  Parallelism  Unstructured data
  • 20. Real world use cases Clickstream analysis Sentiment analysis Recommendation engines Ad Targeting Search Quality
  • 21. What I have been doing…  Seismic Data Management & Processing  WITSML Server & Drilling Analytics  Orchestra Permission Map management for Search  SDIS (just started)  Next steps: Get your hands dirty with code in a workshop on …  Hadoop Configuration  HDFS Data loading  Map Reduce programming  Hbase  Hive & Pig