SlideShare a Scribd company logo

Hadoop and BigData - July 2016

Hadoop and BigData presentation

1 of 53
Download to read offline
Hadoop and BigData
Ranjith Sekar
July 2016
Agenda
 What is BigData and Hadoop?
 Hadoop Architecture
 HDFS
 MapReduce
 Installing Hadoop
 Develop & Run a MapReduce Program
 Hadoop Ecosystems
Introduction
Data
 Structured
 Relational DB,
 Library Catalogues (date, author, place, subject, etc.,)
 Semi Structured
 CSV, XML, JSON, NoSQL database
 Unstructured
Unstructured Data
 Machine Generated
 Satellite images
 Scientific data
 Photographs and video
 Radar or sonar data
 Human Generated
 Word, PDF, Text
 Social media data (Facebook, Twitter, LinkedIn)
 Mobile data (text messages)
 website contents (blogs, Instagram)
Storage

Recommended

Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Desing Pathshala
 

More Related Content

What's hot

Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUBAhmed Salman
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 

What's hot (20)

Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 

Viewers also liked

Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)
Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)
Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)Simone Aliprandi
 
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북[분석]서울시 2030 나홀로족을 위한 라이프 가이드북
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북BOAZ Bigdata
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler Shengwen HOU(侯圣文)
 
ITEC - Qua trinh phat trien he thong BigData
ITEC - Qua trinh phat trien he thong BigDataITEC - Qua trinh phat trien he thong BigData
ITEC - Qua trinh phat trien he thong BigDataIT Expert Club
 
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...FactoVia
 
Oxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataOxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataLudovic Piot
 
Integración Bigdata: punto de entrada al IoT - LibreCon 2016
Integración Bigdata: punto de entrada al IoT - LibreCon 2016Integración Bigdata: punto de entrada al IoT - LibreCon 2016
Integración Bigdata: punto de entrada al IoT - LibreCon 2016LibreCon
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataRolf Koski
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMohammed Guller
 
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...Storage Component Technologies in the Age of Big Data and Cloud Computing - S...
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...xuyunhao
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...Usama Fayyad
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...kcitp
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBoston Consulting Group
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Bigdata analytics and our IoT gateway
Bigdata analytics and our IoT gateway Bigdata analytics and our IoT gateway
Bigdata analytics and our IoT gateway Freek van Gool
 

Viewers also liked (20)

Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)
Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)
Verso i bigdata giudiziari? (Nexa Torino, luglio 2016)
 
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북[분석]서울시 2030 나홀로족을 위한 라이프 가이드북
[분석]서울시 2030 나홀로족을 위한 라이프 가이드북
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
ITEC - Qua trinh phat trien he thong BigData
ITEC - Qua trinh phat trien he thong BigDataITEC - Qua trinh phat trien he thong BigData
ITEC - Qua trinh phat trien he thong BigData
 
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...
Retour d'expérience Large IoT project / BigData : détail du cas réel de Hager...
 
Oxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataOxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigData
 
Integración Bigdata: punto de entrada al IoT - LibreCon 2016
Integración Bigdata: punto de entrada al IoT - LibreCon 2016Integración Bigdata: punto de entrada al IoT - LibreCon 2016
Integración Bigdata: punto de entrada al IoT - LibreCon 2016
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdata
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...Storage Component Technologies in the Age of Big Data and Cloud Computing - S...
Storage Component Technologies in the Age of Big Data and Cloud Computing - S...
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
 
Big data&hadoop
Big data&hadoopBig data&hadoop
Big data&hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Bigdata analytics and our IoT gateway
Bigdata analytics and our IoT gateway Bigdata analytics and our IoT gateway
Bigdata analytics and our IoT gateway
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 

Similar to Hadoop and BigData - July 2016

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 

Similar to Hadoop and BigData - July 2016 (20)

Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop
HadoopHadoop
Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
paper
paperpaper
paper
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 

Recently uploaded

Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanDatabarracks
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsInflectra
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxmohayyudin7826
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 

Recently uploaded (20)

Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response Plan
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptx
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 

Hadoop and BigData - July 2016

  • 1. Hadoop and BigData Ranjith Sekar July 2016
  • 2. Agenda  What is BigData and Hadoop?  Hadoop Architecture  HDFS  MapReduce  Installing Hadoop  Develop & Run a MapReduce Program  Hadoop Ecosystems
  • 4. Data  Structured  Relational DB,  Library Catalogues (date, author, place, subject, etc.,)  Semi Structured  CSV, XML, JSON, NoSQL database  Unstructured
  • 5. Unstructured Data  Machine Generated  Satellite images  Scientific data  Photographs and video  Radar or sonar data  Human Generated  Word, PDF, Text  Social media data (Facebook, Twitter, LinkedIn)  Mobile data (text messages)  website contents (blogs, Instagram)
  • 7. Key Terms  Commodity Hardware – PCs which can be used to form clusters.  Node – Commodity servers interconnected through network device.  NameNode = Master Node, DataNode = Slave Node  Cluster – interconnection of different nodes/systems in a network.
  • 10. BigData  Traditional approaches not fit for data analysis due to inflation.  Handling Large volume of data (zettabytes & petabytes) which are structured or unstructured.  Datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.  Generated by different sources around us like Systems, sensors and mobile devices.  2.5 quintillion bytes of data created everyday.  80-90% of the data in the world today has been created in the last two years alone.
  • 11. Flood of Data  More than 3 billion internet users in the world today.  The New York Stock Exchange generates about 4-5 TB of data per day.  7TB of data are processed by Twitter every day.  10TB of data are processed by Facebook every day and growing at 7 PB per month.  Interestingly 80% of these data are unstructured.  With this massive quantity of data, businesses need fast, reliable, deeper data insight.  Therefore, BigData solutions based on Hadoop and other analytics software are becoming more and more relevant.
  • 12. Dimensions of BigData Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information. Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business. Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.
  • 13. BigData Benefits  Analysis of market and derive new strategy to improve business in different geo locations.  To know the response for their campaigns, promotions, and other advertising mediums.  Use medical history of patients, hospitals to provide better and quick service.  Re-develop your products.  Perform Risk Analysis.  Create new revenue streams.  Reduces maintenance cost.  Faster, better decision making.  New products & services.
  • 16. Hadoop  Google File System (2003).  Developed by Doug Cutting from Yahoo.  Hadoop 0.1.0 was released in April 2006.  Open source project of the Apache Software Foundation.  A Framework written in Java.  Distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.  Naming the Hadoop.
  • 17. Hardware & Software  Hardware (commodity hardware)  Software  OS  RedHat Enterprise Linux (RHEL)  CentOS  Ubuntu  Java  Oracle JDK 1.6 (v 1.6.31) Medium High CPU 8 physical cores 12 physical cores Memory 16 GB 48 GB Disk 4 disks x 1TB = 4 TB 12 disks x 3TB = 36 TB Network 1 GB Ethernet 10 GB Ethernet or Infiniband
  • 18. When Hadoop?  When you must process lots of unstructured data.  When your processing can easily be made parallel.  When running batch jobs is acceptable.  When you have access to lots of cheap hardware.
  • 22. Hadoop Configurations  Standalone Mode  All Hadoop services run into a single JVM and on a single machine.  Pseudo-Distributed Mode  Individual Hadoop services run in an individual JVM, but on a single machine.  Fully Distributed Mode  Hadoop services run in individual JVMs, but JVMs resides in separate machines in a single cluster.
  • 23. Hadoop Core Services  NameNode  Secondary NameNode  DataNode  ResourceManager  ApplicationMaster  NodeManager
  • 24. How does Hadoop work?  Stage 1  User submit the Job to process with location of the input and output files in HDFS & Jar file of MapReduce Program.  Job configuration by setting different parameters specific to the job.  Stage 2  The Hadoop Job Client submits the Job and Configuration to JobTracker.  JobTracker will initiate the process to TaskTracker which in slave nodes.  JobTracker will schedule the tasks and monitoring them, providing status and diagnostic information to the job-client.  Stage 3  TaskTracker executes the Job as per MapReduce implementation.  Input will be processed and output will be stored into HDFS.
  • 26. HDFS
  • 27. Hadoop Distributed File System (HDFS)  Java-based file system to store large volume of data.  Scalability of up to 200 PB of storage and a single cluster of 4500 servers.  Supporting close to a billion files and blocks.  Access  Java API  Python/C for Non-Java Applications  Web GUI through HTTP  FS Shell - shell-like commands that directly interact with HDFS
  • 28. HDFS Features  HDFS can handle large data sets.  Since HDFS deals with large scale data, it supports a multitude of machines.  HDFS provides a write-once-read-many access model.  HDFS is built using the Java language making it portable across various platforms.  Fault Tolerance and availability are high.
  • 30. File Storage in HDFS  Split into multiple blocks/chunks and stored into different machines.  Blocks – 64MB size (default), 128MB (recommended).  Replication – fault tolerance and availability, it is configurable and it can be modified.  No storage space wasted. E.g. 420MB file stored as
  • 31. NameNode  One Per Hadoop Cluster and Act as Master Server.  Commodity hardware that contains the Linux operating system.  Namenode software – runs on commodity hardware.  Responsible for  Manages the file system namespace.  Regulates client’s access to files.  executes file system operations such as renaming, closing, and opening files and directories.
  • 32. Secondary NameNode  NameNode contains meta-data of job & data details in RAM.  S-NameNode contacts NameNode in a periodic time and copy of metadata information out of NameNode.  When NameNode crashes, the meta-data copied from S-NameNode.
  • 33. DataNode  Many per Hadoop Cluster.  Uses inexpensive commodity hardware.  Contains actual data.  Performs read/write operations on file based on request.  Performs block creation, deletion, and replication according to the instructions of the NameNode.
  • 34. HDFS Command Line Interface  View existing files  Copy files from local (copyFromLocal / put)  Copy files to local (copyToLocal / get)  Reset replication
  • 37. MapReduce  Heart of Hadoop.  Programming model/Algorithm for data processing.  Hadoop can run MapReduce programs written in various languages (Java, Ruby, Python etc.,).  MapReduce programs are inherently parallel.  Master-Slave Model.  Mapper  Performs filtering and sorting.  Reducer  Performs a summary operation.
  • 39. Job Tracker  One per Hadoop Cluster.  Controls overall execution of MapReduce Program.  Manages the Task Tracker running on Data Node.  Tracking of available & utilized resources.  Tracks the running jobs and provides fault tolerance.  Heartbeat from TaskTracker for every few minutes.
  • 40. Task Tracker  Many per Hadoop Cluster.  Executes and manages the individual tasks assigned by Job Tracker.  Periodic status to the JobTracker about the execution of the Job.  Handles the data motion between map() and reduce().  Notifies JobTracker if any task failed.
  • 43. Installing Hadoop  Prerequisites  Installation  Download : http://hadoop.apache.org/releases.html  > tar xzf hadoop-x.y.z.tar.gz  > export JAVA_HOME=/user/software/java6/  > export HADOOP_INSTALL=/home/tom/hadoop-x.y.z  > export PATH=$PATH:$HADOOP_INSTALL/bin  > Hadoop version Hadoop 0.20.0
  • 44. Pseudo-Distributed Mode Configuration core-site.xml hdfs-site.xml mapred-site.xml <?xml version="1.0"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost/</value> </property> </configuration> <?xml version="1.0"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> <?xml version="1.0"?> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>  Formatting HDFS  > hadoop namenode -format  Start HDFS & MapReduce  > start-dfs.sh  > start-mapred.sh  Stop HDFS & MapReduce  > stop-dfs.sh  > stop-mapred.sh
  • 45. Develop & Run a MapReduce Program
  • 46. Mapper import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); }}}
  • 47. Reducerimport java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } context.write(key, new IntWritable(sum)); } }
  • 48. Main Programimport org.apache.hadoop.*; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • 49. Input Data $ bin/hadoop dfs -ls /user/ranjith/mapreduce/input/ /user/ranjith/mapreduce/input/file01 /user/ranjith/mapreduce/input/file02 $ bin/hadoop dfs -cat /user/ranjith/mapreduce/input/file01 Hello World Bye World $ bin/hadoop dfs -cat /user/ranjith/mapreduce/input/file02 Hello Hadoop Goodbye Hadoop
  • 50. Run  Create Jar WordCout.jar  Run Command > hadoop jar WordCount.jar jbr.hadoopex.WordCount /user/ranjith/mapreduce/input/ /user/ranjith/mapreduce/output  Output $ bin/hadoop dfs -cat /user/ranjith/mapreduce/output/part-00000 Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2  Link : http://javabyranjith.blogspot.in/2015/10/hadoop-word-count-example-with-maven.html
  • 52. Hadoop Ecosystem  HDFS & MapReduce  Ambari - provisioning, managing, and monitoring Apache Hadoop clusters.  Pig – Scripting Language for MapReduce Program.  Mahout - Scalable, commercial-friendly machine learning for building intelligent application.  Hive – Metastore to view HDFS data.  Hbase - open source, non-relational, distributed database.  Sqoop – CLI application for transferring data between relational databases and Hadoop.  ZooKeeper - distributed configuration service, synchronization service, and naming registry for large distributed systems.  Oozie – define and manage the workflow.