SlideShare a Scribd company logo

What is hadoop

Whats is Big Data

1 of 23
Download to read offline
1
Big Data
By: Asis Mohanty
CBIP, CDMP
2
The Three V’s of Big Data
Varity
Unstructured and semi-structured data is becoming as
strategic as traditional structured data. (Text, Machine
logs, clickstream, Social blog..etc)
Volume
Data coming in from new source as well as increased
regulation in multiple areas means storing more data for
longer period of time.
Velocity
Machine data as well as data coming for new source is
being ingested at speeds not even imagined a few
years ago.
3
6 Key Hadoop Data Types
• How your Customer FeelsSentiment
• Website Visitors DataClickstream
• Data from Remote Sensors and MachinesSensor/Machine
• Location Based dataGeographic
• Log files automatically created by serversServer Logs
• Millions of web pages, emails and documentsText
4
Changes in Analyzing Data
Big Data is fundamentally changing the way we analyze information.
Ability to analyze vast amounts of data rather than evaluating sample
sets.
Historically we have had to look at causes. Now we can look at patterns
and correlate in data that give us much better prospective.
5
The Need of Hadoop
 Store and use all types of data
 Process all the data
 Scalability
 Commodity hardware
Scale (Storage and Processing)
Traditional
DBMS
EDW
MPP
Analytics
No SQL Hadoop
Platform
6
Integrating Hadoop
RDBMS
Streams
Social Media
Data Marts
Machine Logs
Sqoop/Flume
O
p
e
n
S
o
u
r
c
e
E
T
L
S
t
r
e
a
m
i
n
g
E
T
L
Direct Access
No SQL
O
p
e
n
S
o
u
r
c
e
E
T
L
Data Warehouse Access
EDW
D
a
t
a
M
a
r
t
A
c
c
e
s
s
Big Data Access
B
u
si
n
e
s
s
In
te
lli
g
e
n
c
e
P
la
tf
o
r
m
Cloud
Performance
& Adhoc
reporting
OLAP
Dashboards
Exploratory
Visualization
Statistical
Analytics
Machine
Learning
Ad

Recommended

Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 

More Related Content

What's hot

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 

What's hot (20)

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 

Similar to What is hadoop

Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 

Similar to What is hadoop (20)

Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
paper
paperpaper
paper
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Big data
Big dataBig data
Big data
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
HDFS
HDFSHDFS
HDFS
 

More from Asis Mohanty

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data WarehousesAsis Mohanty
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteriaAsis Mohanty
 
Cognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonCognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonAsis Mohanty
 
Reporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsReporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsAsis Mohanty
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyAsis Mohanty
 
BI Error Processing Framework
BI Error Processing FrameworkBI Error Processing Framework
BI Error Processing FrameworkAsis Mohanty
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradataAsis Mohanty
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time biAsis Mohanty
 

More from Asis Mohanty (14)

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data Warehouses
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Apache TAJO
Apache TAJOApache TAJO
Apache TAJO
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
 
COGNOS Vs OBIEE
COGNOS Vs OBIEECOGNOS Vs OBIEE
COGNOS Vs OBIEE
 
Cognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonCognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS Comparison
 
Reporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsReporting/Dashboard Evaluations
Reporting/Dashboard Evaluations
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration Casestudy
 
BI Error Processing Framework
BI Error Processing FrameworkBI Error Processing Framework
BI Error Processing Framework
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time bi
 

Recently uploaded

skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its typesMinaxi patil. CATALLYST
 
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)Rabiya Husain
 
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfDiploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfSUMIT TIWARI
 
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Rabiya Husain
 
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdfAynouraHamidova
 
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdfAynouraHamidova
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxAKSHAYMAGAR17
 
Food Web SlideShare for Ecology Notes Quiz in Canvas
Food Web SlideShare for Ecology Notes Quiz in CanvasFood Web SlideShare for Ecology Notes Quiz in Canvas
Food Web SlideShare for Ecology Notes Quiz in CanvasAlexandraSwartzwelde
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...MohonDas
 
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS method.pptx
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS  method.pptxADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS  method.pptx
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS method.pptxAKSHAYMAGAR17
 
Nzinga Kika - The story of the queen
Nzinga Kika    -  The story of the queenNzinga Kika    -  The story of the queen
Nzinga Kika - The story of the queenDeanAmory1
 
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...EduSkills OECD
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfChristalin Nelson
 
BTKi in Treatment Of Chronic Lymphocytic Leukemia
BTKi in Treatment Of Chronic Lymphocytic LeukemiaBTKi in Treatment Of Chronic Lymphocytic Leukemia
BTKi in Treatment Of Chronic Lymphocytic LeukemiaFaheema Hasan
 
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...AKSHAYMAGAR17
 
Chromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-PrincipleChromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-Principleblessipriyanka
 
Persuasive Speaking and Means of Persuasion
Persuasive Speaking and Means of PersuasionPersuasive Speaking and Means of Persuasion
Persuasive Speaking and Means of PersuasionCorinne Weisgerber
 
DISCOURSE: TEXT AS CONNECTED DISCOURSE
DISCOURSE:   TEXT AS CONNECTED DISCOURSEDISCOURSE:   TEXT AS CONNECTED DISCOURSE
DISCOURSE: TEXT AS CONNECTED DISCOURSEMYDA ANGELICA SUAN
 
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its types
 
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
 
DNA damage and repair mechanism
DNA damage and repair mechanism DNA damage and repair mechanism
DNA damage and repair mechanism
 
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfDiploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
 
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
 
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
 
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 2-2023-Aynura-Hamidova.pdf
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
 
Food Web SlideShare for Ecology Notes Quiz in Canvas
Food Web SlideShare for Ecology Notes Quiz in CanvasFood Web SlideShare for Ecology Notes Quiz in Canvas
Food Web SlideShare for Ecology Notes Quiz in Canvas
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
 
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS method.pptx
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS  method.pptxADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS  method.pptx
ADAPTABILITY, Types of Adaptability AND STABILITY ANALYSIS method.pptx
 
Nzinga Kika - The story of the queen
Nzinga Kika    -  The story of the queenNzinga Kika    -  The story of the queen
Nzinga Kika - The story of the queen
 
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdf
 
BTKi in Treatment Of Chronic Lymphocytic Leukemia
BTKi in Treatment Of Chronic Lymphocytic LeukemiaBTKi in Treatment Of Chronic Lymphocytic Leukemia
BTKi in Treatment Of Chronic Lymphocytic Leukemia
 
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
 
Chromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-PrincipleChromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-Principle
 
Persuasive Speaking and Means of Persuasion
Persuasive Speaking and Means of PersuasionPersuasive Speaking and Means of Persuasion
Persuasive Speaking and Means of Persuasion
 
DISCOURSE: TEXT AS CONNECTED DISCOURSE
DISCOURSE:   TEXT AS CONNECTED DISCOURSEDISCOURSE:   TEXT AS CONNECTED DISCOURSE
DISCOURSE: TEXT AS CONNECTED DISCOURSE
 
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...
GIÁO ÁN TIẾNG ANH GLOBAL SUCCESS LỚP 11 (CẢ NĂM) THEO CÔNG VĂN 5512 (2 CỘT) N...
 

What is hadoop

  • 1. 1 Big Data By: Asis Mohanty CBIP, CDMP
  • 2. 2 The Three V’s of Big Data Varity Unstructured and semi-structured data is becoming as strategic as traditional structured data. (Text, Machine logs, clickstream, Social blog..etc) Volume Data coming in from new source as well as increased regulation in multiple areas means storing more data for longer period of time. Velocity Machine data as well as data coming for new source is being ingested at speeds not even imagined a few years ago.
  • 3. 3 6 Key Hadoop Data Types • How your Customer FeelsSentiment • Website Visitors DataClickstream • Data from Remote Sensors and MachinesSensor/Machine • Location Based dataGeographic • Log files automatically created by serversServer Logs • Millions of web pages, emails and documentsText
  • 4. 4 Changes in Analyzing Data Big Data is fundamentally changing the way we analyze information. Ability to analyze vast amounts of data rather than evaluating sample sets. Historically we have had to look at causes. Now we can look at patterns and correlate in data that give us much better prospective.
  • 5. 5 The Need of Hadoop  Store and use all types of data  Process all the data  Scalability  Commodity hardware Scale (Storage and Processing) Traditional DBMS EDW MPP Analytics No SQL Hadoop Platform
  • 6. 6 Integrating Hadoop RDBMS Streams Social Media Data Marts Machine Logs Sqoop/Flume O p e n S o u r c e E T L S t r e a m i n g E T L Direct Access No SQL O p e n S o u r c e E T L Data Warehouse Access EDW D a t a M a r t A c c e s s Big Data Access B u si n e s s In te lli g e n c e P la tf o r m Cloud Performance & Adhoc reporting OLAP Dashboards Exploratory Visualization Statistical Analytics Machine Learning
  • 7. 7 What is Hadoop ? o Framework for solving data intensive processes o Designed to scale massively o Processes all the contents of a file (instead of attempting to read portion of a file) o Hadoop is very fast for very large jobs o Hadoop is not fast for small jobs o It does not provide caching or indexing (tools like HBase can provide these features if needed) o Designed for hardware and software failures
  • 8. 8 What is Hadoop 2.0? The Apache Hadoop 2.0 project consists of the following modules o Hadoop Common: the utilities that provide support for the other Hadoop modules o HDFS: the Hadoop Distributed File System o YARN: a framework for job scheduling and cluster resource management. o MapReduce: for processing large data sets in a scalable and parallel fashion
  • 9. 9 What is Yarn?  YARN is a sub-project of Hadoop at the Apache Software Foundation that takes Hadoop beyond batch processing to enable broader data-processing  It extends the Hadoop platform by supporting non-MapReduce workloads associated with other programming models  The core concept of YARN was born out of a need to have Hadoop work for more real-time and streaming capabilities  As more and more data landed in Hadoop, enterprises have demanded that Hadoop extend its capabilities  As part of Hadoop 2.0, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines  Streamlines MapReduce to do what it does best - process data  Run multiple applications in Hadoop, all sharing a common resource management  Many organization are already building application on YARN in order to bring them IN to Hadoop  With Hadoop 2.0 and YARN, organizations can use Hadoop for streaming, interactive and a world of other Hadoop-based applications
  • 10. 10 Yarn taking Hadoop Beyond Batch With YARN, Applications run natively in Hadoop Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…)
  • 11. 11 How it Works Raw Data 1. Put the data into HDFS in Raw Format 2. Use Pig to explore and Transform 3. Data Analytics use Hive to query the data 4. Data Scientist use MapReduce, R and Mahout to mine the data Hadoop Distributed File System Structured Data Answer to Question = $$ Predictive KPI = ##
  • 12. 12 Getting Relational Data & Raw Data in to Hadoop Raw Data Hadoop Distributed File System Table in RDBMS Sqoop Job Sqoop is a tool to transfer data between RDBMS to Hadoop Flume Agent Flume is a tool to streaming data in to Hadoop
  • 13. 13 What is Pig?  Pig is an extension of Hadoop that simplifies the ability to query large HDFS datasets  Pig is made up of two main components: • A data processing language called Pig Latin • A compiler that compiles and runs Pig Latin scripts  Pig was created at Yahoo! to make it easier to analyze the data in HDFS without the complexities of writing a traditional MapReduce program  With Pig, you can develop MapReduce jobs with a few lines of Pig Latin
  • 14. 14 What is Hive?  Hive is a subproject of the Apache Hadoop project that provides a data warehousing layer built on top of Hadoop  Hive allows you to define a structure for your unstructured big data, simplifying the process of performing analysis and queries by introducing a familiar, SQL-like language called HiveQL  Hive is for data analysts familiar with SQL who need to do ad-hoc queries, summarization and data analysis on their HDFS data  Hive is not a relational database  Hive uses a database to store metadata, but the data that Hive processes is stored in HDFS  Hive is not designed for on-line transaction processing and does not offer real-time queries and row level updates
  • 15. 15 1. Client sends a request to the NameNode and a file to HDFS 2. NameNode tells client how and where to distribute the blocks Big Data 3. Client breaks the data in to blocks and distributes the blocks to the DataNode 4. DataNode replicates the blocks (as instructed by NameNode How HDFS Works?
  • 19. 19 Hortonworks HDP: Enterprise Hadoop 1.x Distribution OS Cloud VM Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE (HCATALOG) PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS
  • 20. 20 Hadoop 2.0… The Enterprise Generation Business Value Big Data Transactions, Interactions, Observations Single Platform Multiple Use BATCH INTERACTIVE ONLINE 1.0 Architected for the Large Web Properties 2.0 Architected for the Broad Enterprise Enterprise Requirements Hadoop 2.0 Features Mixed workloads YARN Interactive Query Hive on Tez Reliability Full Stack HA Point in time Recovery Snapshots Multi Data Center Disaster Recovery ZERO downtime Rolling Upgrades Security Knox Gateway
  • 21. 21 HDP: Enterprise Hadoop 2.0 Distribution OS/VM Cloud Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE HDFS MAP Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN* TEZ* OTHERREDUCE
  • 22. 22 Current Architecture in my Project Source System Integration Layer Enterprise Data Warehouse Layer Semantic Layer Presentation Layer Oracle DB2 SQL Server Data Profiling Data Extraction Data Quality Data Transformation Data Loading MetadataManagement Scheduling Semantic/Mart Enterprise Data Warehouse Staging Flat- file/.cvs XML Metadata Management Data Governence Data Quality SAP BO Universe SAP BO Reports Landing Other Systems Get the Data using Sqoop Use Hive External & Managed Table
  • 23. 23 Lambda Architecture The Lambda-Architecture aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. The resulting system should be linearly scalable, and it should scale out rather than up. 1. All data entering the system is dispatched to both the batch layer and the speed layer for processing. 2. The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views. 3. The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way. 4. The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only. 5. Any incoming query can be answered by merging results from batch views and real-time views.