SlideShare a Scribd company logo
1 of 10
• The classic tool for processing line-oriented data is awk==> without hadoop
• Although Hadoop is best known for MapReduce and its distributed filesystem (HDFS,
renamed from NDFS)
• In earlier versions of Hadoop, there was a single site configuration file
for the Common, HDFS, and MapReduce components, called hadoopsite.xml.
• No horizontal scalability(Single NameNode)
• JobTracker is single Point of availability
• Perform only MapReduce Jobs
• Tight coupling
• Problems with resource utilization
• Hive can execute a SQL query as a series of
MapReduce jobs, performance is the issue
• No Sharing Between jobs
• Not good for online low latency job
• Can't run anything other than Java
• HDFS 1.0
• Supports Horizontle Scalability(Multiple NameNode)
• Highly Available
• Perform Multiple Jobs(other sub-projects)
• YARN decouples it
• Well utilization
• Good query performance
• Backward compatible
• HDFS 2.0
• Beyond Java
2.0 Features1.0 Limitations
HDFS
Persistent Data Structures
The namespaceID is a unique identifier for the filesystem namespace, which is created
when the namenode is first formatted. The clusterID is a unique identifier for the
HDFS cluster as a whole; this is important for HDFS federation,
where a cluster is made up of multiple namespaces and each namespace
is managed by one namenode. The blockpoolID is a unique identifier for the
block pool containing all the files in the namespace managed by this namenode
The in_use.lock file is a lock file that the namenode uses to lock the storage directory.
This prevents another namenode instance from running at the same time with (and
possibly corrupting) the same storage directory.
1.0 2.0
1.0 Hadoop FileSystem 2.0
Speculative Execution(Reasons to Turn off)
1.0
On a busy cluster, speculative execution can reduce overall throughput, since redundant
tasks are being executed in an attempt to bring down the execution time for a
single job. For this reason, some cluster administrators prefer to turn it off on the cluster
and have users explicitly turn it on for individual jobs. when speculative execution could be overly aggressive in
scheduling speculative tasks.
2.0
There is a good case for turning off speculative execution for reduce tasks, since any
duplicate reduce tasks have to fetch the same map outputs as the original task, and this can
significantly increase network traffic on the cluster
Another reason for turning off speculative execution is for nonidempotent tasks(no additional effect if it is called
more than once with the same input parameters).
However, in many cases it is possible to write tasks to be idempotent and use an OutputCommitter to
promote the output to its final location when the task succeeds
1.0
2.0
Configuration Files
1.0
2.0
Security
1.0
One area that hasn’t yet been addressed in the security work is encryption: neither RPC
nor block transfers are encrypted. HDFS blocks are not stored in an encrypted form.
2.0
Various parts of Hadoop can be configured to encrypt network data, including RPC
(hadoop.rpc.protection), HDFS block transfers (dfs.encrypt.data.transfer), the
MapReduce shuffle (mapreduce.shuffle.ssl.enabled), and the web UIs
(hadokop.ssl.enabled). Work is ongoing to encrypt data “at rest,” too, so that HDFS
blocks can be stored in encrypted form, for example
Upgrades(compatability)
1.0
All pre-1.0 Hadoop components have very rigid version compatibility requirements.
Only components from the same release are guaranteed to be compatible with each
other, which means the whole system—from daemons to clients—has to be upgraded
simultaneously, in lockstep. This necessitates a period of cluster downtime.
Version 1.0 of Hadoop promises to loosen these requirements so that, for example,
older clients can talk to newer servers (within the same major release number). In later
releases, rolling upgrades may be supported, which would allow cluster daemons to be
upgraded in phases, so that the cluster would still be available to clients during the
upgrade.
Minor releases (e.g., from 1.0.x to 1.1.0) and point releases (e.g., from 1.0.1
to 1.0.2) should not break compatibility.
2.0
API compatibility, data compatibility, and wire compatibility
1 Api Compatability:
API compatibility concerns the contract between user code and the published Hadoop APIs, such as the Java
MapReduce APIs. Major releases are allowed to break API compatibility, so user programs
may need to be modified and recompiled.
2 Data Compatability
Data compatibility concerns persistent data and metadata formats, such as the format in which the HDFS namenode
stores its persistent data. The formats can change across minor or major releases, but the change is transparent to
users because the upgrade will automatically migrate the data. There may be some restrictions about upgrade paths,
and these are covered in the release notes. For example, it may be necessary to upgrade via an intermediate release
rather than upgrading directly to the later final release in one step
3 Wire Compatability
Wire compatibility concerns the interoperability between clients and servers via wire protocols such as RPC and
HTTP. The rule for wire compatibility is that the client must have the same major release number as the
server,but may differ in its minor or point release number (e.g., client version 2.0.2 will work with server 2.0.1 or
2.1.0, but not necessarily with server 3.0.0).
2.0 Features
• Flume
• Apache Crunch is a higher-level API for writing MapReduce pipelines.
• Spark
• Parquet File Format

More Related Content

What's hot

Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorContinuent
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPavan Deolasee
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Shi Shao Feng
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Ashnikbiz
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)Gary Jackson MBCS
 
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Continuent
 
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesRACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesLudovico Caldara
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
Microsoft SharePoint Disaster Recovery to Azure
Microsoft SharePoint Disaster Recovery to AzureMicrosoft SharePoint Disaster Recovery to Azure
Microsoft SharePoint Disaster Recovery to AzureDavid J Rosenthal
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 

What's hot (20)

Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten Replicator
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL Cluster
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
MYSQL
MYSQLMYSQL
MYSQL
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
 
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
 
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesRACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Microsoft SharePoint Disaster Recovery to Azure
Microsoft SharePoint Disaster Recovery to AzureMicrosoft SharePoint Disaster Recovery to Azure
Microsoft SharePoint Disaster Recovery to Azure
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 

Viewers also liked

SEMINAR BAHASA NUR FADILAH
SEMINAR BAHASA NUR FADILAHSEMINAR BAHASA NUR FADILAH
SEMINAR BAHASA NUR FADILAHDhita Candra
 
I'm not american, i'm canadian!
I'm not american, i'm canadian!I'm not american, i'm canadian!
I'm not american, i'm canadian!clanmort
 
Where are you from
Where are you fromWhere are you from
Where are you fromclanmort
 
Women's in Open Source(Mozilla)
Women's in Open Source(Mozilla)Women's in Open Source(Mozilla)
Women's in Open Source(Mozilla)khansara9419
 
Boek Heleen
Boek HeleenBoek Heleen
Boek HeleenIdaklas
 
Yeni blogger
Yeni bloggerYeni blogger
Yeni bloggerrabiakk
 
Motivation and winning
Motivation and winningMotivation and winning
Motivation and winningSlide2theLeft
 
SEMINAR BAHASA_DHITA CANDRA PUSPITA
SEMINAR BAHASA_DHITA CANDRA PUSPITASEMINAR BAHASA_DHITA CANDRA PUSPITA
SEMINAR BAHASA_DHITA CANDRA PUSPITADhita Candra
 
Présentation avant projet itinérances mars 2013
Présentation  avant projet itinérances mars 2013Présentation  avant projet itinérances mars 2013
Présentation avant projet itinérances mars 2013escuelasconviviales2013
 
BizSmart Regain Business Direction
BizSmart Regain Business DirectionBizSmart Regain Business Direction
BizSmart Regain Business DirectionWAKSTER Limited
 
File 6B therewas
File 6B therewasFile 6B therewas
File 6B therewasclanmort
 

Viewers also liked (20)

Xhotels slides show
Xhotels slides showXhotels slides show
Xhotels slides show
 
SEMINAR BAHASA NUR FADILAH
SEMINAR BAHASA NUR FADILAHSEMINAR BAHASA NUR FADILAH
SEMINAR BAHASA NUR FADILAH
 
Shiva j asthana
Shiva j asthanaShiva j asthana
Shiva j asthana
 
I'm not american, i'm canadian!
I'm not american, i'm canadian!I'm not american, i'm canadian!
I'm not american, i'm canadian!
 
Where are you from
Where are you fromWhere are you from
Where are you from
 
Diderot filosofo
Diderot filosofoDiderot filosofo
Diderot filosofo
 
Mapa
MapaMapa
Mapa
 
Incredible india
Incredible indiaIncredible india
Incredible india
 
Women's in Open Source(Mozilla)
Women's in Open Source(Mozilla)Women's in Open Source(Mozilla)
Women's in Open Source(Mozilla)
 
Boek Heleen
Boek HeleenBoek Heleen
Boek Heleen
 
Ejercicio 1
Ejercicio 1Ejercicio 1
Ejercicio 1
 
Yeni blogger
Yeni bloggerYeni blogger
Yeni blogger
 
Motivation and winning
Motivation and winningMotivation and winning
Motivation and winning
 
TITANIC
TITANICTITANIC
TITANIC
 
FDI
FDIFDI
FDI
 
SEMINAR BAHASA_DHITA CANDRA PUSPITA
SEMINAR BAHASA_DHITA CANDRA PUSPITASEMINAR BAHASA_DHITA CANDRA PUSPITA
SEMINAR BAHASA_DHITA CANDRA PUSPITA
 
Présentation avant projet itinérances mars 2013
Présentation  avant projet itinérances mars 2013Présentation  avant projet itinérances mars 2013
Présentation avant projet itinérances mars 2013
 
BizSmart Regain Business Direction
BizSmart Regain Business DirectionBizSmart Regain Business Direction
BizSmart Regain Business Direction
 
File5 pe
File5 peFile5 pe
File5 pe
 
File 6B therewas
File 6B therewasFile 6B therewas
File 6B therewas
 

Similar to Hadoop 1.0 vs 2.0: A comparison of core features and limitations

Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Oozie & sqoop by pradeep
Oozie & sqoop by pradeepOozie & sqoop by pradeep
Oozie & sqoop by pradeepPradeep Pandey
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperScott Gray
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 

Similar to Hadoop 1.0 vs 2.0: A comparison of core features and limitations (20)

Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Oozie & sqoop by pradeep
Oozie & sqoop by pradeepOozie & sqoop by pradeep
Oozie & sqoop by pradeep
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Unit 5
Unit  5Unit  5
Unit 5
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Hadoop 1.0 vs 2.0: A comparison of core features and limitations

  • 1. • The classic tool for processing line-oriented data is awk==> without hadoop • Although Hadoop is best known for MapReduce and its distributed filesystem (HDFS, renamed from NDFS) • In earlier versions of Hadoop, there was a single site configuration file for the Common, HDFS, and MapReduce components, called hadoopsite.xml.
  • 2. • No horizontal scalability(Single NameNode) • JobTracker is single Point of availability • Perform only MapReduce Jobs • Tight coupling • Problems with resource utilization • Hive can execute a SQL query as a series of MapReduce jobs, performance is the issue • No Sharing Between jobs • Not good for online low latency job • Can't run anything other than Java • HDFS 1.0 • Supports Horizontle Scalability(Multiple NameNode) • Highly Available • Perform Multiple Jobs(other sub-projects) • YARN decouples it • Well utilization • Good query performance • Backward compatible • HDFS 2.0 • Beyond Java 2.0 Features1.0 Limitations
  • 3. HDFS Persistent Data Structures The namespaceID is a unique identifier for the filesystem namespace, which is created when the namenode is first formatted. The clusterID is a unique identifier for the HDFS cluster as a whole; this is important for HDFS federation, where a cluster is made up of multiple namespaces and each namespace is managed by one namenode. The blockpoolID is a unique identifier for the block pool containing all the files in the namespace managed by this namenode The in_use.lock file is a lock file that the namenode uses to lock the storage directory. This prevents another namenode instance from running at the same time with (and possibly corrupting) the same storage directory. 1.0 2.0
  • 5. Speculative Execution(Reasons to Turn off) 1.0 On a busy cluster, speculative execution can reduce overall throughput, since redundant tasks are being executed in an attempt to bring down the execution time for a single job. For this reason, some cluster administrators prefer to turn it off on the cluster and have users explicitly turn it on for individual jobs. when speculative execution could be overly aggressive in scheduling speculative tasks. 2.0 There is a good case for turning off speculative execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task, and this can significantly increase network traffic on the cluster Another reason for turning off speculative execution is for nonidempotent tasks(no additional effect if it is called more than once with the same input parameters). However, in many cases it is possible to write tasks to be idempotent and use an OutputCommitter to promote the output to its final location when the task succeeds
  • 8. Security 1.0 One area that hasn’t yet been addressed in the security work is encryption: neither RPC nor block transfers are encrypted. HDFS blocks are not stored in an encrypted form. 2.0 Various parts of Hadoop can be configured to encrypt network data, including RPC (hadoop.rpc.protection), HDFS block transfers (dfs.encrypt.data.transfer), the MapReduce shuffle (mapreduce.shuffle.ssl.enabled), and the web UIs (hadokop.ssl.enabled). Work is ongoing to encrypt data “at rest,” too, so that HDFS blocks can be stored in encrypted form, for example
  • 9. Upgrades(compatability) 1.0 All pre-1.0 Hadoop components have very rigid version compatibility requirements. Only components from the same release are guaranteed to be compatible with each other, which means the whole system—from daemons to clients—has to be upgraded simultaneously, in lockstep. This necessitates a period of cluster downtime. Version 1.0 of Hadoop promises to loosen these requirements so that, for example, older clients can talk to newer servers (within the same major release number). In later releases, rolling upgrades may be supported, which would allow cluster daemons to be upgraded in phases, so that the cluster would still be available to clients during the upgrade. Minor releases (e.g., from 1.0.x to 1.1.0) and point releases (e.g., from 1.0.1 to 1.0.2) should not break compatibility. 2.0 API compatibility, data compatibility, and wire compatibility 1 Api Compatability: API compatibility concerns the contract between user code and the published Hadoop APIs, such as the Java MapReduce APIs. Major releases are allowed to break API compatibility, so user programs may need to be modified and recompiled. 2 Data Compatability Data compatibility concerns persistent data and metadata formats, such as the format in which the HDFS namenode stores its persistent data. The formats can change across minor or major releases, but the change is transparent to users because the upgrade will automatically migrate the data. There may be some restrictions about upgrade paths, and these are covered in the release notes. For example, it may be necessary to upgrade via an intermediate release rather than upgrading directly to the later final release in one step 3 Wire Compatability Wire compatibility concerns the interoperability between clients and servers via wire protocols such as RPC and HTTP. The rule for wire compatibility is that the client must have the same major release number as the server,but may differ in its minor or point release number (e.g., client version 2.0.2 will work with server 2.0.1 or 2.1.0, but not necessarily with server 3.0.0).
  • 10. 2.0 Features • Flume • Apache Crunch is a higher-level API for writing MapReduce pipelines. • Spark • Parquet File Format