SlideShare a Scribd company logo
Research on Scheduling Scheme
for Hadoop clusters
By: Jiong Xiea,b, FanJun Mengc, HaiLong Wangc, HongFang Panb, JinHong
Chengb, Xiao Qina
04/22/14 1CSC 8710
Outlines
• What is Hadoop?
• Hadoop Characterstics
• Hadoop Objectives
• Big Data Challenges
• Hadoop Architecture
• What is the predictive schedule and prefetching
mechanism ?
• Hadoop Issues
• Hadoop Scheduler
• PSP Scheduler
• Conclustion
04/22/14 2CSC 8710
Goal
• Designing prefetching mechanism to solve
the data moving problem in mapReducing
and to improve the performance.
04/22/14 CSC 8710 3
What is Hadoop?
• Hadoop is an open source software
framework that is used to deal with the
large amount of data and to process them
on clusters of commodity hardware.
04/22/14 4CSC 8710
Characteristics
• It is a framework of tools
- Not a particular program as some people think
• Open source tools.
• Distributed under apache license .
• Linux based tools.
• It works on a distributed models
- Not one big powerful computer, but numerous low
cost computers.
04/22/14 5CSC 8710
objectives
• Hadoop supports running of application on
Big Data.
• Therefore, Hadoop addresses Big Data
challenges.
Hadoop
Running application
on Big Datasupports
04/22/14 6CSC 8710
Big Data Challenges
04/22/14 7CSC 8710
Why Do We need Hadoop?
• Powerful computer can process data until some
point when the quantity of data becomes larger
than the ability of the computer.
• Now, we need Hadoop tool to deal with this
issue.
• Hadoop uses different strategy to deal with data.
04/22/14 8CSC 8710
Hadoop Functionality
• Hadoop breaks up the data into smaller pieces
and distribute them equally on different nodes to
be processed at the same time.
• Similarly, Hadoop divides the computation into
the nodes equally.
• Results are combined all together then sent
again to the application
04/22/14 9CSC 8710
Hadoop Functionality
Node Node
Big Data
Node
Combined
Result
Dividing the data equally
computation
Returning the result
Input data
Combining the result
04/22/14 10CSC 8710
Architecture
• Hadoop consists of two main components:
– MapReduce: divides the workload into smaller pieces
– File System (HDFS): accounts for component failure, and it
keeps directory for all the tasks
– There are other projects provide additional functionality:
• Pig
• Hive
• HBase
• Flume
• Mahout
• Oozie
• Scoop
MapReduce File System
HDFS
Hadoop
04/22/14 11CSC 8710
Architecture
• Slave computers consist of 2
components:
- Task Tracker: to process the given task, and it
represents the mapReduce component.
- Data Node: to manage the piece of task that has
been give to the task tracker, and it represents HDFS.
04/22/14 12CSC 8710
Architecture
• The master computer consists of 4
components:
- Job Tracker: It works under mapReduce component so it breaks up the
task into smaller pieces and divides them equally on the Task Trackers.
- Task Tracker: to process the given task.
- Name Node: It is responsible to keep an index of all the tasks.
- Data Node: to manage the piece of task that has been give to the
task tracker.
04/22/14 13CSC 8710
Architecture
04/22/14 14CSC 8710
Fault Tolerance for Data
• Hadoop keeps three copies of each file, and each copy is
given to a different node.
• If any one of the Task Tracker fails The Job Tracker will
detect that failure and will ask another Task Tracker to
take care of that job.
• Tables in The Name node will be backed up as well in
different computer, and this is the reason why the
enterprise version of Hadoop keeps two masters. One is
the working master and the other one is back up master.
04/22/14 15CSC 8710
Scalability cost
• The scalability cost is always linear. If you
want to increase the speed, increase the
number of computers.
04/22/14 16CSC 8710
predictive schedule and prefetching
• implementing a predictive schedule and
prefetching (PSP) mechanism on Hadoop tools
to improve the performance.
• Predictive scheduler:
- A flexible task scheduler, predicts the most appropriate task
trackers to the next data.
• Prefetching module:
– The responsible part of forcing the preload workers threads to
start loading data to main memory of the node before the
current task finish. It depends on estimated time.
04/22/14 17CSC 8710
PSP
• Factors that make PSP possible:
- Underutilization of CPU.
- Importance of MapReduce performance
- The storage availability in HDFS
- Interaction between the nodes
04/22/14 18CSC 8710
Hadoop’s Issue
• In the current MapReduce model, all the tasks are
managed by the master node, so the computation nodes
ask the master node to assign the new task to be
processed.
• The master node will tell the computing nodes what the
next task is, and where it is located.
• That will waste some of the CPU’s time while the
computation node communicates with the master node.
04/22/14 19CSC 8710
Hadoop’s Issue
• The original Hadoop assigns tasks randomly
from local or remote disk to the computation
node whenever the data is required.
• CPU of the computing nodes won’t process until
all the input data resources are loaded into the
main memory.
• This affects Hadoop’s performance negatively.
04/22/14 20CSC 8710
Prefetching
• It will force the preload workers threads to start
loading data from the local desk to the main
memory of the node before the current task
finish.
• The waiting time will be reduced, so the task will
be processed on time.
• Improving the performance of MapReduce
system.
04/22/14 21CSC 8710
Hadoop Scheduler
• The original Hadoop scheduler, The job tracker includes
the task scheduler module assign tasks to different tasks
trackers.
• Task Trackers periodically send heartbeat to the job
tracker.
• The job tracker checks the heartbeat and send tasks to
the available one.
• The scheduler assigns tasks randomly to the nodes via
the same heartbeat message protocol.
• It assigns tasks randomly and mispredict stragglers in
many cases.
04/22/14 22CSC 8710
Predictive Scheduler
• Making a predictive scheduler by designing a
prediction algorithm integrated with the original
Hadoop.
• The predictive scheduler predicts stragglers and
find the appropriate data blocks.
• The prediction decisions are made by a
prediction module during the prefetching stage.
04/22/14 23CSC 8710
Hadoop Function
04/22/14 24CSC 8710
Lunching Process
• Three basic steps to lunch the tasks:
- Copying the job from the shared file system to the job
tracker’s file system, and copying all the required
files.
- Creating a local directory of the task and un-jar the
content of the jar into the directory.
- Copying the task to the task tracker to be processed.
04/22/14 25CSC 8710
Lunching Process
• In PSP, all the last steps are monitored
by the prediction module, and it
predicts three events:
- The finish time of the current processed task.
- Tasks that are going to be assigned to the task
trackers
- Lunch time of the pending tasks.
04/22/14 26CSC 8710
prefetching
• These three issued must be addressed:
- When to prefetch:
- What to prefetch
- How much to prefetch
04/22/14 27CSC 8710
Conclusion
• Proposing a predictive scheduling and prefetching
mechanism (PSP) aim to enhance Hadoop performance.
• prediction module predicts data blocks to be accessed
by computing nodes in a cluster.
• the prefetching module preloads these future set of data
in the cache of the nodes.
• It has been applied on 10 nodes, so it reduces the
execution time up to 28% and 19% for the average.
• It increases the overall throughput and the I/O utilization.
04/22/14 28CSC 8710
Resources
• http://ac.els-cdn.com/S1877050913005668/1-s2.0-S1877050913005668-
main.pdf?_tid=00e2b8e8-8d59-11e3-be92-
00000aacb362&acdnat=1391490095_5f34abbe9f98d3b8a0978b2464478da
1
• http://blog.vitria.com/bid/87945/Big-Data-Analytics-Challenges-Facing-All-
Communications-Service-Providers
• http://blog.raremile.com/hadoop-demystified/
• http://namitkabra.wordpress.com/category/etl/page/2/
• http://odbms.org/download/Pro%20Hadoop%20Ch.%201.pdf
• http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
• http://wiki.apache.org/hadoop/Defining%20Hadoop
• https://engineering.purdue.edu/~ychu/ee673/Projects.F11/detectstraggeler_fi
nalrpt.pdf
04/22/14 29CSC 8710
04/22/14 30CSC 8710
04/22/14 31CSC 8710

More Related Content

What's hot

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
t_ivanov
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
Ravindra Bandara
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
Xuan-Chao Huang
 
Hadoop DB
Hadoop DBHadoop DB
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
Shravan (Sean) Pabba
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
FARUK BERKSÖZ
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
BalajiVaradarajan13
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
Databricks
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Apache Apex
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
Szehon Ho
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
Vu Thi Trang
 

What's hot (20)

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 

Viewers also liked

Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Yahoo Developer Network
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Meshal Albeedhani
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
NhatHai Phan
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
joshwills
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 

Viewers also liked (6)

Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 

Similar to Suggested Algorithm to improve Hadoop's performance.

Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
Amjith Singh
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Govt.Engineering college, Idukki
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
rishavkumar1402
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
pavan penugonda
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
ch adnan
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
AmmarHassan80
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Safir Shah
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
HPCC Systems
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
pranav_joshi
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 

Similar to Suggested Algorithm to improve Hadoop's performance. (20)

Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 

Suggested Algorithm to improve Hadoop's performance.

  • 1. Research on Scheduling Scheme for Hadoop clusters By: Jiong Xiea,b, FanJun Mengc, HaiLong Wangc, HongFang Panb, JinHong Chengb, Xiao Qina 04/22/14 1CSC 8710
  • 2. Outlines • What is Hadoop? • Hadoop Characterstics • Hadoop Objectives • Big Data Challenges • Hadoop Architecture • What is the predictive schedule and prefetching mechanism ? • Hadoop Issues • Hadoop Scheduler • PSP Scheduler • Conclustion 04/22/14 2CSC 8710
  • 3. Goal • Designing prefetching mechanism to solve the data moving problem in mapReducing and to improve the performance. 04/22/14 CSC 8710 3
  • 4. What is Hadoop? • Hadoop is an open source software framework that is used to deal with the large amount of data and to process them on clusters of commodity hardware. 04/22/14 4CSC 8710
  • 5. Characteristics • It is a framework of tools - Not a particular program as some people think • Open source tools. • Distributed under apache license . • Linux based tools. • It works on a distributed models - Not one big powerful computer, but numerous low cost computers. 04/22/14 5CSC 8710
  • 6. objectives • Hadoop supports running of application on Big Data. • Therefore, Hadoop addresses Big Data challenges. Hadoop Running application on Big Datasupports 04/22/14 6CSC 8710
  • 8. Why Do We need Hadoop? • Powerful computer can process data until some point when the quantity of data becomes larger than the ability of the computer. • Now, we need Hadoop tool to deal with this issue. • Hadoop uses different strategy to deal with data. 04/22/14 8CSC 8710
  • 9. Hadoop Functionality • Hadoop breaks up the data into smaller pieces and distribute them equally on different nodes to be processed at the same time. • Similarly, Hadoop divides the computation into the nodes equally. • Results are combined all together then sent again to the application 04/22/14 9CSC 8710
  • 10. Hadoop Functionality Node Node Big Data Node Combined Result Dividing the data equally computation Returning the result Input data Combining the result 04/22/14 10CSC 8710
  • 11. Architecture • Hadoop consists of two main components: – MapReduce: divides the workload into smaller pieces – File System (HDFS): accounts for component failure, and it keeps directory for all the tasks – There are other projects provide additional functionality: • Pig • Hive • HBase • Flume • Mahout • Oozie • Scoop MapReduce File System HDFS Hadoop 04/22/14 11CSC 8710
  • 12. Architecture • Slave computers consist of 2 components: - Task Tracker: to process the given task, and it represents the mapReduce component. - Data Node: to manage the piece of task that has been give to the task tracker, and it represents HDFS. 04/22/14 12CSC 8710
  • 13. Architecture • The master computer consists of 4 components: - Job Tracker: It works under mapReduce component so it breaks up the task into smaller pieces and divides them equally on the Task Trackers. - Task Tracker: to process the given task. - Name Node: It is responsible to keep an index of all the tasks. - Data Node: to manage the piece of task that has been give to the task tracker. 04/22/14 13CSC 8710
  • 15. Fault Tolerance for Data • Hadoop keeps three copies of each file, and each copy is given to a different node. • If any one of the Task Tracker fails The Job Tracker will detect that failure and will ask another Task Tracker to take care of that job. • Tables in The Name node will be backed up as well in different computer, and this is the reason why the enterprise version of Hadoop keeps two masters. One is the working master and the other one is back up master. 04/22/14 15CSC 8710
  • 16. Scalability cost • The scalability cost is always linear. If you want to increase the speed, increase the number of computers. 04/22/14 16CSC 8710
  • 17. predictive schedule and prefetching • implementing a predictive schedule and prefetching (PSP) mechanism on Hadoop tools to improve the performance. • Predictive scheduler: - A flexible task scheduler, predicts the most appropriate task trackers to the next data. • Prefetching module: – The responsible part of forcing the preload workers threads to start loading data to main memory of the node before the current task finish. It depends on estimated time. 04/22/14 17CSC 8710
  • 18. PSP • Factors that make PSP possible: - Underutilization of CPU. - Importance of MapReduce performance - The storage availability in HDFS - Interaction between the nodes 04/22/14 18CSC 8710
  • 19. Hadoop’s Issue • In the current MapReduce model, all the tasks are managed by the master node, so the computation nodes ask the master node to assign the new task to be processed. • The master node will tell the computing nodes what the next task is, and where it is located. • That will waste some of the CPU’s time while the computation node communicates with the master node. 04/22/14 19CSC 8710
  • 20. Hadoop’s Issue • The original Hadoop assigns tasks randomly from local or remote disk to the computation node whenever the data is required. • CPU of the computing nodes won’t process until all the input data resources are loaded into the main memory. • This affects Hadoop’s performance negatively. 04/22/14 20CSC 8710
  • 21. Prefetching • It will force the preload workers threads to start loading data from the local desk to the main memory of the node before the current task finish. • The waiting time will be reduced, so the task will be processed on time. • Improving the performance of MapReduce system. 04/22/14 21CSC 8710
  • 22. Hadoop Scheduler • The original Hadoop scheduler, The job tracker includes the task scheduler module assign tasks to different tasks trackers. • Task Trackers periodically send heartbeat to the job tracker. • The job tracker checks the heartbeat and send tasks to the available one. • The scheduler assigns tasks randomly to the nodes via the same heartbeat message protocol. • It assigns tasks randomly and mispredict stragglers in many cases. 04/22/14 22CSC 8710
  • 23. Predictive Scheduler • Making a predictive scheduler by designing a prediction algorithm integrated with the original Hadoop. • The predictive scheduler predicts stragglers and find the appropriate data blocks. • The prediction decisions are made by a prediction module during the prefetching stage. 04/22/14 23CSC 8710
  • 25. Lunching Process • Three basic steps to lunch the tasks: - Copying the job from the shared file system to the job tracker’s file system, and copying all the required files. - Creating a local directory of the task and un-jar the content of the jar into the directory. - Copying the task to the task tracker to be processed. 04/22/14 25CSC 8710
  • 26. Lunching Process • In PSP, all the last steps are monitored by the prediction module, and it predicts three events: - The finish time of the current processed task. - Tasks that are going to be assigned to the task trackers - Lunch time of the pending tasks. 04/22/14 26CSC 8710
  • 27. prefetching • These three issued must be addressed: - When to prefetch: - What to prefetch - How much to prefetch 04/22/14 27CSC 8710
  • 28. Conclusion • Proposing a predictive scheduling and prefetching mechanism (PSP) aim to enhance Hadoop performance. • prediction module predicts data blocks to be accessed by computing nodes in a cluster. • the prefetching module preloads these future set of data in the cache of the nodes. • It has been applied on 10 nodes, so it reduces the execution time up to 28% and 19% for the average. • It increases the overall throughput and the I/O utilization. 04/22/14 28CSC 8710
  • 29. Resources • http://ac.els-cdn.com/S1877050913005668/1-s2.0-S1877050913005668- main.pdf?_tid=00e2b8e8-8d59-11e3-be92- 00000aacb362&acdnat=1391490095_5f34abbe9f98d3b8a0978b2464478da 1 • http://blog.vitria.com/bid/87945/Big-Data-Analytics-Challenges-Facing-All- Communications-Service-Providers • http://blog.raremile.com/hadoop-demystified/ • http://namitkabra.wordpress.com/category/etl/page/2/ • http://odbms.org/download/Pro%20Hadoop%20Ch.%201.pdf • http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf • http://wiki.apache.org/hadoop/Defining%20Hadoop • https://engineering.purdue.edu/~ychu/ee673/Projects.F11/detectstraggeler_fi nalrpt.pdf 04/22/14 29CSC 8710