SlideShare a Scribd company logo
‫ابر‬‫در‬‫ی‬ ‫نویس‬‫نامه‬‫ر‬‫ب‬‫های‬‫مدل‬‫و‬‫ی‬‫ابر‬‫سکوهای‬
‫ی‬‫امیر‬ ‫وحید‬
‫امی‬‫دانشگاه‬ ‫ی‬‫ابر‬‫ایانش‬‫ر‬‫مایشگاه‬‫ز‬‫آ‬‫رکبیر‬
‫آبان‬1391
‫ی‬‫ابر‬ ‫ایانش‬‫ر‬ ‫ملی‬ ‫کارگاه‬ ‫اولین‬
1
Vahid amiri
Vahidamiry.ir
Anatomy of a Cloud
Data Centers
Clusters
Storage
Other
Grids/Clouds
Virtualization
VM Management & Deployment
Amazon S3, EC2
OpenNebula, Eucalyptus
Web 2.0 Interface
Programming API
Scripting & Programming
Languages
Google AppEngine
Microsoft Azure
Manjrasoft Aneka
Google Apps (Gmail, Docs,…)
Salesforce.com
Public Cloud
Private Cloud
Infrastructure as a Service
Platform as a Service
Software as a Service
2
The Next Revolution in IT
• Cloud Computing
• Subscribe
• Use
• $ - pay for what you
use, based on QoS
• Classical Computing
3
Example cloud-based deployment of an application
4
Platform as a Service (PaaS)
• Platform as a Service (PaaS) cloud systems provide a
software execution environment that application services
can run on
• The environment is not just a pre-installed operating
system but is also integrated with a programming-
language-level platform
• PaaS clouds’ users don’t need to take care of the
resource management or allocation problems such as
automatic scaling and load balancing.
5
6
Common PaaS Scenario
Executor
Scheduler
Executor
Executor Executor
internet
internet
Programming / Deployment Model
public DumbTask: ITask
{
…
public void Execute()
{
……
}
}
for(int i=0; i<n; i++)
{
…
DumbTask task = new DumbTask();
app.SubmitExecution(task);
}
PaaS Providers
PaaS provider
Programming
Environments
Infrastructure
Google AppEngine Python, Java and Go Google Data Center
Azure .Net (Microsoft Visual Studio) Microsoft Data Centers
Force.com Apex Programming and Java Saleforce Data Center
Heroku Ruby, Java, Python and Scala Amazon EC2 and S3
Hadoop
MapReduce Model(Java,
Python)
Private Cloud- Elastic MapReduce
AppScale Java, Python Private Cloud
7
• Google App Engine lets you run your web applications on
Google's infrastructure
• With App Engine, there are no servers to maintain: You
just upload your application, and it's ready to serve your
users.
8
Google AppEngine
• Full support for common web technologies
• Program in Java, Go, or Python
• Automatic scaling, load balancing
• Scheduled tasks & queues
• Persistent storage
• Sandboxing
9
Google App Engine Architecture
10
storing data:
• App Engine Datastore
• NOSql Datastore
• Google Cloud SQL
• RDBMS Based Databases (MySQL)
• Google Cloud Storage
• provides a storage service for objects and files up to terabytes in
size
11
App Engine Services
• Mail
• Memcache
• Image Manipulation
• Full Text Search API
• Google Cloud Storage API
• Datastore API
• Blobstore API
12
GAE Pricing
13
PaaS Advantages
• Infinite compute resource available on demand
• Pay per use basis
• Reduced costs due to dynamic resource provisioning
• Scalability - No need to plan for peak load
• Easy management
• Software versioning and upgrading
• Elastic
• Only use what you need
14
Scalability
• Energy
• Utilization
• $$$
Static Solution Cloud based solution
Resources
15
Risks
• Privacy
• Who access your data?
• Security
• How much you trust your provider?
• What about recovery, tracing, and data integrity?
• Political and legal issues
• Who owns the data?
• Who uses your personal data?
• Government
• Where is your data?
• Amazon Availability Zones
• Lock-in to vendor
16
Hadoop Platform
• Google Articles
• The Google File System - 2003
• MapReduce: Simplified Data Processing on Large Cluster - 2004
• A framework for storing & processing Petabyte of data
using commodity hardware and storage
• Hadoop partitions data and computation across many
(thousands) of hosts, and executing application
computations in parallel close to their data.
17
Hadoop clusters
• Yahoo has ~20,000 machines running Hadoop
• largest clusters are currently 3000 nodes
• Load 30-50TB/day
18
Hadoop Project
19
Hadoop projects
• HDFS : A distributed filesystem that runs on large clusters of
commodity machines
• MapReduce : A distributed data processing model
• Hbase : A distributed, column-oriented database.
• Hive : A distributed data warehouse. Hive manages data
stored in HDFS and provides a query language based on SQL
• Pig : A data flow language and execution environment for
exploring very large datasets
20
Hadoop Characteristics
• Commodity HW + Horizontal scaling
• Add inexpensive servers
• Storage servers and their disks are not assumed to be highly reliable and available
• Use replication across servers to deal with unreliable storage/servers
• Support for moving computation close to data
• Automatic re-execution on failure/distribution
• Metadata-data separation - simple design
• Storage scales horizontally
• Metadata scales vertically (today)
21
Components
• Distributed File System
• HDFS
• Distributed Processing Framework
• Map/Reduce
22
Hadoop Distributed File System- HDFS
23
HDFS Architecture
• Master-Slave Architecture
• HDFS Master “Namenode”
• Manages all filesystem metadata
• File name to list blocks + location mapping
• Collect block reports from Datanodes on block locations
• Replicate missing blocks
• Controls read/write access to files
• Manages block replication
• HDFS Slaves “Datanodes”
• Notifies NameNode about block-IDs it has
• Serve read/write requests from clients
• Perform replication tasks upon instruction by namenode
• Rack-aware
24
REPLICA MANGEMENT
• The placement of replicas is critical to HDFS data
reliability and read/write performance.
25
HDFS cluster
26
MapReduce framework
27
MapReduce Model
• Developing MapReduce based Applications
• Define map and reduce operations
• Provide the data
• Run the MapReduce engine
• MapReduce library does most of the hard work for us!
• Parallelization
• Fault Tolerance
• Data Distribution
• Load Balancing
28
Map and Reduce
• Map()
• Map workers read in contents of corresponding input partition
• Process a key/value pair to generate intermediate key/value pairs
• Reduce()
• Merge all intermediate values associated with the same key
• eg. <key, [value1, value2,..., valueN]>
• Output of user's reduce function is written to output file on global file
system
Input data
map & reduce
MapReduce engine
Map & Reduce network
29
MapReduce Example
the quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Map
Map
Map
Reduce
Reduce
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
quick, 1
brown, 1
fox, 1
the, 1
fox, 1
the, 1
ate, 1
mouse, 1
how, 1
now, 1
brown, 1
cow, 1
Input Map Reduce Resualt
30
Example: Counting Words
31
MapReduce Components
• Master-Slave architecture
• JobTracker
• Accepts jobs submitted by users
• Assigns Map and Reduce tasks to Tasktrackers
• Makes all scheduling decisions
• Schedules tasks on nodes close to data
• Monitors task and tasktracker status, re-executes tasks upon failure
• TaskTracker
• Asks for new tasks, executes, monitors, reports status
• Run Map and Reduce tasks upon instruction from the Jobtracker
• Manage storage and transmission of intermediate output
32
HDFS and MapReduce Cluster
33
Amazon Elastic MapReduce
34
Private Cloud
• HADOOP AND EUCALYPTUS INTEGRATION
• in order to build a Hadoop cluster, it can use virtual machines that are
created by the Eucalyptus
Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7….
Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor
Infrastructure Manager
VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27
DFS-M
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
…
Master
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Slave7
Slave8
Slave9
Distributed File System / Platform Manager
….
35
Case study - Evolutionary algorithms
• In artificial intelligence, an evolutionary algorithm (EA) is
a subset of evolutionary computation, a generic
population-based metaheuristic optimization algorithm:
• Genetic algorithm
• Populations
• Fitness Function
• Mutation
• Crossover
36
Genetic Algorithm Diagram
37
MapReduce Model
Map
Intermediate Data
Reduce
Initial population
38
Job Shop Scheduling Problem
39
10*10 problem
40
Program Model
1 | 0 0,020123011
1 | 0 0,310103022
1 | 0 0,120321302
2 | 0 0,310223012
2 | 0 0,320103012
2 | 0 0,220321301
1 | 12 1,020123011
1 | 17 1,310103022
1 | 20 1,120321302
2 | 21 1,310223012
2 | 10 1,320103012
2 | 19 1,220321301
1 | 0 0,310103022
1 | 0 0,310103022
1 | 17 1,120321302
2 | 0 0,310223012
2 | 10 1,320103012
2 | 0 0,220321301
Next Generation
Intermediate Data
Reduce
Initial population Map
41
Benchmarks
42
Amirkabir Supercomputer
43
Setup Cluster
• Cores: 7 * 16
= 112 Cores
• RAM: 7 * 32 G
= 224 G
• Hard: 7 * 500 G
44
System Configuration
• Infrastructure Management: 1 Core + 8G RAM
• Platform Management: 2 Core + 4G RAM
• Slaves: 48 * 1 core (2 G RAM)
Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7….
Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor
Infrastructure Manager
VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27
DFS-M
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
…
Master
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Slave7
Slave8
Slave9
Distributed File System / Platform Manager
….
45
Thank you!!
46
‫یافته‬‫بهبود‬‫مدل‬
47
‫ثابت‬‫مقدار‬‫با‬‫جمعیت‬‫نمودار‬500‫نسل‬
48
0
10
20
30
40
50
60
70
80
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
TimeperIteration(inSeconds)
Population
hadoop
hop
haloop
‫نمودار‬‫ثابت‬‫مقدار‬‫با‬‫نسل‬50000‫ای‬‫ر‬‫ب‬‫جمعیت‬
49
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
500 1000 1500 2000 2500 3000 3500 4000
Time(Second)
Generation
Hadoop
Hop
Haloop
‫نمودار‬‫ثابت‬‫مقدار‬‫با‬‫دهنده‬‫کاهش‬‫تاثیر‬20000‫ای‬‫ر‬‫ب‬‫جمعیت‬
50
0
200
400
600
800
1000
1200
4 8 16 32 64 128
TimeperIteratins(inSeconds)
Number of Reducers
Hadoop
Hop
Haloop

More Related Content

What's hot

Hadoop
HadoopHadoop
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
DataWorks Summit
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
Romeo Kienzler
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop projectKamal A
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
Ravi namboori
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 

Similar to سکوهای ابری و مدل های برنامه نویسی در ابر

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
Codemotion
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
Douglas Bernardini
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
York University
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Azure basics
Azure basicsAzure basics
Azure basics
Jitendra Soni
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the Cloud
Craig Dickson
 
Microservices - Is it time to breakup?
Microservices - Is it time to breakup? Microservices - Is it time to breakup?
Microservices - Is it time to breakup?
Dave Nielsen
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
Zohar Elkayam
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
剑飞 陈
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh Poduska
Data Con LA
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 

Similar to سکوهای ابری و مدل های برنامه نویسی در ابر (20)

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Hadoop
HadoopHadoop
Hadoop
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Azure basics
Azure basicsAzure basics
Azure basics
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the Cloud
 
Microservices - Is it time to breakup?
Microservices - Is it time to breakup? Microservices - Is it time to breakup?
Microservices - Is it time to breakup?
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh Poduska
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

سکوهای ابری و مدل های برنامه نویسی در ابر

  • 1. ‫ابر‬‫در‬‫ی‬ ‫نویس‬‫نامه‬‫ر‬‫ب‬‫های‬‫مدل‬‫و‬‫ی‬‫ابر‬‫سکوهای‬ ‫ی‬‫امیر‬ ‫وحید‬ ‫امی‬‫دانشگاه‬ ‫ی‬‫ابر‬‫ایانش‬‫ر‬‫مایشگاه‬‫ز‬‫آ‬‫رکبیر‬ ‫آبان‬1391 ‫ی‬‫ابر‬ ‫ایانش‬‫ر‬ ‫ملی‬ ‫کارگاه‬ ‫اولین‬ 1 Vahid amiri Vahidamiry.ir
  • 2. Anatomy of a Cloud Data Centers Clusters Storage Other Grids/Clouds Virtualization VM Management & Deployment Amazon S3, EC2 OpenNebula, Eucalyptus Web 2.0 Interface Programming API Scripting & Programming Languages Google AppEngine Microsoft Azure Manjrasoft Aneka Google Apps (Gmail, Docs,…) Salesforce.com Public Cloud Private Cloud Infrastructure as a Service Platform as a Service Software as a Service 2
  • 3. The Next Revolution in IT • Cloud Computing • Subscribe • Use • $ - pay for what you use, based on QoS • Classical Computing 3
  • 4. Example cloud-based deployment of an application 4
  • 5. Platform as a Service (PaaS) • Platform as a Service (PaaS) cloud systems provide a software execution environment that application services can run on • The environment is not just a pre-installed operating system but is also integrated with a programming- language-level platform • PaaS clouds’ users don’t need to take care of the resource management or allocation problems such as automatic scaling and load balancing. 5
  • 6. 6 Common PaaS Scenario Executor Scheduler Executor Executor Executor internet internet Programming / Deployment Model public DumbTask: ITask { … public void Execute() { …… } } for(int i=0; i<n; i++) { … DumbTask task = new DumbTask(); app.SubmitExecution(task); }
  • 7. PaaS Providers PaaS provider Programming Environments Infrastructure Google AppEngine Python, Java and Go Google Data Center Azure .Net (Microsoft Visual Studio) Microsoft Data Centers Force.com Apex Programming and Java Saleforce Data Center Heroku Ruby, Java, Python and Scala Amazon EC2 and S3 Hadoop MapReduce Model(Java, Python) Private Cloud- Elastic MapReduce AppScale Java, Python Private Cloud 7
  • 8. • Google App Engine lets you run your web applications on Google's infrastructure • With App Engine, there are no servers to maintain: You just upload your application, and it's ready to serve your users. 8
  • 9. Google AppEngine • Full support for common web technologies • Program in Java, Go, or Python • Automatic scaling, load balancing • Scheduled tasks & queues • Persistent storage • Sandboxing 9
  • 10. Google App Engine Architecture 10
  • 11. storing data: • App Engine Datastore • NOSql Datastore • Google Cloud SQL • RDBMS Based Databases (MySQL) • Google Cloud Storage • provides a storage service for objects and files up to terabytes in size 11
  • 12. App Engine Services • Mail • Memcache • Image Manipulation • Full Text Search API • Google Cloud Storage API • Datastore API • Blobstore API 12
  • 14. PaaS Advantages • Infinite compute resource available on demand • Pay per use basis • Reduced costs due to dynamic resource provisioning • Scalability - No need to plan for peak load • Easy management • Software versioning and upgrading • Elastic • Only use what you need 14
  • 15. Scalability • Energy • Utilization • $$$ Static Solution Cloud based solution Resources 15
  • 16. Risks • Privacy • Who access your data? • Security • How much you trust your provider? • What about recovery, tracing, and data integrity? • Political and legal issues • Who owns the data? • Who uses your personal data? • Government • Where is your data? • Amazon Availability Zones • Lock-in to vendor 16
  • 17. Hadoop Platform • Google Articles • The Google File System - 2003 • MapReduce: Simplified Data Processing on Large Cluster - 2004 • A framework for storing & processing Petabyte of data using commodity hardware and storage • Hadoop partitions data and computation across many (thousands) of hosts, and executing application computations in parallel close to their data. 17
  • 18. Hadoop clusters • Yahoo has ~20,000 machines running Hadoop • largest clusters are currently 3000 nodes • Load 30-50TB/day 18
  • 20. Hadoop projects • HDFS : A distributed filesystem that runs on large clusters of commodity machines • MapReduce : A distributed data processing model • Hbase : A distributed, column-oriented database. • Hive : A distributed data warehouse. Hive manages data stored in HDFS and provides a query language based on SQL • Pig : A data flow language and execution environment for exploring very large datasets 20
  • 21. Hadoop Characteristics • Commodity HW + Horizontal scaling • Add inexpensive servers • Storage servers and their disks are not assumed to be highly reliable and available • Use replication across servers to deal with unreliable storage/servers • Support for moving computation close to data • Automatic re-execution on failure/distribution • Metadata-data separation - simple design • Storage scales horizontally • Metadata scales vertically (today) 21
  • 22. Components • Distributed File System • HDFS • Distributed Processing Framework • Map/Reduce 22
  • 23. Hadoop Distributed File System- HDFS 23
  • 24. HDFS Architecture • Master-Slave Architecture • HDFS Master “Namenode” • Manages all filesystem metadata • File name to list blocks + location mapping • Collect block reports from Datanodes on block locations • Replicate missing blocks • Controls read/write access to files • Manages block replication • HDFS Slaves “Datanodes” • Notifies NameNode about block-IDs it has • Serve read/write requests from clients • Perform replication tasks upon instruction by namenode • Rack-aware 24
  • 25. REPLICA MANGEMENT • The placement of replicas is critical to HDFS data reliability and read/write performance. 25
  • 28. MapReduce Model • Developing MapReduce based Applications • Define map and reduce operations • Provide the data • Run the MapReduce engine • MapReduce library does most of the hard work for us! • Parallelization • Fault Tolerance • Data Distribution • Load Balancing 28
  • 29. Map and Reduce • Map() • Map workers read in contents of corresponding input partition • Process a key/value pair to generate intermediate key/value pairs • Reduce() • Merge all intermediate values associated with the same key • eg. <key, [value1, value2,..., valueN]> • Output of user's reduce function is written to output file on global file system Input data map & reduce MapReduce engine Map & Reduce network 29
  • 30. MapReduce Example the quick brown fox the fox ate the mouse how now brown cow Map Map Map Reduce Reduce brown, 2 fox, 2 how, 1 now, 1 the, 3 ate, 1 cow, 1 mouse, 1 quick, 1 the, 1 quick, 1 brown, 1 fox, 1 the, 1 fox, 1 the, 1 ate, 1 mouse, 1 how, 1 now, 1 brown, 1 cow, 1 Input Map Reduce Resualt 30
  • 32. MapReduce Components • Master-Slave architecture • JobTracker • Accepts jobs submitted by users • Assigns Map and Reduce tasks to Tasktrackers • Makes all scheduling decisions • Schedules tasks on nodes close to data • Monitors task and tasktracker status, re-executes tasks upon failure • TaskTracker • Asks for new tasks, executes, monitors, reports status • Run Map and Reduce tasks upon instruction from the Jobtracker • Manage storage and transmission of intermediate output 32
  • 33. HDFS and MapReduce Cluster 33
  • 35. Private Cloud • HADOOP AND EUCALYPTUS INTEGRATION • in order to build a Hadoop cluster, it can use virtual machines that are created by the Eucalyptus Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7…. Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Infrastructure Manager VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27 DFS-M DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N … Master Slave1 Slave2 Slave3 Slave4 Slave5 Slave6 Slave7 Slave8 Slave9 Distributed File System / Platform Manager …. 35
  • 36. Case study - Evolutionary algorithms • In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm: • Genetic algorithm • Populations • Fitness Function • Mutation • Crossover 36
  • 39. Job Shop Scheduling Problem 39
  • 41. Program Model 1 | 0 0,020123011 1 | 0 0,310103022 1 | 0 0,120321302 2 | 0 0,310223012 2 | 0 0,320103012 2 | 0 0,220321301 1 | 12 1,020123011 1 | 17 1,310103022 1 | 20 1,120321302 2 | 21 1,310223012 2 | 10 1,320103012 2 | 19 1,220321301 1 | 0 0,310103022 1 | 0 0,310103022 1 | 17 1,120321302 2 | 0 0,310223012 2 | 10 1,320103012 2 | 0 0,220321301 Next Generation Intermediate Data Reduce Initial population Map 41
  • 44. Setup Cluster • Cores: 7 * 16 = 112 Cores • RAM: 7 * 32 G = 224 G • Hard: 7 * 500 G 44
  • 45. System Configuration • Infrastructure Management: 1 Core + 8G RAM • Platform Management: 2 Core + 4G RAM • Slaves: 48 * 1 core (2 G RAM) Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7…. Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Infrastructure Manager VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27 DFS-M DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N DFS-N … Master Slave1 Slave2 Slave3 Slave4 Slave5 Slave6 Slave7 Slave8 Slave9 Distributed File System / Platform Manager …. 45
  • 48. ‫ثابت‬‫مقدار‬‫با‬‫جمعیت‬‫نمودار‬500‫نسل‬ 48 0 10 20 30 40 50 60 70 80 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 TimeperIteration(inSeconds) Population hadoop hop haloop