سکوهای ابری و مدل های برنامه نویسی در ابر

‫ابر‬‫در‬‫ی‬ ‫نویس‬‫نامه‬‫ر‬‫ب‬‫های‬‫مدل‬‫و‬‫ی‬‫ابر‬‫سکوهای‬
‫ی‬‫امیر‬ ‫وحید‬
‫امی‬‫دانشگاه‬ ‫ی‬‫ابر‬‫ایانش‬‫ر‬‫مایشگاه‬‫ز‬‫آ‬‫رکبیر‬
‫آبان‬1391
‫ی‬‫ابر‬ ‫ایانش‬‫ر‬ ‫ملی‬ ‫کارگاه‬ ‫اولین‬
1
Vahid amiri
Vahidamiry.ir

Anatomy of a Cloud
Data Centers
Clusters
Storage
Other
Grids/Clouds
Virtualization
VM Management & Deployment
Amazon S3, EC2
OpenNebula, Eucalyptus
Web 2.0 Interface
Programming API
Scripting & Programming
Languages
Google AppEngine
Microsoft Azure
Manjrasoft Aneka
Google Apps (Gmail, Docs,…)
Salesforce.com
Public Cloud
Private Cloud
Infrastructure as a Service
Platform as a Service
Software as a Service
2

The Next Revolution in IT
• Cloud Computing
• Subscribe
• Use
• $ - pay for what you
use, based on QoS
• Classical Computing
3

Example cloud-based deployment of an application
4

Platform as a Service (PaaS)
• Platform as a Service (PaaS) cloud systems provide a
software execution environment that application services
can run on
• The environment is not just a pre-installed operating
system but is also integrated with a programming-
language-level platform
• PaaS clouds’ users don’t need to take care of the
resource management or allocation problems such as
automatic scaling and load balancing.
5

6
Common PaaS Scenario
Executor
Scheduler
Executor
Executor Executor
internet
internet
Programming / Deployment Model
public DumbTask: ITask
{
…
public void Execute()
{
……
}
}
for(int i=0; i<n; i++)
{
…
DumbTask task = new DumbTask();
app.SubmitExecution(task);
}

PaaS Providers
PaaS provider
Programming
Environments
Infrastructure
Google AppEngine Python, Java and Go Google Data Center
Azure .Net (Microsoft Visual Studio) Microsoft Data Centers
Force.com Apex Programming and Java Saleforce Data Center
Heroku Ruby, Java, Python and Scala Amazon EC2 and S3
Hadoop
MapReduce Model(Java,
Python)
Private Cloud- Elastic MapReduce
AppScale Java, Python Private Cloud
7

• Google App Engine lets you run your web applications on
Google's infrastructure
• With App Engine, there are no servers to maintain: You
just upload your application, and it's ready to serve your
users.
8

Google AppEngine
• Full support for common web technologies
• Program in Java, Go, or Python
• Automatic scaling, load balancing
• Scheduled tasks & queues
• Persistent storage
• Sandboxing
9

Google App Engine Architecture
10

storing data:
• App Engine Datastore
• NOSql Datastore
• Google Cloud SQL
• RDBMS Based Databases (MySQL)
• Google Cloud Storage
• provides a storage service for objects and files up to terabytes in
size
11

App Engine Services
• Mail
• Memcache
• Image Manipulation
• Full Text Search API
• Google Cloud Storage API
• Datastore API
• Blobstore API
12

PaaS Advantages
• Infinite compute resource available on demand
• Pay per use basis
• Reduced costs due to dynamic resource provisioning
• Scalability - No need to plan for peak load
• Easy management
• Software versioning and upgrading
• Elastic
• Only use what you need
14

Scalability
• Energy
• Utilization
• $$$
Static Solution Cloud based solution
Resources
15

Risks
• Privacy
• Who access your data?
• Security
• How much you trust your provider?
• What about recovery, tracing, and data integrity?
• Political and legal issues
• Who owns the data?
• Who uses your personal data?
• Government
• Where is your data?
• Amazon Availability Zones
• Lock-in to vendor
16

Hadoop Platform
• Google Articles
• The Google File System - 2003
• MapReduce: Simplified Data Processing on Large Cluster - 2004
• A framework for storing & processing Petabyte of data
using commodity hardware and storage
• Hadoop partitions data and computation across many
(thousands) of hosts, and executing application
computations in parallel close to their data.
17

Hadoop clusters
• Yahoo has ~20,000 machines running Hadoop
• largest clusters are currently 3000 nodes
• Load 30-50TB/day
18

Hadoop projects
• HDFS : A distributed filesystem that runs on large clusters of
commodity machines
• MapReduce : A distributed data processing model
• Hbase : A distributed, column-oriented database.
• Hive : A distributed data warehouse. Hive manages data
stored in HDFS and provides a query language based on SQL
• Pig : A data flow language and execution environment for
exploring very large datasets
20

Hadoop Characteristics
• Commodity HW + Horizontal scaling
• Add inexpensive servers
• Storage servers and their disks are not assumed to be highly reliable and available
• Use replication across servers to deal with unreliable storage/servers
• Support for moving computation close to data
• Automatic re-execution on failure/distribution
• Metadata-data separation - simple design
• Storage scales horizontally
• Metadata scales vertically (today)
21

Components
• Distributed File System
• HDFS
• Distributed Processing Framework
• Map/Reduce
22

Hadoop Distributed File System- HDFS
23

HDFS Architecture
• Master-Slave Architecture
• HDFS Master “Namenode”
• Manages all filesystem metadata
• File name to list blocks + location mapping
• Collect block reports from Datanodes on block locations
• Replicate missing blocks
• Controls read/write access to files
• Manages block replication
• HDFS Slaves “Datanodes”
• Notifies NameNode about block-IDs it has
• Serve read/write requests from clients
• Perform replication tasks upon instruction by namenode
• Rack-aware
24

REPLICA MANGEMENT
• The placement of replicas is critical to HDFS data
reliability and read/write performance.
25

MapReduce Model
• Developing MapReduce based Applications
• Define map and reduce operations
• Provide the data
• Run the MapReduce engine
• MapReduce library does most of the hard work for us!
• Parallelization
• Fault Tolerance
• Data Distribution
• Load Balancing
28

Map and Reduce
• Map()
• Map workers read in contents of corresponding input partition
• Process a key/value pair to generate intermediate key/value pairs
• Reduce()
• Merge all intermediate values associated with the same key
• eg. <key, [value1, value2,..., valueN]>
• Output of user's reduce function is written to output file on global file
system
Input data
map & reduce
MapReduce engine
Map & Reduce network
29

MapReduce Example
the quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Map
Map
Map
Reduce
Reduce
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
quick, 1
brown, 1
fox, 1
the, 1
fox, 1
the, 1
ate, 1
mouse, 1
how, 1
now, 1
brown, 1
cow, 1
Input Map Reduce Resualt
30

MapReduce Components
• Master-Slave architecture
• JobTracker
• Accepts jobs submitted by users
• Assigns Map and Reduce tasks to Tasktrackers
• Makes all scheduling decisions
• Schedules tasks on nodes close to data
• Monitors task and tasktracker status, re-executes tasks upon failure
• TaskTracker
• Asks for new tasks, executes, monitors, reports status
• Run Map and Reduce tasks upon instruction from the Jobtracker
• Manage storage and transmission of intermediate output
32

Private Cloud
• HADOOP AND EUCALYPTUS INTEGRATION
• in order to build a Hadoop cluster, it can use virtual machines that are
created by the Eucalyptus
Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7….
Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor
Infrastructure Manager
VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27
DFS-M
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
…
Master
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Slave7
Slave8
Slave9
Distributed File System / Platform Manager
….
35

Case study - Evolutionary algorithms
• In artificial intelligence, an evolutionary algorithm (EA) is
a subset of evolutionary computation, a generic
population-based metaheuristic optimization algorithm:
• Genetic algorithm
• Populations
• Fitness Function
• Mutation
• Crossover
36

MapReduce Model
Map
Intermediate Data
Reduce
Initial population
38

Job Shop Scheduling Problem
39

Program Model
1 | 0 0,020123011
1 | 0 0,310103022
1 | 0 0,120321302
2 | 0 0,310223012
2 | 0 0,320103012
2 | 0 0,220321301
1 | 12 1,020123011
1 | 17 1,310103022
1 | 20 1,120321302
2 | 21 1,310223012
2 | 10 1,320103012
2 | 19 1,220321301
1 | 0 0,310103022
1 | 0 0,310103022
1 | 17 1,120321302
2 | 0 0,310223012
2 | 10 1,320103012
2 | 0 0,220321301
Next Generation
Intermediate Data
Reduce
Initial population Map
41

Setup Cluster
• Cores: 7 * 16
= 112 Cores
• RAM: 7 * 32 G
= 224 G
• Hard: 7 * 500 G
44

System Configuration
• Infrastructure Management: 1 Core + 8G RAM
• Platform Management: 2 Core + 4G RAM
• Slaves: 48 * 1 core (2 G RAM)
Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7….
Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor
Infrastructure Manager
VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27
DFS-M
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
…
Master
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Slave7
Slave8
Slave9
Distributed File System / Platform Manager
….
45

‫یافته‬‫بهبود‬‫مدل‬
47

‫ثابت‬‫مقدار‬‫با‬‫جمعیت‬‫نمودار‬500‫نسل‬
48
0
10
20
30
40
50
60
70
80
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
TimeperIteration(inSeconds)
Population
hadoop
hop
haloop

‫نمودار‬‫ثابت‬‫مقدار‬‫با‬‫نسل‬50000‫ای‬‫ر‬‫ب‬‫جمعیت‬
49
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
500 1000 1500 2000 2500 3000 3500 4000
Time(Second)
Generation
Hadoop
Hop
Haloop

‫نمودار‬‫ثابت‬‫مقدار‬‫با‬‫دهنده‬‫کاهش‬‫تاثیر‬20000‫ای‬‫ر‬‫ب‬‫جمعیت‬
50
0
200
400
600
800
1000
1200
4 8 16 32 64 128
TimeperIteratins(inSeconds)
Number of Reducers
Hadoop
Hop
Haloop

سکوهای ابری و مدل های برنامه نویسی در ابر

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to سکوهای ابری و مدل های برنامه نویسی در ابر

Similar to سکوهای ابری و مدل های برنامه نویسی در ابر (20)

Recently uploaded

Recently uploaded (20)

سکوهای ابری و مدل های برنامه نویسی در ابر