SlideShare a Scribd company logo
1 of 44
Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)
Jimmy Lin
The iSchool
University of Maryland
Wednesday, September 3, 2008
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
Source: http://www.free-pictures-photos.com/
The iSchool
University of Maryland
What is Cloud Computing?
1. Web-scale problems
2. Large data centers
3. Different models of computing
4. Highly-interactive Web applications
The iSchool
University of Maryland
1. Web-Scale Problems
 Characteristics:
 Definitely data-intensive
 May also be processing intensive
 Examples:
 Crawling, indexing, searching, mining the Web
 “Post-genomics” life sciences research
 Other scientific data (physics, astronomers, etc.)
 Sensor networks
 Web 2.0 applications
 …
The iSchool
University of Maryland
How much data?
 Wayback Machine has 2 PB + 20 TB/month (2006)
 Google processes 20 PB a day (2008)
 “all words ever spoken by human beings” ~ 5 EB
 NOAA has ~1 PB climate data (2007)
 CERN’s LHC will generate 15 PB a year (2008)
640K ought to be
enough for anybody.
Maximilien Brice, © CERN
Maximilien Brice, © CERN
The iSchool
University of Maryland
There’s nothing like more data!
s/inspiration/data/g;
(Banko and Brill, ACL 2001)
(Brants et al., EMNLP 2007)
The iSchool
University of Maryland
What to do with more data?
 Answering factoid questions
 Pattern matching on the Web
 Works amazingly well
 Learning relations
 Start with seed instances
 Search for patterns on the Web
 Using patterns to find more instances
Who shot Abraham Lincoln? → X shot Abraham Lincoln
Birthday-of(Mozart, 1756)
Birthday-of(Einstein, 1879)
Wolfgang Amadeus Mozart (1756 - 1791)
Einstein was born in 1879
PERSON (DATE –
PERSON was born in DATE
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
The iSchool
University of Maryland
2. Large Data Centers
 Web-scale problems? Throw more machines at it!
 Clear trend: centralization of computing resources in large
data centers
 Necessary ingredients: fiber, juice, and space
 What do Oregon, Iceland, and abandoned mines have in
common?
 Important Issues:
 Redundancy
 Efficiency
 Utilization
 Management
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
The iSchool
University of Maryland
Key Technology: Virtualization
Hardware
Operating System
App App App
Traditional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
The iSchool
University of Maryland
3. Different Computing Models
 Utility computing
 Why buy machines when you can rent cycles?
 Examples: Amazon’s EC2, GoGrid, AppNexus
 Platform as a Service (PaaS)
 Give me nice API and take care of the implementation
 Example: Google App Engine
 Software as a Service (SaaS)
 Just run it for me!
 Example: Gmail
“Why do it yourself if you can pay someone to do it for you?”
The iSchool
University of Maryland
4. Web Applications
 A mistake on top of a hack built on sand held together by
duct tape?
 What is the nature of software applications?
 From the desktop to the browser
 SaaS == Web-based applications
 Examples: Google Maps, Facebook
 How do we deliver highly-interactive Web-based
applications?
 AJAX (asynchronous JavaScript and XML)
 For better, or for worse…
The iSchool
University of Maryland
What is the course about?
 MapReduce: the “back-end” of cloud computing
 Batch-oriented processing of large datasets
 Ajax: the “front-end” of cloud computing
 Highly-interactive Web-based applications
 Computing “in the clouds”
 Amazon’s EC2/S3 as an example of utility computing
The iSchool
University of Maryland
Amazon Web Services
 Elastic Compute Cloud (EC2)
 Rent computing resources by the hour
 Basic unit of accounting = instance-hour
 Additional costs for bandwidth
 Simple Storage Service (S3)
 Persistent storage
 Charge by the GB/month
 Additional costs for bandwidth
 You’ll be using EC2/S3 for course assignments!
The iSchool
University of Maryland
This course is not for you…
 If you’re not genuinely interested in the topic
 If you’re not ready to do a lot of programming
 If you’re not open to thinking about computing in new ways
 If you can’t cope with uncertainly, unpredictability, poor
documentation, and immature software
 If you can’t put in the time
Otherwise, this will be a richly rewarding course!
Source: http://davidzinger.wordpress.com/2007/05/page/2/
The iSchool
University of Maryland
Cloud Computing Zen
 Don’t get frustrated (take a deep breath)…
 This is bleeding edge technology
 Those W$*#T@F! moments
 Be patient…
 This is the second first time I’ve taught this course
 Be flexible…
 There will be unanticipated issues along the way
 Be constructive…
 Tell me how I can make everyone’s experience better
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
The iSchool
University of Maryland
Things to go over…
 Course schedule
 Assignments and deliverables
 Amazon EC2/S3
The iSchool
University of Maryland
Web-Scale Problems?
 Don’t hold your breath:
 Biocomputing
 Nanocomputing
 Quantum computing
 …
 It all boils down to…
 Divide-and-conquer
 Throwing more hardware at the problem
Simple to understand… a lifetime to master…
The iSchool
University of Maryland
Divide and Conquer
“Work”
w1 w2 w3
r1 r2 r3
“Result”
“worker” “worker” “worker”
Partition
Combine
The iSchool
University of Maryland
Different Workers
 Different threads in the same core
 Different cores in the same CPU
 Different CPUs in a multi-processor system
 Different machines in a distributed system
The iSchool
University of Maryland
Choices, Choices, Choices
 Commodity vs. “exotic” hardware
 Number of machines vs. processor vs. cores
 Bandwidth of memory vs. disk vs. network
 Different programming models
The iSchool
University of Maryland
Flynn’s Taxonomy
Instructions
Single (SI) Multiple (MI)
Data
Multiple(M
SISD
Single-threaded
process
MISD
Pipeline
architecture
SIMD
Vector Processing
MIMD
Multi-threaded
Programming
Single(SD)
The iSchool
University of Maryland
SISD
D D D D D D D
Processor
Instructions
The iSchool
University of Maryland
SIMD
D0
Processor
Instructions
D0D0 D0 D0 D0
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D0
The iSchool
University of Maryland
MIMD
D D D D D D D
Processor
Instructions
D D D D D D D
Processor
Instructions
The iSchool
University of Maryland
Memory Typology: Shared
Memory
Processor
Processor Processor
Processor
The iSchool
University of Maryland
Memory Typology: Distributed
MemoryProcessor MemoryProcessor
MemoryProcessor MemoryProcessor
Network
The iSchool
University of Maryland
Memory Typology: Hybrid
Memory
Processor
Network
Processor
Memory
Processor
Processor
Memory
Processor
Processor
Memory
Processor
Processor
The iSchool
University of Maryland
Parallelization Problems
 How do we assign work units to workers?
 What if we have more work units than workers?
 What if workers need to share partial results?
 How do we aggregate partial results?
 How do we know all the workers have finished?
 What if workers die?
What is the common theme of all of these problems?
The iSchool
University of Maryland
General Theme?
 Parallelization problems arise from:
 Communication between workers
 Access to shared resources (e.g., data)
 Thus, we need a synchronization system!
 This is tricky:
 Finding bugs is hard
 Solving bugs is even harder
The iSchool
University of Maryland
Managing Multiple Workers
 Difficult because
 (Often) don’t know the order in which workers run
 (Often) don’t know where the workers are running
 (Often) don’t know when workers interrupt each other
 Thus, we need:
 Semaphores (lock, unlock)
 Conditional variables (wait, notify, broadcast)
 Barriers
 Still, lots of problems:
 Deadlock, livelock, race conditions, ...
 Moral of the story: be careful!
 Even trickier if the workers are on different machines
The iSchool
University of Maryland
Patterns for Parallelism
 Parallel computing has been around for decades
 Here are some “design patterns” …
The iSchool
University of Maryland
Master/Slaves
slaves
master
The iSchool
University of Maryland
Producer/Consumer Flow
CP
P
P
C
C
CP
P
P
C
C
The iSchool
University of Maryland
Work Queues
CP
P
P
C
C
shared queue
W W W W W
The iSchool
University of Maryland
Rubber Meets Road
 From patterns to implementation:
 pthreads, OpenMP for multi-threaded programming
 MPI for clustering computing
 …
 The reality:
 Lots of one-off solutions, custom code
 Write you own dedicated library, then program with it
 Burden on the programmer to explicitly manage everything
 MapReduce to the rescue!
 (for next time)

More Related Content

Viewers also liked (7)

Chinese food box
Chinese food boxChinese food box
Chinese food box
 
Assembly Information Management System
Assembly Information Management SystemAssembly Information Management System
Assembly Information Management System
 
Toy Boxes.
Toy Boxes.Toy Boxes.
Toy Boxes.
 
Evolution of computers
Evolution of computersEvolution of computers
Evolution of computers
 
New Jersey photos
New Jersey photosNew Jersey photos
New Jersey photos
 
Bookmarks
BookmarksBookmarks
Bookmarks
 
Cathexis therapeutic imagery
Cathexis therapeutic imageryCathexis therapeutic imagery
Cathexis therapeutic imagery
 

Similar to Session1

ccna course 2
ccna course 2ccna course 2
ccna course 2
S Sridhar
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
eswcsummerschool
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
CS, NcState
 

Similar to Session1 (20)

ccna course 2
ccna course 2ccna course 2
ccna course 2
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunami
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
 
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
ppt
pptppt
ppt
 
Big Data Analytics V2
Big Data Analytics V2Big Data Analytics V2
Big Data Analytics V2
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Above the Clouds: A View From Academia
Above the Clouds: A View From AcademiaAbove the Clouds: A View From Academia
Above the Clouds: A View From Academia
 
Industry Vs Curriculum Talk Mec
Industry Vs Curriculum Talk MecIndustry Vs Curriculum Talk Mec
Industry Vs Curriculum Talk Mec
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Ibm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortIbm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 short
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 

Recently uploaded

Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service AvailableGoa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
Call Girls Mumbai
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Klinik kandungan
 
2024 May - Clearbit Integration with Hubspot - Greenville HUG.pptx
2024 May - Clearbit Integration with Hubspot  - Greenville HUG.pptx2024 May - Clearbit Integration with Hubspot  - Greenville HUG.pptx
2024 May - Clearbit Integration with Hubspot - Greenville HUG.pptx
Boundify
 

Recently uploaded (20)

Home Furnishings Ecommerce Platform Short Pitch 2024
Home Furnishings Ecommerce Platform Short Pitch 2024Home Furnishings Ecommerce Platform Short Pitch 2024
Home Furnishings Ecommerce Platform Short Pitch 2024
 
Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service AvailableGoa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
Goa Call Girls Just Call 👉📞9731111159 Top Class Call Girl Service Available
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
Jual obat aborsi Hongkong ( 085657271886 ) Cytote pil telat bulan penggugur k...
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDEJIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
JIND CALL GIRL ❤ 8272964427❤ CALL GIRLS IN JIND ESCORTS SERVICE PROVIDE
 
Berhampur Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
Berhampur Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...Berhampur Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
Berhampur Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Service Avai...
 
Pimpri Chinchwad Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Servi...
Pimpri Chinchwad Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Servi...Pimpri Chinchwad Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Servi...
Pimpri Chinchwad Call Girl Just Call♥️ 8084732287 ♥️Top Class Call Girl Servi...
 
MEHSANA 💋 Call Girl 9827461493 Call Girls in Escort service book now
MEHSANA 💋 Call Girl 9827461493 Call Girls in  Escort service book nowMEHSANA 💋 Call Girl 9827461493 Call Girls in  Escort service book now
MEHSANA 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
2024 May - Clearbit Integration with Hubspot - Greenville HUG.pptx
2024 May - Clearbit Integration with Hubspot  - Greenville HUG.pptx2024 May - Clearbit Integration with Hubspot  - Greenville HUG.pptx
2024 May - Clearbit Integration with Hubspot - Greenville HUG.pptx
 
Managerial Accounting 5th Edition by Stacey Whitecotton test bank.docx
Managerial Accounting 5th Edition by Stacey Whitecotton test bank.docxManagerial Accounting 5th Edition by Stacey Whitecotton test bank.docx
Managerial Accounting 5th Edition by Stacey Whitecotton test bank.docx
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Moradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in PenacovaMoradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in Penacova
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Top Quality adbb 5cl-a-d-b Best precursor raw material
Top Quality adbb 5cl-a-d-b Best precursor raw materialTop Quality adbb 5cl-a-d-b Best precursor raw material
Top Quality adbb 5cl-a-d-b Best precursor raw material
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.Ital Liptz - all about Itai Liptz. news.
Ital Liptz - all about Itai Liptz. news.
 
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
 

Session1

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
  • 3. The iSchool University of Maryland What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications
  • 4. The iSchool University of Maryland 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  …
  • 5. The iSchool University of Maryland How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody.
  • 8. The iSchool University of Maryland There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007)
  • 9. The iSchool University of Maryland What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Who shot Abraham Lincoln? → X shot Abraham Lincoln Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 PERSON (DATE – PERSON was born in DATE (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
  • 10. The iSchool University of Maryland 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management
  • 13. The iSchool University of Maryland Key Technology: Virtualization Hardware Operating System App App App Traditional Stack Hardware OS App App App Hypervisor OS OS Virtualized Stack
  • 14. The iSchool University of Maryland 3. Different Computing Models  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail “Why do it yourself if you can pay someone to do it for you?”
  • 15. The iSchool University of Maryland 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse…
  • 16. The iSchool University of Maryland What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing
  • 17. The iSchool University of Maryland Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments!
  • 18. The iSchool University of Maryland This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course!
  • 20. The iSchool University of Maryland Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better
  • 25. The iSchool University of Maryland Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3
  • 26. The iSchool University of Maryland Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master…
  • 27. The iSchool University of Maryland Divide and Conquer “Work” w1 w2 w3 r1 r2 r3 “Result” “worker” “worker” “worker” Partition Combine
  • 28. The iSchool University of Maryland Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system
  • 29. The iSchool University of Maryland Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models
  • 30. The iSchool University of Maryland Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) Data Multiple(M SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single(SD)
  • 31. The iSchool University of Maryland SISD D D D D D D D Processor Instructions
  • 32. The iSchool University of Maryland SIMD D0 Processor Instructions D0D0 D0 D0 D0 D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D0
  • 33. The iSchool University of Maryland MIMD D D D D D D D Processor Instructions D D D D D D D Processor Instructions
  • 34. The iSchool University of Maryland Memory Typology: Shared Memory Processor Processor Processor Processor
  • 35. The iSchool University of Maryland Memory Typology: Distributed MemoryProcessor MemoryProcessor MemoryProcessor MemoryProcessor Network
  • 36. The iSchool University of Maryland Memory Typology: Hybrid Memory Processor Network Processor Memory Processor Processor Memory Processor Processor Memory Processor Processor
  • 37. The iSchool University of Maryland Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems?
  • 38. The iSchool University of Maryland General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder
  • 39. The iSchool University of Maryland Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines
  • 40. The iSchool University of Maryland Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” …
  • 41. The iSchool University of Maryland Master/Slaves slaves master
  • 42. The iSchool University of Maryland Producer/Consumer Flow CP P P C C CP P P C C
  • 43. The iSchool University of Maryland Work Queues CP P P C C shared queue W W W W W
  • 44. The iSchool University of Maryland Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time)