Your SlideShare is downloading. ×
CPSC433/533: Computer Networks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

CPSC433/533: Computer Networks

289
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
289
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CS435/535: Internet-Scale Applications http://zoo.cs.yale.edu/classes/cs435/ 1/12/2009
  • 2. Outline
    • What are Internet-scale applications?
    • Administration
  • 3. Outline
    • What are Internet-scale applications?
  • 4. Internet-Scale: Large Network
  • 5. Growth of the Internet in Terms of Number of Hosts
    • Number of Hosts on the Internet:
    • Aug. 1981 213
    • Oct. 1984 1,024
    • Dec. 1987 28,174
    • Oct. 1990 313,000
    • Jul. 1993 1,776,000
    • Jul. 1996 19,540,000
    • Jul. 1999 56 , 218 ,000
    • Jul. 200 4 285,139, 000
    • Jul. 2005 353,284, 000
    • Jul. 2007 489,774,000
    • Jul. 2008 570,937,000
    • Jul. 2009 681,064,000
    CAIDA router level view
  • 6. Internet Physical Infrastructure
    • Residential a ccess
      • Cable
      • Fiber
      • DSL
      • Wireless
    • Campus access, e.g.,
      • Ethernet
      • Wireless
    • The Internet is a network of heterogeneous networks
    • Each individually administrated network is called an Autonomous System (AS)
    Backbone ISP ISP ISP
  • 7. Abilene I2 Backbone http://weathermap.grnoc.iu.edu/abilene_jpg.html
  • 8. Qwest Backbone Map http://www.qwest.com/largebusiness/enterprisesolutions/networkMaps/preloader.swf
  • 9. ATT Global Backbone IP Network From http://www.business.att.com
  • 10. AT&T USA Backbone Map From AT&T web site.
  • 11. Internet Diameter
  • 12. Internet-Scale: Large User Base
  • 13. User Base of Large Internet Applications in U.S. (June 2009) Source: comScore Media Metrix (http://ir.comscore.com/releasedetail.cfm?releaseid=398136) Total U.S. - Home, Work and University Locations Unique Visitors (000) Unique Visitors Rank Property (000) ---- -------- -------- Total Internet : Total Audience 193,896 1 Google Sites 156,871 2 Yahoo! Sites 154,097 3 Microsoft Sites 127,454 4 AOL LLC 106,467 5 Fox Interactive Media 84,567 6 FACEBOOK.COM 77,031 7 Ask Network 73,041 8 eBay 71,020 9 Amazon Sites 63,178 10 Wikimedia Foundation Sites 60,692 11 Apple Inc. 56,554 12 Glam Media 54,223 13 Viacom Digital 51,575 14 Turner Network 50,841 15 CBS Interactive 50,341 16 craigslist, inc. 46,832 17 New York Times Digital 44,789 18 Weather Channel, The 41,751 19 Adobe Sites 38,120 20 Comcast Corporation 34,865 21 Verizon Communications Corporation 33,436 22 Wal-Mart 33,358 23 AT&T Interactive Network 31,582 24 Disney Online 31,362 25 Demand Media 28,938 26 Superpages.com Network 28,367 27 Expedia Inc 27,058 28 The Mozilla Organization 26,964 29 Target Corporation 26,284 30 WordPress 26,245 31 Answers.com Sites 26,163 32 Bank of America 25,479 33 Photobucket.com LLC 24,528 34 AT&T, Inc. 24,032 35 Gorilla Nation 24,022 36 United Online, Inc 22,828 37 Everyday Health 22,374 38 Break Media 22,334 39 CareerBuilder LLC 21,704 40 NBC Universal 21,202 41 ESPN 20,984 42 NetShelter Technology Media 20,635 43 iVillage.com: The Womens Network 20,594 44 Weatherbug Property 20,465 45 JPMorgan Chase Property 20,211 46 TWITTER.COM* 20,111 47 Real.com Network 19,918 48 EA Online 19,607 49 Gannett Sites 19,298 50 Time Warner -Excluding AOL 19,293
  • 14. Internet-Scale: Can Be Data/Processing Intensive
  • 15. How Much Data? 1 PB = 1000 TB 1EB = 1000 PB
  • 16. How Much Data?
    • Wayback Machine has 2 PB + 20 TB/month (2006)
    • NOAA has ~1 PB climate data (2007)
    • Google processes 20 PB a day (2008)
    • Internet traffic 5-8 EB (Dec. 2008)
    • Size of World’s digital content 500 EB (May 2009)
    640K ought to be enough for anybody. 1 PB = 1000 TB 1EB = 1000 PB http://en.wikipedia.org/wiki/Exabyte
  • 17. Processing Examples
    • Crawling, indexing, searching, mining the Web
    • Ecommerce transactions
    • Software as service
  • 18. Internet-Scale: Large System Scale
  • 19. Servers
    • Internet-scale problem? Throw more machines at it !
      • From tiny end users (called P2P)
      • From giant data centers (called data center applications)
  • 20. Large Data Centers
    • A trend: centralization of computing resources in large data centers
    • Necessary ingredients: fiber, juice, and space
      • What do Oregon, Iceland, and abandoned mines have in common?
    • Major design point: scale out, not scale up
  • 21. Source: Harper’s (Feb, 2008)
  • 22. Maximilien Brice, © CERN
  • 23. Internet-Scale: Evolving Computing Model
  • 24. Evolving Computing Models
    • Do it yourself (build your own data centers)
    • Utility computing
      • Why buy machines when you can rent cycles?
      • Examples: Amazon’s EC2, GoGrid, AppNexus
    • Platform as a Service (PaaS)
      • Give me nice API and take care of the implementation
      • Example: Google App Engine
    • Software as a Service (SaaS)
      • Just run it for me!
      • Example: Gmail; MS Exchange; MS Office Online
  • 25. Internet-Scale: Likely Web-Based
  • 26. Web-based Applications
    • The Internet infrastructure has better support for HTTP than other protocols
    • A trend of software applications:
      • From the desktop to the browser
      • SaaS == Web-based applications
      • Examples: Google Maps/Doc, Facebook
    • How do we deliver highly-interactive Web-based applications?
      • AJAX (asynchronous JavaScript and XML)
      • For better, or for worse…
  • 27. Internet-Scale: Software/Platform Architecture Matters
  • 28. Programming Architecture Matters
    • Performance vs. software extensibility
  • 29. Software Architecture Matters
    • It all boils down to…
      • Divide-and-conquer
      • Throwing more hardware at the problem as the problem grows bigger
  • 30. Divide and Conquer “ Work” w 1 w 2 w 3 r 1 r 2 r 3 “ Result” “ worker” “ worker” “ worker” Partition Combine It is simple to state, hard to master…
  • 31. Different Workers
    • Where are the workers?
      • Different threads in the same core
      • Different cores in the same CPU
      • Different CPUs in a multi-processor system
      • Different machines in a distributed system
    • Many design issues
      • Which worker does what?
      • How do the workers communicate/coordinate?
      • What if some workers die or are separated from others?
  • 32. Example Architecture: Three Tiered Architecture
    • Stateless frontend
    • Soft state middle tier containing application logic and common services
    • Backend persistent storage
  • 33. Platform Matters
    • “ Developers who have worked at the small scale might be asking themselves why we need to bother with “platform design” when we could just use some kind of out-of the-box solution. For small-scale applications, this can be a great idea. We save time and money up front and get a working and serviceable application. The problem comes at larger scales—there are no off-the-shelf kits that will allow you to build something like Amazon or Friendster. While building similar functionality might be fairly trivial, making that functionality work for millions of products, millions of users, and without spending far too much on hardware requires us to build something highly customized and optimized for our exact needs. There’s a good reason why the largest applications on the Internet are all bespoke creations: no other approach can create massively scalable applications within a reasonable budget.”
    http://www.evontech.com/symbian/55.html
  • 34. Outline
    • What are Internet-scale applications?
    • Course administration
  • 35. Personnel
    • I nstructor
      • Michael Fischer < [email_address] >
        • AKW 409
      • Y. Richard Yang <yry@cs.yale.edu >
        • AKW 308A
        • office hours
          • TTh 4 : 00 - 5 : 00 or by appointment
    • T eaching assistant (grader)
      • Ye Wang
  • 36. What are the Goals o f this Course?
    • Learn design principles and techniques of:
      • Large-scale Internet applications;
      • Infrastructure supporting such applications
    • See how the principles and techniques apply and adapt in real world:
      • Real examples from DNS/Email/Web, Akamai, Amazon (dynamo, AWS), Google (Google cluster, GFS, BigTable, Chubby, AppEngine), Microsoft (Live Mesh, Azure), PPLive
  • 37. What Will We Cover?
    • Background on Internet/DNS/Email/Web
    • Basic abstraction/design for high-performance client/server
      • multi-thread, async i/o, SEDA
      • adaptive applications (e.g., playout buffer)
    • Web service oriented architecture
      • Interactivity (ajax)
    • Server scaling
      • Load balancing (HTTP local/Akamai global example)
      • Cloud integration
      • Over-provisioning and capacity planning
    • Servent (end hosts contributed) design (P2P)
    • Application/network infrastructure integration and interface
    • Tiered architecture and middle layer design
      • Storage/state management (e.g., DDS, memcard, GFS, BigTable, dynamo, and PRS) [1.5 week]
      • Transaction/
    • Higher level programming models
      • MapReduce/Hadoop
      • Dryad
    • Data center design
    • Debugging, deployment and diagnosis
  • 38. References
    • We will use resources available from the Intrenet
    • Resources will be posted online
      • http://zoo.cs.yale.edu/classes/cs435
  • 39. What Do You Need To Do?
    • Please return the class background survey at the end of the class
      • help us determine your background
      • help us determine the depth , topics , and assignments
      • suggest topics that you want to be covered (if you think of a topic later, please send us email)
    • Your workload
      • homework assignments
      • one exam
      • a project
  • 40. Grading
    • Subject to change after we know more about your background
    • More important is what you realize/learn than the grades !!
    Assignments 30% Exam 15% Class Participation 10% Project 45%
  • 41. Amazon Web Services
    • Elastic Compute Cloud (EC2)
      • Rent computing resources by the hour
      • Basic unit of accounting = instance-hour
      • Additional costs for bandwidth
    • Simple Storage Service (S3)
      • Persistent storage
      • Charge by the GB/month
      • Additional costs for bandwidth
    • Simple Queue Service (SQS)
    • Virtual Private Cloud (VPC)
    • You’ll be using Zoo/Amazon AWS for course assignments
    • Projects can use AppEngine/Azure or AWS!
  • 42. This Course is not for you…
    • If you’re not genuinely interested in the topic
    • If you’re not open to thinking about computing in new ways
    • If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software
    • Otherwise, this could be an interesting learning experience.
  • 43. Internet Application Zen
    • Don’t get frustrated (take a deep breath)…
      • Those W$*#T@F! moments
    • Be patient…
      • This is the second first time we are teaching this course
    • Be flexible…
      • There will be unanticipated issues along the way
    • Be constructive…
      • Tell us how we can make everyone’s experience better
  • 44. Source: http://davidzinger.wordpress.com/2007/05/page/2/