• Like
Farms, Fabrics and Clouds
Upcoming SlideShare
Loading in...5
×

Farms, Fabrics and Clouds

  • 1,478 views
Uploaded on

Trends in server-side, clustered, datacentred and cloud computing, including a look at what classic assumptions are no longer valid in these worlds.

Trends in server-side, clustered, datacentred and cloud computing, including a look at what classic assumptions are no longer valid in these worlds.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,478
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
114
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • 1/14/2004 This is a presentation initially given at Bristol University; its about trends in server-side computing.

Transcript

  • 1. Steve Loughran Julio Guijarro HP Laboratories, Bristol, UK November 2008 Farms, Fabrics and Clouds [email_address] [email_address]
  • 2.
      • Researcher at HP Laboratories
      • Area of interest: Deployment
      • Author of Ant in Action
    Steve Loughran
  • 3. Julio Guijarro
      • Researcher at HP Laboratories
      • Area of interest: Deployment
      • In charge of OSS release
      • http://smartfrog.org/
  • 4.
    • How to host big applications across distributed resources
      • Automatically
      • Repeatably
      • Dynamically
      • Correctly
      • Securely
    • How to manage them from installation to removal
    • How to make dynamically allocated servers useful
    Our research - see smartfrog.org
  • 5. Who had breakfast this morning? Question
  • 6. Who harvested wheat or corn, or killed an animal for that breakfast? Question
  • 7. Farms provide food. It is somebody else's problem
  • 8. Who is wearing clothes they wove or knitted themselves? Question
  • 9. Provisioning of clothing -fabrics- is outsourced It is somebody else's problem
  • 10. Future applications are on the Web
    • Web Browser, AJAX clients
    • Richer: Flash, XUL, Silverlight
    • "… as a Service "
    • Lots of code running in the server
    • Unpredictable demand
    • Data mining/analysis problems
  • 11. Old world installation: single server Single web server, Single DB RAID filestore -SPOF -limitations of scale
  • 12. yesterday: clustering Multiple web servers, Replicated DB RAID Network filestore Load-balancing router -Cost -Complexity -Limitations of scale Maintains the illusion of a single server
  • 13. Now: server farms 500 web servers, Distributed filestore Rented storage & CPU Scales up No capital outlay Agile infrastructure
  • 14. Tomorrow? grid fabric. 50000 servers
  • 15. Application architectures and deployment problems change radically in this world
  • 16. Application architectures September 2008
  • 17. Application architectures
    • ROA/REST
    • Virtualized
    • MapReduce
    • Shards
    • Tuple-spaces
    • XMPP
  • 18. Virtualization
  • 19. Why?
    • Save on hardware (and power, space)‏
    • Dynamically move running servers
    • Demand creation of new images
    • Testing complex system configurations
    • Redistributing entire machine image
    • 'virtual appliance'
  • 20. Assumptions that are now invalid
    • Systems have a long lifespan
    • It is slow/expensive to create a new system
    • It is expensive to duplicate one
    • Systems can/should be managed by hand
    • Clocks proceed at the same rate
    • Physical RAM doesn’t get swapped out
    • Running machines can't be moved/cloned
    • Virtualization is only for testing.
  • 21. Server Farms http://www.linuxjournal.com/
  • 22. Assumptions that are now invalid
    • System failure is an unusual event
    • 100% availability can be achieved
    • Data is always near the server
    • You need physical access to the servers
    • Databases are the best form of storage
    • You need millions of $/£/€ to play
  • 23. Who has the servers?
    • Yahoo!, Google, MSN, Amazon, eBay: services
    • MMORPG Game Vendors: World of Warcraft, Second Life
    • EU Grid: Scientists
    • HP, IBM, Sun: rent to companies (some resold) -focus on CPU performance for enterprise
    • Amazon: rent to anyone with an Amazon account -focus on startups
  • 24. Amazon EC2
    • Pay as you go Virtual Machine Hosting
    • No persistent storage other than S3 filestore -uses HTTP GET/PUT/DELETE operations
    • $0.10 per CPU/hour
    • Resold OS images for more (RedHat, Windows)‏
    • Rent static IP addresses for failover/balancing
    • New: RAID-like storage
  • 25. Host Amazon EC2 S3 Storage AMI (Xen VM)‏ AMI (Xen VM)‏ /mnt Host AMI (Xen VM)‏ AMI (Xen VM)‏ Public Internet /mnt /mnt /mnt Fast (free) network free access; slow initial read time pay per GET; per megabyte $ $ $ $ $
  • 26. Demo
  • 27. EC2 Limitations
    • Can't talk to peers using public IP addresses
    • Persistent file system is a premium extra
    • Most addresses are dynamic
    • No managed redundancy/restart
    • No multicast IP
    • No movement of VMs off high-traffic racks
  • 28. Amazon S3
    • Multiple geo-located data storage
    • No limits on size
    • Cost of write is high (guarantee of written remotely)‏
    • Read is cheap; may be out of date
    • Cost: Low
    • S3 is a global file system that any project can afford
  • 29. Amazon S3 Charges
    • S3 sets the limit on costs for reliable data storage over the network
    • For Amazon, indexing and writes are the big costs…small files are the enemy
    Storage $0.15/GB/month Upload $0.10 per GB - all data transfer in Download $0.18 per GB - first 10 TB / month data transfer out $0.16 per GB - next 40 TB / month data transfer out $0.13 per GB - data transfer out / month over 50 TB Requests $0.01 per 1,000 PUT or LIST $0.01 per 10,000 GET or HEAD $0 DELETE
  • 30. MapReduce Commodity data processing for commodity data
  • 31. Assumptions that are now invalid
    • Terabyte datasets are hard to work with
    • Code runs on a single machine
    • Sequential code is better than parallel code
    • RAID hardware is the best way to store data
    • Databases are better than filesystems
    • Low-value data isn't worth collecting even if you don't have a use for it now
  • 32. Shards
  • 33. Assumptions that are now invalid
    • A single farm needs to scale to infinity
    • You need to provide 100% availability to 100% of users
    • You have to roll out simultaneous updates to the application, changes to the DB schema, globally
  • 34. XMPP post extends GoogleChatClientWorkflow { to "smartfrog.two@gmail.com"; login "smartfrog.two"; password xmpp.password; message "hello, world"; }
  • 35. Assumptions that are now invalid
    • You can't send message to a laptop that moves around behind a firewall.
    • You need to build your own monitoring infrastructure.
    • Blocking RPC is a good metaphor for long-haul communications.
    • You can't send messages to your server farm from your phone
    • IT doesn't have their eyes on your protocol
  • 36. Problems for us farmers
    • Power management
    • Predictive disk failure management
    • Load balancing for availability, power
    • File management
    • Billing
    • Routing
    • Security/Isolation
    • How will this change server hardware?
    • Managing/Configuring Machine Images
    • Diagnostics when things go wrong
  • 37. “ Agile” Routers
    • Handle hundreds to thousands of (concurrent) change requests/second
    • Integrate with billing
    • Managed throttling to specific hosts
    • Propagation of state to peer rooters
    • 'agile' DHCP -short leases; mobile
    • Monitored bandwidth may trigger VM migration
  • 38. “ Agile” Operating Systems
    • Design for VM-only use
    • Limited functionality
    • Limited lifespan
    • Fully configurable before initial boot
    • Adapt to changes in surrounding environment
    • Viable licensing model