Farms, Fabrics and Clouds

Steve Loughran Julio Guijarro HP Laboratories, Bristol, UK November 2008 Farms, Fabrics and Clouds [email_address] [email_address]

Researcher at HP Laboratories Area of interest: Deployment Author of Ant in Action Steve Loughran

Julio Guijarro Researcher at HP Laboratories Area of interest: Deployment In charge of OSS release http://smartfrog.org/

How to host big applications across distributed resources Automatically Repeatably Dynamically Correctly Securely How to manage them from installation to removal How to make dynamically allocated servers useful Our research - see smartfrog.org

Who had breakfast this morning? Question

Who harvested wheat or corn, or killed an animal for that breakfast? Question

Farms provide food. It is somebody else's problem

Who is wearing clothes they wove or knitted themselves? Question

Provisioning of clothing -fabrics- is outsourced It is somebody else's problem

Future applications are on the Web Web Browser, AJAX clients Richer: Flash, XUL, Silverlight "… as a Service " Lots of code running in the server Unpredictable demand Data mining/analysis problems

Old world installation: single server Single web server, Single DB RAID filestore -SPOF -limitations of scale

yesterday: clustering Multiple web servers, Replicated DB RAID Network filestore Load-balancing router -Cost -Complexity -Limitations of scale Maintains the illusion of a single server

Now: server farms 500 web servers, Distributed filestore Rented storage & CPU Scales up No capital outlay Agile infrastructure

Tomorrow? grid fabric. 50000 servers

Application architectures and deployment problems change radically in this world

Application architectures September 2008

Application architectures ROA/REST Virtualized MapReduce Shards Tuple-spaces XMPP

Why? Save on hardware (and power, space)‏ Dynamically move running servers Demand creation of new images Testing complex system configurations Redistributing entire machine image 'virtual appliance'

Assumptions that are now invalid Systems have a long lifespan It is slow/expensive to create a new system It is expensive to duplicate one Systems can/should be managed by hand Clocks proceed at the same rate Physical RAM doesn’t get swapped out Running machines can't be moved/cloned Virtualization is only for testing.

Server Farms http://www.linuxjournal.com/

Assumptions that are now invalid System failure is an unusual event 100% availability can be achieved Data is always near the server You need physical access to the servers Databases are the best form of storage You need millions of $/£/€ to play

Who has the servers? Yahoo!, Google, MSN, Amazon, eBay: services MMORPG Game Vendors: World of Warcraft, Second Life EU Grid: Scientists HP, IBM, Sun: rent to companies (some resold) -focus on CPU performance for enterprise Amazon: rent to anyone with an Amazon account -focus on startups

Amazon EC2 Pay as you go Virtual Machine Hosting No persistent storage other than S3 filestore -uses HTTP GET/PUT/DELETE operations $0.10 per CPU/hour Resold OS images for more (RedHat, Windows)‏ Rent static IP addresses for failover/balancing New: RAID-like storage

Host Amazon EC2 S3 Storage AMI (Xen VM)‏ AMI (Xen VM)‏ /mnt Host AMI (Xen VM)‏ AMI (Xen VM)‏ Public Internet /mnt /mnt /mnt Fast (free) network free access; slow initial read time pay per GET; per megabyte $ $ $ $ $

EC2 Limitations Can't talk to peers using public IP addresses Persistent file system is a premium extra Most addresses are dynamic No managed redundancy/restart No multicast IP No movement of VMs off high-traffic racks

Amazon S3 Multiple geo-located data storage No limits on size Cost of write is high (guarantee of written remotely)‏ Read is cheap; may be out of date Cost: Low S3 is a global file system that any project can afford

Amazon S3 Charges S3 sets the limit on costs for reliable data storage over the network For Amazon, indexing and writes are the big costs…small files are the enemy Storage $0.15/GB/month Upload $0.10 per GB - all data transfer in Download $0.18 per GB - first 10 TB / month data transfer out $0.16 per GB - next 40 TB / month data transfer out $0.13 per GB - data transfer out / month over 50 TB Requests $0.01 per 1,000 PUT or LIST $0.01 per 10,000 GET or HEAD $0 DELETE

MapReduce Commodity data processing for commodity data

Assumptions that are now invalid Terabyte datasets are hard to work with Code runs on a single machine Sequential code is better than parallel code RAID hardware is the best way to store data Databases are better than filesystems Low-value data isn't worth collecting even if you don't have a use for it now

Assumptions that are now invalid A single farm needs to scale to infinity You need to provide 100% availability to 100% of users You have to roll out simultaneous updates to the application, changes to the DB schema, globally

XMPP post extends GoogleChatClientWorkflow { to "smartfrog.two@gmail.com"; login "smartfrog.two"; password xmpp.password; message "hello, world"; }

Assumptions that are now invalid You can't send message to a laptop that moves around behind a firewall. You need to build your own monitoring infrastructure. Blocking RPC is a good metaphor for long-haul communications. You can't send messages to your server farm from your phone IT doesn't have their eyes on your protocol

Problems for us farmers Power management Predictive disk failure management Load balancing for availability, power File management Billing Routing Security/Isolation How will this change server hardware? Managing/Configuring Machine Images Diagnostics when things go wrong

“ Agile” Routers Handle hundreds to thousands of (concurrent) change requests/second Integrate with billing Managed throttling to specific hosts Propagation of state to peer rooters 'agile' DHCP -short leases; mobile Monitored bandwidth may trigger VM migration

“ Agile” Operating Systems Design for VM-only use Limited functionality Limited lifespan Fully configurable before initial boot Adapt to changes in surrounding environment Viable licensing model

Farms, Fabrics and Clouds

More Related Content

What's hot

Viewers also liked

Similar to Farms, Fabrics and Clouds

More from Steve Loughran

Recently uploaded

Farms, Fabrics and Clouds

Editor's Notes