Distributed-ness: Distributed computing & the clouds

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

  • + rcoup Robert Coup 10 months ago
    By 'will return errors' I meant 'it’ll return 500 errors to post/get requests occasionally', not 'it’ll corrupt your data' :)
Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1










image: http://www.flickr.com/photos/erichews/2639564244/








image: http://www.flickr.com/photos/anshul/2313406717/


image: http://www.flickr.com/photos/sarkasmo/428860683/





Awesome Web 2.0 site selling hats for cats. In addition to the store...


Facebook app, online games, design-your-own hats, story writing with automatic creation of cat videos from your story, forums, blogs - you name it…


image: http://www.bride.net/wp-content/uploads/2008/02/cat-in-the-hat.gif


image: http://www.flickr.com/photos/abear23/1444321123/





What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?


1 worker or 20 workers should be the same to the client, and it should just work if the worker dies mid-process.


What components? background tasks, sessions,


REST services are a great example here - search


image: http://www.flickr.com/photos/ysgellery/3103708893/


Polling is a concern with many systems



image: http://www.flickr.com/photos/90001203@N00/172506278/






image: http://www.flickr.com/photos/75166820@N00/221373872/


image: http://www.flickr.com/photos/livinginmonrovia/85868861/


image: http://www.businessballs.com/project.htm





image: http://www.flickr.com/photos/donsolo/166981992/


Really easy with Twisted to add a SSH/Telnet shell


image: http://www.flickr.com/photos/dwstucke/6045801/


What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?


image: http://www.flickr.com/photos/whatknot/12974821/


$1.80/GB/year


Keys can be 1KB, and values(objects) can be up to 5GB


APIs for every language means its easy to incorporate into offline applications as well


Clever access control allows you to delegate authorization


Eventual consistency means that when your PUT request returns, it’ll be in at least 2 datacenters. But it might not be replicated across all of S3 yet, so an immediate GET request might return a not-found error. Likewise with 2 concurrent writes, it’ll take a while for (a random) one to win.





What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?


image: http://www.flickr.com/photos/phantomkitty/259379993/





Generating 60 million map tiles in a few hours for $40


Video encoding


Facebook apps with 300K users signing up in 24 hours








So now we can have as many App servers as needed, and as many Workers as needed











Used for indexing the web.


Geo-example: finding closest servo for any point on a road network













2 Favorites

Distributed-ness: Distributed computing & the clouds - Presentation Transcript

  1. Distributed-ness Robert Coup, Koordinates http://rob.coup.net.nz/ robert.coup@koordinates.com
  2. Me
  3. What is it? “Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime.” - Wikipedia http://en.wikipedia.org/wiki/Distributed_computing
  4. What is it? Application architecture Independent components Dynamic resourcing
  5. What is it? Distributed computing is not scaling Distributed computing can help you scale There are easier ways to scale short-term
  6. Easier ways
  7. Distributed problems Break up my big problem into small chunks which can be worked on in parallel or asynchronously.
  8. Distributed applications Single application with a bunch of components Inter-dependency Components “load-up” differently
  9. catinthehat.biz “great hats for your cats”
  10. Little catinthehat.biz Load balancer Cache Media App Storage DB Worker
  11. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage + Worker +
  12. Talking
  13. Talking Components of a distributed app need to talk
  14. Talking Components of a distributed app need to talk But should have minimal knowledge of each other
  15. Talking Components of a distributed app need to talk But should have minimal knowledge of each other
  16. Talking Components of a distributed app need to talk But should have minimal knowledge of each other Just like in code modules! “Decoupling”
  17. Messaging Point to point: Needs configuration Web services
  18. Messaging Queues: Publish-subscribe Amazon SQS Lots of others http://aws.amazon.com/sqs/
  19. Messaging Peer-to-peer Jabber / XMPP Persistent connections Presence
  20. Jabber at Koordinates Brainz manages the work Korrew does the work
  21. Jabber @ Koordinates Data imports have 20-25 inter-related tasks “Package” defines the dependencies and input data
  22. Task Packages
  23. Task Packages Kerrows & Brainzs connect via XMPP
  24. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub
  25. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them
  26. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them Brainz notified on task completion/error
  27. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them Brainz notified on task completion/error If Kerrows go offline, tasks are re-assigned
  28. Bots Via IM, we can connect to Brainz/Kerrow Ask for status, cancel, new tasks, … And it can message us: errors, info
  29. Live Status Keep a live eye on whats going on Danga apps have terminal consoles (telnet) Otherwise you’re debugging via logs
  30. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage Worker + +
  31. Storage Dump files Get them back Reliably Quickly In bulk Backups
  32. Amazon S3 - Simple Storage Service Unlimited storage Cheap! US$0.15 / GB / month US$0.10 / GB in & US$0.17 / GB out http://aws.amazon.com/s3/
  33. Amazon S3 Not a hard disk or filesystem Data is organised into namespaces (buckets) hatdesigns.catinthehat.biz Within that: key-value pairs Access via HTTP Authentication / access-control Open source version - mogilefs http://www.danga.com/mogilefs/
  34. Amazon S3 - downsides Eventual consistency 99.99% reliable = 1/10K requests fail Will return errors
  35. Amazon S3 - uses Uses for catinthehat.biz? Customer photos of hats on cats Customer hat designs Story videos Manufacturing design files Backups
  36. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage Worker + S3 +
  37. Compute power Supply & demand Supply costs Demand is hard to manage
  38. Amazon EC2 - Elastic Compute Cloud Virtual servers on demand From US$0.10 - US$0.80 / hour Linux & Windows, 1-8 cores, 1.7-15GB memory, 160GB-1.7TB local storage, 32/64bit Permanent storage from US$0.10 / GB / month http://aws.amazon.com/ec2/
  39. Amazon EC2 Turn capacity on & off at will Ideal for batch processing Ideal for dynamic loads
  40. Amazon EC2 Not cheapest - US$70+/month for static server Instances can be terminated at any time! Organise configuration - Puppet, RightScale, Scalr Need an app that is architected to handle it http://slicehost.com/ http://puppet.reductivelabs.com/ http://www.rightscale.com/ http://code.google.com/p/scalr/
  41. Amazon EC2 - uses Uses for catinthehat.biz? converting customer designs creating story videos application servers
  42. Bigger catinthehat.biz Load balancer Cache Media App App App + EC2 Cache Media DB DB Worker Storage Worker + EC2 S3 +
  43. Google AppEngine Auto-scaling web applications Google hosts and runs Access to BigTable, Image/Email/Cache/HTTP APIs Restricted Python environment Free to get started http://code.google.com/appengine/
  44. Google AppEngine Still in beta, no way of buying “extra” capacity No offline/background processing Time limits on requests No file storage Datastore isn’t SQL Lock-in
  45. Google AppEngine Uses at catinthehat.biz? Facebook application? Prototypes?
  46. MapReduce Map Phase Reduce Phase Take a problem Combine all the answers to the chunk Chop it up into chunks to get the real answer Distribute chunks to lots of workers to do http://en.wikipedia.org/wiki/MapReduce
  47. MapReduce Small atomic chunks of work Run across acres of machines on masses of data Easy to write (although problems need to “fit”) Can be chained together Open source versions - Hadoop, others http://en.wikipedia.org/wiki/MapReduce http://hadoop.apache.org/
  48. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories:
  49. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories: def map(document): for word in document: if not isEnglishWord(word): yield (word,1)
  50. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories: def map(document): for word in document: if not isEnglishWord(word): yield (word,1) def reduce(word, partialCounts): return sum(partialCounts)
  51. So De-couple application components Figure out a messaging strategy Monitor your apps live Vertical scaling is cheaper short-term
  52. So On demand storage (S3) & compute power (EC2) Google App Engine for simple apps Lots of tools available
  53. “If you never did, you should. These things are fun, and fun is good.” - Dr. Seuss

+ Robert CoupRobert Coup, 10 months ago

custom

1264 views, 2 favs, 0 embeds more stats

Discussion on distributed apps and the cloud resour more

More info about this document

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Go to text version

  • Total Views 1264
    • 1264 on SlideShare
    • 0 from embeds
  • Comments 1
  • Favorites 2
  • Downloads 51
Most viewed embeds

more

All embeds

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories