Distributed-ness: Distributed computing & the clouds

2,621 views
2,544 views

Published on

Discussion on distributed apps and the cloud resources available to support them. Some discussion on the XMPP/Jabber based messaging system we use at Koordinates. Part of the seminar series for the Wellington Summer of Code programme.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
2,621
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
125
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide









  • image: http://www.flickr.com/photos/erichews/2639564244/







  • image: http://www.flickr.com/photos/anshul/2313406717/

  • image: http://www.flickr.com/photos/sarkasmo/428860683/




  • Awesome Web 2.0 site selling hats for cats. In addition to the store...

  • Facebook app, online games, design-your-own hats, story writing with automatic creation of cat videos from your story, forums, blogs - you name it…

  • image: http://www.bride.net/wp-content/uploads/2008/02/cat-in-the-hat.gif

  • image: http://www.flickr.com/photos/abear23/1444321123/




  • What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?

  • 1 worker or 20 workers should be the same to the client, and it should just work if the worker dies mid-process.

  • What components? background tasks, sessions,

  • REST services are a great example here - search

  • image: http://www.flickr.com/photos/ysgellery/3103708893/

  • Polling is a concern with many systems


  • image: http://www.flickr.com/photos/90001203@N00/172506278/






  • image: http://www.flickr.com/photos/75166820@N00/221373872/

  • image: http://www.flickr.com/photos/livinginmonrovia/85868861/

  • image: http://www.businessballs.com/project.htm




  • image: http://www.flickr.com/photos/donsolo/166981992/

  • Really easy with Twisted to add a SSH/Telnet shell

  • image: http://www.flickr.com/photos/dwstucke/6045801/

  • What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?

  • image: http://www.flickr.com/photos/whatknot/12974821/

  • $1.80/GB/year

  • Keys can be 1KB, and values(objects) can be up to 5GB

  • APIs for every language means its easy to incorporate into offline applications as well

  • Clever access control allows you to delegate authorization

  • Eventual consistency means that when your PUT request returns, it’ll be in at least 2 datacenters. But it might not be replicated across all of S3 yet, so an immediate GET request might return a not-found error. Likewise with 2 concurrent writes, it’ll take a while for (a random) one to win.




  • What happens when we get a bit bigger, and we start wanting more than one of anything? When we get load spikes and need 6 or 12 App servers, or 10 Workers rather than 2?

  • image: http://www.flickr.com/photos/phantomkitty/259379993/




  • Generating 60 million map tiles in a few hours for $40

  • Video encoding

  • Facebook apps with 300K users signing up in 24 hours







  • So now we can have as many App servers as needed, and as many Workers as needed










  • Used for indexing the web.

  • Geo-example: finding closest servo for any point on a road network












  • Distributed-ness: Distributed computing & the clouds

    1. 1. Distributed-ness Robert Coup, Koordinates http://rob.coup.net.nz/ robert.coup@koordinates.com
    2. 2. Me
    3. 3. What is it? “Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime.” - Wikipedia http://en.wikipedia.org/wiki/Distributed_computing
    4. 4. What is it? Application architecture Independent components Dynamic resourcing
    5. 5. What is it? Distributed computing is not scaling Distributed computing can help you scale There are easier ways to scale short-term
    6. 6. Easier ways
    7. 7. Distributed problems Break up my big problem into small chunks which can be worked on in parallel or asynchronously.
    8. 8. Distributed applications Single application with a bunch of components Inter-dependency Components “load-up” differently
    9. 9. catinthehat.biz “great hats for your cats”
    10. 10. Little catinthehat.biz Load balancer Cache Media App Storage DB Worker
    11. 11. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage + Worker +
    12. 12. Talking
    13. 13. Talking Components of a distributed app need to talk
    14. 14. Talking Components of a distributed app need to talk But should have minimal knowledge of each other
    15. 15. Talking Components of a distributed app need to talk But should have minimal knowledge of each other
    16. 16. Talking Components of a distributed app need to talk But should have minimal knowledge of each other Just like in code modules! “Decoupling”
    17. 17. Messaging Point to point: Needs configuration Web services
    18. 18. Messaging Queues: Publish-subscribe Amazon SQS Lots of others http://aws.amazon.com/sqs/
    19. 19. Messaging Peer-to-peer Jabber / XMPP Persistent connections Presence
    20. 20. Jabber at Koordinates Brainz manages the work Korrew does the work
    21. 21. Jabber @ Koordinates Data imports have 20-25 inter-related tasks “Package” defines the dependencies and input data
    22. 22. Task Packages
    23. 23. Task Packages Kerrows & Brainzs connect via XMPP
    24. 24. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub
    25. 25. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them
    26. 26. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them Brainz notified on task completion/error
    27. 27. Task Packages Kerrows & Brainzs connect via XMPP Brainz publishes tasks via PubSub Kerrow negotiates for tasks, then does them Brainz notified on task completion/error If Kerrows go offline, tasks are re-assigned
    28. 28. Bots Via IM, we can connect to Brainz/Kerrow Ask for status, cancel, new tasks, … And it can message us: errors, info
    29. 29. Live Status Keep a live eye on whats going on Danga apps have terminal consoles (telnet) Otherwise you’re debugging via logs
    30. 30. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage Worker + +
    31. 31. Storage Dump files Get them back Reliably Quickly In bulk Backups
    32. 32. Amazon S3 - Simple Storage Service Unlimited storage Cheap! US$0.15 / GB / month US$0.10 / GB in & US$0.17 / GB out http://aws.amazon.com/s3/
    33. 33. Amazon S3 Not a hard disk or filesystem Data is organised into namespaces (buckets) hatdesigns.catinthehat.biz Within that: key-value pairs Access via HTTP Authentication / access-control Open source version - mogilefs http://www.danga.com/mogilefs/
    34. 34. Amazon S3 - downsides Eventual consistency 99.99% reliable = 1/10K requests fail Will return errors
    35. 35. Amazon S3 - uses Uses for catinthehat.biz? Customer photos of hats on cats Customer hat designs Story videos Manufacturing design files Backups
    36. 36. Bigger catinthehat.biz Load balancer Cache Media App App App + Cache Media DB DB Worker Storage Worker + S3 +
    37. 37. Compute power Supply & demand Supply costs Demand is hard to manage
    38. 38. Amazon EC2 - Elastic Compute Cloud Virtual servers on demand From US$0.10 - US$0.80 / hour Linux & Windows, 1-8 cores, 1.7-15GB memory, 160GB-1.7TB local storage, 32/64bit Permanent storage from US$0.10 / GB / month http://aws.amazon.com/ec2/
    39. 39. Amazon EC2 Turn capacity on & off at will Ideal for batch processing Ideal for dynamic loads
    40. 40. Amazon EC2 Not cheapest - US$70+/month for static server Instances can be terminated at any time! Organise configuration - Puppet, RightScale, Scalr Need an app that is architected to handle it http://slicehost.com/ http://puppet.reductivelabs.com/ http://www.rightscale.com/ http://code.google.com/p/scalr/
    41. 41. Amazon EC2 - uses Uses for catinthehat.biz? converting customer designs creating story videos application servers
    42. 42. Bigger catinthehat.biz Load balancer Cache Media App App App + EC2 Cache Media DB DB Worker Storage Worker + EC2 S3 +
    43. 43. Google AppEngine Auto-scaling web applications Google hosts and runs Access to BigTable, Image/Email/Cache/HTTP APIs Restricted Python environment Free to get started http://code.google.com/appengine/
    44. 44. Google AppEngine Still in beta, no way of buying “extra” capacity No offline/background processing Time limits on requests No file storage Datastore isn’t SQL Lock-in
    45. 45. Google AppEngine Uses at catinthehat.biz? Facebook application? Prototypes?
    46. 46. MapReduce Map Phase Reduce Phase Take a problem Combine all the answers to the chunk Chop it up into chunks to get the real answer Distribute chunks to lots of workers to do http://en.wikipedia.org/wiki/MapReduce
    47. 47. MapReduce Small atomic chunks of work Run across acres of machines on masses of data Easy to write (although problems need to “fit”) Can be chained together Open source versions - Hadoop, others http://en.wikipedia.org/wiki/MapReduce http://hadoop.apache.org/
    48. 48. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories:
    49. 49. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories: def map(document): for word in document: if not isEnglishWord(word): yield (word,1)
    50. 50. MapReduce Use at catinthehat.biz? Find most popular non-English words in user stories: def map(document): for word in document: if not isEnglishWord(word): yield (word,1) def reduce(word, partialCounts): return sum(partialCounts)
    51. 51. So De-couple application components Figure out a messaging strategy Monitor your apps live Vertical scaling is cheaper short-term
    52. 52. So On demand storage (S3) & compute power (EC2) Google App Engine for simple apps Lots of tools available
    53. 53. “If you never did, you should. These things are fun, and fun is good.” - Dr. Seuss

    ×