Introduction to Distributed HTC and overlay systems - OSG User School 2014

378 views

Published on

Lecture about Distributed HTC and overlay systems, given at the OSG User Scool 2014.

https://twiki.opensciencegrid.org/bin/view/Education/OSGUserSchool2014

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
378
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to Distributed HTC and overlay systems - OSG User School 2014

  1. 1. Introduction to Distributed HTC and overlay systems Tuesday morning session Igor Sfiligoi <isfiligoi@ucsd.edu> University of California San Diego
  2. 2. 2014 OSG User School DHTC and Overlays Logistical reminder • It is OK to ask questions - During the lecture - During the demos - During the exercises - During the breaks • If I don't know the answer, I will find someone who likely does
  3. 3. 2014 OSG User School DHTC and Overlays High Throughput Computing • Yesterday you were introduced to HTC - Often called batch system computing - A paradigm that emphasizes maximizing the amount of useful computing over long periods of time Scheduler Jobs Batch slots
  4. 4. 2014 OSG User School DHTC and Overlays Local HTC • What you have really experienced so far is local HTC • i.e. computing on dedicated cluster of dedicated resources - Managed by a single admin group - Co-located in a single location
  5. 5. 2014 OSG User School DHTC and Overlays Is there anything else? • As you might expect, there are several insulated “local HTC” clusters installed around the world • And there are non-HTC systems out there, too
  6. 6. 2014 OSG User School DHTC and Overlays Is there anything else? • As you might expect, there are several insulated “local HTC” clusters installed around the world • And there are non-HTC systems out there, too And you point is …???
  7. 7. 2014 OSG User School DHTC and Overlays Just local HTC • You moved from a single PC - O(1) cores • To a local HTC cluster - Say, O(100) cores daily avg Great! I can have my results back in 1/100th of the time
  8. 8. 2014 OSG User School DHTC and Overlays Just local HTC • You moved from a single PC - O(1) cores • To a local HTC cluster - Say, O(100) cores daily avg • But is it fast enough? It will still take me over a month to get the results back! The result of the 10 body simulation is very promising! I want to run a 100 body one!
  9. 9. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do?
  10. 10. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool - i.e. better priority compared to the other users of the same pool There are likely 100s of users But you better be doing something really important
  11. 11. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought into the pool - Great for long term needs  If you can afford it - But will not help you in the short term Getting new machines installed can take months!
  12. 12. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought • Get the needed resources somewhere else - i.e. not locally
  13. 13. 2014 OSG User School DHTC and Overlays How to get more? • If you find out that you are resource constrained - What do you do? • Beg for a larger share of the local pool • Pay to get more resources bought • Get the needed resources somewhere else - i.e. not locally This is what this lecture is all about!
  14. 14. 2014 OSG User School DHTC and Overlays Distributed HTC • A computing paradigm that aims at maximizing useful computation using any available resource located anywhere on the planet • As a corollary - Compute resources are owned and operated by several independent groups This place is huge!
  15. 15. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other They are owned by several independent groups, remember?
  16. 16. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other • No global HTC scheduler out of the box Scheduler
  17. 17. 2014 OSG User School DHTC and Overlays Implications of distributed computing • We will be dealing with multiple independent compute systems - That do not know about each other • No global HTC scheduler out of the box - Will have to stitch them all together
  18. 18. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster Scheduler Scheduler Scheduler
  19. 19. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster • The drawbacks - You may need multiple user accounts - You may need several different tools - Doing a good partitioning is hard  Want proportional to the resources you will get - And what if those CPUs are not in a HTC setup at all?
  20. 20. 2014 OSG User School DHTC and Overlays The naïve way • The simplest way is to partition your jobs and submit a subset to each cluster • The drawbacks - You may need multiple user accounts - You may need several different tools - Doing a good partitioning hard  Want proportional to the resources you will get - And what if those CPUs are not in a HTC setup at all? What's the alternative???
  21. 21. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world I like the sound of this!
  22. 22. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world I like the sound of this! But... didn't you just say there was no such thing???
  23. 23. 2014 OSG User School DHTC and Overlays Use an overlay system • Use a systems that looks and feels like a regular HTC to users - But has compute nodes all over the world But... didn't you just say there was no such thing??? I said “out of the box” But one can create one.
  24. 24. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • No single person cannot manage all the existing resources Scheduler
  25. 25. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them - We discuss the how later
  26. 26. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them • And instantiate a HTC system on them Scheduler
  27. 27. 2014 OSG User School DHTC and Overlays Creating a dynamic overlay sys • But we can lease a subset of them • And instantiate a HTC system on them - Now we can schedule user jobs on them - Only “our” resources are considered Scheduler Jobs
  28. 28. 2014 OSG User School DHTC and Overlays DHTC through an overlay sys • Just another HTC system - Well, almost - More details in a few slides Scheduler Jobs Cool!Cool!Cool!Cool!
  29. 29. 2014 OSG User School DHTC and Overlays Overlay system ownership • Setting up an overlay a major task - Comparable with installing a dedicated cluster • Long term maintenance is also costly • Not something a final user would want to do
  30. 30. 2014 OSG User School DHTC and Overlays Typical overlay sys operators • Existing HTC admins, e.g. - The UW HTC cluster can “overflow” into OSG - The UCSD operates one for local users • Scientific communities, e.g. - The CMS LHC experiment • The Open Science Grid itself - With the OSG Connect More on this later today
  31. 31. 2014 OSG User School DHTC and Overlays Is DHTC really just HTC? • Even with overlays, there are some differences between DHTC and HTC • With or without overlays, the core reasons are: - Multiple independent HW operators - Not all resources are co-located
  32. 32. 2014 OSG User School DHTC and Overlays The multiple owners problem • In the “Grid” world, the resource owner decides which Operating System, which OS services and which libraries to install - A way smaller problem in the “Cloud” world - But most of the current DHTC landscape is based on the Grid paradigm • Different clusters likely configured differently
  33. 33. 2014 OSG User School DHTC and Overlays The multiple owners problem • In the “Grid” world, the resource owner decides which Operating System, which OS services and which libraries to install - A way smaller problem in the “Cloud” world - But most of the current DHTC landscape is based on the Grid paradigm • Different clusters likely configured differently • Even if you could get your pet library/service installed on some clusters, you cannot expect to get it installed everywhere
  34. 34. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies in your compute jobs  Make them self-contained  Adapt to the running environment (e.g. multiple binaries, one per platform)  Do not use licensed software
  35. 35. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies in your compute jobs  Make them self-contained  Adapt to the running environment (e.g. multiple binaries, one per platform)  Do not use licensed software Sounds like a lot of work!
  36. 36. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner
  37. 37. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner So long for DHTC!
  38. 38. 2014 OSG User School DHTC and Overlays Heterogeneity • DHTC systems are way more heterogeneous than “local HTC” ones • Two ways to approach this: - Minimize external dependencies - Use only a subset of the resources  Restrict where your jobs can run  Your job will of course take longer to finish  May still get you the result sooner So long for DHTC! Unfortunately, those are the only two alternatives.
  39. 39. 2014 OSG User School DHTC and Overlays No shared file system • As a side effect, you cannot expect a globally shared file system - It's just “yet another user requested service” • You will have to deal with data explicitly - Either using the HTC scheduler capabilities (remember yesterday's lecture) - Or, embed file transfer to and from permanent storage into your own jobs • More details tomorrow
  40. 40. 2014 OSG User School DHTC and Overlays The location problem • Nodes in different locations need a way to talk to each other - This is what networks are for • If your computation is mostly about CPU cycles, with little input and output data - Node location is not an issue at all • If you have lots of data - Remember, throughput is typically inversely proportional with the distance
  41. 41. 2014 OSG User School DHTC and Overlays The data problem • Transferring large amounts of data over Wide Area Network can take a lot of time • You should try to compute close to the data source and/or sink - Network-wise - More about this tomorrow
  42. 42. 2014 OSG User School DHTC and Overlays Bottom line • For simple computation, DHTC is very similar to HTC • As soon as you require any complex setup for your jobs, you are in for a rough ride - This includes large datasets
  43. 43. 2014 OSG User School DHTC and Overlays Infrastructure considerations • DHTC is likely to give you access to many more compute slots • Which is mostly a good thing - You get your results faster • But could crash your HTC system, e.g. - Can your storage system handle more data traffic? - Can the job scheduling system handle an order of magnitude more nodes?
  44. 44. 2014 OSG User School DHTC and Overlays Infrastructure considerations • DHTC is likely to give you access to many more compute slots • Which is mostly a good thing - You get your results faster • But could crash your HTC system, e.g. - Can your storage system handle more data traffic? - Can the job scheduling system handle an order of magnitude more nodes? Hopefully not something final users should deal with but it is good to keep in mind.
  45. 45. 2014 OSG User School DHTC and Overlays Questions? • Questions? Comments? - Feel free to ask me questions later: Igor Sfiligoi <isfiligoi@ucsd.edu> • Upcoming sessions - Where to get the needed resources - glideinWMS – the OSG overlay software - Hands-on exercises - Tour
  46. 46. 2014 OSG User School DHTC and Overlays Beware the power Courtesy of fanpop.com
  47. 47. 2014 OSG User School DHTC and Overlays Copyright statement • Some images contained in this presentation are the copyrighted property of ToonClipart. • As such, these images are being used under a license from said entities, and may not be copied or downloaded without explicit permission from ToonClipart.

×