High Performance Computing          Cloud point of view                     Alexey Ragozin         alexey.ragozin@gmail.co...
Massive parallel computing I/O bound workload  • Data mining / machine learning / indexing  • Focus: Do not move data, pr...
CPU bound task Stream like workload  • Independent tasks  • Random continuous stream of tasks  • E.g. video conversion, c...
Handling task stream in Cloud               Worker pool                                                                   ...
Structured batch jobs in cloudBatches are usually more sporadic e.g. end of day risk calculationsTask may have cross depe...
Data delivery strategyPush approach scheduler controls data delivery worker expects data to be available locally more o...
What kind of data do we have? Working set • working set is divided between jobs • each portion of working set processed by...
Data distribution problemWorking set• Spiky work load – especially at the start• Hard to predict there piece of data will ...
Private grid practice     HPC Grid                                    RDBMS                                      or       ...
Data grid, what is it?• Key/Value storage• Data distributed across cluster of servers• RAM is usually used as storage• Red...
Data service for cloud HPC• Block storage service  Azure drive / Amazon EBS  – Lack of shared access to data• Key / Value ...
Use case for caching Avoid storage of data in cloud  • Upload data once per batch and cache in cloud Reduce storage cost...
Routing overlays• Each node knows (communicates) only subset  of network.• A request to unknown node is routed via one  of...
Task stealingTask steeling – alternative scheduling approachTask steeling in widely used for in-process multi-core concurr...
Task stealing       Worker 1                     Work backlog is organized as                      stack                 ...
Task stealing       Worker 1             Worker 2                    steal                     fork                     fo...
IO bound workload in cloudDawn of Map/Reduce- high bandwidth interconnects are expensive- network storage is expensive- ch...
Hadoop is cloud unfriendlyAssume I have 50 nodes Hadoop cluster in cloudWhat will I gain by adding another 50 nodes?- Not ...
How cloud M/R should look?• Use cloud storage service and persistent storage• Streaming M/R processing• Aggressive use of ...
Looking into futureHighly anticipated features Scheduler as a Service  Azure HPC Simple middleware for organizing caches...
Thank youhttp://aragozin.blogspot.com- my articles                              Alexey Ragozin                   alexey.ra...
Upcoming SlideShare
Loading in...5
×

High Performance Computing - Cloud Point of View

1,424

Published on

Slide deck from Moscow CloudCamp 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,424
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "High Performance Computing - Cloud Point of View"

  1. 1. High Performance Computing Cloud point of view Alexey Ragozin alexey.ragozin@gmail.com Apr 2012
  2. 2. Massive parallel computing I/O bound workload • Data mining / machine learning / indexing • Focus: Do not move data, process in place CPU bound • complex simulations / complex math models • Focus: Keep all cores busy Latency bound • Physical process simulations (e.g. weather forecast) • Focus: Minimize communication latencies
  3. 3. CPU bound task Stream like workload • Independent tasks • Random continuous stream of tasks • E.g. video conversion, crawling Structured batch jobs • Single batch is split into subtasks for parallel execution • Task may have data dependency on each other • Task may be generated during batch execution • E.g. portfolio risk calculation
  4. 4. Handling task stream in Cloud Worker pool incoming in g Task queue po ll tasks queue metrics Controler adjusts pool size based on queue metrics Simple pattern. Exploiting “elasticy” of cloud. Cost effective.
  5. 5. Structured batch jobs in cloudBatches are usually more sporadic e.g. end of day risk calculationsTask may have cross dependencies scheduler should be “cloud-aware”Supplying tasks with data data delivery delay is critical worker pool is generally very large data sets also could be very large
  6. 6. Data delivery strategyPush approach scheduler controls data delivery worker expects data to be available locally more opportunities for optimization complexPull approach worker pulls required data from central service scheduler is unaware about data sets requires scalable data service much simpler
  7. 7. What kind of data do we have? Working set • working set is divided between jobs • each portion of working set processed by single job • often jobs are producing working set for next computation stage Reference data • exactly same data shared by multiple/all jobs • usually static data set
  8. 8. Data distribution problemWorking set• Spiky work load – especially at the start• Hard to predict there piece of data will be required• Caching is ineffectiveReference data set• Naïve approach will produce huge volume of redundant transfers – smart caching required• Spiky work load
  9. 9. Private grid practice HPC Grid RDBMS or Data Warehouse Data grid
  10. 10. Data grid, what is it?• Key/Value storage• Data distributed across cluster of servers• RAM is usually used as storage• Redundant copies provide level of fault tolerant / durability• No single point of failure• Automatic rebalancing of data when servers added/removed from grid• Capacity and throughput are scaling linearly
  11. 11. Data service for cloud HPC• Block storage service Azure drive / Amazon EBS – Lack of shared access to data• Key / Value storage Azure Tables / Amazon Simple DB – Pricing: volume + usage• Blob store Azure Tables (blobs) / Amazon S3 – Pricing: volume + transactions – Good read scalability
  12. 12. Use case for caching Avoid storage of data in cloud • Upload data once per batch and cache in cloud Reduce storage cost by reducing number of operations Save IO bandwidth for shared data • Edge caching • Routing overlays
  13. 13. Routing overlays• Each node knows (communicates) only subset of network.• A request to unknown node is routed via one of neighbors which a closest to destination.
  14. 14. Task stealingTask steeling – alternative scheduling approachTask steeling in widely used for in-process multi-core concurrencyWhy use it for cluster task scheduling• Stochastic and adaptive• Can use cost models accounting internal cloud topology• Decently solves problem of data delivery, without additional caching• Unproven for cluster computation, so far
  15. 15. Task stealing Worker 1  Work backlog is organized as stack  Tasks are generated recursively  Top of stack – fine grained tasksfork  Bottom of stack – coarse grained tasksfork  Execution from top of stackfork  Stealing – bottom of stack processing
  16. 16. Task stealing Worker 1 Worker 2 steal fork fork processingforkfork processingfork done
  17. 17. IO bound workload in cloudDawn of Map/Reduce- high bandwidth interconnects are expensive- network storage is expensive- cheap serves and local processing for keeping costs low“Cloud” reality- network bandwidth is cheap- disks are already “networked”- RAM is abundant
  18. 18. Hadoop is cloud unfriendlyAssume I have 50 nodes Hadoop cluster in cloudWhat will I gain by adding another 50 nodes?- Not much, until they are populated with data.What if I will shut these 50 afterward?- Effort to populate them with data will be wasted.Hadoop is coupling execution and storage services together – you have pay for both even if you use one.
  19. 19. How cloud M/R should look?• Use cloud storage service and persistent storage• Streaming M/R processing• Aggressive use of memory for intermediate dataPeregrine – storeless M/R framework http://peregrine_mapreduce.bitbucket.org/Spark – in-memory M/R framework http://www.spark-project.org/
  20. 20. Looking into futureHighly anticipated features Scheduler as a Service Azure HPC Simple middleware for organizing caches and routing overlays Existing solutions are far from simple Cloud friendly map/reduce frameworks
  21. 21. Thank youhttp://aragozin.blogspot.com- my articles Alexey Ragozin alexey.ragozin@gmail.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×