0
High Performance Computing in Our Everydays                   High Performance Computing in Our                           ...
High Performance Computing in Our EverydaysOutline       1    What Is New in HPC?       2    Supporting Frameworks       3...
High Performance Computing in Our Everydays  What Is New in HPC?Cloud HPC              Cloud computing: think of it as a u...
High Performance Computing in Our Everydays  What Is New in HPC?Massive Parallelism        Figure: Floating-Point Operatio...
High Performance Computing in Our Everydays  What Is New in HPC?Massive Parallelism                      Control         A...
High Performance Computing in Our Everydays  What Is New in HPC?Massive Parallelism              Parallel versus distribut...
High Performance Computing in Our Everydays  What Is New in HPC?Why You Should Care              Digital libraries and HPC...
High Performance Computing in Our Everydays  Supporting FrameworksWhy Is Distributed Computing Hard?              Take an ...
High Performance Computing in Our Everydays  Supporting FrameworksMapReduce              Published in 2004 by Google resea...
High Performance Computing in Our Everydays  Supporting FrameworksA MapReduce Inverted Indexer              The task is: f...
High Performance Computing in Our Everydays  Supporting FrameworksAnother MapReduce Example              Sometimes it is w...
High Performance Computing in Our Everydays  Supporting FrameworksExploiting GPU Resources              Low-level framewor...
High Performance Computing in Our Everydays  Supporting FrameworksOvercoming GPU Obstacles              GPU MapReduce     ...
High Performance Computing in Our Everydays  Computational Requirements of Digital LibrariesDigital Preservation          ...
High Performance Computing in Our Everydays  Computational Requirements of Digital LibrariesMachine Learning and Advanced ...
High Performance Computing in Our Everydays  A Workflow in Cloud HPCA Middleware Architecture                              ...
High Performance Computing in Our Everydays  Experimental ResultsCost                                                0.08 ...
High Performance Computing in Our Everydays  Experimental ResultsRunning time                                             ...
High Performance Computing in Our Everydays  Open IssuesObstacles to Adoption              Persistence and high-reliabilit...
High Performance Computing in Our Everydays  ConclusionsAcknowledgment                Work has been funded by Sustaining H...
High Performance Computing in Our Everydays  ConclusionsSummary                Cloud and HPC: a solution looking for a pro...
Upcoming SlideShare
Loading in...5
×

High Performance Computing in Our Everydays

186

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
186
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "High Performance Computing in Our Everydays"

  1. 1. High Performance Computing in Our Everydays High Performance Computing in Our Everydays Peter Wittek Swedish School of Library and Information Science University of Boras˚ 10/10/11
  2. 2. High Performance Computing in Our EverydaysOutline 1 What Is New in HPC? 2 Supporting Frameworks 3 Computational Requirements of Digital Libraries 4 A Workflow in Cloud HPC 5 Experimental Results 6 Open Issues 7 Conclusions
  3. 3. High Performance Computing in Our Everydays What Is New in HPC?Cloud HPC Cloud computing: think of it as a utility E.g., you get to use 10 small computer instances for $0.82 an hour Your computer instances do not necessarily correspond to actual computers Virtualization Demo: ReactOS Latest contestant in cloud computing: HPC Not ordinary computer instances
  4. 4. High Performance Computing in Our Everydays What Is New in HPC?Massive Parallelism Figure: Floating-Point Operations per Second for the CPU and GPU
  5. 5. High Performance Computing in Our Everydays What Is New in HPC?Massive Parallelism Control ALU ALU ALU ALU Cache DR AM DR AM CP U GPU Streaming hardware Explicit memory management
  6. 6. High Performance Computing in Our Everydays What Is New in HPC?Massive Parallelism Parallel versus distributed computing Distributed nodes do not share the memory: Connected through network; Calculations may run in a parallel fashion; Other nodes do not see what one node has computed; Nodes may fail.
  7. 7. High Performance Computing in Our Everydays What Is New in HPC?Why You Should Care Digital libraries and HPC? No need for upfront investment; Go beyond full-text search; Machine learning; Pattern matching; Social media and graph mining; You can define a new field Freedom
  8. 8. High Performance Computing in Our Everydays Supporting FrameworksWhy Is Distributed Computing Hard? Take an example: creating an inverted index An inverted index is at the core of search engines A simple example: term1: (doc1,freq11), (doc5,freq51) term2: (doc1,freq12), (doc3,freq32), (doc6,freq62) Na¨ve approach to parallelize: ı Have an indexer at each node; Distribute documents to nodes; Let nodes broadcast the lists (Message Passing Interface – MPI).
  9. 9. High Performance Computing in Our Everydays Supporting FrameworksMapReduce Published in 2004 by Google researchers Since then it has become widespread in data-intensive processing Core idea: keep things simple, you can do two things: Map: Send out chunks of data and then do something on them Reduce: Collect chunks of data and do something on them while collecting Intermediate data structure: key-value pairs The framework should also take care of the mundane tasks, such as failing nodes, network latency, etc.
  10. 10. High Performance Computing in Our Everydays Supporting FrameworksA MapReduce Inverted Indexer The task is: formulate your problem in MapReduce terms Map: gets a chunk of text. Emits: Key: term Value: document id and corresponding frequency Reduce: Merges by key There might be a different number of map and reduce tasks
  11. 11. High Performance Computing in Our Everydays Supporting FrameworksAnother MapReduce Example Sometimes it is worth bypassing the reduce phase Then we do not need to emit key-value pairs at all Distributed GPU random projection
  12. 12. High Performance Computing in Our Everydays Supporting FrameworksExploiting GPU Resources Low-level frameworks: CUDA and OpenCL They certainly do not make GPUs much friendlier Higher-level libraries: BLAS, cuSPARSE As long as you know maths. . .
  13. 13. High Performance Computing in Our Everydays Supporting FrameworksOvercoming GPU Obstacles GPU MapReduce Academic projects: Mars, GPMR GPU-aware MapReduce: extend existing frameworks Develop extensive middleware
  14. 14. High Performance Computing in Our Everydays Computational Requirements of Digital LibrariesDigital Preservation Future-proofing document collections Emulation Migration Workflows are often tremendously compute-intensive
  15. 15. High Performance Computing in Our Everydays Computational Requirements of Digital LibrariesMachine Learning and Advanced Services Digital collections and social networks A step towards digital curation SaaS approach to digital curation Indexing by Lucene/Nutch Collection-level metadata extraction by Mahout
  16. 16. High Performance Computing in Our Everydays A Workflow in Cloud HPCA Middleware Architecture Support MapReduce Engine Policy Services: Enforcement -Document processes -Context Archival search Storage -Data Interface mining Middleware Grid or Cloud Storage Grid or Cloud Computing A middleware to make adoption by DL practitioners easier Moving towards computational science
  17. 17. High Performance Computing in Our Everydays Experimental ResultsCost 0.08 0.07 Average Cost in USD 0.06 0.05 0.04 100 0.03 1000 10000 0.02 0.01 0 1 4 10 20 40 80 Number of Processing Cores Figure: Comparison of average cost of computations with different collection sizes
  18. 18. High Performance Computing in Our Everydays Experimental ResultsRunning time 8000 7000 Running Time (Mins) 6000 5000 4000 100 3000 1000 10000 2000 1000 0 1 4 10 20 40 80 Number of Processing Cores Figure: Comparison of running times with different collection sizes
  19. 19. High Performance Computing in Our Everydays Open IssuesObstacles to Adoption Persistence and high-reliability MapReduce Not just a technological issue Service-level agreement Particularly problematic Another EU FP7 project working on it: SLA@SOI Niche for alternative cloud providers Difficulty of integration
  20. 20. High Performance Computing in Our Everydays ConclusionsAcknowledgment Work has been funded by Sustaining Heritage Access through Multivalent ArchiviNg (SHAMAN), an EU FP7 large integrated project. http://shaman-ip.eu/shaman/ Additional funding has been received from Amazon Web Services. http://aws.amazon.com/
  21. 21. High Performance Computing in Our Everydays ConclusionsSummary Cloud and HPC: a solution looking for a problem Digital libraries Computational requirements Expertise Complexity and integration Contact: peterwittek@acm.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×