libHPC: Software sustainability and reuse through metadata preservation
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

libHPC: Software sustainability and reuse through metadata preservation

on

  • 663 views

 

Statistics

Views

Total Views
663
Views on SlideShare
663
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

libHPC: Software sustainability and reuse through metadata preservation Presentation Transcript

  • 1. libHPC: Software sustainability andreuse through metadata preservationJeremy Cohen, John Darlington, Brian FuchsLondon e-Science Centre / Department of Computing, Imperial College LondonDavid Moxey, Chris Cantwell, Pavel Burovskiy, Spencer SherwinDepartment of Aeronautics, Imperial College LondonNeil Chue HongSoftware Sustainability Institute, University of EdinburghFirst Workshop on Maintainable Software Practices in e-Science, ChicagoTuesday 9th October 2012
  • 2. Introduction•  Decision making – building scientific software can be hard•  Abstraction – hide the complexity•  Efficiency – achieve the performance•  Aim for a universal technology that spans all application domains, machines, metrics ns M Num. tio•  Coordination forms – a different approach to task Cluster ac Intensive ica Cloud hi Data Intensive ne Multi-core pl s Bioinformatics GPU Ap specification CFD FPGA Cost Time Energy•  Components – encapsulated building blocks Metrics
  • 3. Information and decisionsWhy is software development and re-use hard?•  A particular piece of code is the result of many development decisions•  Developers invest significant knowledge about the task to be solved …however…•  Decisions made by developers cannot be reconstructed from the code•  Loss of original information and structure invested by developer(s)
  • 4. Information and decisionsUnderstanding code structure and the options available and the decisionsmade during development is important:•  Portability; optimisation on different architectures•  Long-term sustainabilityNeed an explicit representation of decisions and alternatives:•  Decision tree used to represent this (structure)•  Metadata used to annotate decision tree (information)•  Modifications can be made to decision tree (based on metadata analysis) which can than be mapped to modified code
  • 5. Information and decisions e.g. code that uses a solver: •  Many options to select suitable solver – abstract components •  Choice dependent on problem being addressed, parameters, etc. •  Represent solver choice on a tree of component alternatives, leaf nodes are concrete implementations higher-level nodes are abstract Matrix Linear Vector Vector Solver" Matrix Matrix Vector LU" Vector Jacobi" Vector Vector Parallel LU" Parallel LU" Sequential Parallel JacobiSequential LU" (OpenMP)" (MPI)" Jacobi" (UPC)"
  • 6. Abstractions a Encapsulation Encapsulate functions as components (reuse) Allow alternatives a Functional properties Referentially transparent a Encapsulation Church-Rosser a Alternative behaviours
  • 7. Abstractions – alternative behavioursi.e. Church-Rosser (4 + 3) – (2 + 1) 7 – (2 + 1) (4 + 3) – 3 7–3 4
  • 8. Application flow and specificationWe represent application elements using two techniques•  Data processing – core code that forms application building blocks a Components (first-order functions)•  Control flow, orchestration a High-order functions a Coordination Forms e.g. Pipe, Parallel, Map / Reduce, …
  • 9. Coordination Forms•  A functional/mathematical approach to job specification•  Based on work by Darlington, et al. J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination. In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995. Springer Berlin/Heidelberg•  Applied to components – define application flow•  May be: •  General – applicable to most applications – e.g. PIPE, PAR •  Iterative patterns – e.g FARM, ITERATE •  Domain-specific higher-level forms – e.g. Monte Carlo •  Extensible – new patterns can be introduced
  • 10. Coordination Forms•  A given form may have multiple underlying implementations •  E.g. PAR may provide sequential, multi-threaded and MPI parallel implementations•  Forms aim to be as lightweight as possible •  They result in code that can be run •  They intelligently glue together component building blocks•  PIPE as an example – functions f1 to fn with initial input a: PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a = f1(f2 (… (fn(a))))
  • 11. Coordination Forms – Impementation•  Prototype implementation in Python•  Class wrappers for component and parameter metadata – concrete implementation code selectablePIPE – Compose a series of components in the order specifiedPIPE ([component list], initial input)Additional parameters can be added in component listPAR – Run a series of components independently (perhaps in parallel)PAR ([component list], [(input1), (input2), …, (inputn)])E.g. for components add, multiply, divide:2 * ( (245+34) / (6+8) )PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])
  • 12. Bioinformatics: Genome Read Pre-Processing/Mapping Short ReadInput files – Reference Set (Paired) Genome Reference Genome – FASTA file Single FASTQ FASTA file file Reads from sequencing machine - FASTQ bwa index FASTQ split((sr1,sr2), u) = PAR([fastq_split, bwa_index], SR_1 SR_2 [(short_read_file, None, None),(ref_genome_file,)]) bwa aln bwa aln (v, w) = PAR([bwa_aln, bwa_aln], FASTA file + index file [(ref_genome_file, sr1, None), bwa sampe - generate alignment (paired ended) (ref_genome_file, sr2, None)]) SAM file samtools importresult = PIPE([samtools_index, samtools_sort, BAM file (samtools_import, ref_genome_file), samtools sort bwa_sampe], sorted BAM file [ref_genome_file, [v,w], [sr1, sr2], None]) samtools index OUTPUT
  • 13. LibHPC Project•  LibHPC •  Two year project under EPSRC HPC Software Programme •  Imperial College London (Computing (LeSC), Aeronautics, ICT) •  SSI, Edinburgh•  Implementing/demonstrating framework with main supporting application (Nektar++) + other exemplars
  • 14. Example High-level Application Description / Job Specification (Co-ordination Forms, DSLs, etc.) Job Specification Analysis/ProcessingOptimising Software Component Library & Metadata Resource Discovery & Domain-specificFEM Codes Application Support Metadata Libraries Hardware Resources
  • 15. Nektar++ - Hybrid Assembly•  Nektar++ operates on matrices based on input mesh•  Each element of input mesh is mapped to an (elemental) matrix•  There are two matrix assembly strategies: •  Local •  Global
  • 16.              Nektar++ - Hybrid Assembly  =                                                                   =       =                                                     Local Assembly      Global Assembly                        =                                                                  
  • 17.               =                              Nektar++ - Hybrid Assembly                                  =                          Hybrid Assembly
  • 18. Thank You