Your SlideShare is downloading. ×
General Introduction to technologies that will be seen in the school
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

General Introduction to technologies that will be seen in the school

691
views

Published on

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
691
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Yellow – gLite, Green – externally supported components, gLite consortium
  • Transcript

    • 1. Introduction to Themes and Technologies
      Per Öster
      <per.oster@csc.fi>
      CSC – IT Center for Science Ltd
      Finland
    • 2. CSC at a glance
      • Founded in 1970 as a technical support unit for Univac 1108
      • 3. Reorganized as a company, CSC - Scientific Computing Ltd. in 1993
      • 4. All shares to the Ministry of Education of Finland in 1997
      • 5. Operates on a non-profit principle
      • 6. Facilities in Espoo, close to Otaniemi community (of 15,000 students and 16,000 technologyprofessionals)
      • 7. Staff 170
      • 8. Turnover 2008 19,6 millioneuros
    • Themes of the First Week
    • 9. Themes of the Second Week
    • 10. The Acronyms
    • 11. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 12. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 13. 1. Principles of job submission and execution management
      Vision
      UNiformInterface to COmputingResources
      seamless, secure, and intuitive
      History
      08/1997 – 12/2002: UNICORE and UNICORE Plus projects
      Initial development started in two German projects funded by the German ministry of education and research (BMBF)
      Continuation in different EU projects since 2002
      Open Source community development since summer 2004
    • 14. http://www.unicore.eu
      UNICORE 6 Guiding Principles, Implementation Strategies
      Open source under BSD license with software hosted on SourceForge
      Standards-based: OGSA-conform, WS-RF 1.2 compliant
      Open, extensible Service-Oriented Architecture (SOA)
      Interoperable with other Grid technologies
      Seamless, secure and intuitive following a vertical end-to-end approach
      Mature Security: X.509, proxy and VO support
      Workflow support tightly integrated while being extensible for different workflow languages and engines for domain-specific usage
      Application integration mechanisms on the client, services and resource level
      Variety of clients: graphical, command-line, API, portal, etc.
      Quick and simple installation and configuration
      Support for many operating systems (Windows, MacOS, Linux, UNIX) and batch systems (LoadLeveler, Torque, SLURM, LSF, OpenCCS)
      Implemented in Java to achieve platform-independence
    • 15. scientific clientsand applications
      URCEclipse-based Rich client
      HiLAProgrammingAPI
      UCCcommand-line client
      Portal e.g. GridSphere
      X.509, Proxies, SOAP, WS-RF, WS-I, JSDL
      web service stack
      Gateway
      central services running in WS-RF hosting environments
      ServiceRegistry
      WorkflowEngine
      OGSA-RUS, UR,GLUE 2.0
      ServiceOrchestrator
      CISInfoService
      Gateway – Site 1
      Gateway – Site 2
      authentication
      UNICOREWS-RFhostingenvironment
      UNICOREWS-RFhostingenvironment
      OGSA-ByteIO, OGSA-BES, JSDL, HPC-P, OGSA-RUS, UR
      UNICORE Atomic Services
      OGSA-*
      UNICORE Atomic Services
      OGSA-*
      UVOSVO Service
      Grid services hosting
      XNJS – Site 1
      XNJS – Site 2
      IDB
      IDB
      job incarnation
      X.509, XACML, SAML, Proxies
      XACML entity
      XACML entity
      XUUDB
      XUUDB
      authorization
      Target System Interface – Site 1
      Target System Interface – Site 2
      DRMAA
      ExternalStorage
      Local RMS (e.g. Torque, LL, LSF, etc.)
      Local RMS (e.g. Torque, LL, LSF, etc.)
      GridFTP, Proxies
      USpace
      USpace
      data transfer to external storages
      http://www.unicore.eu
    • 16. http://www.unicore.eu
      Workflows in
      Two layer architecture for scalability
      Workflow engine
      Based on Shark open-source XPDLengine
      Pluggable, domain-specific workflow languages
      Service orchestrator
      Job execution and monitoring
      Callback to workflow engine
      Brokering based on pluggable strategies
      Clients
      GUI client based on Eclipse
      Commandline submission of workflows is also possible
    • 17. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 18. High-Throughput Computing
      Large amount of tasks that can be executed independently
      Parameter Studies
      Monte Carlo or Stochastic Methods
      Genome Sequencing (matching)
      Analysis of LHC data
      :
      Starting from this
      Looking for this
      (1 in 1013)
    • 19. 2. Principles of high-throughput computing
      Vision
      Condor provides high-throughput computing in a variety of environments
      Local dedicated clusters (machine rooms)
      Local opportunistic (desktop) computers)
      Grid environments; Can submit jobs to other systems
      Can run workflows of jobs
      Can run parallel jobs
      Independently parallel (lots of single jobs)
      Tightly coupled (such as MPI)
    • 20. 2. Principles of high-throughput computing
      History and Activity
      Distributed Computing research performed by a team of ~35 faculty, full time staff and students who
      Established in 1985
      Faces software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment,
      Involved in national and international collaborations,
      Interacts with users in academia and industry,
      Maintains and support a distributed production environment (more than 5000 CPUs at UW),
      Educates and trains students.
    • 21. Condor Project:Main Threads of Activities
      Distributed Computing Research – develop and evaluate new concepts, frameworks and technologies
      Develop and maintain Condor; support our users
      More on next slide
      The Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructure
      The Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus
      The NSF Middleware Initiative (NMI) - Develop, build and operate a national Build and Test facility powered by Metronome (ETICS-II)
    • 22. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 23. Web Services
      XML
      DCE
      RPC
      DCOM
      RMI
      CORBA
      “Web services has dramatically reduced the programming and management cost of publishing and receiving information”
      Jim Gray, Microsoft Research
      EMBRACE – 4yr EU project to establish services for the bioinformatics community
    • 24. 3. Principles of service-oriented architectures
      Vision
      Provide the fundamental components to get the grid working
      History
      Starting point in I-WAY, a distributed high-performance network demonstrated at the SuperComputing '95 conference and exhibition
    • 25. …14 Years Later
      4 major versions
      Components to address the original problems
      Many new fields
      recent hot topics: service oriented science, virtualization
      Diverse application areas
      recently: lots of bioinformatics and medical apps
      others include: earthquakes, particle physics, earth sciences
    • 26. 21
      Globus Software now – many components
      Globus Projects
      OGSA-DAI
      GT4
      MPICH-
      G2
      Data
      Rep
      Replica
      Location
      Java Runtime
      MyProxy
      Delegation
      GridWay
      GridFTP
      MDS4
      CAS
      C Runtime
      GSI-
      OpenSSH
      Incubator
      Mgmt
      Reliable
      File
      Transfer
      GRAM
      Python Runtime
      C Sec
      GT4 Docs
      Incubator
      Projects
      Cog WF
      GAARDS
      VirtWkSp
      MEDICUS
      Others...
      Metrics
      OGRO
      GDTE
      UGP
      GridShib
      Dyn Acct
      Gavia JSC
      DDM
      LRMA
      HOC-SA
      PURSE
      Introduce
      WEEP
      Gavia MS
      SGGC
      ServMark
      Security
      Execution
      Mgmt
      Info
      Services
      Common
      Runtime
      Other
      Data Mgmt
    • 27. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 28. 4. Principles of distributed data management
    • 29. EGEE Project Overview
      17000 users
      136000 LCPUs (cores)
      25Pb disk
      39Pb tape
      12 million jobs/month
      +45% in a year
      268 sites
      +5% in a year
      48 countries
      +10% in a year
      162 VOs
      +29% in a year
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      24
    • 30. Middleware Supporting HTC
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      25
      Archeology
      Astronomy
      Astrophysics
      Civil Protection
      Comp. Chemistry
      Earth Sciences
      Finance
      Fusion
      Geophysics
      High Energy Physics
      Life Sciences
      Multimedia
      Material Sciences
      History of gLite
      • Development started in 2004
      • 31. Entered production in May 2006
      • 32. Middleware distribution of EGEE
      Supported End-user Activity
      • 13,000 end-users in 112 VOs
      • 33. +44% users in a year
      • 34. 23 core VOs
      • 35. A core VO has >10% of usage within its science cluster
    • gLite Middleware
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      26
      User Interface
      User Access
      External Components
      User Interface
      EGEE Maintained Components
      Information Services
      General Services
      Security
      Services
      Virtual Organisation Membership
      Service
      Workload
      Management Service
      Logging &
      Book keeping
      Service
      Hydra
      BDII
      Proxy Server
      AMGA
      File Transfer
      Service
      LHC File
      Catalogue
      Storage Element
      Compute Element
      SCAS
      CREAM
      LCG-CE
      Disk Pool Manager
      Authz. Service
      BLAH
      MON
      LCAS & LCMAPS
      dCache
      Worker Node
      gLExec
      Physical Resources
    • 36. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 37. The Computing “Eco-system”
      • Scientific need for all tiers!
      TIER 1
      Large-scale HPC centers
      Capability
      Computing
      National/regional centers, Grid-collaboration
      TIER 2
      Capacity
      Computing
      TIER3
      Local centers
      Personal/office computing
      TIER4
    • 38. 5. Principles of using distributed and high performance systems
      ARC middleware (Advanced Resource Connector)
      open source out-of-the-box Grid solution software which enables production quality computational and data Grids (released in May 2002)
      development is coordinated by NDGF
      emphasis is put on scalability, stability, reliability and performance
      builds upon standard OS solutions,OpenLDAP, OpenSSL, SASL and Globus Toolkit
      adds services not provided by Globus
      extends or completely replaces some Globus components
    • 39. NorduGrid collaboration*
      • a community around open source Grid middleware: ARC
      national Grids (e.g. M-grid, SweGrid, NorGrid), users also outside the Nordic countries
      real users, real applications
      implemented a production Grid system working non stop since May 2002
      open for anyone to participate
      * http://www.nordugrid.org/monitor
    • 40. M-grid ̶ the Finnish Material Sciences Grid
      • joint project between seven Finnish universities, Helsinki Institute of Physics and CSC
      partners are laboratories and departments and not university IT centers
      not limited by the field of research, used for a wide range of physical, chemical and nanoscience applications
      • jointly funded by the Academy of Finland and the participating universities
      • 41. first large initiative to put Grid middleware into production use in Finland
      • 42. goal: throughput computing capacity mainly for the needs of physics and chemistry researchers
      • 43. opened to all CSC customers in Nov 2005
    • Grids at CSC (HPC and Grids in Practice)
      • HP CP4000BL ProLiant Cluster
      • 44. 2176 processor cores
      • 45. 5 TB memory
      • 46. 11 TF peak performance
      • 47. Infiniband interconnect
      gLite on HP cluster
      ARC on HP cluster
      • Cray XT4/XT5
      • 48. 10960 computing cores
      • 49. 11.092 TB
      • 50. computing peak power 100.8 TF.
      • 51. Final configuration Q3/2008
      UNICORE on Cray MPP
    • 52. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 53. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
      OGSA-DAI Vision
      is to enable the sharing of data resources to enable collaboration, to support:
      Data access - access to structured data in distributed heterogeneous data resources.
      Data transformation e.g. expose data in schema X to users as data in schema Y.
      Data integration e.g. expose multiple databases to users as a single virtual database
      Data delivery - delivering data to where it's needed by the most appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP
    • 54. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
      OGSA-DAI History
      The OGSA-DAI project started in February 2002 as part of the UK e-Science Grid Core Program
      Is today part of OMII-UK, a partnership between:
      OMII, The University of Southampton
      myGrid, The University of Manchester
      OGSA-DAI, The University of Edinburgh
    • 55. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
      Vision of a Simple API for Grid Application - SAGA
      Provide simple programmatic interface that is widely-adopted, usable and available for enabling applications for the grid
      Simplicity:
      easy to use, install, administer and maintain
      Uniformity:
      provides support for different application programming languages as well as consistent semantics and style for different Grid functionality
      Scalability:
      Contains mechanisms for the same application (source) code to run on a variety of systems ranging from laptops to HPC resources
      Genericity:
      adds support for different grid middleware, even concurrent ones
      Modularity:
      provides a framework that is easily extendable
    • 56. 6. Higher level APIs: OGSA-DAI, SAGA and metadata management (S-OGSA)
      Metadata management: Make metadata Princess in the kingdom of Semantic Web
    • 57. Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows
    • 58. 7. Workflows
      Organize your work e.g:
      Gather initial data
      Pre-processing of data
      Define computing job(s)
      Initiate job(s)
      Gather results
      Post-processing of results
      :
      Repeat
      During the school you will understand how you can do this in different ways with the systems studied. But, this can also be done with specific workflow systems: Taverna, P-Grade Portal,…
    • 59. Motivations for developing P-GRADE portal
      P-GRADE portal should
      Give an answer for all the questions of an e-scientist
      Hide the complexity of the underlying grid middlewares
      Provide a high-level graphical user interface that is easy-to-use for e-scientists
      Support many different grid programming approaches (see Morris Riedel’s talk):
      Simple Scripts & Control (sequential and MPI job execution)
      Scientific Application Plug-ins (based on GEMLCA)
      Complex Workflows
      Parameter sweep applications: both on job and workflow level
      Interoperability: transparent access to grids based on different middleware technology
      Support three levels of parallelism
    • 60. Short History of P-GRADE portal
      Parallel Grid Application and Development Environment
      Initial development started in the Hungarian SuperComputing Grid project in 2003
      It has been continuously developed since 2003
      Detailed information:
      http://portal.p-grade.hu/
      Open Source community development since January 2008:
      https://sourceforge.net/projects/pgportal/
    • 61. Integrating Practical
      Principles of service-oriented architecture
      Principles of high-throughput computing
      Principles of distributed data management
      Principles of job submission and execution management
      Principles of using distributed and high performance systems
      Higher level APIs: OGSA-DAI, SAGA and metadata management
      Workflows