• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dissertation defense
 

Dissertation defense

on

  • 2,253 views

Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual ...

Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/).
Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory:
http://www.plgrid.pl/en/pr_materials/posters
Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab

Statistics

Views

Total Views
2,253
Views on SlideShare
2,237
Embed Views
16

Actions

Likes
0
Downloads
27
Comments
0

3 Embeds 16

http://www.linkedin.com 9
http://www.slideshare.net 6
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dissertation defense Dissertation defense Presentation Transcript

    • Data source registrationin the VirtualLaboratory
      Marek Pomockamajor: applied computer science
      specialisation: computer techniques in science and technology
      Faculty of Physics and Applied Computer Science,
      AGH University of Science and Technology
      Supervisor: Marian Bubak, Ph.D.
      Consultants:PiotrNowakowski, M.Sc.
      Daniel Harężlak, M.Sc.
      Master’s thesis defenseNovember 13, 2009
    • Outline
      Introduction to Grid technologies and Virtual Laboratories
      Motivation and Objectives
      Conceptual view onto the solution
      Challenges and solutions
      Applications
      Future work
      Summary
      References
    • Grid technologies and Virtual Laboratories
      3
    • Gridis a distributedcomputingarchitecturewithcross-organizationalaccess, providingnontrivialquality of service for participatingactors.
    • Notable applications include
      high-energy physics (LHC)
      Weather forecasting
      Natural disaster modelling
      Complex parameter studies in biomedicine and biochemistry
      Digital image archives
    • Gridis a computer infrastructure
      .. dedicated to conducting in-silico research
      created by many partners
      who share supercomputers, computer clusters, storage andresearch instruments
      TASK
      PCSS
      ICM
      WCSS
      CYFRONET
    • to create common space for e-Science
    • which are dynamic by their nature
      Grid users are Virtual Organizations(VOs)
      VO approach simplifies access management
      CYFRONET
      PSNC
      CYFRONET
      PSNC
    • Examples of Grids
      EGEE, DEISA
      TeraGrid
      Open Science Grid
    • Virtual Laboratories (VLs)supplyhigher-level services and abstractlow-leveldetailsrelated to Grid services invocations, security etc. awayfromend-users.
      VirtualLaboratory
      Gridmiddleware
      Many VLsendeavor to be general purposein-silico(orvirtual)experiment design and execution environment,
      Gridinfrastructure
      e.g. GridSpace VirtualLaboratory.
    • Others are often designed for specific purpose
      such as remote access to scientific instruments (e.g. VLAB)
      supporting research in meteorology (LEAD)
      research and decision support in virology
      (ViroLab)
    • Virtual experiments in VLs are expressed using script-based languages (e.g. in GridSpace, Athena, Geodise)
      if (condition) then

      else

      end
      … or using workflow languages (e.g. in VL-e, VLAB, myExperiment, myGridTaverna, Kepler, Triana, Pegasus)
      Virtual Laboratory
      VLs made Grids available to non-computer scientists.
      Grid
      Users
    • Motivation and Objectives
      13
    • Hello, I’m a chemist. I use Gaussian program and work mostly with files. I’d like to use Grids, but filesystem is far too complex for me.
      ... the security system is complicated too.
      Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.
    • Objectives
      The objective of the dissertation is to meet these needs by enabling access to LFC data sources from GridSpace scripts concealing most of interactions with Grid Security Infrastructure (GSI).
      This goal entails several other objectives:
      Data Source
      Registry
      reorganization
      Integration with GridSpace Engine
      extending DSR EPE plug-in
      DAC2
      GSEngine
      LFC DS
    • Conceptual view onto the solution
    • Challenges and solutions
      17
    • Not to comprise GSEngine portability
      Windows
      Linux
      Scientific Linux 4 (SL4)
      UNIX
      Mac OS X
      Isolation of platform dependent code into a remote service
      Solution:
      GScript LFC integration
      GSEngine
      LFC connector
      LFC client library
      LFC DS Server
      Platform independent
      Platform dependent
    • Serve multiple users utilizing inherently single user gLite libraries.
      Solution:
      ChemPo command wrappers – each command is run in new JVM with prepared UNIX environment.
      Worker 1 JVM
      LFC DS Server
      Cert1
      Key1
      (Server
      JVM)
      Worker 2 JVM
      Cert2
      Key2
      Instead of permanent place for a credentials (e.g. ~/.globus/), use temporary files and specify paths dynamically in UNIX environment of created JVM processes.
    • Enabling access to Grid files without downloading them to GSEngine machine
      First, download file to LFC DS Server. Then, stream it to client.
      Grid File Access Library (GFAL)
      ChemPo command wrappers do not support such a mode of operation (streaming to client)
      Vice—versa for sending file to Grid, i.e. stream file to LFC DS Server, then send it to Grid.
    • Streaming representation in GridSpace scripts
      Solution: User receives modified version of Ruby IO object (sending file to Grid happens on file close operation while retrieving a file from Grid during object initialization)
      Reading a Grid file
      ds.open("mpomocka/test_file", "r") do |file|
      file.each {|line| puts line}
      end
      f = ds.open("mpomocka/test_file", :r)
      f.each {|line| puts line}
      f.close
      Writing to a Grid file
      f = ds.open("mpomocka/test_file",:write)
      f.puts "First line of the file test_file"
      f.puts "Second line of the file test_file"
      f.close
      Alternatively
      ds.open("mpomocka/test_file",:w) do |f|
      f.puts "Another way to write to a file"
      f.puts "Note that close is not necessary“
      end
    • Need for a descriptive and intuitive API
      mimicking Ruby file operations, e.g. exist?, file?
      e.g. create_directory instead of mkdir
      DAC2 LFC DS methods
      Method name, Aliases
      createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path),isFile, is_filegetFile(filename),get_filegetSize(path),size, size?, get_sizelistFiles(path),list_filesopenFile(path, mode, &b),open, open_filestoreFile(payload, filename),store_filezero?(path)
    • Securecommunication
      Security
      Tunnelling is simpler
      Transport Layer Security
      Need to manage keystores
      Credentials management
      Proxy certificate generation
       Java CoG Kit
      Data Source Registry
      Credentials are stored in DSR
      Credentials can be set static, i.e. shared with other authenticated users
    • Proxy generated automatically during initialization
    • Information needs – previous DSR structure did not enable storage of LFC data sources information nor gLite credentials.
      Solution:
      RelationalDataSources
      DataSources
      DataSources
      +
      +
      LFCDataSources
      LFCCertData
      LFCDSConnections
      Also changes to DAC2 and DSR EPE Plug-in DSR access modules.
    • GUI for registering data source of new type
      Created as a new form in EPE DSR Plug-in
      In addition, some new DSR access methods were created in DSR EPE Plug-in.
    • Selection of distributed computing approach
    • Exchanging large files – how to avoid OutOfMemoryerrors?
      Solution: employ RMIIO library (RemoteInputStream[Server] and RemoteOutputStream[Server] classes)
      Figure illustrates downloading a file to client
    • Figure – sending a file from client to server
      Additional benefits of using RMIIO:
      Compressed socket-based communication
      Automatic
      retry
    • Solution scales linearly
      Figure – download and upload times up to 2Gb when tested locally on ChemPo server
    • PL-Grid:
      Polish Infrastructure for Information Science Support in the European Research Space.
      Chemistry Portal – ChemPo
      Applications
    • Finer-grained security
      Pseudo memory mapped-file API (Pseudo MMAP)
      Future work
    • Summary
      33
    • LFC DS Server
      LFC DS client Java library
      New DAC2 API
      DAC2 LFC connector
      DAC2 LFC DS methods
      Method name, Aliases
      createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directory….
    • Automated and transparent handling of Grid credentials
      Extended EPE DSR Plug-in
      Reorganized DSR Schema
    • References
      [1] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as partof theCracowGridWorkshop ’09, Krakow, Poland, 12-14 October 2009.
      [2] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH.to appear
      [3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007.
      [4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global
      [5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61
      [6] M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009.
      36