Dissertation defense

1,877 views

Published on

Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/).
Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory:
http://www.plgrid.pl/en/pr_materials/posters
Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,877
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dissertation defense

  1. 1. Data source registrationin the VirtualLaboratory<br />Marek Pomockamajor: applied computer science<br />specialisation: computer techniques in science and technology<br />Faculty of Physics and Applied Computer Science,<br />AGH University of Science and Technology<br />Supervisor: Marian Bubak, Ph.D.<br />Consultants:PiotrNowakowski, M.Sc.<br /> Daniel Harężlak, M.Sc.<br />Master’s thesis defenseNovember 13, 2009<br />
  2. 2. Outline<br />Introduction to Grid technologies and Virtual Laboratories<br />Motivation and Objectives<br />Conceptual view onto the solution<br />Challenges and solutions<br />Applications<br />Future work<br />Summary<br />References<br />
  3. 3. Grid technologies and Virtual Laboratories<br />3<br />
  4. 4. Gridis a distributedcomputingarchitecturewithcross-organizationalaccess, providingnontrivialquality of service for participatingactors.<br />
  5. 5. Notable applications include<br />high-energy physics (LHC)<br />Weather forecasting<br />Natural disaster modelling<br />Complex parameter studies in biomedicine and biochemistry<br />Digital image archives<br />
  6. 6. Gridis a computer infrastructure<br />.. dedicated to conducting in-silico research<br />created by many partners<br />who share supercomputers, computer clusters, storage andresearch instruments<br />TASK<br />PCSS<br />ICM<br />WCSS<br />CYFRONET<br />
  7. 7. to create common space for e-Science<br />
  8. 8. which are dynamic by their nature<br />Grid users are Virtual Organizations(VOs)<br />VO approach simplifies access management<br />CYFRONET<br />PSNC<br />CYFRONET<br />PSNC<br />
  9. 9. Examples of Grids<br />EGEE, DEISA<br />TeraGrid<br />Open Science Grid<br />
  10. 10. Virtual Laboratories (VLs)supplyhigher-level services and abstractlow-leveldetailsrelated to Grid services invocations, security etc. awayfromend-users.<br />VirtualLaboratory<br />Gridmiddleware<br />Many VLsendeavor to be general purposein-silico(orvirtual)experiment design and execution environment, <br />Gridinfrastructure<br />e.g. GridSpace VirtualLaboratory.<br />
  11. 11. Others are often designed for specific purpose<br />such as remote access to scientific instruments (e.g. VLAB)<br />supporting research in meteorology (LEAD)<br />research and decision support in virology<br />(ViroLab)<br />
  12. 12. Virtual experiments in VLs are expressed using script-based languages (e.g. in GridSpace, Athena, Geodise)<br />if (condition) then<br /> …<br />else<br /> …<br />end<br />… or using workflow languages (e.g. in VL-e, VLAB, myExperiment, myGridTaverna, Kepler, Triana, Pegasus)<br />Virtual Laboratory<br />VLs made Grids available to non-computer scientists. <br />Grid<br />Users<br />
  13. 13. Motivation and Objectives<br />13<br />
  14. 14. Hello, I’m a chemist. I use Gaussian program and work mostly with files. I’d like to use Grids, but filesystem is far too complex for me.<br />... the security system is complicated too.<br />Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.<br />
  15. 15. Objectives<br />The objective of the dissertation is to meet these needs by enabling access to LFC data sources from GridSpace scripts concealing most of interactions with Grid Security Infrastructure (GSI).<br />This goal entails several other objectives:<br />Data Source<br />Registry<br />reorganization<br />Integration with GridSpace Engine<br />extending DSR EPE plug-in<br />DAC2<br />GSEngine<br />LFC DS<br />
  16. 16. Conceptual view onto the solution<br />
  17. 17. Challenges and solutions<br />17<br />
  18. 18. Not to comprise GSEngine portability<br />Windows<br />Linux<br />Scientific Linux 4 (SL4)<br />UNIX<br />Mac OS X<br />Isolation of platform dependent code into a remote service<br />Solution:<br />GScript LFC integration<br />GSEngine<br />LFC connector<br />LFC client library<br />LFC DS Server<br />Platform independent<br />Platform dependent<br />
  19. 19. Serve multiple users utilizing inherently single user gLite libraries.<br />Solution:<br />ChemPo command wrappers – each command is run in new JVM with prepared UNIX environment.<br />Worker 1 JVM<br />LFC DS Server<br />Cert1<br />Key1<br />(Server<br />JVM)<br />Worker 2 JVM<br />Cert2<br />Key2<br />Instead of permanent place for a credentials (e.g. ~/.globus/), use temporary files and specify paths dynamically in UNIX environment of created JVM processes.<br />
  20. 20. Enabling access to Grid files without downloading them to GSEngine machine<br />First, download file to LFC DS Server. Then, stream it to client. <br />Grid File Access Library (GFAL)<br />ChemPo command wrappers do not support such a mode of operation (streaming to client)<br />Vice—versa for sending file to Grid, i.e. stream file to LFC DS Server, then send it to Grid.<br />
  21. 21. Streaming representation in GridSpace scripts<br />Solution: User receives modified version of Ruby IO object (sending file to Grid happens on file close operation while retrieving a file from Grid during object initialization)<br /> Reading a Grid file<br />ds.open(&quot;mpomocka/test_file&quot;, &quot;r&quot;) do |file|<br />file.each {|line| puts line}<br />end<br />f = ds.open(&quot;mpomocka/test_file&quot;, :r)<br />f.each {|line| puts line}<br />f.close<br /> Writing to a Grid file<br />f = ds.open(&quot;mpomocka/test_file&quot;,:write)<br />f.puts &quot;First line of the file test_file&quot;<br />f.puts &quot;Second line of the file test_file&quot;<br />f.close<br /> Alternatively<br />ds.open(&quot;mpomocka/test_file&quot;,:w) do |f|<br />f.puts &quot;Another way to write to a file&quot;<br />f.puts &quot;Note that close is not necessary“<br />end<br />
  22. 22. Need for a descriptive and intuitive API<br />mimicking Ruby file operations, e.g. exist?, file?<br />e.g. create_directory instead of mkdir<br />DAC2 LFC DS methods<br />Method name, Aliases<br />createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path),isFile, is_filegetFile(filename),get_filegetSize(path),size, size?, get_sizelistFiles(path),list_filesopenFile(path, mode, &b),open, open_filestoreFile(payload, filename),store_filezero?(path)<br />
  23. 23. Securecommunication<br />Security<br />Tunnelling is simpler<br />Transport Layer Security<br />Need to manage keystores<br />Credentials management<br />Proxy certificate generation<br /> Java CoG Kit<br />Data Source Registry<br />Credentials are stored in DSR<br />Credentials can be set static, i.e. shared with other authenticated users<br />
  24. 24. Proxy generated automatically during initialization<br />
  25. 25. Information needs – previous DSR structure did not enable storage of LFC data sources information nor gLite credentials.<br />Solution:<br />RelationalDataSources<br />DataSources<br />DataSources<br />+<br />+<br />LFCDataSources<br />LFCCertData<br />LFCDSConnections<br />Also changes to DAC2 and DSR EPE Plug-in DSR access modules.<br />
  26. 26. GUI for registering data source of new type<br />Created as a new form in EPE DSR Plug-in<br />In addition, some new DSR access methods were created in DSR EPE Plug-in.<br />
  27. 27. Selection of distributed computing approach<br />
  28. 28. Exchanging large files – how to avoid OutOfMemoryerrors?<br />Solution: employ RMIIO library (RemoteInputStream[Server] and RemoteOutputStream[Server] classes)<br />Figure illustrates downloading a file to client<br />
  29. 29. Figure – sending a file from client to server<br />Additional benefits of using RMIIO: <br />Compressed socket-based communication<br />Automatic<br />retry<br />
  30. 30. Solution scales linearly<br />Figure – download and upload times up to 2Gb when tested locally on ChemPo server<br />
  31. 31. PL-Grid:<br />Polish Infrastructure for Information Science Support in the European Research Space.<br />Chemistry Portal – ChemPo<br />Applications<br />
  32. 32. Finer-grained security<br />Pseudo memory mapped-file API (Pseudo MMAP)<br />Future work<br />
  33. 33. Summary<br />33<br />
  34. 34. LFC DS Server<br />LFC DS client Java library<br />New DAC2 API<br />DAC2 LFC connector<br />DAC2 LFC DS methods<br />Method name, Aliases<br />createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directory….<br />
  35. 35. Automated and transparent handling of Grid credentials<br />Extended EPE DSR Plug-in<br />Reorganized DSR Schema<br />
  36. 36. References<br />[1] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as partof theCracowGridWorkshop ’09, Krakow, Poland, 12-14 October 2009.<br />[2] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH.to appear<br />[3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007.<br />[4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global<br />[5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61<br />[6] M. Malawski, T. Bartynski, and M. Bubak, &quot;Invocation of operations from script-based grid applications,&quot; Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009.<br />36<br />

×