Technical status of the project by Bob Jones

429 views

Published on

  • Be the first to comment

  • Be the first to like this

Technical status of the project by Bob Jones

  1. 1. Technical Status of the Project Bob Jones
  2. 2. Overview <ul><li>Testbed status </li></ul><ul><li>Application status </li></ul><ul><li>Project retreat </li></ul><ul><ul><li>Issues and actions for software process, current (EDG 1.2) and future releases </li></ul></ul><ul><li>Tutorials </li></ul><ul><li>Summary </li></ul>
  3. 3. Testbed Status <ul><li>Application testbed </li></ul><ul><ul><li>Running EDG 1.2 on 5 core sites (and a couple of others) </li></ul></ul><ul><ul><ul><li>Since first week of August (several months later than initially planned) </li></ul></ul></ul><ul><ul><li>Users guide and release notes available (installation guide will come later) </li></ul></ul><ul><ul><li>Being used for application tests </li></ul></ul><ul><ul><li>Current “show-stopper” issues found: </li></ul></ul><ul><ul><ul><li>Long job status problem (can’t retrieve output) </li></ul></ul></ul><ul><ul><ul><li>Long file transfers problem (20 mins limit) </li></ul></ul></ul><ul><li>Development testbed </li></ul><ul><ul><li>Testing urgent updates to EDG 1.2 </li></ul></ul><ul><ul><ul><li>More recent beta release of Globus 2 </li></ul></ul></ul><ul><ul><ul><li>Various fixes for data management chain </li></ul></ul></ul>
  4. 4. Application Status <ul><li>WP8: High Energy Physics </li></ul><ul><ul><li>LHC experiments doing tests now </li></ul></ul><ul><ul><li>ATLAS task force </li></ul></ul><ul><li>WP9: Earth Observation </li></ul><ul><ul><li>Installation of EDG 1.2 at ESA done </li></ul></ul><ul><ul><li>Testing to start in September </li></ul></ul><ul><li>WP10: Biology </li></ul><ul><ul><li>Initial tests made with EDG 1.2 </li></ul></ul><ul><li>Overall comments: </li></ul><ul><ul><li>General confusion about how best to use data mgmt tools </li></ul></ul><ul><ul><li>Software not yet stable enough and insufficient diagnostics information available </li></ul></ul><ul><ul><li>Too difficult to configure </li></ul></ul><ul><ul><li>Concern that EDG 1.2 in its current configuration will not scale easily to ~40 sites </li></ul></ul>
  5. 5. ATLAS Task Force <ul><li>Task for with ATLAS & EDG people (lead by Oxana Smimova) </li></ul><ul><ul><li>http:// cern . ch / smirnova /atlas- edg </li></ul></ul><ul><li>ATLAS is eager to use Grid tools for the Data Challenges </li></ul><ul><ul><li>ATLAS Data Challenges are already on the Grid (NorduGrid, iVDGL) </li></ul></ul><ul><ul><li>The DC1/phase2 (to start in October) is expected to be done mostly using the Grid tools </li></ul></ul><ul><li>By September 16 (ATLAS SW week) evaluate the usability of EDG for the DC tasks </li></ul><ul><li>The task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one non-EDG site (Karlsruhe) </li></ul><ul><li>Intensive activity has meant they could process some partitions but problems with long running jobs is still an issue </li></ul><ul><li>Data Management chain is proving difficult to use and sometime unreliable </li></ul><ul><li>Need to clarify policy for distribution/installation of applications software </li></ul><ul><li>On-going activity with very short-timescale: highest priority task </li></ul>
  6. 6. Project Retreat <ul><li>Project retreat held last week (27 & 28 August) at Chevannes </li></ul><ul><li>~45 participants </li></ul><ul><ul><li>work package managers, architecture group, quality group, applications groups, mware experts, representatives from LCG, DataTAG, Globus & Condor </li></ul></ul><ul><li>Agenda and material on the web: </li></ul><ul><ul><li>http://documents. cern . ch /age?a021130 </li></ul></ul><ul><ul><li>Photos by Jeff Templon http://www.nikhef.nl/~templon/chavannes/index.html </li></ul></ul><ul><li>3 sessions addressing most important aspects of projects current work: </li></ul><ul><ul><li>Software Release Process </li></ul></ul><ul><ul><li>Release 1.2 </li></ul></ul><ul><ul><li>Testbed 2 </li></ul></ul>
  7. 7. Software Process <ul><li>Over-simplification of the current situation: </li></ul><ul><ul><li>Mware groups develop software in isolation </li></ul></ul><ul><ul><li>ITeam assembles it as best it can </li></ul></ul><ul><ul><li>Site managers are asked to install it </li></ul></ul><ul><ul><li>Application groups are asked to test it </li></ul></ul><ul><li>Problems: </li></ul><ul><li>No place for the mware groups to integrate software before delivering it to the ITeam </li></ul><ul><li>Inadequate software testing – leads to installation/configuration/execution faults </li></ul><ul><li>We are running blind – no way to control or reliably plan software delivery </li></ul>
  8. 8. Software Process: Autobuild <ul><li>A release manager will be nominated with overall responsibility for ensuring the procedure is followed </li></ul><ul><li>Make autobuild tools the basis of the daily work of the mware groups and ITeam </li></ul><ul><ul><li>Nightly build from CVS repository for all software </li></ul></ul><ul><ul><ul><li>Problems must be fixed ASAP – checked by Quality Group reps </li></ul></ul></ul><ul><ul><li>Mware groups give ITeam CVS tags instead of RPMs </li></ul></ul><ul><ul><ul><li>Tagged software must be documented </li></ul></ul></ul><ul><ul><li>Mware group must perform and supply unit tests </li></ul></ul><ul><ul><ul><li>Integrated with nightly build </li></ul></ul></ul><ul><li>Tagged software that fails the integration, testing or is inadequately documented will be rejected </li></ul><ul><ul><li>Mware group is responsible for fixing it </li></ul></ul>
  9. 9. Software Process: Quality Group <ul><li>Recently formed Quality Group, convened by Gabriel Zaquine, is responsible for ensuring quality issues are addressed within the WPs </li></ul><ul><ul><li>Ensure unit test plans are complete and followed </li></ul></ul><ul><ul><li>Follow-up on problems reported bugzilla & nightly builds </li></ul></ul><ul><ul><li>Organise running of code checking tools on all EDG software </li></ul></ul><ul><ul><li>Agree on adopted project developer-guidelines etc. </li></ul></ul><ul><li>http://eu-datagrid.web.cern.ch/eu-datagrid/QAG/default.htm </li></ul>
  10. 10. Software Process: Testing <ul><li>Strengthen the Testing Group </li></ul><ul><ul><li>Identify leader and a small number of full-time testers </li></ul></ul><ul><ul><li>Assemble and maintain test suite integrated with autobuild tools </li></ul></ul><ul><li>Automate installation and configuration of software releases </li></ul><ul><ul><li>To permit auto testing need to be able to auto install & configure a release on a pre-defined small example site </li></ul></ul><ul><ul><li>Needs improvements by mware WPs to simplify and complete installation & configuration of their sw </li></ul></ul><ul><ul><li>Site managers have good overview about how to do this </li></ul></ul><ul><ul><ul><li>Need to clarify the work involved during this conference </li></ul></ul></ul><ul><li>Set-up certificate testbed </li></ul><ul><ul><li>Used for testing activities </li></ul></ul><ul><ul><li>Involves several sites </li></ul></ul>
  11. 11. Technical Management <ul><li>Architecture group documenting testbed 2 architecture </li></ul><ul><ul><li>draft: http://doc. cern . ch /archive/electronic/other/agenda/a021130/a021130s4t1/TB2Arch_v0_1.doc </li></ul></ul><ul><ul><li>Meets once a month (next meeting tomorrow) </li></ul></ul><ul><li>Project Tech. Board addresses deliverables and relationships with other projects </li></ul><ul><ul><li>Meets once per quarter (next meeting 2 nd October @ CERN) </li></ul></ul><ul><ul><li>http://documents. cern . ch /AGE/current/ displayLevel . php ?fid=3l131 </li></ul></ul><ul><li>Need more frequent technical management forum </li></ul><ul><ul><li>Authority to make technical & architectural decisions affecting sw development in WPs </li></ul></ul><ul><ul><li>Include WP managers, chaired by the Technical Coordinator </li></ul></ul><ul><ul><ul><li>Can call on mware experts according to needs of the themed agenda </li></ul></ul></ul><ul><ul><li>Meets frequently to ensure issues are addressed rapidly </li></ul></ul><ul><ul><ul><li>Associate with WP managers weekly meeting </li></ul></ul></ul><ul><li>Relationship with Architecture Group and Project Tech. Board needs to be clarified </li></ul>
  12. 12. Testbed Support <ul><li>Strengthen user support group </li></ul><ul><ul><li>Ensure people involved have sufficient knowledge of the software </li></ul></ul><ul><ul><li>Emphasis on the accurate and usefulness of the responses provided </li></ul></ul><ul><ul><ul><li>Tools used for support are a secondary issue </li></ul></ul></ul><ul><ul><li>Federate with equivalent groups from other projects </li></ul></ul><ul><ul><li>Provides support on the application testbed </li></ul></ul><ul><li>Clarify & document procedures </li></ul><ul><ul><li>Creating a new CA (CA group) </li></ul></ul><ul><ul><ul><li>Need to reduce time involved (currently 3 months) </li></ul></ul></ul><ul><ul><li>Site Installation (site managers & ITeam) </li></ul></ul><ul><ul><ul><li>Steps for system manager and requirements for a site to join the testbed </li></ul></ul></ul><ul><ul><li>Creating & Managing a Virtual Organisation (site managers & ITeam) </li></ul></ul><ul><ul><ul><li>Steps involved and tasks of a VO manager </li></ul></ul></ul>
  13. 13. Release Development Continuous support CVS Autobuild Nightly build Incremental Improvements Continuous Support 1.2 2.0 branch Current “show-stoppers” fixed Incremental Improvements from mware WPs 2.x Application tests satisfied branch Migrate sites Etc. patches Changes foreseen for 1.3 & 1.4 become “incremental improvements” patches
  14. 14. Incremental Steps from EDG 1.2 <ul><li>Fix “show-stoppers” for application groups – mware WPs ( continuous ) </li></ul><ul><li>Build EDG1.2.x with autobuild tools - Iteam </li></ul><ul><li>Integrate testing framework and limited automatic tests with autobuild tools - testing group </li></ul><ul><li>Automatic installation & configuration procedure for pre-defined site (can’t auto test without it) </li></ul><ul><li>Start autobuild server for RH 7.2 and attempt build of release 1.2 – Yannick Patois </li></ul><ul><li>New LCFG - WP4 </li></ul><ul><li>GridFTP server access to MSS - WP5 </li></ul><ul><li>Giggle & Reptor – WP2 </li></ul><ul><li>LCAS with dynamic plug-in modules – WP4 </li></ul><ul><li>NetworkCost Function – WP7 </li></ul><ul><li>Integrate mapcentre (nordugrid?) and R-GMA – WP3 </li></ul><ul><li>GLUE modified info providers/consumers – WP1,4,5 </li></ul><ul><li>Res. Broker – WP1 </li></ul><ul><li>LCFG for RH 7.2 – WP4 </li></ul><ul><li>Integration with Condor as batch system – WP4 </li></ul>What do we do about: Space mgmt, VOMS, slashrgrid? End Sept 2002 Expect this list to be discussed/updated this week
  15. 15. EDG Tutorial <ul><li>DAY1 </li></ul><ul><li>Tutorial introduction </li></ul><ul><li>Introduction to Grid computing and overview of the DataGrid project </li></ul><ul><li>Security </li></ul><ul><li>Testbed overview </li></ul><ul><li>Job Submission </li></ul><ul><li>lunch </li></ul><ul><li>hands-on exercises: job submission </li></ul><ul><li>DAY2 </li></ul><ul><li>Data Management </li></ul><ul><li>LCFG, fabric mgmt & sw distribution & installation </li></ul><ul><li>Applications and Use cases </li></ul><ul><li>Future Directions </li></ul><ul><li>lunch </li></ul><ul><li>hands-on exercises: data mgmt </li></ul>The tutorials are aimed at users wishing to &quot;gridify&quot; their applications using EDG software and are organized over 2 full consecutive days http://hep-proj-grid-tutorials.web.cern.ch/hep-proj-grid-tutorials/dry.asp user :griduser passwd :tutorials123
  16. 16. Tutorial rehearsal <ul><li>Rehearsal at CERN, 29 & 30 August </li></ul><ul><ul><li>19 participants (members of project or closely related) to check material & approach </li></ul></ul><ul><li>Lessons learnt </li></ul><ul><ul><li>Can’t cover as much material as we hoped (goes to fast) </li></ul></ul><ul><ul><ul><li>Explain why not just how </li></ul></ul></ul><ul><ul><ul><li>Avoid details – can read them from references afterwards </li></ul></ul></ul><ul><ul><li>Need as many helpers as possible for hands-on exercises </li></ul></ul><ul><ul><li>Participants have difficulties with certificate management </li></ul></ul><ul><ul><ul><li>All participants must have a certificate ready for them and be in the same VO </li></ul></ul></ul><ul><li>Generated a lot of enthusiasm in the participants and EDG people doing the hands-on </li></ul><ul><ul><li>Found genuine bugs during hands-on exercises </li></ul></ul><ul><ul><li>Recommend mware WPs send developers to help with hands-on exercises </li></ul></ul><ul><ul><li>New project people should follow the tutorial </li></ul></ul><ul><li>Thanks to: </li></ul><ul><ul><ul><li>Mario Reale, Elisabetta Ronchieri , Akos Frohner, Erwin Laure, Peter Kunszt, Antony Wilson, Steve Fisher, Maite Barroso Lopez, Owen Synge, Emanuele Leonardi, Steve Traylen, Frank Bonnassieux, Christophe Jacquet, Sophie Nicoud, Karin Burghauser & CERN training people </li></ul></ul></ul>
  17. 17. Tutorial Schedule <ul><li>CERN school of Computing, Naples, 23-27 September </li></ul><ul><ul><li>80 participants. Hands-on exercises only (presentations by Carl Kesselman & Ian Foster) </li></ul></ul><ul><ul><li>ALL EDG people attending should do exercises first and help others at the school </li></ul></ul><ul><li>CERN, October 3 & 4 </li></ul><ul><li>NeSC, Edinburgh, December </li></ul><ul><ul><li>Dates still moving. Maximum 30 participants (more for the presentations) </li></ul></ul><ul><li>We could accommodate more sites in December, January etc. </li></ul><ul><ul><li>Sites must provide support and handle logistics </li></ul></ul><ul><ul><ul><li>Organisers/helpers must attend tutorial at another site first </li></ul></ul></ul><ul><li>The tutorial does represent some load on the testbed (own VO & cert. creation) </li></ul><ul><li>For the future </li></ul><ul><ul><li>Hands-on exercises are a test suite - automate and run with the nightly checks </li></ul></ul><ul><ul><li>The material must be kept up to date with each public release of the software </li></ul></ul><ul><ul><ul><li>We need to nominate people responsible for the different chapters of the tutorial to be responsible for ensuring the slides and exercises are kept up to date </li></ul></ul></ul>
  18. 18. Summary <ul><li>Addressing the serious bugs found by the application groups on the testbed is the task with the highest priority </li></ul><ul><li>Testing activities need more resources </li></ul><ul><li>Test-bed support is becoming a more important task </li></ul><ul><li>Future releases must continue to address the needs of the application groups </li></ul><ul><li>We need to clarify the following points during this conference: </li></ul><ul><ul><li>Autobuild status & how automate installation & configuration </li></ul></ul><ul><ul><li>Contents of the test-suites </li></ul></ul><ul><ul><li>Release plans until the next EU review </li></ul></ul><ul><li>In short: </li></ul><ul><ul><ul><li>What we are doing is right, we are just going about it in a sloppy manner </li></ul></ul></ul><ul><ul><ul><li>Need to go one step at a time and ensure each step works </li></ul></ul></ul>

×