Your SlideShare is downloading. ×
STAR STAR Grid Activities, OSG and Beyond
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

STAR STAR Grid Activities, OSG and Beyond


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Note: Ruth discussed with us sending a team at UIC to install and debug the OSG software stack and data transfer components. We did not close the loop on this, hence a note here indicating where OSG could help. UCM issue – well, we can’t run production in STAR (we know that for 10 years) without immediate monitoring … otherwise, one bad apple and the science go south. We cannot trade good science for more resource, hence need to bring an application monitoring component like UCM ASAP.
  • EGEE and interoperability will be an issue for STAR as we expand to China especially … Our partners in Europe never told us to “keep off”. Perhaps a political issue rather than interoperability issue – if so, this may be an OSG EB issue.
  • Transcript

    • 1. STAR Grid Activities, OSG and Beyond D. Olson a for the STAR Collaboration The STAR Grid Team: W. Betts b , L. Didenko b , T. Freeman c , P. Jakl b , L. Hajdu b , E. Hjort a , K. Keahey c , J. Lauret b , D. Olson a , A. Rose a , I. Sakrejda a , A. Sim a a LBNL, b BNL, c ANL
    • 2. Abstract
      • We will present the ongoing grid efforts of the STAR experiment within the Open Science Grid (OSG) and beyond, as well as the integration of resources in Europe, Asia and South America. STAR is a founding member of the OSG Consortium and has several functioning resources on OSG, its main facilities at BNL/RCF and LBNL/NERSC as well as universities, Wayne & Birmingham. Additional resources are in process of connecting to OSG. Numerous distributed resources used by STAR collaborators are employing grid or grid-inspired technologies. Common examples are the usage of grid job submission tools with the STAR standard workload service called SUMS and the use of data handling and transfer tools across grids. Maximizing on heterogeneity of resources while minimizing in-house platform support efforts, evaluation of the dynamic deployment of reliable data analysis framework via STAR validated software stack with Xen virtual machine is being thoroughly investigated, leveraging advanced VM technologies and research from the CEDPS project.
    • 3. Contents
      • Background/History
      • Open Science Grid Deployments and Usage
      • Other Distributed Computing Usage
      • Asian Activities
      • Workload Scheduling (SUMS)
      • Virtualization & Cloud Computing
      • Conclusion
    • 4. Background/History
      • STAR has been participating in the U.S. grid activities since the early days of the Particle Physics Data Grid (1999) and a founding member of the Open Science Grid.
      • Starting with involvement of LBNL and BNL, activities now include collaborators also at Wayne State, MIT, Univ. Chicago, Birmingham, Sao Paolo, Prague and ANL.
      • Additionally
        • SUN Grid, 2007
        • MIT Xgrid, 2006+
        • Xen, Amazon EC2, 2007+
    • 5. PDSF Berkeley LAB Brookhaven National Lab Fermi Lab University of Birmingham Wayne State University STAR Grid STAR Grid = 90% of Grid resources part of the OpenScience Grid                               
    • 6. MIT X-grid SunGrid NPI, Czech Republic Interoperability / outreach Virtualization VDT extension SRM / DPM / EGEE STAR is also outreaching other grid resources & projects                               
    • 7. Resources used by STAR
      • 6 main dedicated sites (STAR software fully installed)
      • BNL Tier0
      • NERSC/PDSF Tier1
      • WSU (Wayne State University) Tier2
      • BHAM (Birmingham, England) Tier2
      • UIC (University of Illinois, Chicago) Tier2
      • Incoming
      • Prague Tier2
      • Other resources
      • FermiGrid - non STAR dedicated ; simulation production 10% level
      • SunGrid – commercial (free for STAR) ; event generation 1-2% level
      • MIT Xgrid cluster – analysis mainly ; working on Globus GK for Mac OSX
      • EC2 cluster (Elastic Computing Cloud) ; event generation for now ; exercise on Xen based virtualization 1-2% level
    • 8. BeStMan SRM Berkeley Storage Manager
      • SRM interface with caching for data transfer
      • We use for bulk data transfer as well as asynchronous data placement in job workflow.
      • Expect to deploy BeStMan-Xrootd interface bestman /
    • 9. OSG usage Usage - Process Hours / Week
    • 10. Proof of Principle Initial Successes and Benefits from OSG
      • Year 1 OSG Milestone for STAR: 
        • Migration of 80% or more of the simulation production to OSG based operation
      • Simulation production - 97% efficiency achieved
        • Exceeds expectations (we targeted a satisfactory level between 75% to 85% success)
      • Site used are not necessarily STAR dedicated (FermiGrid)
        • Especially: STAR received help from Fermi resources and the FNAL team in June 2007
          • several k CPU hours loaned on emergency request
          • as small as it seems, this help made the difference
        • This part of resource loan worked and is an important proof of principle of OSG benefit
      Before resubmission. After resubmission Efficiency of job execution via OSG infrastructure.
    • 11. Other grid/distributed activities
      • Xgrid at MIT
        • Adam Kocoloski, Michael Miller Leve Hajdu
        • Mac OS X, 50 desktops
        • Scavenging spare cycles
        • Doing STAR data analysis via SUMS so same UI for analysis
        • Xgrid/Globus job manager in test
      • Prague, EGEE Tier2 site
        • Michal Zerola, Pavl Jakl
        • High-performance data transfer using multiple srmcp to DPM in Prague (next slide)
      • SUN Grid
        • Production of STAR Geant simulations on SUN utility computing resources.
    • 12. Data transfer to Prague: parallel srmcp to DPM storage element, 700 Mbps – 20 threads
    • 13. STAR Asian institutions
      • China
        • IHEP, Beijing (2)
        • Institute of Modern Physics, Lanzhou (6)
        • USTC, Beijing (14)
        • Shanghai Institute of Applied Physics (11)
        • Tsinghua University (9)
        • Institute of Particle Physics, Wuhan (12)
      • India
        • Institute of Physics, Bhubaneswar (4)
        • Indian Institute of Technology, Mumbai (5)
        • University of Jammu (15)
        • Panjab University (5)
        • University of Rajasthan (3)
        • Variable Energy Cyclotron Centre, Kolkata (14)
      • Korea
        • Pusan National University (4)
        • KISTI (in progress as CS collaborator)
    • 14. Asian Activities
      • Many collaborators in Asia
      • Planning for Tier2-like facility at PNU
      • Discussions with KISTI of possible Tier1-like facility for Asia region
      • Anxious to see how we can better interface/integrate with our Asian collaborators on computational aspects
    • 15. Gloriad
      • 10 Gb all the way through NY
      • Would allow for immediate full data transfer
      • Would allow later year ½ dataset transfer
        • Possibly more depending on Gloriad expansion
    • 16. SUMS
      • STAR Unified Meta Scheduler
      • A single user interface and framework for submitting to all STAR resources, local and grid flavors
      • Optimizes resource utilization
      25K jobs/day
    • 17. Why Xen? Virtualization?
        • We can all do it …
      • BEYOND THAT, the reality
        • Complex experimental application codes
          • Developed over more than 10 years, by more than 100 scientists, comprises ~2 M lines of C++ and Fortran code
        • Require complex, customized environments
          • Rely on the right combination of compiler versions and available libraries
          • Dynamically load external libraries depending on the task to be performed
        • Environment validation
          • To ensure reproducibility and result uniformity across environments
          • Regression tests cannot be done on all OS flavors due to simple manpower considerations)
    • 18. Why Xen? Virtualization?
      • Solution? Use Virtual Machines (Xen)
        • Bring your environment with you
        • Fast to deploy, enables short-term leasing
        • Excellent enforcement, performance isolation
        • Very good security isolation
        • Minimize experiment team’s efforts
      • Activity ↔ Development effort leveraged though CEDPS SciDAC partner project
    • 19. Deploying OSG Cluster as Workspaces Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node VWS Service Cluster manager can deploy gatekeeper and workernodes in ~ 30 min. Application workload submitted to cluster as to any other OSG CE. OSG CE image as gatekeeper Worker node images with application environment. Cluster can be retired after workload finishes, freeing resources for other applications.
    • 20. Virtual Machine activities
      • “ Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers .”
      • Work so far:
        • Xen image with OSG 0.6.0 CE on SL 4.4
        • Xen image with OSG 0.6.0 WN on SL 4.4
        • Use Globus Workspaces to deploy gatekeeper and workernodes on EC2
        • Can launch 100 node cluster in ~ 30 min.
        • Have run Hijing event generator simulations on EC2.
        • Have prepared Xen image with full STAR software environment on SL4.4, currently being validated
      • Next steps:
        • Run event reconstruction of simulations on EC2 and Teraport cloud
    • 21. Nersc PDSF ENC2 WSU Accelerated display of a workflow job state Y = job number, X = job state
    • 22. VM image build/maintenance
      • We are working with rPath, Inc. in an SBIR project to use rBuilder to efficiently build and maintain OS and application images.
      • From the inventors of RPM, rBuilder
        • “ rBuilder is the first and only development tool that simplifies and automates the creation of software appliances and virtual appliances. rBuilder combines powerful features with innovative packaging techniques to yield a repeatable appliance creation process. “
    • 23. Near term plans
      • We MUST prepare for real data production on OSG
        • And take ANY shortcut necessary to accomplish it BY 2009
          • onset of DAQ1000, one order of magnitude higher data acquisition rate than today will require additional resources for real-data processing
          • Virtualization appears to us as one development helping to easily deploy & run a 2 Million line framework (software) for data mining
        • UCM job tracking (SBIR with Tech-X) is maturing
          • Essential to engage discussion on integration – we MUST monitor our application
      • We have to consolidate our sites
        • More resources are available in STAR but not-fully used (BHAM, UIC for example)
          • We will ramp up in infrastructure support to achieve this
          • We hope leveraging OSG efforts in the US (UIC for example)
        • We have efforts in integrating Mac OS-X resources from MIT
          • Initial work was uniquely started in STAR
          • Is there a path forward? Depends on priorities …
    • 24. Longer term needs
      • Requirements driven by demanding data processing
        • We will need to efficiently share resources
          • Concerned about what happens when LHC has ramped up data taking.
          • Will there be any cycles left to be had?
      • Additional
        • STAR is expanding its pool of sites
          • Interest in sites possibly shared by EGEE - OSG interoperability (especially China)
          • Hoping for help from OSG to understand policy as well as technology issues.
        • We believe virtualization is “a” path forward to
          • Simple deployment of experimental software
          • Allowing experimental software developer’s team to concentrate on science and a minimal OS version support
          • Globus workload management needed
    • 25. Conclusion
      • STAR Grid usage is expanding geographically and functionally.
      • Upgrades at STAR and RHIC are driving a significant increase in computational needs beginning next year which means we MUST push more workload onto the grid.
      • The emergence (and convergence?) of VM, cloud computing and grid make very powerful paradigm for scientific computing.
      • We want (and need) to have greater involvement with our Asia-Pacific colleagues which is enabled with new trans-Pacific networks.