Cyberinfrastructure and its Role in Science

1,088 views

Published on

This presentation examines some of the challenges scientists face and describes various cyberinfrastructure technologies that help address these challenges. Example projects employing cyberinfrastructure technologies that we have worked on at the Grid Research Centre, including the GeoChronos project, are also presented. This presentation was given at the IAI International Wireless Sensor Networks Summer School held at the University of Alberta on July 6th, 2009.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,154
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cyberinfrastructure and its Role in Science

  1. 1. Cyberinfrastructure and its Role in Science Cameron Kiddle Research Fellow, Grid Research Centre Adjunct Assistant Professor, Department of Computer Science, University of Calgary Distributed Systems Architect, WestGrid
  2. 2. Outline <ul><li>Challenges </li></ul><ul><li>Cyberinfrastructure </li></ul><ul><li>Cyberinfrastructure Technologies </li></ul><ul><li>Examples </li></ul><ul><ul><li>ICE Force Project </li></ul></ul><ul><ul><li>Molecular Dynamics Simulations </li></ul></ul><ul><ul><li>GT4-based Grid for Canada </li></ul></ul><ul><ul><li>Fire Dynamics Simulator </li></ul></ul><ul><ul><li>Rendering on the Cloud </li></ul></ul><ul><ul><li>GeoChronos </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  3. 3. Collaboration Challenges <ul><li>Familiarity/awareness of collaboration tools </li></ul><ul><li>Keeping all interested parties in the loop </li></ul><ul><li>Finding related work and researchers </li></ul><ul><li>Keeping up to date with current research </li></ul><ul><li>Collaboration while working in the field </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  4. 4. Data Challenges <ul><li>Acquisition of data </li></ul><ul><ul><li>Many different data sources </li></ul></ul><ul><ul><li>Large quantities of data </li></ul></ul><ul><ul><li>Different regulations/mechanisms for accessing data </li></ul></ul><ul><ul><li>Lack of automation </li></ul></ul><ul><ul><li>Finding the right data </li></ul></ul><ul><ul><li>Bandwidth constraints </li></ul></ul><ul><li>Managing data </li></ul><ul><ul><li>Scattered and unorganized data </li></ul></ul><ul><ul><li>Inadequate tools for recording/maintaining metadata </li></ul></ul><ul><ul><ul><li>Data without metadata is meaningless </li></ul></ul></ul><ul><ul><ul><li>Lack of suitable metadata standards </li></ul></ul></ul><ul><ul><ul><li>Validation of metadata </li></ul></ul></ul><ul><ul><li>Tracking provenance of data </li></ul></ul><ul><li>Pre-processing of data </li></ul><ul><ul><li>Raw data typically cannot be directly analyzed </li></ul></ul><ul><ul><li>Significant amount of time spent preparing data for analysis </li></ul></ul><ul><ul><li>Lack of automation </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  5. 5. Application Challenges <ul><li>Limited availability of computing resources </li></ul><ul><li>Access to and familiarity of heterogeneous computing resources </li></ul><ul><li>Fault tolerance and reliability </li></ul><ul><li>Access to software available in research lab while in field or other locations </li></ul><ul><li>Installing, configuring and updating software </li></ul><ul><li>System dependencies of software </li></ul><ul><li>Awareness and suitability of available software </li></ul><ul><li>Sharing applications and results </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  6. 6. Cyberinfrastructure <ul><li>“ Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, &quot;cyberinfrastructure&quot; refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor.” </li></ul><ul><li>Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003. </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  7. 7. Cyberinfrastructure Technologies <ul><li>Grid Computing </li></ul><ul><li>Cloud Computing </li></ul><ul><li>Virtualization </li></ul><ul><li>Web 2.0 / Social Networking </li></ul><ul><li>Web Portals / Scientific Gateways </li></ul><ul><li>Semantic Web </li></ul><ul><li>… </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  8. 8. Grid Computing <ul><li>Many different definitions/uses </li></ul><ul><ul><li>computational grids, data grids, desktop grids, campus grids, sensor grids, access grids </li></ul></ul><ul><li>Coordinated sharing of heterogeneous resources across administrative domains </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - Resources Shared by Virtual Organization X Resources Shared by Virtual Organization Y Domain A Domain B Domain C
  9. 9. Grid Middleware <ul><li>The layer between users/applications and grid resources that glues everything together </li></ul><ul><li>Example grid middleware </li></ul><ul><ul><li>Globus Toolkit </li></ul></ul><ul><ul><ul><li>GT2 – pre-standards </li></ul></ul></ul><ul><ul><ul><li>GT4 – Web Services based </li></ul></ul></ul><ul><ul><li>UNICORE </li></ul></ul><ul><ul><li>gLite </li></ul></ul><ul><ul><li>ARC </li></ul></ul><ul><ul><li>NAREGI </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  10. 10. Key Grid Middleware Services <ul><li>Security Services </li></ul><ul><ul><li>Concerned with authentication, authorization, secure communication, … </li></ul></ul><ul><li>Information Services </li></ul><ul><ul><li>Provide information about resources, policy, services and applications to tools and users </li></ul></ul><ul><li>Data Management Services </li></ul><ul><ul><li>Manage movement and replication of data as well as metadata about data </li></ul></ul><ul><li>Execution Management Services </li></ul><ul><ul><li>Handle placement, provisioning and lifetime management of jobs and workflows </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  11. 11. Benefits of Grid Computing <ul><li>Easier access to more resources </li></ul><ul><ul><li>Users/organizations can share resources </li></ul></ul><ul><ul><li>Single sign-on </li></ul></ul><ul><ul><li>Common interface (hide heterogeneity) </li></ul></ul><ul><li>Improved data management </li></ul><ul><ul><li>Efficient file transfers </li></ul></ul><ul><ul><li>Abstraction of physical location of data </li></ul></ul><ul><li>Automated execution of jobs and workflows </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  12. 12. Example Grid Projects IAI Summer School July 6, 2009 Cyberinfrastructure - Name Description LHC Computing Grid http://lcg.web.cern.ch/ data storage and analysis infrastructure for the high energy physics community using the Large Hadron Collider (LHC) at CERN (ATLAS Tier-1 site at TRIUMF in British Columbia) Network for Earthquake Engineering Simulation (NEES) http://www.nees.org/ a US national network of 15 facilities to study the impact of earthquakes on buildings, bridges, etc. Expanding GEOsciences on DEmand (EGEODE) http://www.egeode.org/ a virtual organization (VO) associated with EGEE that is dedicated to research in geoscience for both public and private industrial R&D and academic laboratories International Virtual Observatory Alliance (IVOA) http://www.ivoa.net/ development of standards and infrastructure to share and analyze astronomical archives from around the world
  13. 13. Cloud Computing <ul><ul><li>Transparent access to scalable and dynamic services over the Internet </li></ul></ul><ul><li>Key features: </li></ul><ul><ul><li>Everything as a Service (EaaS) </li></ul></ul><ul><ul><li>Utility/On-demand </li></ul></ul><ul><ul><li>Accessibility/Transparency </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Virtualization </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  14. 14. Cloud Computing Solutions IAI Summer School July 6, 2009 Cyberinfrastructure -
  15. 15. Benefits of Cloud Computing <ul><li>Reduce capital, support and maintenance costs </li></ul><ul><ul><li>Pay only for what you use </li></ul></ul><ul><ul><li>Get access to more/fewer resources when needed </li></ul></ul><ul><li>Ready to use for users </li></ul><ul><ul><li>No more downloads, installations or updates </li></ul></ul><ul><li>Simplify and speed up software development </li></ul><ul><ul><li>Don’t have to support multiple platforms </li></ul></ul><ul><li>Application popularity and lifespan difficult to predict </li></ul><ul><ul><li>Scale applications according to user demand </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  16. 16. Cloud Computing Case Study: Application Popularity on Facebook <ul><li>Difficult to predict popularity and lifespan of applications </li></ul><ul><li>Facebook Application Growth </li></ul><ul><ul><li>Sep. 2007: ~ 3700 </li></ul></ul><ul><ul><li>Sep. 2008: ~39000 </li></ul></ul><ul><li>Facebook Application Popularity (Sep. 12, 2008) </li></ul><ul><ul><li>39181 applications </li></ul></ul><ul><ul><li>Active user data for 37155 apps </li></ul></ul><ul><ul><li>3 apps > 10 million active users </li></ul></ul><ul><ul><li>80% apps < 1000 active users </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - Monthly Active Users vs. Rank of Facebook Applications (September 12, 2008)
  17. 17. Cloud Computing Case Study: Shrek (Dreamworks) <ul><li>Shrek (2001) – 5 million CPU render hours </li></ul><ul><li>Shrek 2 (2004) – 10 million CPU render hours </li></ul><ul><li>Shrek 3 (2007) – 20 million CPU render hours </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - (Source: R. Rowe. DreamWorks Animation &quot;Shrek the Third&quot;: Linux Feeds an Ogre. Linux Journal . June 5, 2007. (http://www.linuxjournal.com/article/9653)) Time to Render 1 CPU 100 CPUs 10000 CPUs Shrek 571 years 5.7 years 21 days Shrek 2 1142 years 11.4 years 42 days Shrek 3 2283 years 22.8 years 83 days
  18. 18. Cloud Computing Case Study: Animoto <ul><li>Animoto ( http://animoto.com ) </li></ul><ul><ul><li>Produces professional quality videos from images </li></ul></ul><ul><ul><li>Runs on Amazon EC2 </li></ul></ul><ul><li>Popularity soared when promoted on Facebook </li></ul><ul><li>During the course of 4 days: </li></ul><ul><ul><li>Jumped from 8 to 450 renderings per minute </li></ul></ul><ul><ul><li>~20000 new users per hour </li></ul></ul><ul><ul><li>3500 instances running on Amazon EC2 at peak </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - (Source: D. Barker. You Need 3,500 Servers by When?! On-demand Enterprise . 2008.07.07)
  19. 19. Virtualization <ul><li>Can transform a single physical machine into multiple virtual machines (VMs) each with their own OS and software stack </li></ul><ul><li>Virtualization software </li></ul><ul><ul><li>Xen, KVM, VMWare </li></ul></ul><ul><ul><li>Support allocation, deallocation, checkpointing and migration of VMs </li></ul></ul><ul><li>Benefits </li></ul><ul><ul><li>Custom environments (root access) </li></ul></ul><ul><ul><li>More efficient use of resources (consolidation) </li></ul></ul><ul><ul><li>System maintenance without disruption </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  20. 20. Web 2.0 – The “Social Web” <ul><li>Aimed at: </li></ul><ul><ul><li>Providing feature rich user environments </li></ul></ul><ul><ul><li>Making it easier for users to generate Web content </li></ul></ul><ul><ul><li>Improving online social connectivity </li></ul></ul><ul><li>Example Web 2.0 technologies </li></ul><ul><ul><li>Blogs (WordPress, TypePad) </li></ul></ul><ul><ul><li>Wikis (Wikipedia) </li></ul></ul><ul><ul><li>Mashups (HousingMaps, ChicagoCrime) </li></ul></ul><ul><ul><li>Widgets/Gadgets (iGoogle, Netvibes) </li></ul></ul><ul><ul><li>Social networks (Facebook, MySpace, YouTube) </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  21. 21. Social Networking Sites/Platforms IAI Summer School July 6, 2009 Cyberinfrastructure -
  22. 22. Web Portals / Scientific Gateways <ul><li>Aimed at providing a community of users access to computing resources through a common Web-based interface </li></ul><ul><li>Web portal development tools </li></ul><ul><ul><li>GridSphere (portlet based) </li></ul></ul><ul><ul><li>Web 2.0/Social Networking </li></ul></ul><ul><li>Examples </li></ul><ul><ul><li>TeraGrid Scientific Gateways (over 30 of them) </li></ul></ul><ul><ul><li>nanoHUB </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  23. 23. Semantic Web <ul><li>Aimed at representing knowledge, not just information </li></ul><ul><li>Connecting and relating data in a way understandable by machines </li></ul><ul><li>Semantic Web standards </li></ul><ul><ul><li>Resource Description Framework (RDF) </li></ul></ul><ul><ul><li>Web Ontology Language (OWL) </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  24. 24. Confederation Bridge ICE Force Monitoring Project <ul><li>Monitoring of forces on the Confederation Bridge </li></ul><ul><li>Data analyzed by civil engineering groups at University of Calgary and Carleton University </li></ul><ul><li>GRC developed solution to automate data management as part of a CANARIE AAP project </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - ( http://www.confederationbridge.com ) ( http://www.confederationbridge.com )
  25. 25. ICE Force - Technologies Used <ul><li>Grid Middleware </li></ul><ul><ul><li>GT4 </li></ul></ul><ul><li>Data Management </li></ul><ul><ul><li>Proactive Data Management Service (PDMS) </li></ul></ul><ul><ul><ul><li>Data Transfer - GridFTP, RFT </li></ul></ul></ul><ul><ul><ul><li>Replication Management – RLS </li></ul></ul></ul><ul><ul><ul><li>Metadata Management - MCS </li></ul></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  26. 26. Molecular Dynamics Simulations (GROMACS) <ul><li>GROMACS </li></ul><ul><ul><li>Parallel molecular dynamics simulation application </li></ul></ul><ul><ul><li>Can simulate hundreds to millions of particles </li></ul></ul><ul><ul><li>Simulation runs can take days, weeks or months </li></ul></ul><ul><li>Issues with long running jobs </li></ul><ul><ul><li>Fault tolerance </li></ul></ul><ul><ul><li>Scheduler policy constraints </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - ( http://moose.bio.ucalgary.ca/ )
  27. 27. GROMACS - Grid Enabled Solution <ul><li>Automated grid enabled solution developed by GRC to manage GROMACS simulations as part of a CANARIE AAP project </li></ul><ul><li>Long jobs split into a series of shorter jobs </li></ul><ul><li>Automates checkpointing, migration and reconfiguration of jobs </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  28. 28. GROMACS - Portal IAI Summer School July 6, 2009 Cyberinfrastructure -
  29. 29. GROMACS - Technologies Used <ul><li>Grid Middleware </li></ul><ul><ul><li>GT4 </li></ul></ul><ul><li>Information Services </li></ul><ul><ul><li>WS MDS </li></ul></ul><ul><li>Data Management </li></ul><ul><ul><li>PDMS (GridFTP, RFT, RLS, MCS) </li></ul></ul><ul><li>Execution Management </li></ul><ul><ul><li>Custom system (Condor-G, WS GRAM) </li></ul></ul><ul><li>Portal </li></ul><ul><ul><li>GridSphere </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  30. 30. Web Service based Grid Environment for Canada IAI Summer School July 6, 2009 Cyberinfrastructure - <ul><li>Established a GT4-based grid environment from resources across Canada (CANARIE CIIP) </li></ul>
  31. 31. GT4-based Grid - Model Schemas <ul><li>Models developed to describe systems, applications and scheduler policy (GRC Model Schema) </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - System Model Class Diagram
  32. 32. GT4-based Grid – Viewing Resource Information <ul><li>Used WebMDS, a customizable Web based interface for viewing resource information published by WS MDS </li></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  33. 33. GT4-based Grid - Technologies Used <ul><li>Grid Middleware </li></ul><ul><ul><li>GT4 </li></ul></ul><ul><li>Data Management </li></ul><ul><ul><li>GridFTP, RFT </li></ul></ul><ul><li>Information Services </li></ul><ul><ul><li>GRC Model Schema, WS MDS, WebMDS </li></ul></ul><ul><li>Execution Management </li></ul><ul><ul><li>Condor-G, WS GRAM </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  34. 34. Example: Fire Simulation <ul><li>Developed a comprehensive environment for the Fire Dynamics Simulator (FDS) as part of a collaborative project between GRC and HP Labs </li></ul><ul><li>Deployed on HP Labs Data Centre at University of Calgary </li></ul><ul><li>Initial focus of project </li></ul><ul><ul><li>Leverage Web 2.0 technologies </li></ul></ul><ul><ul><li>Explore use of virtualization in a utility/cloud computing environment </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  35. 35. Fire Simulation - Technologies Used <ul><li>User level </li></ul><ul><ul><li>Web 2.0/social networking technology (Facebook) </li></ul></ul><ul><li>Service provider level </li></ul><ul><ul><li>LAMP environment (Linux, Apache, MySQL, Perl/Python/PHP) </li></ul></ul><ul><ul><li>Simulation (FDS, Condor) </li></ul></ul><ul><ul><li>Visualization (Smokeview, VNC) </li></ul></ul><ul><li>Resource (utility) provider level </li></ul><ul><ul><li>Cloud computing technology (ASPEN) </li></ul></ul><ul><ul><li>Virtual machine technology (Xen) </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  36. 36. Example: Rendering on the Cloud <ul><li>GRC created an on-demand cloud rendering service for EDM Studio </li></ul><ul><li>Cybera Pilot Project </li></ul><ul><li>Technologies used: </li></ul><ul><ul><li>Cloud computing technology (ASPEN) </li></ul></ul><ul><ul><li>Virtual machine technology (Xen) </li></ul></ul><ul><ul><li>Social networking technology (Ning/Elgg) </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  37. 37. <ul><li>An on-line platform </li></ul><ul><ul><li>For: </li></ul></ul><ul><ul><ul><li>Earth Observation Scientists </li></ul></ul></ul><ul><ul><li>Facilitating: </li></ul></ul><ul><ul><ul><li>Collaboration between scientists </li></ul></ul></ul><ul><ul><ul><li>Data access, management and sharing </li></ul></ul></ul><ul><ul><ul><li>Application access, management and sharing </li></ul></ul></ul><ul><ul><li>Leveraging: </li></ul></ul><ul><ul><ul><li>Web 2.0 / social networking technologies (Elgg) </li></ul></ul></ul><ul><ul><ul><li>Semantic Web technologies (RDF, OWL) </li></ul></ul></ul><ul><ul><ul><li>Cloud computing and virtualization technologies (ASPEN, Xen) </li></ul></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  38. 38. GeoChronos - Collaboration <ul><li>Social networking portal </li></ul><ul><ul><li>Elgg-based (elgg.org) </li></ul></ul><ul><li>Social networking services </li></ul><ul><ul><li>Blogs </li></ul></ul><ul><ul><li>Tags </li></ul></ul><ul><ul><li>Media/document sharing </li></ul></ul><ul><ul><li>Wikis </li></ul></ul><ul><ul><li>Friends/contacts </li></ul></ul><ul><ul><li>Groups </li></ul></ul><ul><ul><li>Discussions </li></ul></ul><ul><ul><li>Message boards </li></ul></ul><ul><ul><li>Calendars </li></ul></ul><ul><ul><li>Status </li></ul></ul><ul><ul><li>News Feeds </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure - http://geochronos.org/
  39. 39. GeoChronos - Data <ul><li>Data Acquisition </li></ul><ul><ul><li>Automated acquisition of data from sensors (ground, airborne, satellite) or third party </li></ul></ul><ul><li>Data Storage </li></ul><ul><ul><li>Store, share, browse and search data </li></ul></ul><ul><ul><ul><li>i.e., spectral library </li></ul></ul></ul><ul><li>Data Processing </li></ul><ul><ul><li>Automated data workflows </li></ul></ul><ul><ul><ul><li>i.e., mosaic, reproject and subset MODIS data </li></ul></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  40. 40. GeoChronos - Applications <ul><li>Interactive Application Service (IAS) </li></ul><ul><ul><li>On-line, on-demand access to scientific applications </li></ul></ul><ul><ul><li>Share application sessions and data with other users </li></ul></ul><ul><ul><li>Access control to applications </li></ul></ul><ul><li>Batch Processing Service </li></ul><ul><ul><li>Batch processing environment for longer running data processing tasks or simulations </li></ul></ul><ul><ul><li>For use directly by individual users or as part of automated data workflows </li></ul></ul>IAI Summer School July 6, 2009 Cyberinfrastructure -
  41. 41. GeoChronos - Project Team IAI Summer School July 6, 2009 Cyberinfrastructure - Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Victoria Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project Coordination Platform Development Domain Scientists
  42. 42. GeoChronos - Virtual Organization IAI Summer School July 6, 2009 Cyberinfrastructure -
  43. 43. Contact Information IAI Summer School July 6, 2009 Cyberinfrastructure - Cameron Kiddle [email_address] http://pages.cspc.ucalgary.ca/~kiddlec/ http://grid.ucalgary.ca/

×