Cyberenvironments @ NCSA Supporting Community-scale Science Jim Myers Associate Director Collaborative Technologies NCSA
Beyond Cyberinfrastructure CyberInfrastructure  commonly refers to infrastructure (networks, compute, and data resources) plus the middleware (grid) that links those resources together and presents them in a uniform standard way. CyberEnvironments  is a term NCSA has coined to describe the complete End-to-End solution.  This integrates Shared and Custom Cyberinfrastructure into a process-oriented framework for the community and researchers that allow them to focus on their research, not on accessing and managing the CI. A  CyberCommunity  is a distributed group of people (virtual organization) with common goals and shared knowledge. Size ranges from a few individuals to an interdisciplinary or international groups.  These groups can include, researchers, policy makers, responders, educators, and citizens and often have a long term identity and purpose.
Cyberenvironments:  Enable researchers to tackle more, and more complex challenges leading to  Enhanced production of knowledge  and Enhanced application of that knowledge  to understanding our world, developing solutions, and making informed decisions
The Systems Science Revolution Research spans multiple disciplines/sub-disciplines Coordination through Community Resources Bi-directional flow/feedback of information Partial results being combined to produce new knowledge Experiment/Theory/Model comparisons Multiscale optimizations Rapid Evolution High Complexity Resources will be distributed With multiple curators
End to end Scientific Progress is limited by the manual processes: Data discovery Translation Experiment setup Group coordination Tool integration Training Feature Extraction Data interpretation Acceptance of new models/tools Dissemination of best practices Interdisciplinary communication Data production Processing power Data transfer/storage !
Round-Trip Information Logistics Desktop applications accessing remote resources Individuals publishing to communities and accessing reference information, best practices, etc. Unique capabilities linked into end-to-end community processes Inter-community connectivity Evolving at the speed of science Individual Unique capabilities High Performance Resources Desktop Community End-to-end processes
Key Issues How do we build a system before the parts are done? How do we evolve the system to keep it current? How do we convey knowledge as well as tools to end users? How do we coordinate without centralizing? Technology Responses:  Workflow  Ability to integrate independent web services Ability to hide workflow behind applications Rich metadata Tracking provenance Context-based data discovery Distributed data stores Data translation/data virtualization Cyberenvironments Engineering view of cutting-edge science Collaboration capabilities ‘ Publication’ – exposing work to groups & the public Streams/Events/Feature Management Core Domain Services, e.g. GIS
NCSA Processes Analysis of science and engineering processes across many disciplines Identification of challenges and appropriate design responses Research/Technology Roadmaps Integrated project teams (IPTs) taking leadership roles within specific communities with strong partners to develop Cyberenvironments/CI Producing pilot/production capabilities Advancing technologies along roadmaps Backed by: 20 years of experience in user/community engagement Leadership roles in cutting edge Cyberenvironment projects in many disciplines Strong R&D efforts in Environments/Grid/Viz/Knowledge Discovery,.. Central role in national/global cyberinfrastructure definition/development
Want a systems-science approach to address complex problems New knowledge is assimilated from different data, tools, and disciplines at each scale Real-time bi-directional information flow Multiple  applications for the same information But Normal publication is slow and lossy Data has different formats, hidden dependencies Standardization is hard to do up-front Multi-scale information is complex and its pedigree and context matters  Need lighter weight, flexible, adaptive mechanisms for sharing data  groups    communities Combustion: a Multi-scale Chemical Science Challenge
CMCS Portal CHEF (Sakai precursor) SAM  Basic data/metadata management Metadata extraction Data Translations Additional portlets Metadata view/search Provenance graph E-notebook Chemistry apps Email notifications
CMCS Pilot Science Groups DNS– Jackie Chen, David Leahy Feature detection & tracking in DNS data HCCI University Consortium – Bill Pitz Homogeneous Charge Compression Ignition PrIMe – led by Michael Frenklach Development and publishing chemical reaction models  Real Fuels Project– Wing Tsang, Tom Allison Lead real fuels chemistry at NIST IUPAC – led by Branko Ruscic Develop and publish validated thermochemical data Quantum Chemistry – Theresa Windus QM Reference data
Community Curation of Data:  Quantum Chemistry Basis Sets
MAEViz Cyberenvironment Consequence-Based Risk Management Mid-America Earthquake Center Engineering View of MAE Center Research Portal-based Collaboration Environment Distributed data/metadata Sources Builds on NEESgrid technologies Hazard  Definition Inventory  Selection Fragility Models Damage  Prediction Decision  Support
NEESgrid UIUC NEESgrid UIUC http://neespop.ce.uiuc.edu:9271/chef/portal/group/NEESgridUIUC/page/default.psml/js_pane/P-f16a0kkk Narutoshi Nakata Project Name:  UIUC_ShakeTableExperiment NEESgrid UIUC UIUC UIUC
Environmental Observatories NCSA including CAC is involved in the development of CI for a number of environmental communities CUAHSI  (Consortium of Universities for the Advancement of Hydrologic Sciences Inc.) for hydrology NEON  (National Ecological Observatory Network) for ecology LOOKING  (Laboratory for the Ocean Observatory Knowledge Integration Grid) CLEANER  (Collaborative Large Scale Engineering Analysis Network for Environmental Research) for environmental engineering LTER  (U.S. Long-Term Ecological Research Network) investigating ecological processes over long temporal and broad spatial scale
Long Term Ecological Research  (LTER) Established 1980 (25 years) 26 Research Sites & 1 Support Site (LNO) North America Artic/Antarctica Puerto Rico/Tahiti Five Core Areas of Study Primary Plant Production Organism Population Studies Movement of Organic Matter Movement of Inorganic Matter Disturbance Patterns Questions are being asked at the Regional, National, and Global scale
LTER Pilot Study Portal User Interface Single Signon Data Discovery Secure Data Staging Data Audit Trail Data Analysis via HPC system
Large Synoptic Survey Telescope (LSST) A new telescope located in Chile 8.4m dia. Mirror, 10 sq. degrees FOV 3 GPixel Camera Image available sky every 3 days First light: January 2012 Science Mission: observe the time-varying sky Dark Energy and the accelerating universe Comprehensive census Solar System objects Study optical transients Create a galactic map The LSST collaboration Currently about a dozen institutions, including 3 DOE labs Schedule:  D&D phase:  2004-2007  (funded by NSF grant, private money, in-kind contributions) Construction:  2007-2012  (funded by NSF & DOE) Operation: 2012- NCSA Team headed by Ray Plante: 4 FTEs from NCSA, 2 FTEs from UIUC, 3 FTEs from NSF Data Generation Rate:  30  TB/night, 6 PB/year Total Disk Storage:  18 PB Nominal Computing  required:  20+ Tflops Site-to-archive network  bandwidth:  2.5 Gbits/s Processing latency  for real  time alerts:  ~ 60 secs
LEAD Mesoscale weather is  VERY DYNAMIC  but our tools, cyber environments, research methodologies and learning modalities are  VERY STATIC Getting even static capability is an enormous challenge due to the  complexity  of the tools and the  primitive  information technology infrastructures used to link them
NCSA Processes Analysis of science and engineering processes across many disciplines Identification of challenges and appropriate design responses Research/Technology Roadmaps Integrated project teams (IPTs) taking leadership roles within specific communities with strong partners to develop Cyberenvironments/CI Producing pilot/production capabilities Advancing technologies along roadmaps Backed by: 20 years of experience in user/community engagement Leadership roles in cutting edge Cyberenvironment projects in many disciplines Strong R&D efforts in Environments/Grid/Viz/Knowledge Discovery,.. Central role in national/global cyberinfrastructure definition/development
Cyberenvironments Architecture Perspective Community CyberEnvironments  Security Applications Services  (HPC, Instrument, Analysis,…) Core Services Orchestration Scientific Content/Process Mgmt Services Collaborative Services E-Science Services Data Mgmt Analytics Visualization Stream Mgmt Community Knowledge Services instruments Sensor nets
Key concepts Lightweight environment frameworks Portlet/plug-in models Contextualized collaboration capabilities Distributed Scientific Content & Process Mgmt / Semantics Tracking provenance Metadata Context-based data discovery, translation, virtualization Base for knowledge services Workflow/Services  Ability to integrate independent web services, manage complexities of CI Application/ process-oriented interface (Schema/ontology-driven) Visual Analytics Identification of features/patterns from one domain in terms of another… Streaming/steering/event-driven science Marshaling additional sensors for interesting phenomena On-demand simulation Living Cyberenvironments End-to-end, e.g. Engineering view of cutting-edge science Community managed/evolved Science lifecycle support – research, publication, curation, …
Cyberenvironments Mosaic and Cyberenvironments Mosaic By early 1990s, the internet had a wealth of resources, but they were inaccessible to most scientists Hyperlinking and document formatting did nothing new except lower the barriers to information access Cyberenvironments By the early 2000’s, the internet and grid had a wealth of interactive resources, but they were inaccessible to most scientists Cyberenvironments will lower barriers to orchestrating these resources
SNAC: My Position Statement Cyberenvironments have unsolved issues How do we discover data, services, best practices without hierarchical management? Organization    virtual organizations Disciplines    system science How do we structure large systems projects so they succeed? Can we identify communities who are ‘cyber-ready’? Can we suggest technologies based on community structure?
SNAC: My Position Statement (2) Cyberenvironments will be a rich resource for network research Computer mediated communication Workflow E-notebooks/annotation services Computer mediated model translation

Cyberistructure

  • 1.
    Cyberenvironments @ NCSASupporting Community-scale Science Jim Myers Associate Director Collaborative Technologies NCSA
  • 2.
    Beyond Cyberinfrastructure CyberInfrastructure commonly refers to infrastructure (networks, compute, and data resources) plus the middleware (grid) that links those resources together and presents them in a uniform standard way. CyberEnvironments is a term NCSA has coined to describe the complete End-to-End solution. This integrates Shared and Custom Cyberinfrastructure into a process-oriented framework for the community and researchers that allow them to focus on their research, not on accessing and managing the CI. A CyberCommunity is a distributed group of people (virtual organization) with common goals and shared knowledge. Size ranges from a few individuals to an interdisciplinary or international groups. These groups can include, researchers, policy makers, responders, educators, and citizens and often have a long term identity and purpose.
  • 3.
    Cyberenvironments: Enableresearchers to tackle more, and more complex challenges leading to Enhanced production of knowledge and Enhanced application of that knowledge to understanding our world, developing solutions, and making informed decisions
  • 4.
    The Systems ScienceRevolution Research spans multiple disciplines/sub-disciplines Coordination through Community Resources Bi-directional flow/feedback of information Partial results being combined to produce new knowledge Experiment/Theory/Model comparisons Multiscale optimizations Rapid Evolution High Complexity Resources will be distributed With multiple curators
  • 5.
    End to endScientific Progress is limited by the manual processes: Data discovery Translation Experiment setup Group coordination Tool integration Training Feature Extraction Data interpretation Acceptance of new models/tools Dissemination of best practices Interdisciplinary communication Data production Processing power Data transfer/storage !
  • 6.
    Round-Trip Information LogisticsDesktop applications accessing remote resources Individuals publishing to communities and accessing reference information, best practices, etc. Unique capabilities linked into end-to-end community processes Inter-community connectivity Evolving at the speed of science Individual Unique capabilities High Performance Resources Desktop Community End-to-end processes
  • 7.
    Key Issues Howdo we build a system before the parts are done? How do we evolve the system to keep it current? How do we convey knowledge as well as tools to end users? How do we coordinate without centralizing? Technology Responses: Workflow Ability to integrate independent web services Ability to hide workflow behind applications Rich metadata Tracking provenance Context-based data discovery Distributed data stores Data translation/data virtualization Cyberenvironments Engineering view of cutting-edge science Collaboration capabilities ‘ Publication’ – exposing work to groups & the public Streams/Events/Feature Management Core Domain Services, e.g. GIS
  • 8.
    NCSA Processes Analysisof science and engineering processes across many disciplines Identification of challenges and appropriate design responses Research/Technology Roadmaps Integrated project teams (IPTs) taking leadership roles within specific communities with strong partners to develop Cyberenvironments/CI Producing pilot/production capabilities Advancing technologies along roadmaps Backed by: 20 years of experience in user/community engagement Leadership roles in cutting edge Cyberenvironment projects in many disciplines Strong R&D efforts in Environments/Grid/Viz/Knowledge Discovery,.. Central role in national/global cyberinfrastructure definition/development
  • 9.
    Want a systems-scienceapproach to address complex problems New knowledge is assimilated from different data, tools, and disciplines at each scale Real-time bi-directional information flow Multiple applications for the same information But Normal publication is slow and lossy Data has different formats, hidden dependencies Standardization is hard to do up-front Multi-scale information is complex and its pedigree and context matters  Need lighter weight, flexible, adaptive mechanisms for sharing data groups  communities Combustion: a Multi-scale Chemical Science Challenge
  • 10.
    CMCS Portal CHEF(Sakai precursor) SAM Basic data/metadata management Metadata extraction Data Translations Additional portlets Metadata view/search Provenance graph E-notebook Chemistry apps Email notifications
  • 11.
    CMCS Pilot ScienceGroups DNS– Jackie Chen, David Leahy Feature detection & tracking in DNS data HCCI University Consortium – Bill Pitz Homogeneous Charge Compression Ignition PrIMe – led by Michael Frenklach Development and publishing chemical reaction models Real Fuels Project– Wing Tsang, Tom Allison Lead real fuels chemistry at NIST IUPAC – led by Branko Ruscic Develop and publish validated thermochemical data Quantum Chemistry – Theresa Windus QM Reference data
  • 12.
    Community Curation ofData: Quantum Chemistry Basis Sets
  • 13.
    MAEViz Cyberenvironment Consequence-BasedRisk Management Mid-America Earthquake Center Engineering View of MAE Center Research Portal-based Collaboration Environment Distributed data/metadata Sources Builds on NEESgrid technologies Hazard Definition Inventory Selection Fragility Models Damage Prediction Decision Support
  • 14.
    NEESgrid UIUC NEESgridUIUC http://neespop.ce.uiuc.edu:9271/chef/portal/group/NEESgridUIUC/page/default.psml/js_pane/P-f16a0kkk Narutoshi Nakata Project Name: UIUC_ShakeTableExperiment NEESgrid UIUC UIUC UIUC
  • 15.
    Environmental Observatories NCSAincluding CAC is involved in the development of CI for a number of environmental communities CUAHSI (Consortium of Universities for the Advancement of Hydrologic Sciences Inc.) for hydrology NEON (National Ecological Observatory Network) for ecology LOOKING (Laboratory for the Ocean Observatory Knowledge Integration Grid) CLEANER (Collaborative Large Scale Engineering Analysis Network for Environmental Research) for environmental engineering LTER (U.S. Long-Term Ecological Research Network) investigating ecological processes over long temporal and broad spatial scale
  • 16.
    Long Term EcologicalResearch (LTER) Established 1980 (25 years) 26 Research Sites & 1 Support Site (LNO) North America Artic/Antarctica Puerto Rico/Tahiti Five Core Areas of Study Primary Plant Production Organism Population Studies Movement of Organic Matter Movement of Inorganic Matter Disturbance Patterns Questions are being asked at the Regional, National, and Global scale
  • 17.
    LTER Pilot StudyPortal User Interface Single Signon Data Discovery Secure Data Staging Data Audit Trail Data Analysis via HPC system
  • 18.
    Large Synoptic SurveyTelescope (LSST) A new telescope located in Chile 8.4m dia. Mirror, 10 sq. degrees FOV 3 GPixel Camera Image available sky every 3 days First light: January 2012 Science Mission: observe the time-varying sky Dark Energy and the accelerating universe Comprehensive census Solar System objects Study optical transients Create a galactic map The LSST collaboration Currently about a dozen institutions, including 3 DOE labs Schedule: D&D phase: 2004-2007 (funded by NSF grant, private money, in-kind contributions) Construction: 2007-2012 (funded by NSF & DOE) Operation: 2012- NCSA Team headed by Ray Plante: 4 FTEs from NCSA, 2 FTEs from UIUC, 3 FTEs from NSF Data Generation Rate: 30 TB/night, 6 PB/year Total Disk Storage: 18 PB Nominal Computing required: 20+ Tflops Site-to-archive network bandwidth: 2.5 Gbits/s Processing latency for real time alerts: ~ 60 secs
  • 19.
    LEAD Mesoscale weatheris VERY DYNAMIC but our tools, cyber environments, research methodologies and learning modalities are VERY STATIC Getting even static capability is an enormous challenge due to the complexity of the tools and the primitive information technology infrastructures used to link them
  • 20.
    NCSA Processes Analysisof science and engineering processes across many disciplines Identification of challenges and appropriate design responses Research/Technology Roadmaps Integrated project teams (IPTs) taking leadership roles within specific communities with strong partners to develop Cyberenvironments/CI Producing pilot/production capabilities Advancing technologies along roadmaps Backed by: 20 years of experience in user/community engagement Leadership roles in cutting edge Cyberenvironment projects in many disciplines Strong R&D efforts in Environments/Grid/Viz/Knowledge Discovery,.. Central role in national/global cyberinfrastructure definition/development
  • 21.
    Cyberenvironments Architecture PerspectiveCommunity CyberEnvironments Security Applications Services (HPC, Instrument, Analysis,…) Core Services Orchestration Scientific Content/Process Mgmt Services Collaborative Services E-Science Services Data Mgmt Analytics Visualization Stream Mgmt Community Knowledge Services instruments Sensor nets
  • 22.
    Key concepts Lightweightenvironment frameworks Portlet/plug-in models Contextualized collaboration capabilities Distributed Scientific Content & Process Mgmt / Semantics Tracking provenance Metadata Context-based data discovery, translation, virtualization Base for knowledge services Workflow/Services Ability to integrate independent web services, manage complexities of CI Application/ process-oriented interface (Schema/ontology-driven) Visual Analytics Identification of features/patterns from one domain in terms of another… Streaming/steering/event-driven science Marshaling additional sensors for interesting phenomena On-demand simulation Living Cyberenvironments End-to-end, e.g. Engineering view of cutting-edge science Community managed/evolved Science lifecycle support – research, publication, curation, …
  • 23.
    Cyberenvironments Mosaic andCyberenvironments Mosaic By early 1990s, the internet had a wealth of resources, but they were inaccessible to most scientists Hyperlinking and document formatting did nothing new except lower the barriers to information access Cyberenvironments By the early 2000’s, the internet and grid had a wealth of interactive resources, but they were inaccessible to most scientists Cyberenvironments will lower barriers to orchestrating these resources
  • 24.
    SNAC: My PositionStatement Cyberenvironments have unsolved issues How do we discover data, services, best practices without hierarchical management? Organization  virtual organizations Disciplines  system science How do we structure large systems projects so they succeed? Can we identify communities who are ‘cyber-ready’? Can we suggest technologies based on community structure?
  • 25.
    SNAC: My PositionStatement (2) Cyberenvironments will be a rich resource for network research Computer mediated communication Workflow E-notebooks/annotation services Computer mediated model translation