The e-Science Vision Enabling New Science through Innovative Integrated Technology Solutions The Mission To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base.  To “e-enable” the STFC facilities.
The Vision An increasingly sophisticated infrastructure supporting innovative exploitation of data from the full range of STFC facilities.  integrated into National and International activities.  Improved use of computation and data management in areas with little historic engagement but growing needs.  Exploit emerging technologies to further enhance UK capabilities. Better science... accelerate the research process, improve traceability and reproducibility  meet the challenges posed by increasing data volumes.  improves cost effectiveness and quality encourage collaboration and knowledge exchange enable researchers to tackle more of the world’s grand challenges  improve the long-term exploitation of research outputs bridging facilities and users
JET The UK e-Infrastructure for e-Science ESRF
The Road to Net-centricity from Applications Perspective WEB Enabled A application that requests,  and is given access to,  services and/or resources  via an HTTP request Application may have been created before there was a WEB Leverage prior investment to quickly make data or application available Can use simple HTML WEB Interface or full WEB service interface Limited by the Data / Functions exposed in the original design WEB Service Typically built from the ground up to run over the WEB Uses industry standards to provide means of interoperating between different software applications; runs on a variety of platforms and/or frameworks Can be combined in a loosely coupled way in order to achieve complex operations Simple services can interact with each other to deliver sophisticated value-added services Quality of Service and value added capabilities can be documented as Service Level Agreements (SLAs) WEB-enabled Make Data Available Deconstruct & Reconstruct WEB-Services / Compose XML/HTTP "Reach" "Volume" "Efficient & Flexible" "Agility & Speed" WSDL QoS & SLAs Data Transfer Non-Web Era System typically designed as closed, standalone Tightly coupled and engineered interface Data transfer via FTP / file transfer Data is system application specific  SOAP / UDDI \ WSDL Wrappers
Strategy  Expertise in systems, applications and information management  Develop and support the integrated e-infrastructure required by researchers  Focused around exploiting the full lifecycle for scientific data Developed through Science led projects User focused, standards based, acknowledging constraints from National and International collaborations and Government priorities. Direct contributions to projects and activities e.g. LHC, ISIS, DLS, CLF… Competitive and technology push R&D to inform and support future programmes Grid infrastructures for the UK and Europe Information management in a distributed heterogeneous environment Long term data curation Advanced analysis and visualisation Leveraging investment through provision of services to partner organisations Engage Nationally and Internationally.  Take expert advice.  The e-Science Advisory Board
e-Science Advisory Board External   Dr. Daron Green - BT Prof. David Ingram  - UCL Dr David Williams - CERN  Dr Jerzy Graff - BMT  Dr. Graham Cameron – EBI  Prof. Malcolm Atkinson  - NeSC Prof. Alex Gray – Cardiff  Prof. Andy Lawrence –  ROE Prof. Carole Goble - Manchester Prof. Paul Jeffreys – Oxford Internal Neil Geddes John Gordon Prof. Keith Jeffery Prof. Paul Durham
e-Science in 2001 CCLRC e-Science Centre ~ 8 people 10 Projects covering astronomy, particle physics and computing £1M p.a. e-Science Industry day  February 2001
e-Science in 2007 Over 100 staff in e-Science Centre £11M income in 2006/07  Projects in HEP, astronomy, biomedical simulation, environmental science, nano-technology, materials science UK Leadership in grid infrastructure European leadership in data curation
Some e-science facts and figures 113 staff  28 female (8 in Library), 23 fixed term
Department Overview STFC e-Science Centre is:  using leading edge IT to deliver new science Management and exploitation of large scale scientific data. High-quality scientific computing services Support for collaborative working Collaborative R&D  Sharing expertise - technology transfer Based on core skills: Collaborative tools Data analysis and Computation Data storage Data management
Conclusion Strong personal belief in opportunities from ICT Specific opportunities for STFC: Exploit experience in grand challenges like LHC and IPCC Encourage collaboration across STFC facilities Build on our unique position to lead developments internationally Leverage the infrastructure deployed for wider UK benefit Meet the ICT expectations of modern researchers Use the above to stimulate innovation and support science research Achieving these requires Living close to the technology edge Providing technological expertise and vision Managing  technology push and user pull Active research expertise “ innovate or die”  –anon.
GridPP, LCG and EGEE CCLRC e-science centre  - LHC Tier-1 - Regional Operations Centre (UK+I) Coordinator of National Grid Service Partner in other grid deployments Tier-1
Facilities e-Infrastructure Diamond synchrotron ISIS neutron and muon facility Vulcan laser facility Physical facilities provide  data for the information  Infrastructure Record data Store data Search data Share data Integrated system for DLS demonstrated February 2007 ISIS 20 year back catalogue ISIS available online
Multi-disciplinary environmental science programmes Molecular studies of pollutants and radiation damage Data integration resources  CCLRC provides technological support Data management infrastructure Grid computing Data and information standardisation CML, CSML  Environmental  Science
Bio-Medical Sciences Data management in post-genomic biology  Integrated Systems Biology Centre High throughput experiments Preparations for biomedical use of DLS/ISIS ... Biomedical simulation and integrated systems biology  Integrative Biology Data sharing infrastructure Data integration and visualisation The Ontogenesis Network
Materials and Nanotechnology Characterisation of Materials structure and properties e-Science technology for real time analysis for experiments Ability to run, manage and integrate the results of hundreds of distinct calculations Advanced visualisation for better result analysis Long lasting archives of scientific results with easy access for scientists - Ability to share results easily when required
International Synchrotron and Neutron Data Infrastructure European Data Infrastructure Support UK developments, drive standard access Europe wide Develop position as a good host + develop access for UK researchers Access to Scientific Data: Grid Infrastructure: ESFRI/e-IRG E-infrastructure?? ? Who Encourage and influence development of infrastructure
Summary of STFC implementation   of IB Grid services and applications for Integrative Biology A prototype IB grid with server side visualization to handle extremely large datasets (100MB per small experiment) generated on HPCx and other NGS clusters. Interfaces to the grid job management and SRB built on CoolGraphics, Meshalyser & Matlab and also a standalone C++ GUI for IB services. Control panels of specific application packages deployed on desktop while the functional core executes on NGS for data encoding & decoding Results sent to desktop as well as display walls
Summary of STFC implementation   of IB Grid services and applications for Integrative Biology Implementation of soft tissue cancer models on the grid (parallelisation included), with embedded computational steering  Implementation of 3D image reconstruction in real time using the visualisation cluster MRI & histopathology images of heart data in-vivo cancer image data (for statistics on histopathology) Arterial stent tomography data from ESRF Schematic of stent in artery Stent image to geometry reconstruction Processed image with tumour cells and blood vessels highlighted Result from edge detection  Screenshot of real-time 3D image reconstruction, halfway through. STFC visualization cluster is used and image sent to remote desktop
SKOS Phase 2 (2005-06) W3C Semantic Web Best Practices and Deployment (SWBPD) Working Group HP, IBM, Boeing, Adobe, Universities of Maryland, Stanford, Manchester, Amsterdam  Task force to further develop SKOS  Alistair Miles (STFC) lead
Digital Curation research activities David Giaretta Director of CASPAR Project and Associate Director UK Digital Curation Centre
Outline Background OAIS CASPAR and DCC Future research and projects Summary
Digital Preservation… Easy to do… … as long as you can provide money forever Easy to test claims about tools… … as long as you live a long time
OAIS (ISO14721) Open Archival Information System Reference Model  referenced in just about any serious work on digital preservation Development hosted by CCSDS Panel 2 5 year ISO review underway minor corrections and updates No major changes Revised version due early 2008 Chaired by DG
OAIS Functional Entities SIP  = Submission Information Package AIP =  Archival Information Package DIP =  Dissemination Information Package SIP Descriptive Info. AIP AIP DIP Administration P R O D U C E R C O N S U M E R queries result sets MANAGEMENT Ingest Access Data Management Archival Storage Descriptive Info. Preservation Planning orders
CASPAR Project http://www.casparpreserves.eu EU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU) Started April 2006, for 42 months David Giaretta is Co-ordinator
CASPAR Aims Produce tools and techniques to support digital preservation and make it easier to share the cost must  be relatively easy to use must  have a low “buy-in” in terms of effort required for adoption must  avoid requiring wholesale change of everyone else’s systems must  be decentralised and reproducible so that it can live on after the formal end of the  CASPAR  project must  be “ preservable ” must  be open: open source, open standards Cannot do everything Working closely with other projects
CASPAR information flow architecture Rep Info Virtualisation How do we capture the Representation Information?
Overview Environmental drivers Technology drivers Revolution e-Science Centre’s role Environment Technology activities now future
e-Science Centre role Environment Co-located at STFC with BADC, NEODC IPCC Data Distribution Centre NERC DataGrid Background in environmental science Technology Standards (ISO, OGC) Architecture Expertise in ‘Grid’ technologies Information modelling
Activities – current MOTIIVE (EU FP7,  http:// www.motiive.net ) ISO 19109: General Feature Model cf. object metamodel: feature types, attributes, operations, associations ISO 19110: Feature cataloguing Feature Catalogue  ≡ ‘semantics repository’ powerful operational component in SDI inheritance: semantic re-use behaviour: service binding Developing candidate implementation ebRIM    19110 mapping
Activities – current INSPIRE ( http://www.ec-gis.org/inspire ) selected by EC to co-develop statutory  Implementing Rules  on data specifications D2.3: Scoping of themes D2.5: Generic Conceptual Model D2.6: Draft Methodology D2.7: Encoding ocean/atmosphere/met themes CSML leading candidate liaising with DEFRA on UK transposition/implementation
Activities – current Standards ISO member of BSI IST/36 ISO 19111-2: Parametric coordinates represent NERC interests (SLA) OGC ‘ Observations and Measurements’ model GML KML OGC documents: 06-160r1, 07-112, 07-083
CCLRC Data Portal Wrapper Wrapper Wrapper Wrapper Facility Section Core Data Portal Section At the time CCLRC had: 1 World Data Centre 5 National Data Centre 10 Minor Community based Data Centre The Portal would enable them to all be accessible CCLRC Data Portal Local data Local metadata Facility N Local data Local metadata DLS Local data Local metadata JAERI   Local data Local metadata ISIS
CCLRC (now Core) Scientific Metadata Model   Metadata  Object  Topic   Study  Description   Access  Conditions   Data  Location   Data  Description Related  Material  Today used by other e-Science Projects (e.g. MyGrid), Facilities (e.g. ISIS, DLS, CLF, Lab-in-a-Cell) and Internationally (e.g. SNS, CLS, Australia) Keywords  providing a index on what the study is about. Provenance  about what the study is, who did it and when. Conditions of use  providing information on who and how the data can be accessed . Detailed description  of the organisation of the data into datasets and files. Locations  providing a navigational to where the data on the study can be found. References  into the literature and community providing context about the study.
Storage Resource Broker  Virtualising the Users Data First SRB installation outside SDSC, Distribution Version and Installation Guidelines,  Making SRB ‘Grid aware’ through Grid Security, Licensing
ISIS 20 Year Back Catalogue The catalogue holds 93000 Studies and 1.87 million Data files, with 870 000 Distinct keywords categorising the data.
What we aim to provide with the e-Infrastructure Enabling users to get rapid access to their current and past data,   related experiments, publications etc., leading to improved analysis through more complete information. Creating a powerful, long lasting scientific knowledge resource.
Protecting our valuable assets - Data Curation 2 PhD and 1 MSc studentships with the Universities of Reading and Manchester on: Long Term Metadata Management and Quality Assurance – Arif Shaon  The Usage of semantic technologies for longterm preservation – Kaixuan Wang
Future work Dr. Robert McGreevy, ISIS Integrating data from disparate sources into topic centres – Challenges: Data Presentation and Integration, Trust, Encouraging usage of data from unfamiliar sources.

SomeSlides

  • 1.
    The e-Science VisionEnabling New Science through Innovative Integrated Technology Solutions The Mission To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base. To “e-enable” the STFC facilities.
  • 2.
    The Vision Anincreasingly sophisticated infrastructure supporting innovative exploitation of data from the full range of STFC facilities. integrated into National and International activities. Improved use of computation and data management in areas with little historic engagement but growing needs. Exploit emerging technologies to further enhance UK capabilities. Better science... accelerate the research process, improve traceability and reproducibility meet the challenges posed by increasing data volumes. improves cost effectiveness and quality encourage collaboration and knowledge exchange enable researchers to tackle more of the world’s grand challenges improve the long-term exploitation of research outputs bridging facilities and users
  • 3.
    JET The UKe-Infrastructure for e-Science ESRF
  • 4.
    The Road toNet-centricity from Applications Perspective WEB Enabled A application that requests, and is given access to, services and/or resources via an HTTP request Application may have been created before there was a WEB Leverage prior investment to quickly make data or application available Can use simple HTML WEB Interface or full WEB service interface Limited by the Data / Functions exposed in the original design WEB Service Typically built from the ground up to run over the WEB Uses industry standards to provide means of interoperating between different software applications; runs on a variety of platforms and/or frameworks Can be combined in a loosely coupled way in order to achieve complex operations Simple services can interact with each other to deliver sophisticated value-added services Quality of Service and value added capabilities can be documented as Service Level Agreements (SLAs) WEB-enabled Make Data Available Deconstruct & Reconstruct WEB-Services / Compose XML/HTTP "Reach" "Volume" "Efficient & Flexible" "Agility & Speed" WSDL QoS & SLAs Data Transfer Non-Web Era System typically designed as closed, standalone Tightly coupled and engineered interface Data transfer via FTP / file transfer Data is system application specific SOAP / UDDI \ WSDL Wrappers
  • 5.
    Strategy Expertisein systems, applications and information management Develop and support the integrated e-infrastructure required by researchers Focused around exploiting the full lifecycle for scientific data Developed through Science led projects User focused, standards based, acknowledging constraints from National and International collaborations and Government priorities. Direct contributions to projects and activities e.g. LHC, ISIS, DLS, CLF… Competitive and technology push R&D to inform and support future programmes Grid infrastructures for the UK and Europe Information management in a distributed heterogeneous environment Long term data curation Advanced analysis and visualisation Leveraging investment through provision of services to partner organisations Engage Nationally and Internationally. Take expert advice. The e-Science Advisory Board
  • 6.
    e-Science Advisory BoardExternal Dr. Daron Green - BT Prof. David Ingram - UCL Dr David Williams - CERN Dr Jerzy Graff - BMT Dr. Graham Cameron – EBI Prof. Malcolm Atkinson - NeSC Prof. Alex Gray – Cardiff Prof. Andy Lawrence – ROE Prof. Carole Goble - Manchester Prof. Paul Jeffreys – Oxford Internal Neil Geddes John Gordon Prof. Keith Jeffery Prof. Paul Durham
  • 7.
    e-Science in 2001CCLRC e-Science Centre ~ 8 people 10 Projects covering astronomy, particle physics and computing £1M p.a. e-Science Industry day February 2001
  • 8.
    e-Science in 2007Over 100 staff in e-Science Centre £11M income in 2006/07 Projects in HEP, astronomy, biomedical simulation, environmental science, nano-technology, materials science UK Leadership in grid infrastructure European leadership in data curation
  • 9.
    Some e-science factsand figures 113 staff 28 female (8 in Library), 23 fixed term
  • 10.
    Department Overview STFCe-Science Centre is: using leading edge IT to deliver new science Management and exploitation of large scale scientific data. High-quality scientific computing services Support for collaborative working Collaborative R&D Sharing expertise - technology transfer Based on core skills: Collaborative tools Data analysis and Computation Data storage Data management
  • 11.
    Conclusion Strong personalbelief in opportunities from ICT Specific opportunities for STFC: Exploit experience in grand challenges like LHC and IPCC Encourage collaboration across STFC facilities Build on our unique position to lead developments internationally Leverage the infrastructure deployed for wider UK benefit Meet the ICT expectations of modern researchers Use the above to stimulate innovation and support science research Achieving these requires Living close to the technology edge Providing technological expertise and vision Managing technology push and user pull Active research expertise “ innovate or die” –anon.
  • 12.
    GridPP, LCG andEGEE CCLRC e-science centre - LHC Tier-1 - Regional Operations Centre (UK+I) Coordinator of National Grid Service Partner in other grid deployments Tier-1
  • 13.
    Facilities e-Infrastructure Diamondsynchrotron ISIS neutron and muon facility Vulcan laser facility Physical facilities provide data for the information Infrastructure Record data Store data Search data Share data Integrated system for DLS demonstrated February 2007 ISIS 20 year back catalogue ISIS available online
  • 14.
    Multi-disciplinary environmental scienceprogrammes Molecular studies of pollutants and radiation damage Data integration resources CCLRC provides technological support Data management infrastructure Grid computing Data and information standardisation CML, CSML Environmental Science
  • 15.
    Bio-Medical Sciences Datamanagement in post-genomic biology Integrated Systems Biology Centre High throughput experiments Preparations for biomedical use of DLS/ISIS ... Biomedical simulation and integrated systems biology Integrative Biology Data sharing infrastructure Data integration and visualisation The Ontogenesis Network
  • 16.
    Materials and NanotechnologyCharacterisation of Materials structure and properties e-Science technology for real time analysis for experiments Ability to run, manage and integrate the results of hundreds of distinct calculations Advanced visualisation for better result analysis Long lasting archives of scientific results with easy access for scientists - Ability to share results easily when required
  • 17.
    International Synchrotron andNeutron Data Infrastructure European Data Infrastructure Support UK developments, drive standard access Europe wide Develop position as a good host + develop access for UK researchers Access to Scientific Data: Grid Infrastructure: ESFRI/e-IRG E-infrastructure?? ? Who Encourage and influence development of infrastructure
  • 18.
    Summary of STFCimplementation of IB Grid services and applications for Integrative Biology A prototype IB grid with server side visualization to handle extremely large datasets (100MB per small experiment) generated on HPCx and other NGS clusters. Interfaces to the grid job management and SRB built on CoolGraphics, Meshalyser & Matlab and also a standalone C++ GUI for IB services. Control panels of specific application packages deployed on desktop while the functional core executes on NGS for data encoding & decoding Results sent to desktop as well as display walls
  • 19.
    Summary of STFCimplementation of IB Grid services and applications for Integrative Biology Implementation of soft tissue cancer models on the grid (parallelisation included), with embedded computational steering Implementation of 3D image reconstruction in real time using the visualisation cluster MRI & histopathology images of heart data in-vivo cancer image data (for statistics on histopathology) Arterial stent tomography data from ESRF Schematic of stent in artery Stent image to geometry reconstruction Processed image with tumour cells and blood vessels highlighted Result from edge detection Screenshot of real-time 3D image reconstruction, halfway through. STFC visualization cluster is used and image sent to remote desktop
  • 20.
    SKOS Phase 2(2005-06) W3C Semantic Web Best Practices and Deployment (SWBPD) Working Group HP, IBM, Boeing, Adobe, Universities of Maryland, Stanford, Manchester, Amsterdam Task force to further develop SKOS Alistair Miles (STFC) lead
  • 21.
    Digital Curation researchactivities David Giaretta Director of CASPAR Project and Associate Director UK Digital Curation Centre
  • 22.
    Outline Background OAISCASPAR and DCC Future research and projects Summary
  • 23.
    Digital Preservation… Easyto do… … as long as you can provide money forever Easy to test claims about tools… … as long as you live a long time
  • 24.
    OAIS (ISO14721) OpenArchival Information System Reference Model referenced in just about any serious work on digital preservation Development hosted by CCSDS Panel 2 5 year ISO review underway minor corrections and updates No major changes Revised version due early 2008 Chaired by DG
  • 25.
    OAIS Functional EntitiesSIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package SIP Descriptive Info. AIP AIP DIP Administration P R O D U C E R C O N S U M E R queries result sets MANAGEMENT Ingest Access Data Management Archival Storage Descriptive Info. Preservation Planning orders
  • 26.
    CASPAR Project http://www.casparpreserves.euEU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU) Started April 2006, for 42 months David Giaretta is Co-ordinator
  • 27.
    CASPAR Aims Producetools and techniques to support digital preservation and make it easier to share the cost must be relatively easy to use must have a low “buy-in” in terms of effort required for adoption must avoid requiring wholesale change of everyone else’s systems must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project must be “ preservable ” must be open: open source, open standards Cannot do everything Working closely with other projects
  • 28.
    CASPAR information flowarchitecture Rep Info Virtualisation How do we capture the Representation Information?
  • 29.
    Overview Environmental driversTechnology drivers Revolution e-Science Centre’s role Environment Technology activities now future
  • 30.
    e-Science Centre roleEnvironment Co-located at STFC with BADC, NEODC IPCC Data Distribution Centre NERC DataGrid Background in environmental science Technology Standards (ISO, OGC) Architecture Expertise in ‘Grid’ technologies Information modelling
  • 31.
    Activities – currentMOTIIVE (EU FP7, http:// www.motiive.net ) ISO 19109: General Feature Model cf. object metamodel: feature types, attributes, operations, associations ISO 19110: Feature cataloguing Feature Catalogue ≡ ‘semantics repository’ powerful operational component in SDI inheritance: semantic re-use behaviour: service binding Developing candidate implementation ebRIM  19110 mapping
  • 32.
    Activities – currentINSPIRE ( http://www.ec-gis.org/inspire ) selected by EC to co-develop statutory Implementing Rules on data specifications D2.3: Scoping of themes D2.5: Generic Conceptual Model D2.6: Draft Methodology D2.7: Encoding ocean/atmosphere/met themes CSML leading candidate liaising with DEFRA on UK transposition/implementation
  • 33.
    Activities – currentStandards ISO member of BSI IST/36 ISO 19111-2: Parametric coordinates represent NERC interests (SLA) OGC ‘ Observations and Measurements’ model GML KML OGC documents: 06-160r1, 07-112, 07-083
  • 34.
    CCLRC Data PortalWrapper Wrapper Wrapper Wrapper Facility Section Core Data Portal Section At the time CCLRC had: 1 World Data Centre 5 National Data Centre 10 Minor Community based Data Centre The Portal would enable them to all be accessible CCLRC Data Portal Local data Local metadata Facility N Local data Local metadata DLS Local data Local metadata JAERI Local data Local metadata ISIS
  • 35.
    CCLRC (now Core)Scientific Metadata Model Metadata Object Topic Study Description Access Conditions Data Location Data Description Related Material Today used by other e-Science Projects (e.g. MyGrid), Facilities (e.g. ISIS, DLS, CLF, Lab-in-a-Cell) and Internationally (e.g. SNS, CLS, Australia) Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed . Detailed description of the organisation of the data into datasets and files. Locations providing a navigational to where the data on the study can be found. References into the literature and community providing context about the study.
  • 36.
    Storage Resource Broker Virtualising the Users Data First SRB installation outside SDSC, Distribution Version and Installation Guidelines, Making SRB ‘Grid aware’ through Grid Security, Licensing
  • 37.
    ISIS 20 YearBack Catalogue The catalogue holds 93000 Studies and 1.87 million Data files, with 870 000 Distinct keywords categorising the data.
  • 38.
    What we aimto provide with the e-Infrastructure Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information. Creating a powerful, long lasting scientific knowledge resource.
  • 39.
    Protecting our valuableassets - Data Curation 2 PhD and 1 MSc studentships with the Universities of Reading and Manchester on: Long Term Metadata Management and Quality Assurance – Arif Shaon The Usage of semantic technologies for longterm preservation – Kaixuan Wang
  • 40.
    Future work Dr.Robert McGreevy, ISIS Integrating data from disparate sources into topic centres – Challenges: Data Presentation and Integration, Trust, Encouraging usage of data from unfamiliar sources.

Editor's Notes

  • #2 Spearhead the exploitation of e-Science technologies throughout Science and Technology Facilities Council’s programmes and communities. Support and enhance the Science and Technology Facilities Council (STFC) science programme. Bring the rapid advances in computing, computer science and information technology to bear on the major challenges in science and engineering. Stimulate and support new innovations in ICT for the benefit of the UK science and engineering base. Provide state of the art ICT infrastructure to support UK scientists and engineers.