• Save

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Ticer summer school_24_aug06

on

  • 1,184 views

Sample Powerpoint File On Linkedin Profile.

Sample Powerpoint File On Linkedin Profile.

Statistics

Views

Total Views
1,184
Views on SlideShare
1,184
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • SayDotcom Will Hep You Put Your Video On Your Linkedin Profile, Ask About Advanced Lead Generation With Profile Videos On Linkedin.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Climateprediction: Altruistic computing, outreach to schools, multiple runs of simpler models then statistics of distribution of results illustrating model uncertainty / sensitivity. Genie: again value of making it possible to share data sources, submit jobs easily as ensembles that explore a parameter space contributing to shared data that is then analysed / visualised. Each an example of significant behavioural change propagating among the scientists studying the Natural Environment.
  • Note different teams model different aspects of the heart. Their geographic distribution shown on the next slide.
  • These series of examples allow motivation + show the kind of multi-site, multi-team, multi-discipline collaboration. B iomedical R esearch I nformatics D elivered by G rid E nabled S ervices NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges Supporting project for CFG project Cardiovascular Functional Genomics . T Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, medical records, … Aim is integrated infrastructure supporting Data federation Security
  • Idea: multiple medical regions’ radiographers collaborate for training, comparator data, and perhaps back up / load sharing. This would also provide an sufficiently large pool of data to enable epidemiology on a sufficient scale to study rarer syndromes and presentations. Difficult to anticipate constraints and impediments from existing working practices, e.g. regional variations in description of lesions, worries about loss of personal contact with patients and process, worries about loss of jobs, …
  • BLAST: Gene sequencing In bioinformatics , B asic L ocal A lignment S earch T ool, or BLAST , is an algorithm for comparing biological sequences , such as the amino-acid sequences of different proteins or the DNA sequences . CHARMM – molecular dynamics CHARMM ( Chemistry at HARvard Macromolecular Mechanics ) is a force field for molecular dynamics as well as the name for the molecular dynamics simulation package associated with this force field. http://www.charmm.org/
  • VLBA Very Large Base Array - linked radio telescopes NRAO - National Radio Astronomy Observatory
  • This shows the National Centre, 8 regional centres and 2 laboratories in blue That is the original set up at the start of UK e-Science August 2001 NeSC is jointly run by Edinburgh & Glasgow Universities In 2003 several smaller centres were added (vermilion) The e-Science Institute is run by the National e-Science Centre. It runs a programme of events and hosts visiting international researchers. It was established in 2001. The Open Middleware Infrastructure Institute was established in 2004, to provide support and direction for Grid middleware developed in the UK. It is based at the University of Southampton. The Grid Operations Support Centre was established in 2004. The Digital Curation Centre was established in 2004 by the Universities of Edinburgh and Glasgow, the UK Online Library Network at the University of Bath, and the Central Laboratories at Daresbury and Rutherford. It’s job is to provide advice on curating scientific data and on preserving digital media, formats, and access software. Edinburgh is one of the 4 founders of the Globus Alliance (Sept 2003) which will take responsibility for the future of the Globus Toolkit. The other founders are: Chicago University (Argonne National Lab), University of Southern California, Los Angeles (Information Sciences Institute) and the PDC, Stockholm, Sweden The EU EGEE project (Enabling Grids for E-Science in Europe) is establishing a common framework for Grids in Europe. The UK e-Science programme has several connections with EGEE. NeSC leads the training component for the whole of Europe.

Ticer summer school_24_aug06 Ticer summer school_24_aug06 Presentation Transcript

  • Ticer Summer School Thursday 24 th August 2006 Dave Berry & Malcolm Atkinson National e-Science Centre, Edinburgh www.nesc.ac.uk
  • Digital Libraries, Grids & E-Science
        • What is E-Science?
        • What is Grid Computing?
        • Data Grids
          • Requirements
          • Examples
          • Technologies
        • Data Virtualisation
        • The Open Grid Services Architecture
        • Challenges
  • What is e-Science?
  • What is e-Science?
    • Goal: to enable better research in all disciplines
    • Method: Develop collaboration supported by advanced distributed computation
      • to generate, curate and analyse rich data resources
        • From experiments, observations, simulations & publications
        • Quality management, preservation and reliable evidence
      • to develop and explore models and simulations
        • Computation and data at all scales
        • Trustworthy, economic, timely and relevant results
      • to enable dynamic distributed collaboration
        • Facilitating collaboration with information and resource sharing
        • Security, trust, reliability, accountability, manageability and agility
  • climate prediction .net and GENIE
    • Largest climate model ensemble
    • >45,000 users, >1,000,000 model years
    10K 2K Response of Atlantic circulation to freshwater forcing
  • Integrative Biology
    • Tackling two Grand Challenge research questions:
      • What causes heart disease?
      • How does a cancer form and grow?
    • Together these diseases cause 61% of all UK deaths
    Building a powerful, fault-tolerant Grid infrastructure for biomedical science Enabling biomedical researchers to use distributed resources such as high-performance computers, databases and visualisation tools to develop coupled multi-scale models of how these killer diseases develop.
  • B iomedical R esearch I nformatics D elivered by G rid E nabled S ervices Portal http://www.brc.dcs.gla.ac.uk/projects/bridges/ SyntenyGrid Service blast +
  • eDiaMoND: Screening for Breast Cancer 1 Trust  Many Trusts Collaborative Working Audit capability Epidemiology
    • Other Modalities
    • MRI
    • PET
    • Ultrasound
    Better access to Case information And digital tools Supplement Mentoring With access to digital Training cases and sharing Of information across clinics Provided by eDiamond project: Prof. Sir Mike Brady et al. Letters Radiology reporting systems eDiaMoND Grid 2ndary Capture Or FFD Case Information X-Rays and Case Information Digital Reading SMF Case and Reading Information CAD Temporal Comparison Screening Electronic Patient Records Assessment/ Symptomatic Biopsy Case and Reading Information Symptomatic/Assessment Information Training Manage Training Cases Perform Training SMF CAD 3D Images Patients
  • E-Science Data Resources
    • Curated databases
      • Public, institutional, group, personal
    • Online journals and preprints
    • Text mining and indexing services
    • Raw storage (disk & tape)
    • Replicated files
    • Persistent archives
    • Registries
  • EBank Slide from Jeremy Frey
  • Biomedical data – making connections Slide provided by Carole Goble: University of Manchester 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg
  • Using Workflows to Link Services
    • Describe the steps in a Scripting Language
    • Steps performed by Workflow Enactment Engine
    • Many languages in use
      • Trade off: familiarity & availability
      • Trade off: detailed control versus abstraction
    • Incrementally develop correct process
      • Sharable & Editable
      • Basis for scientific communication & validation
      • Valuable IPR asset
    • Repetition is now easy
      • Parameterised explicitly & implicitly
  • Workflow Systems BIRN, GEON & SEEK http://kepler-project.org/ Kepler Kepler High-level abstract formulation of workflows, automated mapping towards executable forms, cached result re-use Chimera & DAGman VDT / Pegasus EBI, OMII-UK & MyGrid http://taverna.sourceforge.net/index.php Scufl Taverna OASIS standard for industry – coordinating use of multiple Web Services – low level detail - tools BPEL Enactment BPEL Popular target because JVM ubiquity – similar dependence – distribution has to be coded JVM Java Popular in bioinformatics. Similar context dependence – distribution has to be coded Perl runtime Perl Common but not often thought of as WF. Depend on context, e.g. NFS across all sites Shell + OS Shell scripts Comments WF Enact. Language
  • Workflow example
    • Taverna in MyGrid http://www.mygrid.org.uk/
    • “ allows the e-Scientist to describe and enact their experimental processes in a structured, repeatable and verifiable way”
    • GUI
    • Workflow language
    • Enactment engine
  • Notification Pub/Sub for Laboratory data using a broker and ultimately delivered over GPRS Comb-e-chem: Jeremy Frey
  • Relevance to Digital Libraries
    • Similar concerns
      • Data curation & management
      • Metadata, discovery
      • Secure access (AAA +)
      • Provenance & data quality
      • Local autonomy
      • Availability, resilience
    • Common technology
      • Grid as an implementation technology
  • What is Grid computing?
  • What is a Grid?
    • A grid is a system consisting of
      • Distributed but connected resources and
      • Software and/or hardware that provides and manages logically seamless access to those resources to meet desired objectives
    Data Center Cluster Handheld Supercomputer Workstation Server Source: Hiro Kishimoto GGF17 Keynote May 2006 License Printer R2AD Database Web server
  • Virtualizing Resources Access Storage Sensors Applications Information Computers Resource-specific Interfaces Common Interfaces Type-specific interfaces Hiro Kishimoto: Keynote GGF17 Resources Web services
  • Ideas and Forms
    • Key ideas
      • Virtualised resources
      • Secure access
      • Local autonomy
    • Many forms
      • Cycle stealing
      • Linked supercomputers
      • Distributed file systems
      • Federated databases
      • Commercial data centres
      • Utility computing
  • Grid Middleware Brokering Service Registry Service Data Service CPU Resource Printer Service Job-Submit Service Compute Service Notify Advertise Application Service Hiro Kishimoto: Keynote GGF17 Virtualized resources Grid middleware services
  • Key Drivers for Grids
    • Collaboration
      • Expertise is distributed
      • Resources (data, software licences) are location-specific
      • Necessary to achieve critical mass of effort
      • Necessary to raise sufficient resources
    • Computational Power
      • Rapid growth in number of processors
      • Powered by Moore’s law + device roadmap
      • Challenge to transform models to exploit this
    • Deluge of Data
      • Growth in scale: Number and Size of resources
      • Growth in complexity
      • Policy drives greater data availability
  • Minimum Grid Functionalities
    • Supports distributed computation
      • Data and computation
      • Over a variety of
        • hardware components (servers, data stores, …)
        • Software components (services: resource managers, computation and data services)
      • With regularity that can be exploited
        • By applications
        • By other middleware & tools
        • By providers and operations
      • It will normally have security mechanisms
        • To develop and sustain trust regimes
  • Grid & Related Paradigms Source: Hiro Kishimoto GGF17 Keynote May 2006
    • Distributed Computing
    • Loosely coupled
    • Heterogeneous
    • Single Administration
    • Cluster
    • Tightly coupled
    • Homogeneous
    • Cooperative working
    • Grid Computing
    • Large scale
    • Cross-organizational
    • Geographical distribution
    • Distributed Management
    • Utility Computing
    • Computing “services”
    • No knowledge of provider
    • Enabled by grid technology
  • Motives for Grids
  • Why use / build Grids?
    • Research Arguments
      • Enables new ways of working
      • New distributed & collaborative research
      • Unprecedented scale and resources
    • Economic Arguments
      • Reduced system management costs
      • Shared resources  better utilisation
      • Pooled resources  increased capacity
      • Load sharing & utility computing
      • Cheaper disaster recovery
  • Why use / build Grids?
    • Operational Arguments
      • Enable autonomous organisations to
        • Write complementary software components
        • Set up run & use complementary services
        • Share operational responsibility
        • General & consistent environment for Abstraction, Automation, Optimisation & Tools
    • Political & Management Arguments
      • Stimulate innovation
      • Promote intra-organisation collaboration
      • Promote inter-enterprise collaboration
  • Grids In Use: E-Science Examples
    • Data sharing and integration
      • Life sciences, sharing standard data-sets, combining collaborative data-sets
      • Medical informatics, integrating hospital information systems for better care and better science
      • Sciences, high-energy physics
    • Capability computing
      • Life sciences, molecular modeling, tomography
      • Engineering, materials science
      • Sciences, astronomy, physics
    • High-throughput, capacity computing for
      • Life sciences: BLAST, CHARMM, drug screening
      • Engineering: aircraft design, materials, biomedical
      • Sciences: high-energy physics, economic modeling
    • Simulation-based science and engineering
      • Earthquake simulation
    Source: Hiro Kishimoto GGF17 Keynote May 2006
  • Data Requirements
  • Database Growth Slide provided by Richard Baldock: MRC HGU Edinburgh PDB 33,367 Protein structures EMBL DB 111,416,302,701 nucleotides
  • Requirements: User’s viewpoint
    • Find Data
      • Registries & Human communication
    • Understand data
      • Metadata description, Standard / familiar formats & representations, Standard value systems & ontologies
    • Data Access
      • Find how to interact with data resource
      • Obtain permission (authority)
      • Make connection
      • Make selection
    • Move Data
      • In bulk or streamed (in increments)
  • Requirements: User’s viewpoint 2
    • Transform Data
      • To format, organisation & representation required for computation or integration
    • Combine data
      • Standard database operations + operations relevant to the application model
    • Present results
      • To humans: data movement + transform for viewing
      • To application code: data movement + transform to the required format
      • To standard analysis tools, e.g. R
      • To standard visualisation tools, e.g. Spitfire
  • Requirements: Owner’s viewpoint
    • Create Data
      • Automated generation, Accession Policies, Metadata generation
      • Storage Resources
    • Preserve Data
      • Archiving
      • Replication
      • Metadata
      • Protection
    • Provide Services with available resources
      • Definition & implementation: costs & stability
      • Resources: storage, compute & bandwidth
  • Requirements: Owner’s viewpoint 2
    • Protect Services
      • Authentication, Authorisation, Accounting, Audit
      • Reputation
    • Protect data
      • Comply with owner requirements – encryption for privacy, …
    • Monitor and Control use
      • Detect and handle failures, attacks, misbehaving users
      • Plan for future loads and services
    • Establish case for Continuation
      • Usage statistics
      • Discoveries enabled
  • Examples of Grid-based Data Management
  • Large Hadron Collider
    • The most powerful instrument ever built to investigate elementary particle physics
    • Data Challenge:
      • 10 Petabytes/year of data
      • 20 million CDs each year!
    • Simulation, reconstruction, analysis:
      • LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors
  • Composing Observations in Astronomy Data and images courtesy Alex Szalay, John Hopkins
    • No. & sizes of data sets as of mid-2002, grouped by wavelength
    • 12 waveband coverage of large areas of the sky
    • Total about 200 TB data
    • Doubling every 12 months
    • Largest catalogues near 1B objects
  • GODIVA Data Portal
    • G rid for O cean D iagnostics, I nteractive V isualisation and A nalysis
    • Daily Met Office Marine Forecasts and gridded research datasets
    • National Centre for Ocean Forecasting
    • ~3Tb climate model datastore via Web Services
    • Interactive Visualisations inc. Movies
    • ~ 30 accesses a day worldwide
    • Other GODIVA software produces 3D/4D Visualisations reading data remotely via Web Services
    Online Movies www.nerc-essc.ac.uk/godiva
  • GODIVA Visualisations
    • Unstructured Meshes
    • Grid Rotation/Interpolation
    • GeoSpatial Databases v. Files
    • (Postgres, IBM, Oracle)
    • Perspective 3D Visualisation
    • Google maps viewer
  • NERC Data Grid
    • The DataGrid focuses on federation of NERC Data Centres
    • Grid for data discovery, delivery and use across sites
    • Data can be stored in many different ways (flat files, databases…)
    • Strong focus on Metadata and Ontologies
    • Clear separation between discovery and use of data.
    • Prototype focussing on Atmospheric and Oceanographic data
    www.ndg.nerc.ac.uk
  • Global In-flight Engine Diagnostics Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin 100,000 aircraft 0.5 GB/flight 4 flights/day 200 TB/day Now BROADEN Significant in getting Boeing 787 engine contract in-flight data airline maintenance centre ground station global network eg SITA internet, e-mail, pager DS&S Engine Health Center data centre
  • Data Grid Technologies
  • Storage Resource Manager (SRM)
    • http://sdm.lbl.gov/srm-wg/
    • de facto & written standard in physics, …
    • Collaborative effort
      • CERN, FNAL,  JLAB, LBNL and RAL
    • Essential bulk file storage
      • (pre) allocation of storage
        • abstraction over storage systems
      • File delivery / registration / access
      • Data movement interfaces
        • E.g. gridFTP
    • Rich function set
      • Space management, permissions, directory, data transfer & discovery
  • Storage Resource Broker (SRB)
    • http://www.sdsc.edu/srb/index.php/Main_Page
    • SDSC developed
    • Widely used
      • Archival document storage
      • Scientific data: bio-sciences, medicine, geo-sciences, …
    • Manages
      • Storage resource allocation
        • abstraction over storage systems
      • File storage
      • Collections of files
      • Metadata describing files, collections, etc.
      • Data transfer services
  • Condor Data Management
    • Stork
      • Manages File Transfers
      • May manage reservations
    • Nest
      • Manages Data Storage
      • C.f. GridFTP with reservations
        • Over multiple protocols
  • Globus Tools and Services for Data Management
    • GridFTP
      • A secure, robust, efficient data transfer protocol
    • The Reliable File Transfer Service (RFT)
      • Web services-based, stores state about transfers
    • The Data Access and Integration Service (OGSA-DAI)
      • Service to access to data resources, particularly relational and XML databases
    • The Replica Location Service (RLS)
      • Distributed registry that records locations of data copies
    • The Data Replication Service
      • Web services-based, combines data replication and registration functionality
    Slides from Ann Chervenak
  • RLS in Production Use: LIGO
    • Laser Interferometer Gravitational Wave Observatory Currently use RLS servers at 10 sites
      • Contain mappings from 6 million logical files to over 40 million physical replicas
    • Used in customized data management system: the LIGO Lightweight Data Replicator System (LDR)
      • Includes RLS, GridFTP, custom metadata catalog, tools for storage management and data validation
    Slides from Ann Chervenak
  • RLS in Production Use: ESG
    • Earth System Grid: Climate modeling data (CCSM, PCM, IPCC)
    • RLS at 4 sites
    • Data management coordinated by ESG portal
    • Datasets stored at NCAR
      • 64.41 TB in 397253 total files
      • 1230 portal users
    • IPCC Data at LLNL
      • 26.50 TB in 59,300 files
      • 400 registered users
      • Data downloaded: 56.80 TB in 263,800 files
      • Avg. 300GB downloaded/day
      • 200+ research papers being written
    Slides from Ann Chervenak
  • gLite Data Management
    • FTS
      • File Transfer Service
    • LFC
      • Logical file catalogue
    • Replication Service
      • Accessed through LFC
    • AMGA
      • Metadata services
  • Data Management Services
    • FiReMan catalog
      • Resolves logical filenames (LFN) to physical location of files and storage elements
      • Oracle and MySQL versions available
      • Secure services
      • Attribute support
      • Symbolic link support
      • Deployed on the Pre-Production Service and DILIGENT testbed
    • gLite I/O
      • Posix-like access to Grid files
      • Castor, dCache and DPM support
      • Has been used for the BioMedical Demo
      • Deployed on the Pre-Production Service and the DILIGENT testbed
    • AMGA MetaData Catalog
      • Used by the LHCb experiment
      • Has been used for the BioMedical Demo
  • File Transfer Service
    • Reliable file transfer
    • Full scalable implementation
      • Java Web Service front-end, C++ Agents, Oracle or MySQL database support
      • Support for Channel, Site and VO management
      • Interfaces for management and statistics monitoring
    • Gsiftp, SRM and SRM-copy support
    • Support for MySQL and Oracle
    • Multi-VO support
    • GridFTP and SRM copy support
  • Commercial Solutions
    • Vendors include:
      • Avaki
      • Data Synapse
    • Benefits & costs
      • Well packaged and documented
      • Support
      • Can be expensive
        • But look for academic rates
  • Data Virtualisation
  • Data Integration Strategies
    • Use a Service provided by a Data Owner
    • Use a scripted workflow
    • Use data virtualisation services
      • Arrange that multiple data services have common properties
      • Arrange federations of these
      • Arrange access presenting the common properties
      • Expose the important differences
      • Support integration accommodating those differences
  • Data Virtualisation Services
    • Form a federation
      • Set of data resources – incremental addition
      • Registration & description of collected resources
      • Warehouse data or access dynamically to obtain updated data
      • Virtual data warehouses – automating division between collection and dynamic access
    • Describe relevant relationships between data sources
      • Incremental description + refinement / correction
    • Run jobs, queries & workflows against combined set of data resources
      • Automated distribution & transformation
    • Example systems
      • IBM’s Information Integrator
      • GEON, BIRN & SEEK
      • OGSA-DAI is an extensible framework for building such systems
  • Virtualisation variations
    • Extent to which homogeneity obtained
      • Regular representation choices – e.g. units
      • Consistent ontologies
      • Consistent data model
      • Consistent schema – integrated super-schema
      • DB operations supported across federation
      • Ease of adding federation elements
      • Ease of accommodating change as federation members change their schema and policies
      • Drill through to primary forms supported
  • OGSA-DAI
    • http://www.ogsadai.org.uk
    • A framework for data virtualisation
    • Wide use in e-Science
      • BRIDGES, GEON, CaBiG, GeneGrid, MyGrid, BioSimGrid, e-Diamond, IU RGRBench, …
    • Collaborative effort
      • NeSC, EPCC, IBM, Oracle, Manchester, Newcastle
    • Querying of data resources
      • Relational databases
      • XML databases
      • Structured flat files
    • Extensible activity documents
      • Customisation for particular applications
  • OGF Open Grid Services Architecture
  • The Open Grid Services Architecture
    • An open, service-oriented architecture (SOA)
      • Resources as first-class entities
      • Dynamic service/resource creation and destruction
    • Built on a Web services infrastructure
    • Resource virtualization at the core
    • Build grids from small number of standards-based components
      • Replaceable, coarse-grained
      • e.g. brokers
    • Customizable
      • Support for dynamic, domain-specific content…
      • … within the same standardized framework
    Hiro Kishimoto: Keynote GGF17
  • OGSA Capabilities
    • Security
    • Cross-organizational users
    • Trust nobody
    • Authorized access only
    • Information Services
    • Registry
    • Notification
    • Logging/auditing
    • Execution Management
    • Job description & submission
    • Scheduling
    • Resource provisioning
    • Data Services
    • Common access facilities
    • Efficient & reliable transport
    • Replication services
    • Self-Management
    • Self-configuration
    • Self-optimization
    • Self-healing
    • Resource Management
    • Discovery
    • Monitoring
    • Control
    OGSA OGSA “profiles” Web services foundation Hiro Kishimoto: Keynote GGF17
  • Basic Data Interfaces
    • Storage Management
      • e.g. Storage Resource Management (SRM)
    • Data Access
      • ByteIO
      • Data Access & Integration (DAI)
    • Data Transfer
      • Data Movement Interface Specification (DMIS)
      • Protocols (e.g. GridFTP)
    • Replica management
    • Metadata catalog
    • Cache management
    Hiro Kishimoto: Keynote GGF17
  • Challenges
  • The State of the Art
    • Many successful Grid & E-Science projects
      • A few examples shown in this talk
    • Many Grid systems
      • All largely incompatible
      • Interoperation talks under way
    • Standardisation efforts
      • Mainly via the Open Grid Forum
      • A merger of the GGF & EGA
    • Significant user investment required
      • Few “out of the box” solutions
  • Technical Challenges
    • Issues you can’t avoid
      • Lack of Complete Knowledge (LOCK)
      • Latency
      • Heterogeneity
      • Autonomy
      • Unreliability
      • Scalability
      • Change
    • A Challenging goal
      • balance technical feasibility
      • against virtual homogeneity, stability and reliability
      • while remaining affordable, manageable and maintainable
  • Areas “In Development”
    • Data provenance
    • Quality of Service
      • Service Level Agreements
    • Resource brokering
      • Across all resources
    • Workflow scheduling
      • Co-sheduling
    • Licence management
    • Software provisioning
      • Deployment and update
    • Other areas too!
  • Operational Challenges
    • Management of distributed systems
      • With local autonomy
    • Deployment, testing & monitoring
    • User training
    • User support
    • Rollout of upgrades
    • Security
      • Distributed identity management
      • Authorisation
      • Revocation
      • Incident response
  • Grids as a Foundation for Solutions
    • The grid per se doesn’t provide
      • Supported e-Science methods
      • Supported data & information resources
      • Computations
      • Convenient access
    • Grids help providers of these, via
      • International & national secure e-Infrastructure
      • Standards for interoperation
      • Standard APIs to promote re-use
    • But Research Support must be built
      • Application developers
      • Resource providers
  • Collaboration Challenges
    • Defining common goals
    • Defining common formats
      • E.g. schemas for data and metadata
    • Defining a common vocabulary
      • E.g. for metadata
    • Finding common technology
      • Standards should help, eventually
    • Collecting metadata
      • Automate where possible
  • Social Challenges
    • Changing cultures
      • Rewarding data & resource sharing
      • Require publication of data
    • Taking the first steps
      • If everyone shares, everyone wins
      • The first people to share must not lose out
    • Sustainable funding
      • Technology must persist
      • Data must persist
  • Summary & Conclusions
  • Summary
    • E-Science exploits distributed computing resource to enable new discoveries, new collaborations and new ways of working
    • Grid is an enabling technology for e-science.
    • Many successful projects exist
    • Many challenges remain
  • UK e-Science CeSC (Cambridge ) e-Science Institute EGEE, ChinaGrid Grid Operations Support Centre National Centre for e-Social Science National Institute for Environmental e-Science Globus Alliance Digital Curation Centre Open Middleware Infrastructure Institute
  • Questions & Comments