SlideShare a Scribd company logo
1 of 73
Ticer Summer School
Thursday 24th
August 2006
Dave Berry & Malcolm Atkinson
National e-Science Centre, Edinburgh
www.nesc.ac.uk
Digital Libraries, Grids & E-ScienceDigital Libraries, Grids & E-Science
What is E-Science?
What is Grid Computing?
Data Grids
Requirements
Examples
Technologies
Data Virtualisation
The Open Grid Services Architecture
Challenges
What is e-Science?What is e-Science?
• Goal: to enable better research in all disciplines
• Method: Develop collaboration supported by
advanced distributed computation
– to generate, curate and analyse rich data resources
• From experiments, observations, simulations & publications
• Quality management, preservation and reliable evidence
– to develop and explore models and simulations
• Computation and data at all scales
• Trustworthy, economic, timely and relevant results
– to enable dynamic distributed collaboration
• Facilitating collaboration with information and resource sharing
• Security, trust, reliability, accountability, manageability and agility
climateprediction.net and GENIE
• Largest climate model
ensemble
• >45,000 users, >1,000,000
model years
10K2K
Response of Atlantic
circulation to freshwater
forcing
6
Courtesy of David Gavaghan &
IB Team
Integrative Biology
Tackling two Grand Challenge research
questions:
• What causes heart disease?
• How does a cancer form and grow?
Together these diseases cause 61% of all UK
deaths
Building a powerful, fault-tolerant Grid
infrastructure for biomedical science
Enabling biomedical researchers to use
distributed resources such as high-performance
computers, databases and visualisation tools to
develop coupled multi-scale models of how
these killer diseases develop.
BBiomedicaliomedical RResearchesearch IInformaticsnformatics DDelivered byelivered by GGridrid
EEnablednabled SServiceservices
Glasgow
Edinburgh
Leicester
Oxford
London
Netherlands
Publically Curated Data
Private
data
Private
data
Private
data
Private
data
Private
data
Private
data
CFG Virtual
Organisation
Ensembl
MGI
HUGO
OMIM
SWISS-PROT
…
DATA
HUB
RGD
Synteny
Grid
Service
blast
+
Portal
http://www.brc.dcs.gla.ac.uk/projects/bridges/
eDiaMoND: Screening for Breast CancereDiaMoND: Screening for Breast Cancer
1 Trust  Many Trusts
Collaborative Working
Audit capability
Epidemiology
Other Modalities
-MRI
-PET
-Ultrasound
Better access to
Case information
And digital tools
Supplement Mentoring
With access to digital
Training cases and sharing
Of information across
clinics
Letters
Radiology reporting
systems
eDiaMoND
Grid
2ndary Capture
Or FFD
Case Information
X-Rays and
Case Information
Digital
Reading
SMF
Case and
Reading Information
CAD Temporal Comparison
Screening
Electronic
Patient Records
Assessment/ Symptomatic
Biopsy
Case and
Reading Information
Symptomatic/Assessment
Information
Training
Manage Training Cases
Perform Training
SMF CAD 3D Images
Patients
Provided by eDiamond project: Prof. Sir Mike Brady et al.
E-Science Data ResourcesE-Science Data Resources
• Curated databases
– Public, institutional, group, personal
• Online journals and preprints
• Text mining and indexing services
• Raw storage (disk & tape)
• Replicated files
• Persistent archives
• Registries
• …
TICER Summer School, August 24th 2006
©
10
EBank
Slide
from
Jeremy
Frey
TICER Summer School, August 24th 2006
©
11
Biomedical data – making
connections
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat
ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag
tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct
cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg
ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt
gcctgttttt ttttaattgg
Slide provided by Carole Goble: University of Manchester
Using Workflows to Link ServicesUsing Workflows to Link Services
• Describe the steps in a Scripting Language
• Steps performed by Workflow Enactment Engine
• Many languages in use
– Trade off: familiarity & availability
– Trade off: detailed control versus abstraction
• Incrementally develop correct process
– Sharable & Editable
– Basis for scientific communication & validation
– Valuable IPR asset
• Repetition is now easy
– Parameterised explicitly & implicitly
Workflow SystemsWorkflow Systems
Language WF Enact. Comments
Shell
scripts
Shell + OS Common but not often thought of as WF. Depend
on context, e.g. NFS across all sites
Perl Perl
runtime
Popular in bioinformatics. Similar context
dependence – distribution has to be coded
Java JVM Popular target because JVM ubiquity – similar
dependence – distribution has to be coded
BPEL BPEL
Enactment
OASIS standard for industry – coordinating use of
multiple Web Services – low level detail - tools
Taverna Scufl EBI, OMII-UK & MyGrid
http://taverna.sourceforge.net/index.php
VDT /
Pegasus
Chimera &
DAGman
High-level abstract formulation of workflows,
automated mapping towards executable forms,
cached result re-use
Kepler Kepler BIRN, GEON & SEEK
http://kepler-project.org/
TICER Summer School, August 24th 2006
©
14
Workflow example
 Taverna in MyGrid http://www.mygrid.org.uk/
 “allows the e-Scientist to describe and enact their
experimental processes in a structured, repeatable
and verifiable way”
 GUI
 Workflow
language
 Enactment
engine
TICER Summer School, August 24th 2006
©
15
Pub/Sub for Laboratory data
using a broker and ultimately
delivered over GPRS
Notification
Comb-e-chem: Jeremy Frey
Relevance to Digital LibrariesRelevance to Digital Libraries
• Similar concerns
– Data curation & management
– Metadata, discovery
– Secure access (AAA +)
– Provenance & data quality
– Local autonomy
– Availability, resilience
• Common technology
– Grid as an implementation technology
TICER Summer School, August 24th 2006 18
What is a Grid?
LicenseLicense
PrinterPrinter
A grid is a system consisting of
− Distributed but connected resources and
− Software and/or hardware that provides and manages logically
seamless access to those resources to meet desired objectives
A grid is a system consisting of
− Distributed but connected resources and
− Software and/or hardware that provides and manages logically
seamless access to those resources to meet desired objectives
R2AD
DatabaseDatabase
Web
server
Web
server
Data CenterCluster
Handheld Supercomputer
Workstation
Server
Source: Hiro Kishimoto GGF17 Keynote May 2006
TICER Summer School, August 24th 2006 19
Virtualizing Resources
Resources
Web
services
AccessAccess
StorageStorage SensorsSensors ApplicationsApplications InformationInformationComputersComputers
Resource-specific InterfacesResource-specific Interfaces
Common Interfaces
Type-specific interfaces
Hiro Kishimoto: Keynote GGF17
Ideas and FormsIdeas and Forms
• Key ideas
– Virtualised resources
– Secure access
– Local autonomy
• Many forms
– Cycle stealing
– Linked supercomputers
– Distributed file systems
– Federated databases
– Commercial data centres
– Utility computing
TICER Summer School, August 24th 2006 21
Grid Middleware
Virtualized
resources
Grid
middleware
services
Brokering
Service
Brokering
Service
Registry
Service
Registry
Service
Data
Service
Data
Service
CPU
Resource
CPU
Resource Printer
Service
Printer
Service
Job-Submit
Service
Job-Submit
Service
Compute
Service
Compute
Service
Notify
Advertise
Application
Service
Application
Service
Hiro Kishimoto: Keynote GGF17
Key Drivers for GridsKey Drivers for Grids
• Collaboration
– Expertise is distributed
– Resources (data, software licences) are location-specific
– Necessary to achieve critical mass of effort
– Necessary to raise sufficient resources
• Computational Power
– Rapid growth in number of processors
– Powered by Moore’s law + device roadmap
– Challenge to transform models to exploit this
• Deluge of Data
– Growth in scale: Number and Size of resources
– Growth in complexity
– Policy drives greater data availability
Minimum Grid FunctionalitiesMinimum Grid Functionalities
• Supports distributed computation
– Data and computation
– Over a variety of
• hardware components (servers, data stores, …)
• Software components (services: resource managers,
computation and data services)
– With regularity that can be exploited
• By applications
• By other middleware & tools
• By providers and operations
– It will normally have security mechanisms
• To develop and sustain trust regimes
TICER Summer School, August 24th 2006 24
Source: Hiro Kishimoto GGF17 Keynote May 2006
Grid & Related Paradigms
Utility Computing
• Computing “services”
• No knowledge of provider
• Enabled by grid technology
Utility Computing
• Computing “services”
• No knowledge of provider
• Enabled by grid technology
Distributed Computing
• Loosely coupled
• Heterogeneous
• Single Administration
Distributed Computing
• Loosely coupled
• Heterogeneous
• Single Administration
Cluster
• Tightly coupled
• Homogeneous
• Cooperative working
Cluster
• Tightly coupled
• Homogeneous
• Cooperative working
Grid Computing
• Large scale
• Cross-organizational
• Geographical distribution
• Distributed Management
Grid Computing
• Large scale
• Cross-organizational
• Geographical distribution
• Distributed Management
Why use / build Grids?Why use / build Grids?
• Research Arguments
– Enables new ways of working
– New distributed & collaborative research
– Unprecedented scale and resources
• Economic Arguments
– Reduced system management costs
– Shared resources ⇒ better utilisation
– Pooled resources ⇒ increased capacity
– Load sharing & utility computing
– Cheaper disaster recovery
Why use / build Grids?Why use / build Grids?
• Operational Arguments
– Enable autonomous organisations to
• Write complementary software components
• Set up run & use complementary services
• Share operational responsibility
• General & consistent environment for
Abstraction, Automation, Optimisation & Tools
• Political & Management Arguments
– Stimulate innovation
– Promote intra-organisation collaboration
– Promote inter-enterprise collaboration
TICER Summer School, August 24th 2006 28
Grids In Use: E-Science Examples
• Data sharing and integration
− Life sciences, sharing standard data-sets,
combining collaborative data-sets
− Medical informatics, integrating hospital information
systems for better care and better science
− Sciences, high-energy physics
• Data sharing and integration
− Life sciences, sharing standard data-sets,
combining collaborative data-sets
− Medical informatics, integrating hospital information
systems for better care and better science
− Sciences, high-energy physics
• Capability computing
− Life sciences, molecular modeling, tomography
− Engineering, materials science
− Sciences, astronomy, physics
• Capability computing
− Life sciences, molecular modeling, tomography
− Engineering, materials science
− Sciences, astronomy, physics
• High-throughput, capacity computing for
− Life sciences: BLAST, CHARMM, drug screening
− Engineering: aircraft design, materials, biomedical
− Sciences: high-energy physics, economic modeling
• High-throughput, capacity computing for
− Life sciences: BLAST, CHARMM, drug screening
− Engineering: aircraft design, materials, biomedical
− Sciences: high-energy physics, economic modeling
• Simulation-based science and engineering
− Earthquake simulation
• Simulation-based science and engineering
− Earthquake simulation
Source: Hiro Kishimoto GGF17 Keynote May 2006
PDB 33,367 Protein structuresEMBL DB 111,416,302,701 nucleotides
Database GrowthDatabase Growth
Slide provided by Richard Baldock: MRC HGU Edinburgh
Requirements: User’s viewpointRequirements: User’s viewpoint
• Find Data
– Registries & Human communication
• Understand data
– Metadata description, Standard / familiar formats &
representations, Standard value systems & ontologies
• Data Access
– Find how to interact with data resource
– Obtain permission (authority)
– Make connection
– Make selection
• Move Data
– In bulk or streamed (in increments)
Requirements: User’s viewpoint 2Requirements: User’s viewpoint 2
• Transform Data
– To format, organisation & representation
required for computation or integration
• Combine data
– Standard database operations + operations relevant to
the application model
• Present results
– To humans: data movement + transform for viewing
– To application code: data movement + transform to the
required format
– To standard analysis tools, e.g. R
– To standard visualisation tools, e.g. Spitfire
Requirements: Owner’s viewpointRequirements: Owner’s viewpoint
• Create Data
– Automated generation, Accession Policies, Metadata
generation
– Storage Resources
• Preserve Data
– Archiving
– Replication
– Metadata
– Protection
• Provide Services with available resources
– Definition & implementation: costs & stability
– Resources: storage, compute & bandwidth
Requirements: Owner’s viewpoint 2Requirements: Owner’s viewpoint 2
• Protect Services
– Authentication, Authorisation, Accounting, Audit
– Reputation
• Protect data
– Comply with owner requirements – encryption for privacy,
…
• Monitor and Control use
– Detect and handle failures, attacks, misbehaving users
– Plan for future loads and services
• Establish case for Continuation
– Usage statistics
– Discoveries enabled
Large Hadron ColliderLarge Hadron Collider
• The most powerful
instrument ever built to
investigate elementary
particle physics
• Data Challenge:
– 10 Petabytes/year of data
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires
computing power equivalent
to ~100,000 of today's fastest
PC processors
Composing Observations in AstronomyComposing Observations in Astronomy
Data and images courtesy Alex Szalay, John Hopkins
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
GODIVA Data Portal
• Grid for Ocean Diagnostics, Interactive
Visualisation and Analysis
• Daily Met Office Marine Forecasts and
gridded research datasets
• National Centre for Ocean Forecasting
• ~3Tb climate model datastore via Web
Services
• Interactive Visualisations inc. Movies
• ~ 30 accesses a day worldwide
• Other GODIVA software produces
3D/4D Visualisations reading data
remotely via Web Services
Online Movies
www.nerc-essc.ac.uk/godiva
GODIVA Visualisations
• Unstructured Meshes
• Grid Rotation/Interpolation
• GeoSpatial Databases v. Files
(Postgres, IBM, Oracle)
• Perspective 3D Visualisation
• Google maps viewer
NERC Data Grid
• The DataGrid focuses on federation of
NERC Data Centres
• Grid for data discovery, delivery and use
across sites
• Data can be stored in many different
ways (flat files, databases…)
• Strong focus on Metadata and
Ontologies
• Clear separation between discovery and
use of data.
• Prototype focussing on Atmospheric and
Oceanographic data
www.ndg.nerc.ac.uk
Global In-flight Engine DiagnosticsGlobal In-flight Engine Diagnostics
in-flight data
airline
maintenance centre
ground
station
global network
eg SITA
internet, e-mail, pager
DS&S Engine Health Center
data centre
Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin
100,000 aircraft
0.5 GB/flight
4 flights/day
200 TB/day
Now BROADEN
Significant in
getting Boeing
787 engine
contract
Storage Resource Manager (SRM)Storage Resource Manager (SRM)
• http://sdm.lbl.gov/srm-wg/
• de facto & written standard in physics, …
• Collaborative effort
– CERN, FNAL, JLAB, LBNL and RAL
• Essential bulk file storage
– (pre) allocation of storage
• abstraction over storage systems
– File delivery / registration / access
– Data movement interfaces
• E.g. gridFTP
• Rich function set
– Space management, permissions, directory, data transfer
& discovery
Storage Resource Broker (SRB)Storage Resource Broker (SRB)
• http://www.sdsc.edu/srb/index.php/Main_Page
• SDSC developed
• Widely used
– Archival document storage
– Scientific data: bio-sciences, medicine, geo-sciences, …
• Manages
– Storage resource allocation
• abstraction over storage systems
– File storage
– Collections of files
– Metadata describing files, collections, etc.
– Data transfer services
Condor Data ManagementCondor Data Management
• Stork
– Manages File Transfers
– May manage reservations
• Nest
– Manages Data Storage
– C.f. GridFTP with reservations
• Over multiple protocols
TICER Summer School, August 24th 2006 46
Globus Tools and Services
for Data Management
GridFTP
A secure, robust, efficient data transfer protocol
The Reliable File Transfer Service (RFT)
Web services-based, stores state about transfers
The Data Access and Integration Service (OGSA-DAI)
Service to access to data resources, particularly relational and
XML databases
The Replica Location Service (RLS)
Distributed registry that records locations of data copies
The Data Replication Service
Web services-based, combines data replication and
registration functionality
Slides from Ann Chervenak
TICER Summer School, August 24th 2006 47
RLS in Production Use: LIGO
Laser Interferometer Gravitational Wave Observatory
Currently use RLS servers at 10 sites
Contain mappings from 6 million logical files to over 40
million physical replicas
Used in customized data management system: the
LIGO Lightweight Data Replicator System (LDR)
Includes RLS, GridFTP, custom metadata catalog, tools for
storage management and data validation
Slides from Ann Chervenak
TICER Summer School, August 24th 2006 48
RLS in Production Use: ESG
Earth System Grid: Climate
modeling data (CCSM, PCM,
IPCC)
RLS at 4 sites
Data management
coordinated by ESG portal
Datasets stored at NCAR
64.41 TB in 397253 total files
1230 portal users
IPCC Data at LLNL
26.50 TB in 59,300 files
400 registered users
Data downloaded: 56.80 TB
in 263,800 files
Avg. 300GB downloaded/day
200+ research papers being
written
Slides from Ann Chervenak
TICER Summer School, August 24th 20062nd
EGEE Review, CERN - 49
Enabling Grids for E-sciencE
INFSO-RI-508833
gLite Data Management
• FTS
– File Transfer Service
• LFC
– Logical file catalogue
• Replication Service
– Accessed through LFC
• AMGA
– Metadata services
TICER Summer School, August 24th 20062nd
EGEE Review, CERN - 50
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management Services
• FiReMan catalog
– Resolves logical filenames (LFN) to physical location of files and storage elements
– Oracle and MySQL versions available
– Secure services
– Attribute support
– Symbolic link support
– Deployed on the Pre-Production Service and DILIGENT testbed
• gLite I/O
– Posix-like access to Grid files
– Castor, dCache and DPM support
– Has been used for the BioMedical Demo
– Deployed on the Pre-Production Service and the DILIGENT testbed
• AMGA MetaData Catalog
– Used by the LHCb experiment
– Has been used for the BioMedical Demo
Medical Data Management 3
Enabling Grids for E-sciencE
ClientClient
Medical Data Management
Application
MDM Client LibraryMDM Client Library
Grid CatalogsGrid Catalogs
MetadataMetadata
Catalog (AMGA)Catalog (AMGA)
Medical
Imager
EncryptionEncryption
KeystoreKeystore (Hydra)(Hydra)
File CatalogFile Catalog
(Fireman)(Fireman)
SRM DICOMSRM DICOM
MDM TriggerMDM Trigger
GridFTPGridFTP
gLitegLite I/OI/O
Trigger:
• Retrieve DICOM files
from imager.
• Register file in
Fireman
• gLite EDS client:
Generate encryption
keys and store them
in Hydra
• Register Metadata in
AMGA
Client Library:
• Lookup file through Metadata
(AMGA)
• Use gLite EDS client:
• Retrieve file through gLite I/O
• Retrieve encryption Key from
Hydra
• Decrypt data
• Serve it up to the application
TICER Summer School, August 24th 20062nd
EGEE Review, CERN - 51
Enabling Grids for E-sciencE
INFSO-RI-508833
File Transfer Service
• Reliable file transfer
• Full scalable implementation
– Java Web Service front-end, C++ Agents, Oracle or MySQL database support
– Support for Channel, Site and VO management
– Interfaces for management and statistics monitoring
• Gsiftp, SRM and SRM-copy support
• Support for MySQL and Oracle
• Multi-VO support
• GridFTP and SRM copy support
Commercial SolutionsCommercial Solutions
• Vendors include:
– Avaki
– Data Synapse
• Benefits & costs
– Well packaged and documented
– Support
– Can be expensive
• But look for academic rates
Data Integration StrategiesData Integration Strategies
• Use a Service provided by a Data Owner
• Use a scripted workflow
• Use data virtualisation services
– Arrange that multiple data services have common
properties
– Arrange federations of these
– Arrange access presenting the common
properties
– Expose the important differences
– Support integration accommodating those
differences
Data Virtualisation ServicesData Virtualisation Services
• Form a federation
– Set of data resources – incremental addition
– Registration & description of collected resources
– Warehouse data or access dynamically to obtain updated data
– Virtual data warehouses – automating division between collection and
dynamic access
• Describe relevant relationships between data sources
– Incremental description + refinement / correction
• Run jobs, queries & workflows against combined set of data
resources
– Automated distribution & transformation
• Example systems
– IBM’s Information Integrator
– GEON, BIRN & SEEK
– OGSA-DAI is an extensible framework for building such systems
Virtualisation variationsVirtualisation variations
• Extent to which homogeneity obtained
– Regular representation choices – e.g. units
– Consistent ontologies
– Consistent data model
– Consistent schema – integrated super-schema
– DB operations supported across federation
– Ease of adding federation elements
– Ease of accommodating change as federation
members change their schema and policies
– Drill through to primary forms supported
OGSA-DAIOGSA-DAI
• http://www.ogsadai.org.uk
• A framework for data virtualisation
• Wide use in e-Science
– BRIDGES, GEON, CaBiG, GeneGrid, MyGrid,
BioSimGrid, e-Diamond, IU RGRBench, …
• Collaborative effort
– NeSC, EPCC, IBM, Oracle, Manchester, Newcastle
• Querying of data resources
– Relational databases
– XML databases
– Structured flat files
• Extensible activity documents
– Customisation for particular applications
TICER Summer School, August 24th 2006 59
The Open Grid Services Architecture
• An open, service-oriented architecture (SOA)
− Resources as first-class entities
− Dynamic service/resource creation and destruction
• Built on a Web services infrastructure
• Resource virtualization at the core
• Build grids from small number of standards-based
components
− Replaceable, coarse-grained
− e.g. brokers
• Customizable
− Support for dynamic, domain-specific content…
− …within the same standardized framework
Hiro Kishimoto: Keynote GGF17
TICER Summer School, August 24th 2006 60
OGSA Capabilities
Security
• Cross-organizational users
• Trust nobody
• Authorized access only
Security
• Cross-organizational users
• Trust nobody
• Authorized access only
Information Services
• Registry
• Notification
• Logging/auditing
Information Services
• Registry
• Notification
• Logging/auditing
Execution Management
• Job description & submission
• Scheduling
• Resource provisioning
Execution Management
• Job description & submission
• Scheduling
• Resource provisioning
Data Services
• Common access facilities
• Efficient & reliable transport
• Replication services
Data Services
• Common access facilities
• Efficient & reliable transport
• Replication services
Self-Management
• Self-configuration
• Self-optimization
• Self-healing
Self-Management
• Self-configuration
• Self-optimization
• Self-healing
Resource Management
• Discovery
• Monitoring
• Control
Resource Management
• Discovery
• Monitoring
• Control
OGSAOGSA
OGSA “profiles”OGSA “profiles”
Web services foundationWeb services foundation
Hiro Kishimoto: Keynote GGF17
TICER Summer School, August 24th 2006 61
Basic Data Interfaces
• Storage Management
− e.g. Storage Resource
Management (SRM)
• Storage Management
− e.g. Storage Resource
Management (SRM)
• Data Access
− ByteIO
− Data Access & Integration
(DAI)
• Data Access
− ByteIO
− Data Access & Integration
(DAI)
• Data Transfer
− Data Movement Interface
Specification (DMIS)
− Protocols (e.g. GridFTP)
• Data Transfer
− Data Movement Interface
Specification (DMIS)
− Protocols (e.g. GridFTP)
• Replica management
• Metadata catalog
• Cache management
• Replica management
• Metadata catalog
• Cache management
Hiro Kishimoto: Keynote GGF17
The State of the ArtThe State of the Art
• Many successful Grid & E-Science projects
– A few examples shown in this talk
• Many Grid systems
– All largely incompatible
– Interoperation talks under way
• Standardisation efforts
– Mainly via the Open Grid Forum
– A merger of the GGF & EGA
• Significant user investment required
– Few “out of the box” solutions
Technical ChallengesTechnical Challenges
• Issues you can’t avoid
– Lack of Complete Knowledge (LOCK)
– Latency
– Heterogeneity
– Autonomy
– Unreliability
– Scalability
– Change
• A Challenging goal
– balance technical feasibility
– against virtual homogeneity, stability and reliability
– while remaining affordable, manageable and maintainable
Areas “In Development”Areas “In Development”
• Data provenance
• Quality of Service
– Service Level Agreements
• Resource brokering
– Across all resources
• Workflow scheduling
– Co-sheduling
• Licence management
• Software provisioning
– Deployment and update
• Other areas too!
Operational ChallengesOperational Challenges
• Management of distributed systems
– With local autonomy
• Deployment, testing & monitoring
• User training
• User support
• Rollout of upgrades
• Security
– Distributed identity management
– Authorisation
– Revocation
– Incident response
Grids as a Foundation for SolutionsGrids as a Foundation for Solutions
• The grid per se doesn’t provide
– Supported e-Science methods
– Supported data & information resources
– Computations
– Convenient access
• Grids help providers of these, via
– International & national secure e-Infrastructure
– Standards for interoperation
– Standard APIs to promote re-use
• But Research Support must be built
– Application developers
– Resource providers
Collaboration ChallengesCollaboration Challenges
• Defining common goals
• Defining common formats
– E.g. schemas for data and metadata
• Defining a common vocabulary
– E.g. for metadata
• Finding common technology
– Standards should help, eventually
• Collecting metadata
– Automate where possible
Social ChallengesSocial Challenges
• Changing cultures
– Rewarding data & resource sharing
– Require publication of data
• Taking the first steps
– If everyone shares, everyone wins
– The first people to share must not lose out
• Sustainable funding
– Technology must persist
– Data must persist
SummarySummary
• E-Science exploits distributed computing
resource to enable new discoveries, new
collaborations and new ways of working
• Grid is an enabling technology for e-science.
• Many successful projects exist
• Many challenges remain
Globus Alliance
CeSC (Cambridge)
Digital
Curation
Centre
e-Science
Institute
UK e-ScienceUK e-Science
EGEE,
ChinaGri
Grid
Operations
Support
Centre
National
Centre for
e-Social
Science
National
Institute
for
Environmental
e-Science
Open
Middleware
Infrastructure
Institute
Ticer summer school_24_aug06

More Related Content

What's hot

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficePhilip Bourne
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
EGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureEGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureKerstin Lehnert
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meetingNiklas Blomberg
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationHistoric Environment Scotland
 
Enabling Scalable Integration of Diverse Data
Enabling Scalable Integration of Diverse DataEnabling Scalable Integration of Diverse Data
Enabling Scalable Integration of Diverse DataGlobus
 
Big Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben EvansBig Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben EvansARDC
 
091020 E Research Otago
091020 E Research Otago091020 E Research Otago
091020 E Research OtagoNick Jones
 
151111 tryggve-nordic biobank
151111 tryggve-nordic biobank151111 tryggve-nordic biobank
151111 tryggve-nordic biobankanttipursula
 
SGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceSGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceGlobus
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementBlue BRIDGE
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudOpenStack
 
Tryggve support for-research
Tryggve support for-researchTryggve support for-research
Tryggve support for-researchanttipursula
 
Supporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystemSupporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystemJisc
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
 
The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...Martin Kaltenböck
 

What's hot (20)

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
EGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureEGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg Lecture
 
CODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon HodsonCODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon Hodson
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant Application
 
Enabling Scalable Integration of Diverse Data
Enabling Scalable Integration of Diverse DataEnabling Scalable Integration of Diverse Data
Enabling Scalable Integration of Diverse Data
 
Big Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben EvansBig Data is today: key issues for big data - Dr Ben Evans
Big Data is today: key issues for big data - Dr Ben Evans
 
091020 E Research Otago
091020 E Research Otago091020 E Research Otago
091020 E Research Otago
 
151111 tryggve-nordic biobank
151111 tryggve-nordic biobank151111 tryggve-nordic biobank
151111 tryggve-nordic biobank
 
SGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of ScienceSGCI and Globus: Partners for Acceleration of Science
SGCI and Globus: Partners for Acceleration of Science
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research Cloud
 
Tryggve support for-research
Tryggve support for-researchTryggve support for-research
Tryggve support for-research
 
Supporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystemSupporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystem
 
DIRISA for Open Data and Open Science/Anwar Vahed
DIRISA for Open Data and Open Science/Anwar VahedDIRISA for Open Data and Open Science/Anwar Vahed
DIRISA for Open Data and Open Science/Anwar Vahed
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
COBWEB, AIP-6, and Access Management Federations
COBWEB, AIP-6, and Access Management FederationsCOBWEB, AIP-6, and Access Management Federations
COBWEB, AIP-6, and Access Management Federations
 
The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...
 

Similar to Ticer summer school_24_aug06

Grid computing by vaishali sahare [katkar]
Grid computing by vaishali sahare [katkar]Grid computing by vaishali sahare [katkar]
Grid computing by vaishali sahare [katkar]vaishalisahare123
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchMartin Hamilton
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPCJisc RDM
 
AusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesAusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesTERN Australia
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionNick Jones
 
Science DMZ as a Service: Creating Science Super- Facilities with GENI
Science DMZ as a Service: Creating Science Super- Facilities with GENIScience DMZ as a Service: Creating Science Super- Facilities with GENI
Science DMZ as a Service: Creating Science Super- Facilities with GENIUS-Ignite
 
Grid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsGrid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsDibyadip Das
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster LEARN Project
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12ASIS&T
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.pptNileshkuGiri
 
Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...Blue BRIDGE
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 

Similar to Ticer summer school_24_aug06 (20)

Grid computing by vaishali sahare [katkar]
Grid computing by vaishali sahare [katkar]Grid computing by vaishali sahare [katkar]
Grid computing by vaishali sahare [katkar]
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK research
 
Information Systems
Information SystemsInformation Systems
Information Systems
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPC
 
AusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesAusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data Cubes
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN session
 
Grid computing
Grid computingGrid computing
Grid computing
 
Science DMZ as a Service: Creating Science Super- Facilities with GENI
Science DMZ as a Service: Creating Science Super- Facilities with GENIScience DMZ as a Service: Creating Science Super- Facilities with GENI
Science DMZ as a Service: Creating Science Super- Facilities with GENI
 
Grid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsGrid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locations
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...Virtual Research Environments supporting tailor-made data management service...
Virtual Research Environments supporting tailor-made data management service...
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 

Recently uploaded

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Ticer summer school_24_aug06

  • 1. Ticer Summer School Thursday 24th August 2006 Dave Berry & Malcolm Atkinson National e-Science Centre, Edinburgh www.nesc.ac.uk
  • 2. Digital Libraries, Grids & E-ScienceDigital Libraries, Grids & E-Science What is E-Science? What is Grid Computing? Data Grids Requirements Examples Technologies Data Virtualisation The Open Grid Services Architecture Challenges
  • 3.
  • 4. What is e-Science?What is e-Science? • Goal: to enable better research in all disciplines • Method: Develop collaboration supported by advanced distributed computation – to generate, curate and analyse rich data resources • From experiments, observations, simulations & publications • Quality management, preservation and reliable evidence – to develop and explore models and simulations • Computation and data at all scales • Trustworthy, economic, timely and relevant results – to enable dynamic distributed collaboration • Facilitating collaboration with information and resource sharing • Security, trust, reliability, accountability, manageability and agility
  • 5. climateprediction.net and GENIE • Largest climate model ensemble • >45,000 users, >1,000,000 model years 10K2K Response of Atlantic circulation to freshwater forcing
  • 6. 6 Courtesy of David Gavaghan & IB Team Integrative Biology Tackling two Grand Challenge research questions: • What causes heart disease? • How does a cancer form and grow? Together these diseases cause 61% of all UK deaths Building a powerful, fault-tolerant Grid infrastructure for biomedical science Enabling biomedical researchers to use distributed resources such as high-performance computers, databases and visualisation tools to develop coupled multi-scale models of how these killer diseases develop.
  • 7. BBiomedicaliomedical RResearchesearch IInformaticsnformatics DDelivered byelivered by GGridrid EEnablednabled SServiceservices Glasgow Edinburgh Leicester Oxford London Netherlands Publically Curated Data Private data Private data Private data Private data Private data Private data CFG Virtual Organisation Ensembl MGI HUGO OMIM SWISS-PROT … DATA HUB RGD Synteny Grid Service blast + Portal http://www.brc.dcs.gla.ac.uk/projects/bridges/
  • 8. eDiaMoND: Screening for Breast CancereDiaMoND: Screening for Breast Cancer 1 Trust  Many Trusts Collaborative Working Audit capability Epidemiology Other Modalities -MRI -PET -Ultrasound Better access to Case information And digital tools Supplement Mentoring With access to digital Training cases and sharing Of information across clinics Letters Radiology reporting systems eDiaMoND Grid 2ndary Capture Or FFD Case Information X-Rays and Case Information Digital Reading SMF Case and Reading Information CAD Temporal Comparison Screening Electronic Patient Records Assessment/ Symptomatic Biopsy Case and Reading Information Symptomatic/Assessment Information Training Manage Training Cases Perform Training SMF CAD 3D Images Patients Provided by eDiamond project: Prof. Sir Mike Brady et al.
  • 9. E-Science Data ResourcesE-Science Data Resources • Curated databases – Public, institutional, group, personal • Online journals and preprints • Text mining and indexing services • Raw storage (disk & tape) • Replicated files • Persistent archives • Registries • …
  • 10. TICER Summer School, August 24th 2006 © 10 EBank Slide from Jeremy Frey
  • 11. TICER Summer School, August 24th 2006 © 11 Biomedical data – making connections 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg Slide provided by Carole Goble: University of Manchester
  • 12. Using Workflows to Link ServicesUsing Workflows to Link Services • Describe the steps in a Scripting Language • Steps performed by Workflow Enactment Engine • Many languages in use – Trade off: familiarity & availability – Trade off: detailed control versus abstraction • Incrementally develop correct process – Sharable & Editable – Basis for scientific communication & validation – Valuable IPR asset • Repetition is now easy – Parameterised explicitly & implicitly
  • 13. Workflow SystemsWorkflow Systems Language WF Enact. Comments Shell scripts Shell + OS Common but not often thought of as WF. Depend on context, e.g. NFS across all sites Perl Perl runtime Popular in bioinformatics. Similar context dependence – distribution has to be coded Java JVM Popular target because JVM ubiquity – similar dependence – distribution has to be coded BPEL BPEL Enactment OASIS standard for industry – coordinating use of multiple Web Services – low level detail - tools Taverna Scufl EBI, OMII-UK & MyGrid http://taverna.sourceforge.net/index.php VDT / Pegasus Chimera & DAGman High-level abstract formulation of workflows, automated mapping towards executable forms, cached result re-use Kepler Kepler BIRN, GEON & SEEK http://kepler-project.org/
  • 14. TICER Summer School, August 24th 2006 © 14 Workflow example  Taverna in MyGrid http://www.mygrid.org.uk/  “allows the e-Scientist to describe and enact their experimental processes in a structured, repeatable and verifiable way”  GUI  Workflow language  Enactment engine
  • 15. TICER Summer School, August 24th 2006 © 15 Pub/Sub for Laboratory data using a broker and ultimately delivered over GPRS Notification Comb-e-chem: Jeremy Frey
  • 16. Relevance to Digital LibrariesRelevance to Digital Libraries • Similar concerns – Data curation & management – Metadata, discovery – Secure access (AAA +) – Provenance & data quality – Local autonomy – Availability, resilience • Common technology – Grid as an implementation technology
  • 17.
  • 18. TICER Summer School, August 24th 2006 18 What is a Grid? LicenseLicense PrinterPrinter A grid is a system consisting of − Distributed but connected resources and − Software and/or hardware that provides and manages logically seamless access to those resources to meet desired objectives A grid is a system consisting of − Distributed but connected resources and − Software and/or hardware that provides and manages logically seamless access to those resources to meet desired objectives R2AD DatabaseDatabase Web server Web server Data CenterCluster Handheld Supercomputer Workstation Server Source: Hiro Kishimoto GGF17 Keynote May 2006
  • 19. TICER Summer School, August 24th 2006 19 Virtualizing Resources Resources Web services AccessAccess StorageStorage SensorsSensors ApplicationsApplications InformationInformationComputersComputers Resource-specific InterfacesResource-specific Interfaces Common Interfaces Type-specific interfaces Hiro Kishimoto: Keynote GGF17
  • 20. Ideas and FormsIdeas and Forms • Key ideas – Virtualised resources – Secure access – Local autonomy • Many forms – Cycle stealing – Linked supercomputers – Distributed file systems – Federated databases – Commercial data centres – Utility computing
  • 21. TICER Summer School, August 24th 2006 21 Grid Middleware Virtualized resources Grid middleware services Brokering Service Brokering Service Registry Service Registry Service Data Service Data Service CPU Resource CPU Resource Printer Service Printer Service Job-Submit Service Job-Submit Service Compute Service Compute Service Notify Advertise Application Service Application Service Hiro Kishimoto: Keynote GGF17
  • 22. Key Drivers for GridsKey Drivers for Grids • Collaboration – Expertise is distributed – Resources (data, software licences) are location-specific – Necessary to achieve critical mass of effort – Necessary to raise sufficient resources • Computational Power – Rapid growth in number of processors – Powered by Moore’s law + device roadmap – Challenge to transform models to exploit this • Deluge of Data – Growth in scale: Number and Size of resources – Growth in complexity – Policy drives greater data availability
  • 23. Minimum Grid FunctionalitiesMinimum Grid Functionalities • Supports distributed computation – Data and computation – Over a variety of • hardware components (servers, data stores, …) • Software components (services: resource managers, computation and data services) – With regularity that can be exploited • By applications • By other middleware & tools • By providers and operations – It will normally have security mechanisms • To develop and sustain trust regimes
  • 24. TICER Summer School, August 24th 2006 24 Source: Hiro Kishimoto GGF17 Keynote May 2006 Grid & Related Paradigms Utility Computing • Computing “services” • No knowledge of provider • Enabled by grid technology Utility Computing • Computing “services” • No knowledge of provider • Enabled by grid technology Distributed Computing • Loosely coupled • Heterogeneous • Single Administration Distributed Computing • Loosely coupled • Heterogeneous • Single Administration Cluster • Tightly coupled • Homogeneous • Cooperative working Cluster • Tightly coupled • Homogeneous • Cooperative working Grid Computing • Large scale • Cross-organizational • Geographical distribution • Distributed Management Grid Computing • Large scale • Cross-organizational • Geographical distribution • Distributed Management
  • 25.
  • 26. Why use / build Grids?Why use / build Grids? • Research Arguments – Enables new ways of working – New distributed & collaborative research – Unprecedented scale and resources • Economic Arguments – Reduced system management costs – Shared resources ⇒ better utilisation – Pooled resources ⇒ increased capacity – Load sharing & utility computing – Cheaper disaster recovery
  • 27. Why use / build Grids?Why use / build Grids? • Operational Arguments – Enable autonomous organisations to • Write complementary software components • Set up run & use complementary services • Share operational responsibility • General & consistent environment for Abstraction, Automation, Optimisation & Tools • Political & Management Arguments – Stimulate innovation – Promote intra-organisation collaboration – Promote inter-enterprise collaboration
  • 28. TICER Summer School, August 24th 2006 28 Grids In Use: E-Science Examples • Data sharing and integration − Life sciences, sharing standard data-sets, combining collaborative data-sets − Medical informatics, integrating hospital information systems for better care and better science − Sciences, high-energy physics • Data sharing and integration − Life sciences, sharing standard data-sets, combining collaborative data-sets − Medical informatics, integrating hospital information systems for better care and better science − Sciences, high-energy physics • Capability computing − Life sciences, molecular modeling, tomography − Engineering, materials science − Sciences, astronomy, physics • Capability computing − Life sciences, molecular modeling, tomography − Engineering, materials science − Sciences, astronomy, physics • High-throughput, capacity computing for − Life sciences: BLAST, CHARMM, drug screening − Engineering: aircraft design, materials, biomedical − Sciences: high-energy physics, economic modeling • High-throughput, capacity computing for − Life sciences: BLAST, CHARMM, drug screening − Engineering: aircraft design, materials, biomedical − Sciences: high-energy physics, economic modeling • Simulation-based science and engineering − Earthquake simulation • Simulation-based science and engineering − Earthquake simulation Source: Hiro Kishimoto GGF17 Keynote May 2006
  • 29.
  • 30. PDB 33,367 Protein structuresEMBL DB 111,416,302,701 nucleotides Database GrowthDatabase Growth Slide provided by Richard Baldock: MRC HGU Edinburgh
  • 31. Requirements: User’s viewpointRequirements: User’s viewpoint • Find Data – Registries & Human communication • Understand data – Metadata description, Standard / familiar formats & representations, Standard value systems & ontologies • Data Access – Find how to interact with data resource – Obtain permission (authority) – Make connection – Make selection • Move Data – In bulk or streamed (in increments)
  • 32. Requirements: User’s viewpoint 2Requirements: User’s viewpoint 2 • Transform Data – To format, organisation & representation required for computation or integration • Combine data – Standard database operations + operations relevant to the application model • Present results – To humans: data movement + transform for viewing – To application code: data movement + transform to the required format – To standard analysis tools, e.g. R – To standard visualisation tools, e.g. Spitfire
  • 33. Requirements: Owner’s viewpointRequirements: Owner’s viewpoint • Create Data – Automated generation, Accession Policies, Metadata generation – Storage Resources • Preserve Data – Archiving – Replication – Metadata – Protection • Provide Services with available resources – Definition & implementation: costs & stability – Resources: storage, compute & bandwidth
  • 34. Requirements: Owner’s viewpoint 2Requirements: Owner’s viewpoint 2 • Protect Services – Authentication, Authorisation, Accounting, Audit – Reputation • Protect data – Comply with owner requirements – encryption for privacy, … • Monitor and Control use – Detect and handle failures, attacks, misbehaving users – Plan for future loads and services • Establish case for Continuation – Usage statistics – Discoveries enabled
  • 35.
  • 36. Large Hadron ColliderLarge Hadron Collider • The most powerful instrument ever built to investigate elementary particle physics • Data Challenge: – 10 Petabytes/year of data – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors
  • 37. Composing Observations in AstronomyComposing Observations in Astronomy Data and images courtesy Alex Szalay, John Hopkins No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1B objects
  • 38. GODIVA Data Portal • Grid for Ocean Diagnostics, Interactive Visualisation and Analysis • Daily Met Office Marine Forecasts and gridded research datasets • National Centre for Ocean Forecasting • ~3Tb climate model datastore via Web Services • Interactive Visualisations inc. Movies • ~ 30 accesses a day worldwide • Other GODIVA software produces 3D/4D Visualisations reading data remotely via Web Services Online Movies www.nerc-essc.ac.uk/godiva
  • 39. GODIVA Visualisations • Unstructured Meshes • Grid Rotation/Interpolation • GeoSpatial Databases v. Files (Postgres, IBM, Oracle) • Perspective 3D Visualisation • Google maps viewer
  • 40. NERC Data Grid • The DataGrid focuses on federation of NERC Data Centres • Grid for data discovery, delivery and use across sites • Data can be stored in many different ways (flat files, databases…) • Strong focus on Metadata and Ontologies • Clear separation between discovery and use of data. • Prototype focussing on Atmospheric and Oceanographic data www.ndg.nerc.ac.uk
  • 41. Global In-flight Engine DiagnosticsGlobal In-flight Engine Diagnostics in-flight data airline maintenance centre ground station global network eg SITA internet, e-mail, pager DS&S Engine Health Center data centre Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin 100,000 aircraft 0.5 GB/flight 4 flights/day 200 TB/day Now BROADEN Significant in getting Boeing 787 engine contract
  • 42.
  • 43. Storage Resource Manager (SRM)Storage Resource Manager (SRM) • http://sdm.lbl.gov/srm-wg/ • de facto & written standard in physics, … • Collaborative effort – CERN, FNAL, JLAB, LBNL and RAL • Essential bulk file storage – (pre) allocation of storage • abstraction over storage systems – File delivery / registration / access – Data movement interfaces • E.g. gridFTP • Rich function set – Space management, permissions, directory, data transfer & discovery
  • 44. Storage Resource Broker (SRB)Storage Resource Broker (SRB) • http://www.sdsc.edu/srb/index.php/Main_Page • SDSC developed • Widely used – Archival document storage – Scientific data: bio-sciences, medicine, geo-sciences, … • Manages – Storage resource allocation • abstraction over storage systems – File storage – Collections of files – Metadata describing files, collections, etc. – Data transfer services
  • 45. Condor Data ManagementCondor Data Management • Stork – Manages File Transfers – May manage reservations • Nest – Manages Data Storage – C.f. GridFTP with reservations • Over multiple protocols
  • 46. TICER Summer School, August 24th 2006 46 Globus Tools and Services for Data Management GridFTP A secure, robust, efficient data transfer protocol The Reliable File Transfer Service (RFT) Web services-based, stores state about transfers The Data Access and Integration Service (OGSA-DAI) Service to access to data resources, particularly relational and XML databases The Replica Location Service (RLS) Distributed registry that records locations of data copies The Data Replication Service Web services-based, combines data replication and registration functionality Slides from Ann Chervenak
  • 47. TICER Summer School, August 24th 2006 47 RLS in Production Use: LIGO Laser Interferometer Gravitational Wave Observatory Currently use RLS servers at 10 sites Contain mappings from 6 million logical files to over 40 million physical replicas Used in customized data management system: the LIGO Lightweight Data Replicator System (LDR) Includes RLS, GridFTP, custom metadata catalog, tools for storage management and data validation Slides from Ann Chervenak
  • 48. TICER Summer School, August 24th 2006 48 RLS in Production Use: ESG Earth System Grid: Climate modeling data (CCSM, PCM, IPCC) RLS at 4 sites Data management coordinated by ESG portal Datasets stored at NCAR 64.41 TB in 397253 total files 1230 portal users IPCC Data at LLNL 26.50 TB in 59,300 files 400 registered users Data downloaded: 56.80 TB in 263,800 files Avg. 300GB downloaded/day 200+ research papers being written Slides from Ann Chervenak
  • 49. TICER Summer School, August 24th 20062nd EGEE Review, CERN - 49 Enabling Grids for E-sciencE INFSO-RI-508833 gLite Data Management • FTS – File Transfer Service • LFC – Logical file catalogue • Replication Service – Accessed through LFC • AMGA – Metadata services
  • 50. TICER Summer School, August 24th 20062nd EGEE Review, CERN - 50 Enabling Grids for E-sciencE INFSO-RI-508833 Data Management Services • FiReMan catalog – Resolves logical filenames (LFN) to physical location of files and storage elements – Oracle and MySQL versions available – Secure services – Attribute support – Symbolic link support – Deployed on the Pre-Production Service and DILIGENT testbed • gLite I/O – Posix-like access to Grid files – Castor, dCache and DPM support – Has been used for the BioMedical Demo – Deployed on the Pre-Production Service and the DILIGENT testbed • AMGA MetaData Catalog – Used by the LHCb experiment – Has been used for the BioMedical Demo Medical Data Management 3 Enabling Grids for E-sciencE ClientClient Medical Data Management Application MDM Client LibraryMDM Client Library Grid CatalogsGrid Catalogs MetadataMetadata Catalog (AMGA)Catalog (AMGA) Medical Imager EncryptionEncryption KeystoreKeystore (Hydra)(Hydra) File CatalogFile Catalog (Fireman)(Fireman) SRM DICOMSRM DICOM MDM TriggerMDM Trigger GridFTPGridFTP gLitegLite I/OI/O Trigger: • Retrieve DICOM files from imager. • Register file in Fireman • gLite EDS client: Generate encryption keys and store them in Hydra • Register Metadata in AMGA Client Library: • Lookup file through Metadata (AMGA) • Use gLite EDS client: • Retrieve file through gLite I/O • Retrieve encryption Key from Hydra • Decrypt data • Serve it up to the application
  • 51. TICER Summer School, August 24th 20062nd EGEE Review, CERN - 51 Enabling Grids for E-sciencE INFSO-RI-508833 File Transfer Service • Reliable file transfer • Full scalable implementation – Java Web Service front-end, C++ Agents, Oracle or MySQL database support – Support for Channel, Site and VO management – Interfaces for management and statistics monitoring • Gsiftp, SRM and SRM-copy support • Support for MySQL and Oracle • Multi-VO support • GridFTP and SRM copy support
  • 52. Commercial SolutionsCommercial Solutions • Vendors include: – Avaki – Data Synapse • Benefits & costs – Well packaged and documented – Support – Can be expensive • But look for academic rates
  • 53.
  • 54. Data Integration StrategiesData Integration Strategies • Use a Service provided by a Data Owner • Use a scripted workflow • Use data virtualisation services – Arrange that multiple data services have common properties – Arrange federations of these – Arrange access presenting the common properties – Expose the important differences – Support integration accommodating those differences
  • 55. Data Virtualisation ServicesData Virtualisation Services • Form a federation – Set of data resources – incremental addition – Registration & description of collected resources – Warehouse data or access dynamically to obtain updated data – Virtual data warehouses – automating division between collection and dynamic access • Describe relevant relationships between data sources – Incremental description + refinement / correction • Run jobs, queries & workflows against combined set of data resources – Automated distribution & transformation • Example systems – IBM’s Information Integrator – GEON, BIRN & SEEK – OGSA-DAI is an extensible framework for building such systems
  • 56. Virtualisation variationsVirtualisation variations • Extent to which homogeneity obtained – Regular representation choices – e.g. units – Consistent ontologies – Consistent data model – Consistent schema – integrated super-schema – DB operations supported across federation – Ease of adding federation elements – Ease of accommodating change as federation members change their schema and policies – Drill through to primary forms supported
  • 57. OGSA-DAIOGSA-DAI • http://www.ogsadai.org.uk • A framework for data virtualisation • Wide use in e-Science – BRIDGES, GEON, CaBiG, GeneGrid, MyGrid, BioSimGrid, e-Diamond, IU RGRBench, … • Collaborative effort – NeSC, EPCC, IBM, Oracle, Manchester, Newcastle • Querying of data resources – Relational databases – XML databases – Structured flat files • Extensible activity documents – Customisation for particular applications
  • 58.
  • 59. TICER Summer School, August 24th 2006 59 The Open Grid Services Architecture • An open, service-oriented architecture (SOA) − Resources as first-class entities − Dynamic service/resource creation and destruction • Built on a Web services infrastructure • Resource virtualization at the core • Build grids from small number of standards-based components − Replaceable, coarse-grained − e.g. brokers • Customizable − Support for dynamic, domain-specific content… − …within the same standardized framework Hiro Kishimoto: Keynote GGF17
  • 60. TICER Summer School, August 24th 2006 60 OGSA Capabilities Security • Cross-organizational users • Trust nobody • Authorized access only Security • Cross-organizational users • Trust nobody • Authorized access only Information Services • Registry • Notification • Logging/auditing Information Services • Registry • Notification • Logging/auditing Execution Management • Job description & submission • Scheduling • Resource provisioning Execution Management • Job description & submission • Scheduling • Resource provisioning Data Services • Common access facilities • Efficient & reliable transport • Replication services Data Services • Common access facilities • Efficient & reliable transport • Replication services Self-Management • Self-configuration • Self-optimization • Self-healing Self-Management • Self-configuration • Self-optimization • Self-healing Resource Management • Discovery • Monitoring • Control Resource Management • Discovery • Monitoring • Control OGSAOGSA OGSA “profiles”OGSA “profiles” Web services foundationWeb services foundation Hiro Kishimoto: Keynote GGF17
  • 61. TICER Summer School, August 24th 2006 61 Basic Data Interfaces • Storage Management − e.g. Storage Resource Management (SRM) • Storage Management − e.g. Storage Resource Management (SRM) • Data Access − ByteIO − Data Access & Integration (DAI) • Data Access − ByteIO − Data Access & Integration (DAI) • Data Transfer − Data Movement Interface Specification (DMIS) − Protocols (e.g. GridFTP) • Data Transfer − Data Movement Interface Specification (DMIS) − Protocols (e.g. GridFTP) • Replica management • Metadata catalog • Cache management • Replica management • Metadata catalog • Cache management Hiro Kishimoto: Keynote GGF17
  • 62.
  • 63. The State of the ArtThe State of the Art • Many successful Grid & E-Science projects – A few examples shown in this talk • Many Grid systems – All largely incompatible – Interoperation talks under way • Standardisation efforts – Mainly via the Open Grid Forum – A merger of the GGF & EGA • Significant user investment required – Few “out of the box” solutions
  • 64. Technical ChallengesTechnical Challenges • Issues you can’t avoid – Lack of Complete Knowledge (LOCK) – Latency – Heterogeneity – Autonomy – Unreliability – Scalability – Change • A Challenging goal – balance technical feasibility – against virtual homogeneity, stability and reliability – while remaining affordable, manageable and maintainable
  • 65. Areas “In Development”Areas “In Development” • Data provenance • Quality of Service – Service Level Agreements • Resource brokering – Across all resources • Workflow scheduling – Co-sheduling • Licence management • Software provisioning – Deployment and update • Other areas too!
  • 66. Operational ChallengesOperational Challenges • Management of distributed systems – With local autonomy • Deployment, testing & monitoring • User training • User support • Rollout of upgrades • Security – Distributed identity management – Authorisation – Revocation – Incident response
  • 67. Grids as a Foundation for SolutionsGrids as a Foundation for Solutions • The grid per se doesn’t provide – Supported e-Science methods – Supported data & information resources – Computations – Convenient access • Grids help providers of these, via – International & national secure e-Infrastructure – Standards for interoperation – Standard APIs to promote re-use • But Research Support must be built – Application developers – Resource providers
  • 68. Collaboration ChallengesCollaboration Challenges • Defining common goals • Defining common formats – E.g. schemas for data and metadata • Defining a common vocabulary – E.g. for metadata • Finding common technology – Standards should help, eventually • Collecting metadata – Automate where possible
  • 69. Social ChallengesSocial Challenges • Changing cultures – Rewarding data & resource sharing – Require publication of data • Taking the first steps – If everyone shares, everyone wins – The first people to share must not lose out • Sustainable funding – Technology must persist – Data must persist
  • 70.
  • 71. SummarySummary • E-Science exploits distributed computing resource to enable new discoveries, new collaborations and new ways of working • Grid is an enabling technology for e-science. • Many successful projects exist • Many challenges remain
  • 72. Globus Alliance CeSC (Cambridge) Digital Curation Centre e-Science Institute UK e-ScienceUK e-Science EGEE, ChinaGri Grid Operations Support Centre National Centre for e-Social Science National Institute for Environmental e-Science Open Middleware Infrastructure Institute

Editor's Notes

  1. Climateprediction: Altruistic computing, outreach to schools, multiple runs of simpler models then statistics of distribution of results illustrating model uncertainty / sensitivity. Genie: again value of making it possible to share data sources, submit jobs easily as ensembles that explore a parameter space contributing to shared data that is then analysed / visualised. Each an example of significant behavioural change propagating among the scientists studying the Natural Environment.
  2. Note different teams model different aspects of the heart. Their geographic distribution shown on the next slide.
  3. These series of examples allow motivation + show the kind of multi-site, multi-team, multi-discipline collaboration. Biomedical Research Informatics Delivered by Grid Enabled Services NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges Supporting project for CFG project Cardiovascular Functional Genomics. T Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, medical records, … Aim is integrated infrastructure supporting Data federation Security
  4. Idea: multiple medical regions’ radiographers collaborate for training, comparator data, and perhaps back up / load sharing. This would also provide an sufficiently large pool of data to enable epidemiology on a sufficient scale to study rarer syndromes and presentations. Difficult to anticipate constraints and impediments from existing working practices, e.g. regional variations in description of lesions, worries about loss of personal contact with patients and process, worries about loss of jobs, …
  5. BLAST: Gene sequencing In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing biological sequences, such as the amino-acid sequences of different proteins or the DNA sequences. CHARMM – molecular dynamics CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a force field for molecular dynamics as well as the name for the molecular dynamics simulation package associated with this force field. http://www.charmm.org/
  6. <number> VLBA Very Large Base Array - linked radio telescopes NRAO - National Radio Astronomy Observatory
  7. This shows the National Centre, 8 regional centres and 2 laboratories in blue That is the original set up at the start of UK e-Science August 2001 NeSC is jointly run by Edinburgh & Glasgow Universities In 2003 several smaller centres were added (vermilion) The e-Science Institute is run by the National e-Science Centre. It runs a programme of events and hosts visiting international researchers. It was established in 2001. The Open Middleware Infrastructure Institute was established in 2004, to provide support and direction for Grid middleware developed in the UK. It is based at the University of Southampton. The Grid Operations Support Centre was established in 2004. The Digital Curation Centre was established in 2004 by the Universities of Edinburgh and Glasgow, the UK Online Library Network at the University of Bath, and the Central Laboratories at Daresbury and Rutherford. It’s job is to provide advice on curating scientific data and on preserving digital media, formats, and access software. Edinburgh is one of the 4 founders of the Globus Alliance (Sept 2003) which will take responsibility for the future of the Globus Toolkit. The other founders are: Chicago University (Argonne National Lab), University of Southern California, Los Angeles (Information Sciences Institute) and the PDC, Stockholm, Sweden The EU EGEE project (Enabling Grids for E-Science in Europe) is establishing a common framework for Grids in Europe. The UK e-Science programme has several connections with EGEE. NeSC leads the training component for the whole of Europe.