E-Science: distributed scientific computing in practice.
Hector Quintero Casanova University of EdinburghDistributed Scientific Computing in Practice
Distributed Scientific Computing? Also known as e-Science. According to Dr. John Taylor, 2 dimensions: – Global collaboration effort • Cross-organisational effort demanded. • Technical and formal differences are likely. – Infrastructure that will enable it • Middleware hides differences and complexities • Aims at seamless instant access to resources • Much like a utility. Hence, the grid.
Current state of affairs Shift to data: find hypothesis for a pattern – Cosmology: dark flow in WMAP data. Emphasis depends on area of application: – Astronomy: uniform data access • Data and its correct annotation. E.g: VO – Particle Physics: universal job submission • Processing of jobs. E.g: JDL – Biology: workflow. • Research activity model-based. E.g: Myexperiment
Current state of affairs Differences in emphasis reflect on tools: – Astronomy: analysis of data • Multiple approaches ⇒ extensive user interaction. – Biology: workflow design • Decide order mainly ⇒ some user interaction. – Particle Physics: job submission • Define job and submit ⇒ minimal user interaction Scientific research is also conducted in arts: – E-science also applied to them • E-Dance project: annotation of coreography videos
Challenges: semantics Transition from annotation to semantics: – Biology very advanced. E.g: Gene ontology • Describe experimental models. – In Astronomy not so easy despite rich meta-data • Problems such as description of units. Semantics leading to over-standardisation? – Not yet since scientists still play a big role. – Common model of knowledge could limit creativity. • Thinking processes shaped by common framework. Balance between standardisation & flexibility
Challenges: politics Politics does affect scientific decisions: – Astronomy: TAP protocol • Compromise between US and UK. • Each side implements the options it wants. – In effect 2 flavours of TAP available: • Organisations: which TAP to implement? • Undermines standard access to data. – Similar situation with CORBA ended in failure. Solution: avoid compromises. Hold things up? Balance: standards robustness vs. advancement
Challenges: collaboration Focus still on sharing and not on collaborating: – Astronomy: uniform access to data • Data can be shared. • No platform to exchange views on that data. – Exception: myexperiment. Caters only biologists. Also, targeted collaboration during development. – Developers should actively engage with scientists. – Example: evolution of EDIKT project.
EDIKT First, generic solutions that found applications – Holistic approach to e-science problems. • General solutions: BinX and Eldas. • Specific applications: AstroBinX and BioDAS. Change: active engagement with would-be users. – Regular talks involving developers & researchers – Embedded developer: specific to research activity. Example: ECDF portal. – Draws on experience with RAPID. – Fidelity to scientists reqs: command-line look.
Future E-science just started: multi-disciplinary science – New challenges cover wider areas of knowledge. – Example: effects of climate change in migration • Climate change complex already. • Couple that with sociology and geography. Nice! Will push for more standards and collaboration – Semantics would ease establishment of correlations. – Example: social unrest and increase temperature. E-science begging for funding? Hope not.