Ilya Zaslavsky, David Valentine, Amarnath Gupta, Stephen Richard, Tanu Malik
Presentation given in the afternoon Architecture Forum Session on Day 1, June 24 at the EarthCube All-Hands Meeting
Scanning the Internet for External Cloud Exposures via SSL Certs
AHM 2014: Enterprise Architecture for Transformative Research and Collaboration Across Geoscinces
1. EarthCube Conceptual Design:
Enterprise Architecture for
Transformative Research
and Collaboration
Across the Geosciences
http://workspace.earthcube.org/transformative-research-collaboration
ILYA ZASLAVSKY, DAVID VALENTINE, AMARNATH GUPTA
San Diego Supercomputer Center/UCSD
STEPHEN RICHARD
Arizona Geological Survey
TANU MALIK
University of Chicago
2. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
The Science Enterprise
• Ask questions
• Collect information
• Formulate hypotheses
• Test hypotheses to
determine which (if any)
provide satisfactory answer
• Document, curate, and
disseminate data and
results.
…. AND INCREASINGLY:
• Integrate data, analyses,
models across domains
• Collaborate: leverage pooled expertise and resources
increasing amount of data produced in modern science. LSDMA
bridges the gap between data production and data analysis using
a novel approach by combining specific community support and
generic, cross community development. In the Data Life Cycle
Labs (DLCL) experts from the data domain work closely with
scientific groups of selected research domains in joint R&D
where community-specific data life cycles are iteratively
optimized, data and meta-data formats are defined and
standardized, simple access and use is established as well as data
and scientific insights are preserved in long-term and open
accessible archives.
Keywords: data management, data life cycle, data intensive
computing, data analysis, data exploration, LSDMA, support, data
infrastructure
I. INTRODUCTION
Today data is knowledge – data exploration has become the
4th pillar in modern science besides experiment, theory, and
simulation as postulated by Jim Gray in 2007 [1]. Rapidly
increasing data rates in experiments, measurements and
simulation are limiting the speed of scientific production in
various research communities and the gap between the
generated data and data entering the data life cycle (cf. Fig1) is
widening. By providing high performance data management
components, analysis tools, computing resources, storage and
services it is possible to address this challenge but the
realization of a data intensive infrastructure at institutes and
universities is usually time consuming and always expensive.
The introduced “Large Scale Data Management and Analysis”
(LSDMA) project extends the services for research of the
Helmholtz Association of research centers in Germany with
community specific Data Life Cycle Laboratories (DLCL). The
The LSDMA project initiated at the Karlsruhe Institute of
Technology (KIT), builds on the familiarity with supporting
local scientists at a computer center, the knowledge of running
the Grid Computing Centre Karlsruhe (GridKa) [2] as the
German Tier 1 hub in the World Wide LHC Computing
infrastructure [3], the Large Scale Data Facility (LSDF) [4] and
the experience with the very successful Simulation Labs [5]
that specialize at supporting HPC users.
Figure 1. The scientific data life cycle
3. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Design Framework:
Federation of Systems
Research enterprise includes subsystems at the project, program and
agency level, many of which are independent of NSF
• Requirements are a moving target
• Emergent behavior is to be expected
• Technology is constantly changing
• Community governance within constraints of funding agencies
• Evolutionary process and adaptation:
• Lots of variation; Mechanism to select ‘fittest’; Composability
• Technology must foster delegation of responsibilities and communication:
• Promote self-organization, Cultivate ideas, Maintain feedback between
subsystems
• Reliability: responsiveness, robustness, correctness
• Identity of system is based on shared goals and practices
4. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Communication loops
Bottom-up Studies
Top-down Studies
Cross-Domain Scientists
Trends and
Patterns
Data
interoperability
best practices
Scientific Governance
Success stories
Technical Governance
Data Providers
Feasibility
Priorities
Strategies
Data Products
Options
Costs
Problems
and issues
Related work
Questions and
clarifications
Questions and
clarifications
5. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Communication metrics
7. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Converging on reference
architecture semantics
Analysis of existing building blocks, and their variability
Component
System
Function
Description
Interfaces
Implementation
Steward Organization
Availability
Reference
Developing cross-domain vocabularies, connecting domain models
8. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Requirements Process
Workshop Summaries
Surveys
Architecture Designs
Analyze what worked
Incorporate social
technologies
Inventory CI building blocks
9. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Concerns
Hitting the right level of granularity in the design
Identifying necessary communication channels
Account for all key perspectives
Fixing the scope and technologies
Balancing current and future requirements
Harmonizing technical and social subsystems and managing
interactions between them
Uneven standardization and convergence across domains
and functional components
Constructing a self-organizing plug-and-play system
Inventorying building blocks
10. Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Summary
System is defined by:
Specifications for interfaces and interchange formats (the gateways)
Definition of key functional components at an abstract level
Discovery, Workflow s, Data processing, annotation, documentation
Technology needs to support
Communication between subsystems (people and machines)
Collection of metrics required to assess what is working (selection
of the fittest)
Assembly of components