Presentation
Upcoming SlideShare
Loading in...5
×
 

Presentation

on

  • 303 views

 

Statistics

Views

Total Views
303
Views on SlideShare
303
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Baseline system (Preliminary Design) The development of CRPN will extend the current grid infrastructure developed and deployed at Mississippi State University [Haupt2002, Haupt2003], which is used to support the NEESport [Haupt2005a], SPURport [Haupt2005b] and DMEFS [Haupt2001] systems, as schematically shown in Figure 3. The most prominent features of the baseline system are high-level grid services, or façades, that capture common patterns of accessing the Grid resources. For example, the Data Services combine metadata, data access replica locator and file transfer services, and provide a simple interface for retrieving files of known signature but unknown location. The user selects a file by searching the metadata repository (a keyword-based query) for the selected file and the Data Services seamlessly determine its location (i.e., URL) from the Replica Locator. Then the URL is passed to the File Transfer Service to start streaming the data to the user. The User Space is another application of the Data Service that allows the user to store information on his or her activities (i.e., provenance), such as job descriptors and values of parameters used for a particular run. Provenance is critical for linking the model parameters with the model output datasets and thus achieving convergence of models and data [ESMF]. Finally, the Job Submission Services (c.f. Fig. 4) invokes several base services to stage the data, retrieves information needed for generation GRAM RSL request, set the environment on the target machine, submits the job and registers itself as a target for GRAM notifications, so that it can forward the status changes to the Job Monitoring Service. The Job Monitoring Service provides access to all jobs submitted through the portal, their status and their descriptor. Through the job descriptor, the user has access to all results generated by the job. The Grid Services are implemented as WSDL/WSRF services, invoked by Service Providers running inside the GridSphere portal server for the Web Browser-based clients or Service Providers added as modules to stand-alone applications such as Abacus and Matlab.
  • System Capabilities and the CRPN Environments (based on NSF LEAD approach) CRPN comprises a “complex array of services, applications, interfaces, and local and remote computing, networking and storage resources – so-called environments – that can be used as stand-alone resources or linked together in workflows” for rapid-protyping applications. This approach to RPC, based on the NSF LEAD, would provide the RPC users an array of capabilities including simplified and scripted access to data, transforming and translating the data into different formats and projections, visualization of the data and results, running the models, and performing complex analysis of the results. The complexity of the proposed RPC architecture necessitates the provision of multiple viewpoints in order to comprehend its functionalities and extensibility to accommodate changes. Hence, a number of diagrams are provided to depict the needs of users, systems engineers and computer scientists, and other stakeholders. At the fundamental level of functionality, CRPN enables users to accomplish the following: • Query for and Acquire a wide variety of information including but not limited to observational data sets (including real time streams) and gridded model output stored on local and remote servers, definitions of and interrelationships among meteorological quantities, the status of an IT resource or workflow. • Simulate and Predict using numerical atmospheric models, particularly the Weather Research and Forecast (WRF) model system now being developed by a number of organizations. The WRF can be run in a variety of modes ranging from basic (e.g., single vertical profiles of temperature, wind and humidity in a horizontally homogeneous domain) to very complex (full physics, terrain, and inhomogeneous initial conditions in single forecast or ensemble mode). • Assimilate data by combining observations, under imposed dynamical constraints, with background information to create a 3D atmospheric gridded analysis. CRPN is envisioned to support the WRF-SI, and possibly the ARPS Data Assimilation System (ADAS) in collaboration with SPoRT and will incorporate the WRF 3DVAR, as necessary. • Analyze and Mine observational data and model output to obtain quantitative information about spatio-temporal relationships among fields, processes, and features. • Visualize observational and remotely sensed data and model output in 1D, 2D and 3D. Authorization Authentication Notification Monitoring Workflow Security ESMF GCMD THREDDS ESML Ontology Query Stream Host environment GPIR Execution description Application description Transcoder

Presentation Presentation Presentation Transcript

  • Rapid Prototyping Capability for Earth-Sun System Sciences Preliminary Design Robert J. Moorhead Mississippi State University
  • Approach
      • Formulate architectures and develop baseline capacities that integrate applied sciences systems tools into configurations to support efficient evaluation of the prospects of integrating research results from NASA’s Earth observation systems (with emphasis on spacecraft instruments on missions recently launched or planned for near-term launch) and associated Earth system models
        • systems engineering tools
        • enterprise architecture tools
        • information visualization and analysis tools
        • uncertainty characterization tools
        • performance assessment tools
      • “ NASA Earth Science and Space Systems benefiting Society: Evolving Systems Engineering Capacity,” presentation by Ron Birk, August 24, 2005, SSC
  • System Scope
    • Reduce the amount of time that has typically been required to consider the utility of new or future data streams on model outcomes.
    • Systematically evaluate research capabilities in a simulated operational environment in order to evaluate components and/or configurations that could be considered for verification, validation, and benchmarking for transition from research to operations and/or into an integrated system solution (ISS).
    • Figure 1 illustrates the interface between the RPC and external systems that include the SN and ISS components of NASA’s Earth Science Application Plan.
  • RPC Interface
  • System Context
    • The RPC will provide the capability to integrate and provide access to the tools needed to evaluate the use of a wide variety of current and future NASA sensors and research results, model outputs, and knowledge, collectively referred to as “resources”.
    • It is assumed that the resources are geographically distributed and thus RPC will provide the support for the location transparency of the resources.
  • RPC node Local and remote computing and storage facilities Remote data providers Model configuration Input data sets configuration Experiment design and execution Analysis System administration and maintenance
  • System modes and states
    • Before an experiment can be performed (a particular model using a particular data source) two conditions must be satisfied.
      • First, the model must be installed at some computing facility assessable to RPC users, and configured to run;
      • Second, the data must be configured so that it can be used by the model. The data configuration may involve developing tools for the data conversions (format translations, subsetting, deriving values of variables not included in the original data products, geo-processing, etc).
    • From the point of view of performing a particular experiment and analysis, the RPC can be in two distinct states:
      • ready for the experiment and analysis by end users
      • requiring action of specialists for installing and configuring the model and its data
    • During its life cycle, new resources and tools will be integrated with the RPC node, increasing the repertoire of experiments and analyses that can be performed.
  • numerical model Model results Model results Model results analysis numerical model 1 Model results Model results analysis numerical model 2 Major Categories of Experiments Different sources Different models
  • Capabilities Required
    • Discovery, semantic understanding, secure access, and transport mechanisms for data products available from known data providers (Science Data Manager)
    • Data assimilation and geo-processing tools for all data transformations needed to match a given data product (or products) to the model input requirements, and support for organizing the data processing into workflows built from reusable and interoperable modules, including both the workflow specification mechanisms and the workflow enacting engine (Interoperable Geo-processing Environment)
  • Capabilities Required (cont.)
    • Model management:
      • Catalog of available models, model metadata catalog (including input and output model requirements), and mechanisms for integrating new models with RPC
      • Mechanisms for creation runtime environments; data staging (in and out); job scheduling, remote execution, and monitoring
      • Mechanisms for storing model outputs together with metadata and provenance information (all information needed to recreate the output data set); the metadata necessary to enable search and discovery of model outputs
    • Tools for model output analysis (including visualizations), tools for quantitative comparing model outputs, and tools for model benchmarking (Performance Metrics Workbench)
  • Major System Constraints
    • Only models and data made available to RPC users and integrated with the RPC node can be used to perform experiments.
    • Installation and/or integration of models, as well as integration and geo-processing of data, needs to be performed by a respective specialist, and the time needed to accomplish that task will depend on the complexity of the particular model and data set(s).
    • Running a model may take a long time, depending on the complexity and configuration of the model. The experiments will not necessarily be performed in real time.
  • User Categories
    • System administrators – responsible for deployment, configuration, and maintenance of the system, and its users (for access control purposes)
    • Application specialists – responsible for installation and configuration of the model on computational systems accessible to the RPC users, and integrating these models with the RPC (which includes definition of the input and output data requirements)
    • Data processing specialists – responsible for the development and the deployment of the tools for data transformations
    • Domain specialists – responsible for defining, configuring (creating workflows for data processing, setting model parameters, etc), and executing experiments
    • Domain specialist performing the data analysis
  • Assumptions and Dependencies
    • The RPC will depend on data and models provided by third parties.
    • Access to remote computational and storage facilities will be controlled according to policies established by the facility owners (stakeholders).
    • It is assumed that these policies will allow RPC users to submit and monitor jobs on these systems which may require penetrating firewalls.
    • It is possible that the access privileges will be different for different users, depending on organizational membership, nationality, or other factors beyond the control of the RPC system developers.
  • Operational Scenario Summary
    • Design of experiment – identification of models and data sets to be used
    • Assessment whether the models and data are currently integrated with the RPC node
    • Filling requests to model and data specialists, as needed; the specialists issue a notification when the models and data are available
    • Configuration of the experiment (setting the model parameters, configuring the data (e.g., ROI, timeframe, etc)
    • Asynchronous run and monitoring of the model
    • Analysis
  • Physical Issues
    • The RPC node will be installed on a dedicated, stand-alone system consisting of standard commercially available computing nodes, data storage, and hosting middleware servers.
    • Core RPC modular capabilities (SDM, IGE, MM, PMW) will be executed on separate computing nodes.
    • The RPC node will be complemented with remote resources – high performance computing and storage facilities as needed by the models to be used in the experiments.
    • The RPC node can be moved from one geographical location to another.
    • Access to the remote resources will require standard internet connections.
  • System Performance Characteristics
    • The primary goal of the RPC node is to provide the capability to rapidly prototype the assimilation of new or future NASA data products and/or model derived data streams into model applications that have generated demonstrable scientific results of merit and stakeholder interest.
    • However, there is no established benchmark to quantitatively specify what “rapid” means. The reference point is the current practice – manual configuration of data and models, whereas the expectation is that the RPC approach will considerably speed up the process, in particular for repeated experiments, after the baseline data and models are set up.
    • However, the initial phase – setting the baseline data and models – may prove to be time consuming as it will involve model integration, data acquisition and simulation, and the development of new components for geoprocessing the data.
  • System Performance Characteristics
    • “ Rapid Prototyping” performance benefits will be best realized through the reusability of configured geoprocessing tasks to provide model-ready input data to a model that has been fully integrated into the RPC.
    • It is this “reuse” capability that will enable the rapid evaluation of new data types.
    • By associating existing geoprocessing workflows with new data types, the rapid assimilation of next-generation data into configured models should be readily achievable.
  • Policy and Regulation
    • As the RPC develops into a viable simulation system, it is expected that activities requiring RPC resources will be requested and coordinated among those selecting an RPC for evaluation, the RPC team conducting a specific evaluation, and RPC developers who will be required to maintain and evolve the RPC to support requirements for integrating new model applications, data products, and geoprocessing tasks.
    • As the RPC evolves to meet new or changing requirements, configuration management practices, version control, and developmental practices will be followed to ensure that capabilities in development will be isolated from operational RPC capabilities.
  • Policy and Regulation
    • Simply stated, development activities, testing, and integration of new functionalities into the RPC should be “contained” through the use of segregated physical or virtual systems that may be isolated from the operational instance of the RPC.
    • As new capabilities mature through development processes, configuration “check-in” procedures will be followed to ensure the orderly integration of the new “proven” capabilities.
    • It is likely that such activities will involve proactive participation of an RPC technical working group.
  • System Interfaces
    • The RPC node has 5 categories of users, each requiring a dedicated interface.
    • In addition, the RPC interacts with two classes of external systems: data providers and remote computing and storage facilities.
    • Each interface will be described in the remaining slides
  • System Administrator Interface
    • The administrator interface must support the administrator tasks:
    • registering and de-registering users and assigning roles
    • maintaining the user credentials needed to access remote resources
    • monitoring the system status and usage
    • backing up and restoring data and software; recovery from faults
    • deployment of new software components and services
  • Model Specialist Interface
    • The model specialist is responsible for deploying and integrating the models into the RPC environment.
    • The models can be installed either locally on RPC node hardware and/or at a remote computing facility.
    • To integrate the model with RPC the specialist must “ register ” the model, that is, generate a metadata record that describes the model in terms of its functionality, the runtime requirements (location of the executable, environmental variables, the structure of the working directory, etc.), model parameters, and definition of the input and output datasets.
    • The model specialist interface must thus support the registration of new models and editing of the metadata of the existing models.
    • In addition, the model specialist interface must provide support for the testing of the correctness of the model deployment.
  • Data Specialist Interface
    • The data specialist identifies the data providers and designs the geo-processing procedure for transforming the original data product to match the model input data requirements.
    • The design of the geo-processing may require the development and deployment of software components to perform specified tasks.
    • The data specialist interface must provide support for:
      • searching data products from known data providers
      • assessing the structure and syntax of available data products
      • assessing the model input data requirements
      • discovering and evaluating the geo-processing modules already integrated with the RPC node
      • integrating new geo-processing modules within the RPC node
      • composing the geo-processing process from available components
      • testing of the correctness of the geo-processing procedure
  • Domain Specialist Interface
    • To support the design and execution of experiments, the domain specialist interface must support:
      • Discovery of available models and data through the RPC facilities
      • Receiving and filling requests for new models and data
      • Configuring experiments by
        • Connecting a particular model with particular data
        • Setting the model parameters
        • Configuring datasets (region of interest, timeframe, etc.)
      • Submitting models for execution
      • Monitoring the model progress
      • Controlling the model execution (e.g., aborting it, if needed)
      • Verifying that the model completed successfully (e.g., by examining a log file generated by the model, running a test applications, etc.)
  • Analyst Interface
    • The analyst analyses the experiment outcome. The analyst interface must:
      • Allow queries of the output data databases to find the model outputs of interest
      • Provide access to model outputs
      • Provide access to model provenance (when and in what circumstances the model has been run, e.g., what input data sets has been used, the values of the model parameters, etc.)
      • Provide access to tools (visualizations or otherwise) enabling access to the results of the experiments
  • Data Provider Interface
    • The RPC must define interfaces that allow acceptance of data streams coming from data providers.
  • Remote Resources Interface
    • The RPC must define interfaces for invoking Grid services such as allocating and monitoring remote resources, accepting notifications about status changes (i.e., a job has completed), and data transfers between RPC node and remote resources, as well as data transfers between remote resources.
    • Defined interfaces must support delegation of user credentials to satisfy the access control requirements and policies of the remote resources.
  • The End Backup slides follow
  • The baseline system. This four-tier architecture follows OGSA recommendations
  • Evaluations leading to new understanding & ideas for ISS MyRPC LIS Functional computational capabilities of the RPC system IGE
    • Authorization
    • Authentication
    • Notification
    • Monitoring
    • Workflow
    • Security
    • ESMF
    • GCMD
    • THREDDS
    • ESML
    • Ontology
    • Query
    • MyRPC
    • Host environment
    • GPIR
    • Execution description
    • Application description
    • Grid enabled OGC Services
    WorldWinds
  • RPC Portal MyRPC GCMD Service oriented architecture for Computational RPC Node [based on NSF LEAD (Drogemeier et. al., 2006)] WRF, HSPF LIS, RAMS DAACs CLASS Evaluation ESMF, GEOLEM OGC Services
  • CRPN WRF ESMF IGE GCMD Systems framework for CRPN, consisting of interacting subsystems in the secure and stable RPC computational grid [based on NSF LEAD (Drogemeier et. al., 2006)] MyRPC workspace LIS WorldWinds