NOAA GEO-IDE Plan

1,153
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,153
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NOAA GEO-IDE Plan

  1. 1. NOAA Global Earth Observation Integrated Data Environment (GEO-IDE) Concept of Operations Version 3.3 13 September 2006 Prepared by: U.S. Department of Commerce National Oceanic and Atmospheric Administration (NOAA) Data Management Committee (DMC) Data Management Integration Team (DMIT)
  2. 2. NOAA Global Earth Observation Integrated Data Environment CONOPS Contents Preface.............................................................................................................................i Executive Summary.........................................................................................................ii 1. Introduction..................................................................................................................1 1.1.Goals......................................................................................................................2 1.2.Benefits...................................................................................................................2 1.3.Why Now?..............................................................................................................4 1.4.Risks.......................................................................................................................5 1.5.Present situation.....................................................................................................5 1.6.Document organization...........................................................................................6 2. Scope ..........................................................................................................................8 3. Vision and principles..................................................................................................11 3.1.Vision....................................................................................................................11 3.2.Data Management Principles................................................................................13 4. Approach....................................................................................................................16 4.1.Introduction...........................................................................................................16 4.2.Proposed Model: Service-Oriented Architecture...................................................16 4.3.Web Services in NOAA.........................................................................................18 4.4.Basis for development .........................................................................................20 4.5.Development Approach........................................................................................21 4.6.Key Development Strategies................................................................................22 5. Governance Structure and Program Control..............................................................27 5.1.Background ..........................................................................................................27 5.2.Governance..........................................................................................................28 5.3.Management Components ...................................................................................31 6. Towards a Service-Oriented Architecture...................................................................34 6.1.Data access and use............................................................................................34 6.2.Web Services........................................................................................................40 7. The NOAA GEO-IDE Standards Process...................................................................43 7.1.Background...........................................................................................................43 7.2.General Principles for the Standards Process......................................................43 7.3.Related standards processes...............................................................................44 7.4.Process for adoption of NOAA GEO-IDE standards.............................................45 7.5. Proposed process for defining an initial set of standards.........................................50 8. NOAA Guide on Integrated Information Management................................................51 8.1.Data management policies...................................................................................51 8.2.NOAA-wide standards..........................................................................................53 8.3.Registry of data management software (applications and tools)...........................53 8.4.Data management planning template...................................................................54 9. Priorities for Action.....................................................................................................56 Appendices...................................................................................................................57 Acronyms....................................................................................................................57 Membership of NOAA DMIT.......................................................................................59
  3. 3. NOAA Global Earth Observation Integrated Data Environment CONOPS Preface In 1992, Congress ordered that NOAA biennially assess the adequacy of its environmental data and information systems. Of particular concern are the interfaces to these systems. NOAA data systems and those of other Federal agencies with environmental responsibilities should facilitate integration and interpretation of data from different sources. Partly in response to the latest assessment, and partly driven by the advantages offered by new capabilities in information technology, the NOAA Data Management Committee (DMC), in October 2004, called for development of an integrated NOAA data management plan. The NOAA Data Management Integration Team (DMIT) was convened to address this task. The membership of DMIT was selected to broadly represent data management activities and requirements within NOAA. The individuals on DMIT were selected based upon their experience, and technical insight. This document is the work of DMIT. A separate document, the GEO-IDE Implementation Plan describes the actions, responsibilities and milestones needed to guide implementation of the GEO-IDE data integration strategy described in this document. i
  4. 4. NOAA Global Earth Observation Integrated Data Environment CONOPS Executive Summary To carry out its mission, NOAA must understand and address the complexity of many environmental problems and answer questions addressing contemporary societal needs. To do this, NOAA must be able to successfully integrate information from all of its goal areas and exchange data with partners in the national US-Global Earth Observation System (US-GEO) and the international Global Earth Observation System of Systems (GEOSS). Today, integration of data from multiplies sources or disciplines is difficult and expensive, which can lead to no or wrong answers to important societal questions. With its Global Earth Observation Integrated Data Environment (GEO-IDE) as its contribution to US-GEO, NOAA will be able to provide easier and more cost-effective access to all of its data and information. NOAA will be better able to provide timely and accurate answers to important scientific questions and will better serve its customers; Federal, state and local governments, academia, the private sector and, ultimately, the American people. NOAA’s GEO-IDE is envisioned as a “system of systems” – a framework that provides effective and efficient integration of NOAA’s many quasi-independent systems, which individually address diverse mandates in areas of resource management, weather forecasting, safe navigation, disaster response, and coastal mapping among others. NOAA Line Offices will retain a high level of independence in many of their data management decisions, encouraging innovation in pursuit of their missions, but will participate in a well-ordered, standards-based data and information infrastructure that will allow users to easily locate, acquire, integrate and utilize NOAA data and information. The NOAA GEO-IDE will make NOAA products available in multiple formats and communication protocols, utilizing current information technology standards, where they are mature, and best practices, where accepted standards are still evolving. NOAA data and products will be described by comprehensive metadata that conforms to national and international standards. NOAA observing systems and collection, assimilation, quality control and modeling centers will provide their data and metadata in accordance with established NOAA GEO-IDE standards. NOAA GEO-IDE will strive to take full advantage of the opportunities presented by internet technology to make access to environmental data and information as easy and effective as today’s Web access to digital documents. It will also improve efficiency and reduce costs by bridging the barriers between existing, independent “stove pipe” systems and integrating the data management activities of all NOAA programs. It will do this through a federated approach, where the individual components retain a measure of responsibility and authority within the context of an overarching systematic set of goals, principles, and objectives. Many NOAA information systems are critical to the national interest and we must ensure that improved integration and efficiency are achieved with minimal impact on these legacy systems and no interruption in essential services. Any changes in legacy systems that are needed for them to fully participate in GEO-IDE will be through the development of new interfaces to those systems and should not impact their basic capabilities. GEO-IDE will fundamentally depend upon standards and it is essential that these be thorough, documented, and supported standards with demonstrated benefits. To ensure ii
  5. 5. NOAA Global Earth Observation Integrated Data Environment CONOPS these standards are embraced and accepted across NOAA, an open and inclusive “standards process” for nominating, evaluating, and implementing NOAA GEO-IDE standards is proposed. The standards process will define what standards are adopted, when they become effective, and how the organization will build up to and support the implementation of those standards. To ensure standards are effectively applied across all of NOAA, project managers and developers must understand and use them. A NOAA Guide on Integrated Information Management will be developed to serve as a single reference point for NOAA data management polices and guidelines, an inventory of data systems, and relevant NOAA, national and international standards (in all stages of the NOAA approval process). To achieve its goals, this plan recommends an incremental approach with continued operation of existing systems and standards while gradually improving integration through an evolutionary process of pilot projects and iterative improvement. Learning from and working with existing data integration initiatives, GEO-IDE will make use of standards, standard tools, and lessons learned. GEO-IDE aims to retain existing systems as much as possible while building a software infrastructure that links these systems together. This software infrastructure, called a Service Oriented Architecture, is a style of systems design based on using loosely coupled connections among independent programs to create scalable, extensible, interoperable, reliable, and secure systems. Service-based architectures have been proven to solve interoperability problems including integrating systems developed in various programming languages, running on different computing environments and developed by autonomous groups at different times. These architectures make it practical to adapt and connect existing systems quickly for accomplishing new tasks and to benefit from highly evolved and still useful “legacy” applications. Good governance is critical to the successful implementation of GEO-IDE. A higher level administrative structure that provides a suitable context for the Governance of GEO-IDE already exists: the NOAA Observing System Council and Data Management Committee (DMC). The DMC established the Data Management Integration Team (DMIT) to develop the GEO-IDE Concept of Operations and provide expertise and advice on the near-term (5-year) actions needed to implement this plan. DMIT includes representatives from all NOAA line offices and goals. To ensure synergy and effective coordination with IOOS DMAC activities, all NOAA members of the DMAC Steering Team are also members of DMIT. Many of the governance issues will require a detailed understanding of information technology and an expanded structure is recommended to oversee implementation. A number of GEO-IDE implementation teams will be assembled to define the detailed architecture and coordinate development of specific Web services. These teams will be guided by a full-time project manager, hired to oversee implementation of GEO-IDE. Realization of the GEO-IDE vision will take years. Implementation will be pursued through a number of concurrent activities, following a spiral, iterative development approach. A companion document, The GEO-IDE Implementation Plan defines specific actions, responsibilities, and milestones needed to implement GEO-IDE over the next ten years. It calls for implementation to begin with the following high-priority activities to: 1. Establish the GEO-IDE project management structure. 2. Secure funding to support GEO-IDE activities. iii
  6. 6. NOAA Global Earth Observation Integrated Data Environment CONOPS 3. Identify major information management systems in NOAA. 4. Evaluate, adopt and adapt information management standards within NOAA and publicize them via an on-line NOAA Guide to Integrated Information Management. 5. Define a NOAA-wide, service-oriented Web architecture. 6. Test the feasibility of utilizing a “data typing” approach to NOAA data and refine the categorization of data types used throughout NOAA. 7. Develop/acquire technical knowledge and skills. 8. Identify technologies for implementation of the SOA, define core Web services needed and implement these services via pilot projects. 9. Investigate new technologies to support the NOAA mission. iv
  7. 7. NOAA GEO Integrated Data Environment CONOPS 1 1. Introduction NOAA’s mission is “To understand and predict changes in the Earth’s environment and conserve and manage coastal and marine resources to meet our Nation’s economic, social, and environmental needs.” To carry out this mission, NOAA must be able to successfully integrate information from all of its goal areas to understand and address the complexity of many environmental problems and answer questions that are important to address contemporary societal needs. Furthermore, NOAA must be able to exchange data with partners in the national US-Global Earth Observation System (US-GEO) and the international Global Earth Observation System of Systems (GEOSS). With the Global Earth Observation Integrated Data Environment (GEO-IDE) as its contribution to US-GEO, NOAA will provide easier and more cost-effective access to all of its data and information. NOAA will ensure its data and products are collected and managed in accordance with policies, procedures and standards that support and enhance integration and conform to NOAA Administrative Order (NAO) 212-15. These activities will ensure that society can access and use high quality, complete, and integrated information needed to support critical environmental and societal decisions. Discipline-Specific View Whole-System View Atmospheric Observations Land Surface Observation Ocean Observations Space Observations Data Systems Current systems are program-specific, focused, Coordinated, individually efficient. efficient, But incompatible, not integrated, isolated from one integrated, another and from wider environmental community interoperable Figure 1.1 - An integrated, whole-system view is needed for coordinated and efficient operations 1 Over the past decade the advent of the Web and its attendant search engines has greatly improved access to documents and text that are available on line. However, this revolution in access to documents has highlighted how far we have to go to improve access to digital data. Web search engines cannot extract information from digital data holdings and no single standard guides the transfer of digital data and products over the Internet. Instead, as was true with documents before the Web, digital data are indexed and cataloged by many different sources and maintained and supplied in a multitude of 1
  8. 8. NOAA GEO Integrated Data Environment CONOPS 2 formats. It is difficult and inefficient to locate data and hard to make effective use of data that are retrieved. 1.1.Goals One goal of NOAA GEO-IDE is to take full advantage of the opportunities presented by internet technology to make access to environmental data and information as easy and effective as access to digital documents over the Web is today. Just as the Internet and Web browsers interoperate to make the location of documents nearly irrelevant, so should the process of locating datasets and individual elements from datasets be made effortless. Once located, analysis and visualization programs should be able to easily access, analyze and integrate data from many sources, regardless of their location or the underlying data storage techniques in use. Another important goal is to improve efficiency and reduce costs by bridging the barriers between existing, independent “stove pipe” systems and integrating the data management activities of all NOAA programs, while avoiding a fully centralized approach. A federated approach, where the individual components retain a measure of responsibility and authority within the context of an overarching systematic set of goals, principles and objectives is likely more achievable and cost-effective. Many NOAA information systems are critical to the national interest and we must ensure that improved integration and efficiency are achieved with minimal impact on these legacy systems and no interruption in essential services. To achieve these goals, this concept of operations recommends capitalizing on on-going data management initiatives and continued operation of existing systems and standards while gradually improving integration through an evolutionary process of pilot projects and iterative improvement. It aims to retain existing systems as much as possible while building a software infrastructure that links these systems together. This software infrastructure, called a Service- Oriented Architecture, is a style of systems design based on using loosely coupled connections among independent programs to create scalable, extensible, interoperable, reliable, and secure systems. Through the GEO Integrated Data Environment NOAA will:  Identify and address gaps in existing data management systems.  Create interoperability across data types, disciplines, space and time scales, etc.  Develop and adopt standards for data access protocols and data formats.  Develop and adopt standards for terminology, units and quantity names.  Improve integration of measurements, data, and products.  Define a Data Management Architecture to integrate existing systems and provide a framework in which to meet needs of future data systems.  Improve the efficiency of NOAA business by eliminating barriers to information access and reducing duplication through development and implementation of a Service-Oriented Architecture.  Make it possible for the vision of US-GEO and GEOSS to succeed. 1.2.Benefits The NOAA GEO Integrated Data Environment will enhance our ability to integrate observations and products, improve quality control, modeling and dissemination and standardize discovery and access to NOAA data and products. This will greatly expand the effectiveness of in-discipline areas (e.g. research, marine forecasts, storm forecasts, 2
  9. 9. NOAA GEO Integrated Data Environment CONOPS 3 disaster planning, disaster management, etc.) as well as allow improved use of information to address multi-disciplinary societal issues. It will enable access to data and information across various NOAA goals, programs and observing systems in timely, scientifically valid, and user-friendly ways. Information from a variety of societal theme areas must be successfully integrated to address the complexity of many environmental problems. Consider what is needed to understand the societal impacts of sea level change along our coasts. Information from diverse areas including weather, climate, disasters, water resources, ocean resources, and ecosystems, as illustrated in Table 1.1, must be successfully integrated to address this problem. Table 1.1 - Examples of how sea level integrates across theme areas. Theme Areas Important Observables Time-scales of interest Disaster reduction Hurricanes and Tsunamis Multiple time scales Human Health Safety Episodic Climate Sea ice extent & land ice/ocean heat Weekly to decadal/annual content Water Resources Land water withdrawals/ Coastal water Decadal/Annual tables Weather Storms (winds/waves) and Storm surges Daily to weekly Ocean Resources Sea level & detailed coastal elevations Annual/Decadal Agriculture & Land-Use Coastal relief & infrastructure Century/Decadal Ecosystems Coastal flora and fauna Annual to decadal As another example, the measurement and analysis of drought has many time and space scale dependencies that affect all of the societal theme areas. In this example full integration would address common observing, data, and analysis needs as applied to every one of our theme areas. Table 1.2 provides some examples of the kinds of data and information that would need to be integrated to address drought across themes. Table 1.2 - Examples of how drought integrates across the themes Societal Benefit Areas Important Observables Time-scales of Interest Human health Water availability/quality Daily to seasonal Energy Reservoir and lake water levels Monthly Climate Boundary conditions Weekly to decadal Water resources Ground water and lake levels/ Seasonal to decadal Weather water quality Circulation, water vapor Daily to weekly Ocean resources River flow Monthly Agriculture Soil moisture Weekly Ecosystems Water availability/quality Weekly to decadal Development and implementation of the Service-Oriented Architecture described in this document will improve the efficiency and effectiveness of data and information management systems within NOAA. This approach has a proven record of solving interoperability problems which include integrating systems developed in various programming languages, running in different environments on heterogeneous compute platforms, and developed by independent groups in autonomous organizational units at different times. It provides a means to improve integration and interoperability and can lead to a great increase in the reuse of software across NOAA. 3
  10. 10. NOAA GEO Integrated Data Environment CONOPS 4 NOAA’s GEO-IDE will improve the application within NOAA of standards and best practices defined in related plans such as the Integrated Ocean Observing System (IOOS) Data Management and Communications (DMAC) Plan and the Integrated Earth Observation Data Management Plan. This will lead to improved integration of information systems within NOAA and interoperability of NOAA systems with those of other government agencies and the wider commercial and academic communities. These improvements will, in turn, help NOAA make better use of holdings of external data and information in fulfilling its mission. 1.3.Why Now? Congress, in U.S. Code Title 15, Section 1537 (1) and Section 1537 (2) ordered that at least biennially the Secretary of Commerce shall complete an assessment of the adequacy of the environmental data and information systems of NOAA. In conducting such an assessment, the Secretary shall take into consideration the need for (among others):  The development of effective interfaces among the environmental data and information systems of NOAA and other appropriate departments and agencies.  The integration and interpretation of data from different sources to produce information that can be used by decision makers in developing policies that effectively respond to national and global environmental concerns. Improved integration of data management activities is critical to the success of US-GEO. As noted in the Interagency Working-group for Global Earth Observations (IWGEO) Integrated Earth Observation System (IEOS) Draft Strategic Plan (pgs 60-61) The U.S. needs a comprehensive and integrated data management and communications strategy to effectively integrate the wide variety of Earth observations across disciplines, institutions, and temporal and spatial scales. There are three urgent needs for data management:  New observation systems will lead to a 100-fold increase in Earth observation data.  Individual agencies’ current data management systems are challenged to adequately process current data streams.  The U.S. Integrated Earth Observation System, linking the observations and users of multiple agencies, compounds these challenges. Data management is a necessary first step in achieving the synergistic benefits from the U.S. Integrated Earth Observation System. Uncoordinated development leads to inefficiencies, incompatibilities, and duplication of effort. Increased efficiency is needed to handle the expected exponential increase in data volumes that will occur over the next decade. To cope with this unprecedented increase in the volume of data to be managed, NOAA must begin to improve coordination and integration of its data management activities now. Several plans have recently been developed that include reference to the need for improving integration and interoperability of systems that manage Earth observation- related data. These include the: • IOOS Data Management and Communications Plan (DMAC); • IWGEO Strategic Plan for the U.S. Integrated Earth Observation System; 4
  11. 11. NOAA GEO Integrated Data Environment CONOPS 5 • IWGEO Integrated Earth Observation System Data Management Plan; • Strategic Direction for NOAA’s Integrated Environmental Observing and Data Management Systems; • Chief Financial Officer (CFO) Request: NOAA’s Integrated Environmental Observation and Data Management Program; and • NOAA’s Environmental Data Management: Integrating the Pieces. The opportunities presented by improving interoperability between geospatial data have also been recognized by the educational and commercial sectors and in many areas industry and academia are leading the way. There are several initiatives now underway that address issues directly related to locating, sharing, use, and integration of NOAA data. Among the most significant are the activities of the Open Geospatial Consortium (OGC) and World Wide Web Consortium (W3C), the Federal Geographic Data Committee (FGDC), continued development and evolution of national and international metadata standards, the spreading adoption of the Open Project for a Network Data Access Protocol (OPeNDAP), and the development of the E-Gov Geospatial One-Stop Portal – an interagency geospatial data resource. 1.4.Risks Continuing to develop systems in an uncoordinated manner will lead to further incompatibilities and will further isolate NOAA programs from each other and from the wider environmental community. This will increase the difficulty in integrating information between programs and hamper NOAA’s ability to address important multi- disciplinary cross-goal societal issues, e.g., coastal erosion, water resources, etc. The development and institutionalization of isolated islands of data systems that evolve independently may make future integration very expensive and isolate communities of users. There are also risks in adopting a vision this ambitious. Key risks are: • While the basic technologies are sound, there are risks in utilizing new technologies that have not been applied to NOAA data systems, its high volume of data, and the requirement to conform to NOAA security policies and to work harmoniously with current data systems and network architectures. • Many attempts to build, apply and adhere to standards for data and metadata have failed due to a lack of uniform commitment to the process. The risk will be that there is insufficient management and financial support applied to standards necessary for GEO-IDE to be successful. The likelihood of success in developing and implementing an integrated data environment can be increased by setting realistic goals, adopting current best practices for software engineering and project management, and by maintaining agility to make necessary mid-course corrections. 1.5.Present situation Existing NOAA information systems have been developed to meet diverse sets of requirements. In general, these systems have been developed by individual programs to meet specific needs and are, thus, focused in their approach and efficient at what they do. The multiplicity of systems operated for different programs has, however, resulted in incompatibilities, inefficiencies, duplication of effort and higher overall costs for NOAA as 5
  12. 12. NOAA GEO Integrated Data Environment CONOPS 6 a whole. Even with systems connected to the same network, incompatible protocols and interfaces are an effective barrier to interoperability. As illustrated in Figure 1.2, a multitude of observing and data processing systems contribute data to support NOAA goals. Many of these systems are operated by NOAA, while others are operated by partner agencies that make their information available to NOAA or depend upon NOAA for long-term data archival. Data from these systems are encoded in many different formats and transmitted via a variety of communication systems and protocols. The amount, quality and format of metadata pertaining to these systems vary widely. Application of environmental data to multi-disciplinary problems is hampered by lack of agreed-upon and implemented standards needed to effectively identify, acquire, and correctly use all of the relevant data. and Terminology Multiple Inconsistent Sources, Formats, Protocols Figure 1.2 - Present situation: connectivity is limited and users must know where to access information. Data and products are available through incompatible interfaces and formats, and services from multiple centers cannot be easily combined. 1.6.Document organization This document is organized as follows: Chapter 1 Introduction - An overview of the goals, benefits and risks associated with the ideas presented in this document and other background material. Chapter 2 Scope - The types of data, products and information management systems that are and are not covered by GEO-IDE. Chapter 3 Vision and Principles - The vision for a NOAA Integrated Data Environment as well as a set of data management principles applicable to all NOAA environmental information systems. Chapter 4 Approach - The technical and software development approaches recommended to implement the vision. Chapter 5 Governance Structure and Program Control - An outline of the current organizational structure for oversight and coordination of NOAA-wide 6
  13. 13. NOAA GEO Integrated Data Environment CONOPS 7 observations and data management activities. It also includes principles to guide program management efforts and decision making and a proposed organizational structure for implementation of GEO-IDE. Chapter 6 Towards a Service-Oriented Architecture - Identification of specific items that are needed to achieve GEO-IDE. Chapter 7 The NOAA GEO-IDE Standards Process - A proposed process for nomination, evaluation and implementation of information management standards for scientific and environmental data within NOAA. Chapter 8 NOAA Guide on Integrated Information Management - An outline for an on- line NOAA guide, to help NOAA program and project managers implement information systems that conform to the GEO-IDE vision. Chapter 9 Priorities for action - Priorities for action over the next 3 years (2007-2009). 7
  14. 14. NOAA GEO Integrated Data Environment CONOPS 8 2. Scope This concept of operations defines a vision for applying a consistent set of principles, policies, and standards to the design, development, evolution, and operation of NOAA’s data management systems. GEO-IDE shall facilitate convergence towards an integrated system that is aligned with NOAA’s mission, goals and programs and is responsive to their requirements. The NOAA Observing System Council (NOSC) has agreed that data management is defined by two coordinated activities: data management services and data stewardship. Together they constitute a comprehensive end-to-end process for movement of data and information from observing systems to data users. This process includes: data acquisition; quality control; validation; reprocessing; cataloging, documenting, storing and archiving the acquired data; and retrieving and disseminating the various data versions.  Data Management Services include adherence to agreed-upon standards; ingesting data, developing collections, and creating products; maintaining data bases; ensuring permanent, secure archival; migrating services to emerging technologies; providing both user-friendly and machine-interoperable access; and assisting users and responding to user feedback.  Data Stewardship consists of the application of rigorous analyses and oversight to ensure that data sets meet the needs of users. This includes documenting measurement practices and processing practices (metadata); providing feedback on observing system performance; validation of data sets; reprocessing (incorporate new data, apply new algorithms, perform bias corrections, integrate/blend data sets from different sources or observing systems); and recommending corrective action for errant or non-optimal operations. Given the above definition, data management encompasses a wide range of information management functions, as shown below in Table 2.1. The boundaries between communication, data management and data processing systems can be ambiguous and subject to interpretation. Making optimal use of NOAA’s data management systems for a variety of NOAA program requirements, while balancing the disparate, and sometimes contradictory, requirements placed upon them is a constant challenge. Table 2.1 - Data Management Functions Data acquisition  Initial collection of raw data values  Collection/creation of metadata  Downlink and telemetry are not covered by this GEO-IDE Ingest  Transmission (Internet, private networks, satellite, media, etc.)  Collection and storage of metadata  Performance monitoring (observing, computing, communications, etc.) Data Processing  Data representation (format)  Quality control (e.g., detect missing data, check value limits, compare with neighbors)  Quality assurance (e.g., data validation, compliance with Data Quality Act) 8
  15. 15. NOAA GEO Integrated Data Environment CONOPS 9  Model/data intercomparison  Aggregation in space and time  Assimilation  Modeling  Production of products (charts, data records, warnings, forecasts, imagery, statistics, geodatabases, Internet mapping services, etc.)  Analysis (means and extremes, trends, climate indicators, discontinuity and bias determination, statistical analyses, etc.)  Reprocessing Access  Data discovery/catalogs  Query - interactive browse or via intermediary personnel  Data selection, extraction and translation  Delivery of data, metadata and services (via telecommunications or media)  Mapping and map services  Visualization As envisioned in the 2005 Report to Congress, an important focus of data management should be to ensure that NOAA data is easily shared within NOAA, with GEOSS participants and other user communities. This GEO-IDE concept of operations articulates the roles, methods, and standards to ensure that NOAA data are interoperable and easily transferred between these diverse communities of users. It establishes a process for identifying standards, policies and recommended tools to enable integration between independent systems that perform each of the data management functions identified in Table 2.1. GEO-IDE focuses on the future state (e.g., 5 to 10 years) of integrated NOAA data management. It provides the building blocks for a smooth evolution from the status quo to an integrated system of systems. GEO-IDE describes a framework for how on-going and new data management initiatives (e.g., Comprehensive Large Array-data Stewardship System (CLASS), IOOS DMAC, Advanced Weather Information Processing System (AWIPS) modernization, etc.) should be developed to maximize data integration. Through GEO-IDE, NOAA will be able to identify, endorse or develop standards and protocols to effectively migrate legacy systems toward a common vision. GEO-IDE defines and prioritizes specific actions to pursue, and proposes responsibilities to implement integrated data management capabilities. With respect to numerical modeling, the input and output of models are within the scope of GEO-IDE (e.g. data and file formats, communication systems and protocols, metadata, documentation and ultimate disposition of output products). While it is important to retain model source code, data inputs and outputs, etc., GEO-IDE does not address the way in which technical or scientific model algorithms are developed. GEO-IDE is concerned with environmental and geospatial data and information obtained or generated from worldwide sources to support NOAA's mission. This is consistent with NAO 212-15, which includes the following definitions for these data. Environmental Data - recorded observations and measurements of the physical, chemical, biological, geological, or geophysical properties or conditions of the oceans, atmosphere, space environment, sun, and solid 9
  16. 16. NOAA GEO Integrated Data Environment CONOPS 10 earth, as well as correlative data and related documentation or metadata. Media, including voice recordings and photographs, may be included. Geospatial Data - information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth. This information may be derived from, among other things, remote sensing, mapping, and surveying technologies. Statistical data may be included in this definition at the discretion of the collecting agency. Thus GEO-IDE covers, for example, the following types of data, information and services:  Chemical, geological, or geophysical properties or conditions of the oceans  Chemical and physical properties of the atmosphere, space environment and sun  Geological and geophysical properties of the solid earth  Paleoclimatological and other proxy records  Ecological and biological properties and conditions of the oceans  Socio-economic data collected for or associated with a NOAA mission  Information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth GEO-IDE does NOT cover data or data management requirements for administrative support systems, such as finance, personnel, acquisition or facilities management. 10
  17. 17. NOAA GEO Integrated Data Environment CONOPS 11 3. Vision and principles 3.1.Vision NOAA’s GEO-IDE is envisioned as a “system of systems” – a framework that provides effective and efficient integration of NOAA’s many quasi-independent systems, which individually address diverse mandates in areas of resource management, weather forecasting, safe navigation, disaster response, coastal mapping, etc. NOAA line organizations will retain a high level of independence in many of their data management decisions, encouraging innovation in pursuit of their missions, but will participate in a well-ordered, standards-based data and information infrastructure. NOAA’s GEO-IDE will provide friendly and flexible mechanisms to locate and access data and data products. It will address the needs of many classes of users including private industry, students and educators, researchers, government agencies, and the American public. It will also foster a community of private sector, value-added information product providers to address the needs of specialized groups. GEO-IDE will make NOAA products available in multiple formats and communication protocols, utilizing current information technology standards where they are mature, and best practices where accepted standards are still evolving. NOAA data and products will be described by comprehensive metadata that conform to national and international standards. Descriptions of products and data will be available over the Internet and searchable via standardized data discovery portals. NOAA observing systems and collection, assimilation, quality control and modeling centers will provide their data and metadata in accordance with established NOAA GEO-IDE standards. When a user feels a need for personal assistance, the GEO-IDE Web portals will guide the user to contact points – email help desks and telephone-based guidance. NOAA’s GEO-IDE will be a component of US-GEO. It will provide users with integrated access to data and information from other systems within US-GEO, smoothly integrating across Federal Agency, public-private, and inter-disciplinary boundaries. GEO-IDE will contribute data into US-GEO in accordance with US-GEO and GEOSS data and information standards and protocols. The planning of GEO-IDE will emphasize a sustained, close collaboration with and leadership within US-GEO and GEOSS. GEO-IDE will satisfy the diverse requirements of operations, research, monitoring, and archives. It will provide reliable discovery and delivery of data from measurement subsystems to operational modeling centers and to users and the delivery of computer- generated (model) information. For users who have time-critical requirements, collection, transmission, processing and delivery of this information will be done in real time. GEO-IDE will enable research systems to provide data to operational centers, when this is deemed appropriate. It will ensure that all appropriate data flow seamlessly into and out of secure, long-term archive facilities. GEO-IDE will provide a continuous, vigorous outreach process to identify and remedy difficulties encountered by any users. Through its governance mechanisms GEO-IDE will assure a continual assessment of changing user requirements and emerging technological solutions. Continual innovation will be a hallmark of GEO-IDE. 11
  18. 18. NOAA GEO Integrated Data Environment CONOPS 12 To illustrate the benefits that GEO-IDE will offer to users, consider the following scenario: Today, NOAA struggles to provide its environmental information to customers in a way that makes it easy to locate, acquire, and use. For example, if a customer wishes to study coastal erosion and its impact on estuarine ecosystems, several research websites within NOAA, including NWS weather Web pages, NESDIS/NCDC climate Web pages, and the NOS NowCoast must first be located. Requirements with customer service representatives from several organizations would probably need to be discussed, since there is no single comprehensive gateway to all NOAA data. Once the relevant information has been located, each NOAA organization would likely have a different process to follow in order to acquire the data. Once obtained, the data would likely be in inconsistent formats, using inconsistent parameter names, units, and quality control. The documentation available to describe each dataset would vary widely. These problems are exacerbated if the customer needs data and information related to several scientific disciplines, such as meteorology, oceanography and ecosystems. Under NOAA’s GEO-IDE the steps to address the customer needs described above will look radically different. The customer’s favorite application or a standard Web browser will provide access to all NOAA (and related) data and information through a single interface. The interface provides intuitive tools to locate data that may be of interest, allowing for refined searches based upon geographic region, date, discipline, parameters of interest and a host of other descriptive information (metadata). The data discovery server responds to requests within moments, with an assurance that it has comprehensively searched NOAA’s data holdings and has identified all information that matches the request. The customer is then able to read descriptions of the data and browse images and visualizations in order to quickly evaluate the data and arrive at the subset of interest. All of the desired data, products and information can then be obtained in a manner compatible with preferred analysis tools and using standard terms and units. There is no need for awareness of the physical location of the data or the manner in which it is managed – the data subsets that are of interest are delivered in a ready-to- use manner. Thus, all information can be easily combined and analyzed without regard to its source. Comprehensive information about the data (metadata) is available to aid in understanding the corrections, adjustments and other processing applied to the data. Customers can also benefit from services that are not available today. For example, data subscription services and application-supported data discovery services would allow the use of relevant data without a customer having to explicitly discover and access it. The customer is provided with information on how to contact a NOAA expert for additional help if any problems or questions arise. The data environment to be created by NOAA GEO-IDE is outlined in Figure 3.1. 12
  19. 19. NOAA GEO Integrated Data Environment CONOPS 13 Figure 3.1 – Architectural vision GEO-IDE depends fundamentally upon standards, and it is essential that these be thorough, documented, and supported standards with demonstrated benefits. To ensure these standards are embraced and accepted across NOAA, an open and inclusive “standards process” for nominating, evaluating, and implementing NOAA GEO-IDE standards will be adopted. The standards process will define what standards are adopted, when they become effective, and how the organization will build up to and support the implementation of those standards. The GEO-IDE governance infrastructure will assure that all parts of NOAA receive the training and support they need to successfully and usefully implement GEO-IDE standards. 3.2.Data Management Principles Effective realization of this vision requires all NOAA data management systems to consistently follow a set of standard data management principles. Recommended principles are described below, including how they will be applied within NOAA: 1. Commitment and leadership: Information is a strategic asset and information management must be a key component of every environmental data and information program. This ethic must be reflected in a corporate culture, embraced throughout the organization that recognizes data as a corporate resource. 13
  20. 20. NOAA GEO Integrated Data Environment CONOPS 14 NOAA management will be visible advocates for development and implementation of NOAA-wide information management investments, policies, and procedures. All NOAA employees and contractors are stakeholders in the integrated information management vision, and will strive to help the organization develop and implement policies and practices for achieving it. NOAA will establish mechanisms for ongoing communication, coordination and training to ensure that all its data producers have the knowledge and resources needed to implement NOAA data management policies. 2. Stewardship: People who take observations or produce data and information are stewards of these data, not owners. These data must be collected, produced, documented, transmitted and maintained with the accuracy, timeliness and reliability needed to meet the needs of all users. NOAA will strive to meet the requirements of all users in planning, developing, and implementing its data management systems. NOAA will endeavor to make the most of every observation it takes and data product it produces. 3. Long-term preservation: Irreplaceable observations, data products of lasting value, and associated metadata must be preserved. This information must be well- documented and maintained so that it is available to and independently understandable by users, now and in the future. NOAA will ensure all data, products of enduring value, and associated metadata are well documented and maintained in suitable archives. NOAA, in concert with its users and partners, will establish criteria and procedures to guide the acquisition, documentation, retention, and purging of data to ensure important and irreplaceable information is maintained for posterity. 4. Requirements-driven: It is essential that providers and users of data and products play an active role in defining the constantly evolving requirements that drive the development and evolution of data management systems. NOAA understands that it has unrealized potential for the use of its data and information. NOAA will work with its growing and increasingly diverse set of data providers and users to determine present and future environmental requirements and applications and to continuously improve its relationship with both groups. NOAA will establish a vigorous outreach process to involve both groups and to help identify where improvements are needed. NOAA will foster development of a value-added “market”, in which others may readily produce information products tailored to particular groups. 5. Discovery and access: Freedom of access, mechanisms that facilitate discovery, timely delivery, use and interpretation of data and products (directories, browse capabilities, metadata, mapping, visualization, etc.) are essential (while following relevant policies and regulations). NOAA will develop information systems and tools to facilitate discovery, use, and interpretation of data and products by its users. It will work with its partners in government, academia, and industry to make sure its data are available and accessible to all, while respecting any data confidentiality agreements. NOAA will ensure timely access to data and products necessary to support operational and research requirements. 14
  21. 21. NOAA GEO Integrated Data Environment CONOPS 15 6. Standards and practices: Appropriate use of information technologies, widely shared standards, and integration approaches are vital to facilitate collection, management, discovery, dissemination, and access services for environmental data and products. This will ensure interoperability among providers, systems, and users. Effective application of standards and best practices contribute to the development of systems that are interoperable, efficient, reliable, scalable, and adaptable. NOAA subscribes to the value of, and need for, corporate standards, but also recognizes the need for flexibility so that individual creativity in getting jobs done is enhanced by the use of standards. NOAA will define a process for standards adoption that is open and inclusive, and fosters buy-in by all stakeholders. Existing information technology and scientific standards will be favored. NOAA data and information will be consistent to the extent that implementation at each level, and across units, is compatible and mutually supportive. 7. Quality: Data, products and information should be of a quality sufficient to meet the requirements of society and to support sound decision-making. NOAA will strive, as a commonly understood, accepted, and supported goal, to bring quality information to people and processes inside and outside of NOAA. NOAA, together with partner agencies and institutions will strive to ensure its environmental information is of the highest possible quality within reasonable cost. The quality of NOAA data and products will be evaluated, fully characterized, and documented. 8. Cooperation and coordination: Environmental and scientific data management is a task of global scope – a whole that should be much bigger than the sum of its parts. It is only by participating in a global community of integrated data management that each organization can realize the potential of its data to the betterment of humankind. NOAA will actively participate in and commit to utilizing data management solutions that are compatible and interoperable with data systems utilized by international partners; by other US Agencies; by private sector data suppliers and users; by the research community; and by end users at all levels of US society. 9. Security: Data, information, and products must be preserved and protected from unintended or malicious modification, unauthorized use, or inadvertent disclosure. NOAA will ensure that its data management systems comply with all applicable federal security policies. It will ensure the integrity of its stored and transmitted, data and will protect data, networks and services from unauthorized use or attack. 15
  22. 22. NOAA GEO Integrated Data Environment CONOPS 16 4. Approach 4.1.Introduction NOAA is a diverse organization with many quasi-independent information systems, which individually address mandates in areas of resource management, weather forecasting, and safe navigation, among many others. These systems are critical to the national interest and GEO-IDE must ensure that its goals of improved integration and efficiency are achieved with minimal adverse impact to the functioning of legacy systems and no interruption in essential services. The direct approach to this problem would be to develop an entirely new NOAA-wide environmental information system and to replace existing systems wholesale. Once the new system was completed, after a period of parallel operations, legacy systems would be turned off and the new system would assume their functions. Given the diverse requirements of NOAA programs and the large number of existing systems, such an approach would be extremely difficult and costly, and the risk of failure would be unacceptably high. The preferred approach is to capitalize on on-going data management initiatives (e.g. the IOOS DMAC; the Fisheries Information System; the NOAA National Operational Model Archive and Distribution System (NOMADS); Global Earth Observations System of Systems; etc.) and continued operation of existing systems and standards (e.g. AWIPS, CLASS, Family of Services, etc.) while gradually improving integration through an evolutionary process of pilot projects and iterative improvement. GEO-IDE will take advantage of useful and mature existing systems, while building a software infrastructure that links these and other new systems together into an integrated framework. 4.2.Proposed Model: Service-Oriented Architecture As identified in Section 3, the vision for the GEO-IDE is one of cooperative integration. The goal of integration is to retain existing systems as much as possible while building a software infrastructure that links them together. The approach proposed to implement such a vision is through the development of a software infrastructure called a Service- Oriented Architecture (SOA). SOAs are a style of system-of-systems integration based on using loosely coupled connections among independent systems to create a scalable, extensible, interoperable, reliable, and secure framework. SOAs have been proven to solve interoperability problems which include integrating systems developed in various programming languages, running in different environments on heterogeneous computer platforms, and developed by independent groups in autonomous organizational units at different times. SOAs are built on a software technology called web services (Figure 4.1). In the figure, a Service Provider is the system offering a service. A Discovery Service (sometimes known as a service broker) is a well-known repository for information about other services. The Service Requestor is the system requesting to discover and use a particular service. The Publish connector line indicates a provider registering its services. The Find connector line represents queries made to discover details (where, and how to communicate). The Interact line represents the communications between the requestor and provider needed to obtain the requested service. 16
  23. 23. NOAA GEO Integrated Data Environment CONOPS 17 Discovery Service Publish Find Service Interact Service Provider Requestor Figure 4.1 – The Web-Services Interaction Model The standards body for the W3C defines a Web service as “a software system designed to support interoperable machine-to-machine interaction over a network.” Services insulate applications from the underlying platform (hardware and operating system) required to accomplish the task. Web services can be simple (authorization, searching, naming, registration), or complex, combining multiple services into a composite service that encompasses the comprehensive requirements of an application. For example, a user wishing to run a forecast model could utilize a complex service composed of services to: (1) build initial and boundary condition files on one machine, (2) send the results to a service on another system where the model is run, (3) generate forecast products using a third service and (4) visualize the results with a fourth service. Creating services to accomplish common tasks allows an organization to reduce the effort required to develop, port, and maintain its hardware and software systems. Composite services that make use of other services such as authentication and search reduce duplication of effort and increase software reuse, reliability, and security. Service-based architectures address the two most important aspects of data management integration at NOAA: data sharing and application interoperability. Data sharing refers to standards and infrastructure to support sharing data that is stored in different formats, made available through different access methods, and provided by independent sources. The value of insights made possible by merging data from diverse sources within the same visualization or analysis justifies efforts to provide an infrastructure that supports data sharing across the NOAA enterprise and to external programs, organizations and users. Application interoperability refers to a framework that provides the ability for applications to communicate with and use Web services provided by other applications. A service-based infrastructure that allows independent programs to interoperate by communicating across a network will make it practical to build systems from reusable parts, to adapt and connect existing systems quickly for accomplishing new tasks, to benefit from highly evolved and still useful “legacy” applications, and to automate processes among different organizational units that currently require manual steps. Building such an infrastructure from scratch is not necessary, since off-the-shelf and open source implementations of Web service infrastructure are available and will soon be included in most software development environments. Two types of Web service standards are currently supported by industry: SOAP and REST. SOAP and the Web Services Definition Language (WSDL), combined, provide a 17
  24. 24. NOAA GEO Integrated Data Environment CONOPS 18 way for service requestors and providers to exchange information through XML- formatted messages. These SOAP messages contain all the information needed to invoke a Web service through either a remote procedure called (RPC) or a Web service invocation. REST (Representational State Transfer) is a model for Web services based solely on HTTP. REST assumes that HTTP specifications provide all of the capabilities necessary for Web services and additional specifications, such as SOAP and WSDL are not required. Any item can be made available (i.e. represented) at a URI and, subject to the necessary permissions, it can be manipulated using one of the simple operations defined within HTTP (GET, PUT, POST, and DELETE). In setting GEO-IDE standards for an SOA, it is important to recognize that these two approaches both have advantages and drawbacks, and there is no reason to standardize a single architectural style when different services may require different styles. In some cases, it may be necessary to support both ways of accessing a service, to make it integrate well with development tools and to provide a capability to evolve. 4.3.Web Services in NOAA Web services can be described as a thin layer built on top of existing NOAA data management systems in which functional capabilities to access these systems are made available to the applications that require them. Additional Web services can be developed and added to GEO-IDE where functional gaps exist or new capabilities are required. Figure 4.2 illustrates a conceptual SOA for GEO-IDE (note the services listed are only representative; a comprehensive list is provided in Section 6.2). Conceptually, users access the information infrastructure to perform tasks that make use of information resources. The users’ activities are supported by the fabric of the information infrastructure which may include shared hardware resources and long-term data archives. Users may be of many types: operational forecasting centers, state environmental management agencies, fisheries managers, individual researchers, etc. The fabric of the infrastructure includes a set of components that support the use of the information resources. An important part of the fabric is one or more portals that provide the entrance point for users. Portals or Web-based graphical user interfaces permit users to locate and utilize distributed data or compute resources. One might envision data portals to access and utilize operational data, modeling portals to initialize and run weather or ocean models, and data management portals to monitor the state of NOAA’s data systems. These portals could utilize common services including registries (to locate data sources), metadata systems (for information about data content), and ontologies (to map name spaces into a common language) to locate the appropriate Web services that meet their needs. Information resources include datasets (e.g. observational data, processed data, model analyses, historical data collections), tools (e.g. quality control tools, analysis and visualization tools, open GIS software, software for generating derived datasets, event- detection software), numerical modeling modules (e.g. fully assimilative models to permit nowcasting, forecasting and data synthesis; model components that can be composed by users), and real-time data streams. Each resource is exposed within the organization as one or more services, e.g. as Web services, by which the resource is accessed or invoked. With this approach, only the way the service interface is described and accessed needs to be standardized, not the internals of the resource or the application in its local development environment. Thus, one “dataset” resource might be a NASA satellite image archive while another is a collection of ocean databases, some stored as flat files, some as relational databases etc, all accessible via data servers. Exposing the 18
  25. 25. NOAA GEO Integrated Data Environment CONOPS 19 former as a service might just involve writing a Web service implementation of image search and retrieval while for the latter might involve setting up a server that runs a data access client that is essentially one or more Web services. Other services can be built to locate data through registries, monitoring and control services to insure critical systems are available, and services to insure the timely delivery of data through the appropriate operational or research network. Figure 4.2 – A conceptual diagram of a Service-Oriented Architecture that integrates NOAA data management systems Two classes of users must be considered: those whose identities are managed and those whose identities are unmanaged. Users with identities that are unmanaged, use a portal essentially like a public Web-page. While a portal may or may not ask such users to provide some information about themselves, it does not authenticate their identity, manage security certificates or provide other secure access for them, or maintain a personal workspace for them. Users whose identities are managed, first authenticate themselves with the portal to establish their identity; for example, using passwords, a secure electronic ID, or through a biometric identification procedure. For these users, the portal manages the users’ security certificates that control access to resources within the organization. An emerging technology that requires further investigation for its application to GEO-IDE is grid computing. Its goal is ubiquitous computing, where computing is available on- demand and users do not have to be concerned with where their tasks are running or where data reside. The most common analogy for grid computing is the electric power grid. Key to the success of the power grid has been the development and adherence to standards (e.g. voltage, amps, cycles, etc.). As GEO-IDE progresses through pilots to a 19
  26. 26. NOAA GEO Integrated Data Environment CONOPS 20 distributed environment, developers monitor advances in grid computing and take advantage of tools and techniques that are developed. 4.4.Basis for development Most of the primary standards required to build a service-based GEO-IDE are already available; NOAA will not need to create or define them. Web service standards are being adopted by the business and research communities as a means to build distributed interoperable systems. The two most widely used versions of Web services, SOAP and REST, are based on industry standards. Extensions to Web service standards are being developed to provide task and resource management functions across heterogeneous computing environments. NOAA can also leverage existing distributed data technologies being used to link data providers with users via Services. For example, Open-source Project for a Network Data Access Protocol (OPeNDAP) servers have been deployed at numerous sites across NOAA to provide access to local, regional and global data sets on demand. These servers can provide information about the contents of model output or observational data, and can access and retrieve data for the requesting user or application. Other developments have been built on top of these services to provide added capabilities including servers to visualize model output, to handle new data formats and format conversions, and to build data catalogs. Several NOAA projects utilizing these capabilities include NOMADS to distribute model data, Meteorological Assimilation Data Ingest System (MADIS) to provide point data, and Live Access Server to handle oceanographic and other data. Demonstrated success in deploying distributed servers in heterogeneous environments has led to their being considered for use in the operational AWIPS system. These developments represent a good basis for building GEO-IDE; however, significant work remains to define a common language (e.g. conventions, schemas, etc.) so communicating processes can understand each other using the underlying standards. For example, XML schemas must be defined so Web services can identify themselves in a common way to clients or other services requesting a service. Conventions will need to be established or adopted for time representation, parameter names, units, and data formats to facilitate information exchange among disparate distributed processes. A management and architectural group, as described later in this document, will need to design and implement the SOA based on the analysis of current systems and future capabilities based on anticipated program goals and requirements. Technical committees must adopt, adapt, and if necessary, develop conventions and schemas that will be used to interpret requests and responses between communicating clients and Web services. Four general classes of Web services are anticipated: a. Operational Public Access Services: for public access to data, products and information services. Some examples include:  Electronic-commerce capabilities where required.  Subscription services so users can easily get the data they need when they need it. These could provide scheduled, event-triggered, or on- demand delivery mechanisms.  Common format translation.  Common coordinate transformation.  Visualization services. 20
  27. 27. NOAA GEO Integrated Data Environment CONOPS 21 b. Operational Services: where security, timeliness, and reliability are paramount. Some examples include:  Support for operational access to data (Warnings and Forecasts).  Subscription service.  Event notification service.  Format conversion. c. Scientific Services: where efficient and flexible discovery and access to data sets are required. Some examples include:  Model initialization, invocation, and steering.  Access to local data (online), local offline (Mass Store/Archive Services), remote online (ftp, OPeNDAP, others), remote offline (remote-Mass Store/Archive Service, OPeNDAP, others).  Observing System Simulation Experiments.  Scientific Data Stewardship procedures and Archival Providence. d. Commercial value-added services: The responsibility of the design group will be to both identify and develop common services that satisfy needs from both the operational and scientific communities, and provide individual specialized services where programmatic or mission-specific requirements are demanded. Of course, the security of these systems will be addressed in their design. A Notification and Data Subscription Service for Operations A simple example of a Web service is subscription to a near real-time data stream of observations or model outputs. For example, an application for displaying regions susceptible to aircraft icing conditions might subscribe to a service providing meteorological parameters from observations and model outputs. The application would subscribe with a filter specifying the needed subsets of data and would include its own interface endpoint to which notifications would be sent. The notifications need not include the actual data, but merely a reference or query that could be used to access the data when available. Standards now exist for event-driven notifications as Web services, and off-the-shelf implementations of the necessary infrastructure are also available that provide scalable data subscription services to applications. If such service interfaces were available for NOAA observational and model data, the current practice of polling an FTP directory every few seconds to see if desired data is available yet for download would no longer be necessary. Instead a much more scalable solution of event-driven notifications would provide timely access to applications that need real-time information for more complex processing. 4.5.Development Approach NOAA GEO-IDE must encourage relatively small exploratory projects to build necessary services, one component at a time, between currently non-interoperable systems, to support the specific operational priorities described above. The results of such projects could quantify the levels of effort required to fully tie each part of the overall data infrastructure together. If successful, each such project would achieve a significant innovation and create an important foundation for further interoperability. However, it would be a mistake to evaluate an SOA approach by merely connecting two applications through Web service interfaces. Any such two-party connection can usually be provided with less effort directly, without the extra overhead of a SOA infrastructure. The real 21
  28. 28. NOAA GEO Integrated Data Environment CONOPS 22 value of a SOA is its “network effect” that grows more rapidly than the number of services established. The GEO-IDE data management and implementation processes must take place concurrently and in an on-going iterative, spiral development approach, where managers, architects, developers, and users work together. The implementation of GEO-IDE will most likely have a lengthy transition period while necessary services are developed and implemented. While the GEO-IDE architecture is being developed, initial core services can be advanced and provide building blocks upon which the architecture will grow as both requirements and technologies change. Local database managers and staff programmers must be provided the guidance necessary to begin to build, or modify existing applications to a more generalized loosely coupled solution. This way, the entire NOAA community will begin to build the system from the bottom up, but in accordance with NOAA-wide principles and standards. Development of GEO-IDE will be based on an iterative, spiral-development process with the following stages: • Select and evaluate pilot projects that relate to both operational and research parts of NOAA – especially those that show promise toward high levels of interoperability. • Define standards, methods, schemas, security requirements, etc. necessary to interoperate within existing and emerging systems. • Implement using standards, and demonstrate portability and interoperability of approach. • Expand to new projects or capabilities and repeat the process. 4.6.Key Development Strategies Both initial and long-term key development activities of NOAA GEO-IDE include the identification of pilot programs that employ a community-based open architecture design and that have adopted GEO-IDE guiding principles. These pilots will not only provide a jump-start for initial investment analysis, but provide a working set of prototypes. Some key development strategies include: • Building upon self-describing formats. • Utilizing structural data typing to define, and classifying data and applications that require them. • Determining initial and then follow-on services needed. • Initiating pilot projects as recommended by the NOAA DMC that will advance or build the specific services discovering strengths and weaknesses of each. • Following industry-and community-driven standards as appropriate. 4.6.1.Structural data typing The GEO-IDE effort acknowledges that NOAA’s data systems are insufficiently integrated. This situation is a reflection of technology and management and decision- making strategies of the past that have tended to fragment data management, rather than to unify it. Lines of funding have traditionally been matched to observing system elements – satellites, ships, profilers, etc. – and data life cycle points – measurement, real-time applications, climate analysis, archive, etc. In the past, the observing system or function “owned” the data management specific to its system. Each observing system 22
  29. 29. NOAA GEO Integrated Data Environment CONOPS 23 element has therefore developed individualized approaches to data management, often involving the development of unique (and non-interoperable) data formats and protocols. Real-time data management strategies were devised with little thought to analysis or archive, and so on. Predictably these traditions have hindered the development of integrated data management. Communities of interest within data management are most naturally organized by structural type of data. The lines between these communities are drawn from the answers to key data management questions such as, what techniques are appropriate for searching for these data; for transporting (interchanging) these data; for visualizing or analyzing these data; and for storing or archiving these data? Communities of interest defined by structural data types provide a natural way to organize data management efforts and specify standards required for interoperability. For example, the kinds of standards, best practices, metadata, and access interfaces required for time-series data collections are similar for atmospheric, oceanic, hydrological, biological, or climate data. Traditional communities of interest defined by pattern of usage will continue to thrive of course, based upon scientific and societal goals. These communities will provide the requirements to an increasingly integrated data management community. For example, weather forecasters will continue to require synoptic access to observations; climate modelers will continue to view the same observations as time series. The role of the data management community will be to find unified solutions that address both of these usage patterns. Table 4.1 proposes an initial list of communities of interest based upon structural data types. In most cases the structural data types are the natural consequences of the manner in which the data are collected. For any given data stream there may be ambiguities regarding the appropriate structural data type under which it should be handled. As a general rule, the best way to resolve this ambiguity is to choose the most highly ordered data type that could describe the data. Table 4.1 is presented roughly in order from most highly structured data types at the top to least structured types at the bottom. Table 4.1 – Structural Data Types Structural Data Descriptions and Examples and Class subclasses further explanation • rectilinear grids • finite difference model outputs • curvilinear grids • finite element model outputs • finite element meshes • gridded (binned) data Grids outputs products (and collections of • level 4 (gridded) satellite fields grids) • “unstructured” grids (variable numbers of • spherical harmonic spectral vertices) coefficients1 1 In some cases, grids represent coordinate systems that are mathematically transformed from simple latitude-longitude-depth-time positions. Spherical harmonic spectral coefficients are an example of such. 23
  30. 30. NOAA GEO Integrated Data Environment CONOPS 24 Structural Data Descriptions and Examples and Class subclasses further explanation Moving-sensor • swaths • satellite passes multidimensional • HF radar fields • radials • side-scan sonar (and collections of same) • weather radar • time-ordered sequence of records2 associated • ocean moored with a point in space or a measurements3 more complex spatial feature. • fish landings at a port • stream flow records Time series • sun spot activity (and collections • climate data (surface of time series2) atmospheric stations) • paleorecords from cores, corals, tree rings, … • computed climate indices such as SOI • height-or depth-ordered • atmospheric soundings sequence of records1 at • ocean casts Profiles a fixed (or approximately • profiling floats (and collections fixed) point in time and of profiles) position in lat/long • acoustic Doppler instruments (structural overlap with time series) • time-ordered sequence • underway ship measurements Trajectories of records2 along a path • aircraft track data (and collections through space • ocean surface drifters of trajectories) • ocean AUV measurements • lines • shorelines • polygonal regions • fault lines Geospatial • marine boundaries Framework Data4 • map annotations • continually operating reference stations (CORS) 2 A “record” refers to one or more associated parameter values and associated metadata. 3 Standards for time series need to consider small, time-dependent excursions in latitude, longitude, and depth. Cabled ocean moorings are an example of such. 4 The “GIS perspective” must be a major focus in the discussion of all of the data classes listed in this table. 24
  31. 31. NOAA GEO Integrated Data Environment CONOPS 25 Structural Data Descriptions and Examples and Class subclasses further explanation • scattered points • tsunami or seismic occurrences Point Data5 • species sitings • geodetic control • “data about data” Like other data types, metadata – context information has distinct requirements for needed for the storage, access, archival, and interpretation of data transport. Metadata content is a major focus Metadata of discussions within all of the data types. Metadata as a “data type” refers specifically to its unique requirement and properties with respect to archival, access, and transport. 4.6.2.Advancing integration through pilot projects Pilot projects serve as a means to both evaluate and identify weaknesses in current technologies, and to begin the process of building and integrating NOAA’s data management systems. A test bed for data access and use is a fundamental building block for the development and implementation of many NOAA services. NOAA’s SOA must be implemented with both legacy and emerging systems. Pilot projects to address these needs are required. Services solutions however must be generic in that they be general enough to accommodate both existing standards and emerging standards. The OPeNDAP data transport interface is recommended to be used to provide a flexible basis for moving data between providers and users where operations like sub-setting, merging, formatting, and distributed data access are permitted. Adopting an OPeNDAP solution allows the possibility of reducing workload when integrating a new data source, or interacting with a new institution. The OPeNDAP technology is flexible and permits each institution to work using their favorite format or a basket of formats internally, but still maintains the goals of low-cost interaction with other institutions and ease of use. OPeNDAP in itself does not solve all data access issues, as application-specific knowledge of semantic structure and metadata layers remain. Semantic structures and metadata compatibilities will require convergence to naming schemas (e.g. the Climate and Forecast CF convention). Other services as outlined in Chapter 6 “Toward a SOA” must be built on a step-by-step basis with each new service adding services to the overall architecture. Key to the success of pull technologies is demonstration of host side data manipulation and sub-setting so that only needed data, not entire files, could 5 As an organizing principle for data, “Point Data” is the lowest common denominator. Most structural data types are reducible to collections of points, though with a loss of essential semantics in most cases. For example, a grid may be represented as a collection of ordered tuples. Some types of measurements, for example tsunami occurrences or species sitings, naturally possess limited structure. For these measurements, the Point Data structure is the natural classification. Note that real-time delivery of data will generally remove time structure, so that, for example, a collection of time series may reduce to Point Data when accessed in real time. 25
  32. 32. NOAA GEO Integrated Data Environment CONOPS 26 be retrieved. Thus the data transfer time and network bandwidth requirements could be minimized. Agencies and institutions benefit greatly from an emphasis on low cost of buy in; e.g. keeping standards and protocols and software components simple and lightweight enough to be adapted and deployed without a dedicated team of local information technology experts. Within NOAA’s SOA, components need to be evaluated and merged where individual components provide one or more services to other services or to clients. These collections of components crossing NOAA goals will continuously adapt from user requirements in an iterative spiral developmental software engineering approach. There are several areas where pilot projects will enhance NOAA’s understanding of existing standards, permit NOAA to evaluate technologies which can be applied to data systems integration, speed the development of a systems architecture, and enhance the prospects for success of GEO-IDE. A list of perceived challenges and choices are given along with existing technologies that could be investigated and applied toward GEO-IDE. These include: • Security: Explore security implications of Web services and methods to access proprietary data. • Metadata: Apply proposed standards to NOAA data in order to identify and locate data. • OGC: Investigate mechanisms to integrate OGC standards into NOAA data management systems. • Data Transport: Explore data transport mechanisms to improve the movement of data across the network. • Structural Data Typing: Categorize and build common mechanisms to access NOAA data for a specific community (e.g. Ocean Datasets). Extend to other communities when appropriate. • Integration: Link CLASS and NOMADS under a common Web services infrastructure to support the discovery, access, and transport of data. Recommended projects for application-specific use of semantic structure, client and server-side processing, standards advancement, and metadata resources include: • GrADS Data Server (and its underlying “Anagram Server”) • Live Access Server • OGC Standards: Catalog, Web Services (OWS)6, Coverage, Map, and Feature Services • Earth Observing System Clearinghouse • Global Change Master Directory (GCMD) Catalog Service • Earth Observing Clearing House (ECHO) • Open Abstract Data Distribution Environment (ADDE) • THREDDS Data Server/Catalog Services 6 For more information on OWS see http://portal.opengeospatial.org/files/?artifact_id=10380 26
  33. 33. NOAA GEO Integrated Data Environment CONOPS 27 5. Governance Structure and Program Control 5.1.Background Effective management of NOAA’s Data Management systems requires a strong governance structure and a well-defined process to ensure each program component is effectively monitored and appropriately managed. NOAA’s initiatives in the integration program, given the magnitude and breadth of the data management program, require many components operating in an integrated and synchronized manner. A well-defined governance process and structure will better ensure that planning and control processes are constructed, resources are used wisely, and measurable results will be delivered. Proper governance processes are crucial to how a program is managed. It will ensure that roles and responsibilities of all associated entities are clearly articulated; that the program is managed as a portfolio of projects, carefully selected according to clear, repeatable processes and objective criteria; that projects are well designed, properly implemented, and effectively managed; and the overall program performance is regularly assessed and evaluated. A well-executed governance process will help protect the data management program from being distracted from achieving program goals and objectives. Having a sound, proven process for managing the performance and outcomes associated with this program is the best insurance against these pressures. A well-defined governance process will ensure we provide clear assurances that technology investments are necessary, purposeful, and will result in demonstrated improvements in mission effectiveness and serve society’s needs. The NOAA Data Management Committee (DMC) was established by the NOSC to coordinate the development and implementation of data management policy across NOAA. The DMC addresses issues and opportunities that require coordination among the Goal Teams, Line Offices, and Data Centers to address data management responsibilities. The DMC’s objective is to provide clear guidance to NOAA on matters of data management and to provide the NOSC with the information it needs to bring about integrated data management within the NOAA Observing Systems Architecture. The DMC established the Data Management Integration Team (DMIT) to develop the GEO-IDE Concept of Operations and provide expertise and advice on the near-term (5- year) actions needed for implementation. DMIT includes representatives from all NOAA line offices and goals. To ensure synergy and effective coordination with IOOS DMAC activities, all NOAA members of the DMAC Steering Team are also members of DMIT. The relationships between these groups are illustrated in Figure 5.1. With respect to information management standards, the DMC recommends that GEO- IDE be identified as the NOAA coordination group for all NOAA interactions with external standards activities relating to scientific, geospatial or environmental data and information. 27
  34. 34. NOAA GEO Integrated Data Environment CONOPS 28 Under Secretary of Commerce for Oceans and Atmosphere NOSC NOAA Observing System Council All Line Offices All Goals DMC NOAA Data Management Committee DMIT Data Management Integration Team Figure 5.1 – GEO-IDE governance structure 5.2.Governance 5.2.1.Guiding principles The success of GEO-IDE depends on properly run and coordinated operations at the global level. If the program is to be successful, effective management is required at all levels. Since GEO-IDE is a global program that aims to produce consistent and comparable data sharing and standards for all NOAA Line Offices, we must establish standards, and provide guidance to all levels of the organization. A governance process, by definition, requires discipline, consistency, collaboration, and communication. Provided below, are the principles that shall be used to guide program management efforts, management decision-making, evaluation, and pursuit of meaningful results: Line offices and goal team participation – DMIT strongly encourages the full participation in the program from each line office and goal team. Business and technical expertise should be represented in all levels of the governance structure. Transparency – All participants have a clear view into the governance process, program plans, project plans, business processes, and other elements and components of the program. Standard policies and procedures must be understood throughout all levels. Proper representation – DMIT members represent both the program’s national needs and their own agency. DMIT members need to have a broad understanding of NOAA’s data management needs and also be sensitive to customized needs to satisfy practical situations and true business needs of specific programs. Committee members will wear a “big hat” when working on issues of national importance; “small hats” are appropriate when it is necessary to interpret the voice of the customer and to translate this into investment strategies that are relevant to all players. 28

×