SlideShare a Scribd company logo
1 of 13
Download to read offline
The National Virtual Observatory: Leveraging the
Astronomical Resources of the Nation for Astrophysics in
the 21st
Century
A National Mandate:
The National Virtual Observatory (NVO) needs to provide a unified and
negotiated architecture to the nation’s unparalleled, long-term
investment in astronomical observations. In addition, the NVO should
support a negotiated architecture for the support of laboratory and
theoretical modeling results to facilitate the analysis of this same
astronomical data. The development of the NVO is complicated by the
heritage of existing programs and the legacy of existing systems. In
the interest of future scientists and their research programs, NASA
needs to lead a tactical and strategic effort to federate university,
industry, and government offices under a seamless multi-node program
dedicated to astrophysical research. The NVO should be this program.
In return, NASA’s drive toward federation will enable highly focused
new mission concepts, enhance the return of an investment in research
to the public and the nation’s educators, and develop a national market
place for analysis, visualization, and archival tools. The NVO will
provide a higher return on the nation’s astrophysical research
investment than is currently possible.
After many decades of working in space, the NASA vision of space
science missions divides into three phases: research, flight mission
development, and mission operations and analysis. This cycle has been
iteratively followed through a number of NASA observatory programs at
wavelengths spanning the ultra-violet, gamma-ray, and x-ray and through
the infrared and sub-millimeter. Without NASA flight opportunities and
dedicated satellite platforms, observational astrophysical research at
these wavelengths would not have been possible. Following the launch
of SIRTF in the coming years, NASA will have completed its road map for
launching the nation’s great observatory program. NASA programs have
served as inspiration to the astronomical community and the nation and,
as a result, have reshaped our view of the universe and have greatly
influenced the practice of fundamental astrophysical research. NASA
must now lead an effort reaching beyond federal organizations, beyond
international boundaries, and into University and Government labs –
including (but not limited to) programs and facilities created by NASA
- to federate the nation’s astronomical archives, laboratory research,
and theoretical modeling efforts. The federation of the astrophysical
research community organized by wavelength regimes, observational
analysis, and theoretical modeling will in addition to the benefits
above provide a leveraged and growing legacy of NASA’s multi-decade
investment in the space sciences.
Federated systems are a natural by-product of evolutionary development.
Riegler [1] has modeled the lifecycle of space science missions as a
federated series of programs:
- Research: Gives rise to mission concepts and test key
concepts – theoretical studies, new instrumentation,
exploratory ground based and sub-orbital research.
- Flight Development: Mission studies, development and planning,
check out and testing.
- Operations and Data Analysis: Planned observations, data
interpretation, confirm or revise theoretical modeling.
As illustrated in the figure below, each phase flows naturally into the
one following. Operations and data analysis typical result in new
scientific hypotheses, which requires new missions, new technology,
after which the cycle repeats.
Observer's
Plan
Observer's
Proposal &
Experimen t
Archive
Analyzed
Data
Data
Data
Collectio n
Plan
Mission
Plan
Quick
Look
Publication
Simulation
QLA
B' C
F
F
D
G
G
F
G
PR
A'
A
C'
B
C'
Figure 1. The iterative lifecycle of NASA science
missions: clear communications are essential among all
lifecycle phases.
What is evident from figure 1 is the need for clear communications
between all phases of the science mission lifecycle. Above all else,
the process of federation establishes clear channels of communication.
Many other systems are federated. For example, NASA space flight and
research centers, the US military (Army, Navy, Air Force, and Marines),
the US economy, the human body, and the internet are all federated
systems. All these examples have defined means of communication. As
communication improves, so does performance. In many cases, systematic
problems can be linked to problems in communication. Defects in human
DNA (mistaken communications) can produce the wrong amino acids and
have serious life threatening consequences. Clear and standardized
communications are key to the success of all federated systems.
In addition to communications, federated systems must have clear guide-
lines for functional performance and expected deliverables. Each must
component must satisfy a specific need of the overall federated system.
In the US economy, buyers and sellers compete for one another’s
business based upon negotiated offerings. When viewed from a far, the
market place is complex. But in an ideal economy, the solutions are
often optimal. ‘Market place’ standard can specify the federated
principles in any number of management, engineering, and scientific
disciplines. NASA must work to federate the existing astrophysical
‘market place’ to provide an efficient means by which research and
scientific data are clearly communicated among ‘market place’ producers
and consumers. These standards of the ‘market place’ should be defined
under the NASA’s NVO program.
With a large number of astrophysics specific missions in operation (9),
development (9), and study (10), along with a growing list of
previously archived science data (available from ST ScI, IRSA,
HEASEARC, NSSDC, etc.). NASA is aware of the need to federate science
data visualization and archival activities. A broad series of report
address many of the issues in a federated data and analysis research
center: 1982 CODMAC report, 1996 final report of the Task Group on
Science Data Management [2], the 1996 – 2000 Senior Review of
Astrophysics Mission Operations and Data Analysis Programs [3], the
1997 – 2000 meeting reports of the Astrophysical Working Group [4],
most recently the 2001 – 2010 decadal review by the Astronomy and
Astrophysics Survey Committee [5]. A combination of these documents
highlights either the needs or requirements of the astrophysics
community in developing a federated data program. Given the
opportunity at hand, NASA can capitalize upon the current NVO call to
develop a similar federated structure for laboratory and theoretical
research programs. Many of these efforts are already federated to some
degree. NASA is required to join the existing components together and
create a functional ‘market place’ of astrophysics research which will
benefit the scientific community, the nations educators, and the
broader tax paying public.
In specific, the science community is asking for a unified approach for
science data centers and national archives to exchange data from a
large number of astrophysical missions. Many researchers find the
large number of visualization tools required for archival research and
idea generation, proposal preparation and mission planning, data
reduction and analysis to be specific to either missions or science
centers. Astronomical software developers find that to support the
documentation needs of community, the programming needs of the
facility, and the algorithmic needs of the instrument teams requires a
duplicity of effort in all three areas. Since the earliest reports on
data center management have been issued – even as early as 1995 – the
connectivity and software modeling tools of the information science
community have permanently changed our world. Although most
astronomical data center developers (> 80%) use object oriented
languages (C/C++/JAVA/Visual Basic) almost half (45%) do not use object
oriented modeling methods (e.g. OMT, Booch, or UML) [6]. Of this
group, only a small fraction (2%) have a formal degree in computer
science although almost half (48%) have doctorates. The needed for a
federated program to guide center developers in software management and
engineering practices should also be contained within the NASA’s NVO
program.
The issues of the NVO are not unique to NASA, nor are they unique to
science. NASA must work with university, government, and industrial
partners to support the development of federated network of
astrophysics and information science centers to address the needs of
the science research community, educators, and the public.
A National Mandate and Plan for the Virtual Observatory:
The NASA vision of astrophysical research must drive the development of
the NVO. Creation of a successful federate system – moving forward
from existing subsystems – requires vision, a mission, a clear set of
values. The vision is a statement of the future – where the community
needs to go. The mission serves as a charter for the organization. The
values are the principles upon which the organization is based. And a
set of specific, measurable, realistic, and time oriented goals are
needed to measure the progress in achieving these three. USRA suggests
the following.
The NASA NVO Vision:
The astronomical community needs to chare a common set of values and
approaches for software management and development. These values are
reflected in the choice of tools and processes by which the community
develops applications whose strength and longevity arises from a
growing interactive network of scientists, engineers, and managers.
The NASA NVO Mission:
The NVO is chartered to provide the community with the recommended best
practiced for the developments at hand. This spans not only ranges of
technology readiness (perhaps focusing on only the highest TR levels)
but also addressing the needs of the observational and theoretical
astrophysical community.
The NASA NVO values:
1. That open and competitive community wide participation between
University, Government, and Industrial research centers is essential to
the scientific, engineering, and management health of the NVO.
2. That "best practices", "lessons learned", “peer review”, and
"appropriate management infrastructures and insight" are the guiding
principles of the NVO.
3. That evolving developments in the field of software engineering
must sit side-by-side with established mechanical, optical, electrical,
and cryogenic disciplines with the country's astronomical community.
The NASA NVO Goals:
1. Immediately fund a study team to generate US community buy-in by
2002 on the broadened approach of the NVO. Formulate and issue a NVO
lead institution proposal call by 2002.
2. Have an NVO lead institution selected and funded by 2004.
3. Have the lead institution established as a central facility (like
the astrobiology center) and issue a proposal call for NVO development
by 2004.
4. Have funded six 'legacy type' proposals - with institutional leads
and partners to address the known needs of the community (e.g.
astronomical tool visualization, archiving practices, instrument
control, observational modeling and data reduction) by 2006.
5. Have each proposal meet a series of deliverables and milestones in
community workshops, continued education, product deliverables,
measures community satisfaction, etc. each year between 2007 and
2010.
6. Re-assess the direction of the NVO and the current leadership in
2008. Assess the successes of proposal deliverables in 2010.
7. Extend, re-bid, re-asses the success of the lead NVO institution by
2012.
8. Begin phase II of the NVO activity in 2014.
A USRA led effort within the NASA NVO program:
The Universities Space Research Association (USRA) with its present and
recent programs to operate major NASA facilities such as the
Stratospheric Observatory for Infrared Astronomy (SOFIA), the Research
Institute for Advanced Computer Science (RIACS), the Center for
Excellence in Space Data and Information Sciences (CESDIS), and the
Lunar and Planetary Institute (LPI) understands the issues of the NVO
in detail. Furthermore, USRA foresees the need to involve actively the
national university community in the NVO initiative at an early stage.
Clearly, the dynamic interaction of the research and education that
characterizes the university academic environment will be an essential
component of the successful NVO program. USRA, with its 85 member
institutions, has the expertise necessary to pull the university
community together in support of the NVO program.
As an example of this approach, USRA offers the successful
collaboration engineered to produce the design of the Data Cycle System
(DCS) for SOFIA. We describe how the USRA approach to teaming should
benefit the development and foundation of the NVO.
The SOFIA Data Cycle System Development:
The SOFIA DCS is designed to general needs of an observatory and
instantiated for the specific needs of SOFIA. The DCS covers the
complete science cycle of the observatory. The cycle starts with
research into archives of previous observations, continues through
proposal submission, observation, and analysis planning. And includes
instrument scripting, data reduction, and analysis. It concludes with
interpretation and publication followed by archival storage of the
results. The DCS serves as a micro-economic proto-type of the NVO
concept. The DCS architecture is engineered as distributed development
with an ability for load sharing among networked systems and users. By
developing a system that enables fluid access to existing models and
data and which support distributed development, modeling, and analysis,
the DCS will facility the planning and decision making process of the
observatory science programs. The work done to date on the development
of the SOFIA DCS architecture and operating proto-type shows that the
core system has the capability to support the broader science cycle
Develop
Proposal
Awareness
Publish
Review
Proposal
Refine
Proposal
Flight
Planning
Scheduling
Analysis
Planning
Observation
Planning
Make
Observations
Analyze
Data
Scientific
Interpretation
Figure 2 – The SOFIA Data Cycle System is based upon technologies
adopted as current industry standards, platform neutral, and
vendor neutral. Using a foundation of CORBA and XML, the
architecture is flexible to accommodate future growth and change.
planning and decision analysis tool needed to support observatory
erations.op
An Overview of the USRA SOFIA Data Cycle System:
To scope of the DCS is to address the complete astronomical science
data cycle. The system has been in design and development since early
1999 and has passed though four successful reviews. USRA leads a team
in which the core architecture development is done by the Center for
Imaging Science at the Rochester Institute of Technology. The team
includes experts from the following organizations: GSFC, ARC, Sterling
Software, UCLA, U. Chicago, Cornell and IPAC. The core DCS architecture
design has been completed and prototype software has been written and
demonstrated to an external review team in November, 2000.
The following items comprise the goals of the DCS as presented to the
SOFIA Science Council in December 1998:
 Complete end-to-end data cycle support maximizes scientific return
 Capitalize on lessons learned
 Ease of use
 Relentless observing
 Interfaces with a uniform look and feel
The goals of the DCS were summarized as: More photons + more
astronomers + more archival research = more scientific return
The gain in SOFIA scientific performance is accomplished by combining:
 Excellent community access
 Uniform interface for general investigator operations
 Uniform approach to facility instrument operations
 Uniform interface to data reduction tools
 Uniform production of data products for archival research
In developing the DCS requirements and the DCS core prototype, USRA
finds a clear correspondence to the needs of the NVO. The fact that
the design has reached a level of maturity that enables the
construction of a full prototype for testing by the summer of 2001
provides a synergistic opportunity with NVO.
An illustration of the DCS concept is shown in Figure 2. The goal is
to construct a system that enables fluent collaboration between the
scientists and the observatory to conduct the complete cycle of
observation planning, data collection, data reduction, analysis and
archiving. The DCS will enable the scientist to model the expected
results of an observation before the observation is made. It will be
possible to construct simulated data sets based on instrument models,
construct pipeline models based on existing modules written by
instrument team and science center experts, access supporting data from
archives, and simulate the observation and data reduction cycle. The
plan that is produced will include the flight plan, instrument scripts
for data acquisition, and the scripts for the data pipeline. The DCS
will support final reduction of the data, archiving of all raw and
final products, and archiving of publications.
Key Software Technologies and Architectures for the DCS:
Requirements for the SOFIA observatory include providing twenty years
of service to the astronomical community. As change is the only
constant in today’s computing industry, it is critically important that
the DCS not be tied to any single platform or vendor standard.
Therefore, the underlying technologies used in the DCS are those proven
to be industry standard, platform neutral, vendor neutral, and flexible
enough to accommodate future growth and change. Therefore, the DCS
utilizes foundation technologies such as CORBA and XML.
Using CORBA and XML as its foundations, then, the Data Cycle System
embodies the “continuous improvement” paradigm. As new image reduction
algorithms are developed, as new telescope instruments are created, as
new ways of planning and executing observations are formulated, the DCS
can easily incorporate these improvements. Additionally, these
improvements are additions to, not replacements of, the software
components that make up the system. It is important to maintain the
data and algorithms used in previous practice as a reference when
evaluating potential new methodologies. Continuous improvement and
mechanisms that enable it have been a part of the DCS since its design.
The DCS is a federation of software modules, implemented using CORBA
and communicating via XML. The DCS federation is grouped into five
functional domains. These domains are Data Acquisition, Data Reduction,
Data Storage, User Interaction, and the Task Library. We show in figure
3 the relationship of each domain. We describe the detail of all five
domains in Appendix A. of this paper. The elements within each
functional domain, or subsystem, implement a well-defined set of
interfaces to the rest of the system. In this way, the implementation
details of a given component are not important to other software
components in the system; indeed, those details are not known. This
object-oriented encapsulation of functionality is instrumental in
smoothly connecting software components developed by potentially
different developers at different times together into a cohesive whole
called the Data Cycle System.
The DCS is a flexible, capable, and powerful system that enables the
operation of an astronomical observatory. It is adaptable to future
improvements in instruments, computers, software, and the science needs
of astronomers.
DCS Management, Engineering, and Science leadership:
Within USRA SOFIA program, the DCS development has designated
management, engineering, and science leads. Each lead is partnered
with a deputy to provide a solid management foundation during
development. The current positions are Bob Hovde (USRA) – Management
lead, Peter Sharer (Sterling Software) – Management Deputy, John
Graybeal (Sterling Software) – Engineering Lead, Bob Krzaczek (RIT) –
Engineering Deputy, Sean Casey (USRA) – Science Lead, and Joel Kastner
(RIT) – Science Deputy.
For the NVO white paper development, USRA involved participation from
Eric Becklin (USRA), Sean Casey (USRA), Ian Gatley (RIT), Joel Kastner
(RIT), Jacques L’Heureux (USRA), Bob Krzaczek(RIT), Barry Leiner
(USRA), Mark Morris (UCLA), Harvey Rhody (RIT), and Peter Sharer
(Sterling Software). USRA’s involvement in SOFIA, RIT’s involvement in
the South Pole astronomy program, and UCLA’s participation in the Keck,
etc. combined with the joint experiences of the individuals involved
provide the foundation for URSA to move forward as a strong lead within
NASA’s NVO program.
The NVO Development:
The NVO will touch on all parts of the process of scientific inquiry
and will be the central system through which all data and processing
resources are accessed. The NVO will necessarily be distributed, to
connect geographically distributed users and geographically distributed
data sources. Indeed, a major challenge is the distributed and
heterogeneous character of the basic environment. Kepner and McMahon
[7] make a case that the NVO address all of the following elements: (1)
Acquisition and storage of raw data; (2) Data reduction; (3)
Acquisition and storage of detected sources and (4) Multi-sensor/multi-
temporal data mining. They note that “the NVO’s core data mining and
archive federation activities are heavily dependent on the underlying
data pipeline software necessary to translate the raw data into
scientifically relevant source detections.”
Furthermore an open architecture is required for any data system
serving the NVO; the community must be able to contribute to the
growing body of data calibration and analysis knowledge, as expressed
in algorithms and/or code, that is available to all NVO users.
A definition of established roles within NVO is required. This should
necessarily include the assignments of:
A) Project Scientist – an astrophysicist with extensive experience in
the development of data and archive systems who would lead the top-
level science requirements definition for the NVO program based upon
input from an established science definition team and the community.
B) Project Engineer – an information technology expert who would
oversee the design, development, and implementation of an overall
architecture that satisfies the specified science requirements of the
program.
C) Project Manager – an individual experienced in leadership within
science and engineering community, able to interact closely with NASA
head quarters management, and who is able to guide science and
engineering definition and development efforts to meet the goals of the
NVO program.
The USRA track record of experience and accomplishments:
USRA is directly involved in center activities at the NASA Goddard,
Marshall, and Johnson Space Flight Centers as well as NASA’s lead
center for information technology the Ames Research Center. USRA has a
successful track record of working closely with NASA centers and with
NASA quarters staff. USRA also has working relationships with external
NASA science centers – the Space Telescope Science Institute at Johns
Hopkins University and the SIRTF Science Center at the California
Institute of Technology. USRA is tied to NSF, other government
programs, and other non-profit organizations (e.g. AURA) through its
direct University membership consortium. Working teaming relationships
include USRA partnering with for-profit companies such as Raytheon
(SOFIA) and Lockheed (CSOC). In addition, USRA has a demonstrated
record of spinning off appropriate research developments as start-up
businesses within the country’s new economy (e.g. Scyld Computing
Corporation - the original Beowulf team providing Second Generation
Beowulf Clustering [8]).
USRA is available to provide strong leadership in organizing the
community to speak to NASA’s request for an NVO development roadmap.
USRA has before – and is able to repeat past successes in support of
NASA’s requests.
References
[1] Guenter Riegler, “The Lifecycle of Space Science Missions,”
Research Division, Office of Space Science, NASA Headquarters,
September 13, 2000. http://www.aas.org/policy/NASALifeCycle.htm
[2] Jeffrey Linsky, “Final Report of the Task Group on Science Data
Management to the Office of Space Science”, October 23, 1996.
http://adc.gsfc.nasa.gov/~gass/linsky/report_available.html
[3]NASA Senior Review Reports:
2000 - http://spacescience.nasa.gov/codesr/results/SenRev00.PDF
1998 - http://spacescience.nasa.gov/codesr/results/SenRev98.PDF
1996 - http://spacescience.nasa.gov/codesr/results/SenRev96.PDF
[4] The "Astrophysics Working Group" (AWG)meeting reports:
http://www.astronomy.ohio-state.edu/~AWG/Meetings/
[5] The Decadal Report: Astronomy and Astrophysics in the New
Millennium, Astronomy and Astrophysics Survey Committee, National
Research Council - http://books.nap.edu/catalog/9839.html
[6] A. F. Granados, “A ‘Scientific’ Approach to Software Project
Management: Part II: Results of a Survey of Scientific Computing”, in
Astronomical Data Analysis Software and Systems IX, ASP Conference
Series, 2000, Vol 216, N. Manset, C. Veillet, and D. Crabtree, eds.
[7] Jeremy Kepner and Janice McMahon, “High Speed Interconnects and
Parallel Software Libraries: Enabling Technologies for the NVO,” in The
evolution of Galaxies on Cosmological Timescales, ASP Conference
Series, 1999, J. E. Beckman and T. J. Mahoney, eds.
[8] Scyld Computing Corporation http://scyld.com/
Glossary:
CORBA, the Common Object Request Broker Architecture, is an industry
standard set of standards that define how to distribute an object-
oriented software architecture across platforms and networks, and how
components of that architecture remotely inter-operate with each other.
It enables the communication between software objects that may be
located on multiple dissimilar machines and implemented in different
languages. CORBA is not a software product. Rather, it is a standard
maintained by the Open Management Group, a consortium of hundreds of
organizations. Software applications based on CORBA can, by definition,
communicate with each other despite being implemented on a variety of
vendor platforms.
XML, the Extensible Markup Language, is a remarkably flexible yet
simple method for representing and structuring information. Its primary
strength and advantage over other formats for information exchange is
its ability to be extended to accommodate future data structures.
Rather than implementing a fixed set of structures for information,
developers using XML can extend a given structure repertoire by taking
advantage of in-document definitions: an XML document can define its
structure along with the data it represents. This gives software
developers a very rich and powerful mechanism for both defining
information exchange now, as well as accommodating future enhancements
and extensions. The use of XML directly addresses problems associated
with heterogeneous data structures.
Appendix A:
Below is a description of the domains which comprise the DCS core
prototype. The federation of modules provides a flexible architecture
to support the continuous improvement in SOFIA instruments, computers,
software, and the science driven needs of astronomers and the
observatory.
The Data Acquisition subsystem is responsible for rendering, or
translating, a planned observation into the idiosyncratic requirements
of a particular instrument. Rather than re-implementing observation
software for each instrument supported by the DCS, the system instead
maintains observation plans in a flexible neutral internal format. When
interacting with an instrument, the DCS applies a collection of
software components to gradually refine and translate that observation
into the format required by the instrument. This approach minimizes
redundant software development and maximizes re-use of code between
similar instrument science teams.
The Data Reduction subsystem implements algorithm pipelines, converting
raw instrument data into useful science products. This reduction is
automatic; when raw data is introduced to the DCS, an appropriate
pipeline is immediately scheduled and executed at the earliest
opportunity. The DCS supports pipelines running on machines not only at
the SOFIA Science Center, but distributed across networks and even
reaching to a researcher’s desktop. Someone implementing a new pipeline
algorithm, or evaluating an improvement to an existing algorithm, can
easily use the DCS as a “test harness” for their software. All the data
and algorithms available to a general investigator are available to a
developer for testing and evaluation of improvements to the DCS.
The Data Storage subsystem provides a repository of information to the
distributed components of the DCS. More than just “data” is maintained
in DCS Storage; experiments, observation plans, flight logs, reduction
pipelines, and proposals are all maintained within DCS Storage, along
with the expected archives of raw and reduced science data. All the
information in this subsystem is maintained in a neutral format that is
easily processed by software components within the DCS, and is easily
exchanged with external DCS customers and archives.
The User Interaction subsystem is a common interface to the
functionality provided by the DCS. In fact, the rest of the DCS needs
to provide no user interfaces; instead, all user interactions are
conducted through a common, yet customizable, interaction layer. In
this way, changes made to a user’s DCS experience (based on, perhaps,
spoken languages or skill levels) do not require changes in the rest of
the system software. The software that drives an instrument, for
example, is unaffected by the user’s preference of German over English,
or the user’s skill level in using the system.
The Task Library works closely with the User Interaction subsystem in
supporting the imaging scientist using the DCS. The Task Library
provides the perceived intelligence of the DCS, coordinating the
actions of the various system resources (acquisition, reduction,
storage) in ways that satisfy user’s requests. A task may be simple
(“locate any data matching a specified target”) or complex (“conduct a
specified observation, reduce its data, store all intermediate and
final results, and email the observation’s authors the results of the
experiment when available”). Most importantly, the Task Library is
where past, present, and future “best practices” are embedded within
the DCS and made available to all of its users in a consistent, uniform
manner.
Appendix B:
We have included this section (Section 3 - PRINCIPLES OF SUCCESSFUL
MANAGEMENT OF SCIENTIFIC DATA by Linsky et al.) because of it’s
timeless statement of science software requirements.
“Through extensive case studies, the Committee on Data Management and
Computation (CODMAC) of the Space Science Board, National Research
Council, derived a suite of principles or operating guidelines for
maximizing the scientific utilization for space data (NRC, 1982). The
case studies involved large data centers such as the National Space
Science Data Center (NSSDC), data activities associated with missions,
and relatively small groups of individuals who were working on
generating new products from data long after mission operations had
ceased. The CODMAC principles and associated recommendations have been
used over the past 15 years to help guide data system development
within NASA, including establishing programs at NASA Headquarters,
implementation of well-defined data systems within missions that
include generation and validation of data products, and the recognition
that discipline oriented systems are best suited for maximizing the
scientific utilization of space data.
“CODMAC also provided detailed recommendations for distributed,
discipline-oriented data systems, focusing on expected data sets for
each space science discipline, probable growth in networking and
computational capabilities, and an examination of how each discipline-
oriented community will likely work with data (NRC, 1986)…
The principles are listed here, and later in the Report we restate
these principles and our recommendations for successful space science
data management that flow from these principles and the new case
studies that we discuss later in this Report. The Task Group members
unanimously concur that the principles remain entirely valid today, and
they were used to help guide deliberations and recommendations.”
1. ``SCIENTIFIC INVOLVEMENT. There should be active involvement of
scientists from inception to completion of space missions, projects,
and programs in order to assure production of, and access to, high-
quality data sets. Scientists should be involved in planning,
acquisition, processing, and archiving of data. Such involvement will
maximize the science return on both science-oriented and applications-
oriented missions and improve the quality of applications data for
applications users.''
2. ``SCIENTIFIC OVERSIGHT. Oversight of scientific data-management
activities should be implemented through a peer-review process that
involves the user community.''
3. ``DATA AVAILABILITY. Data should be made available to the scientific
user in a manner suitable to scientific research needs and have the
following characteristics:''
(a) ``The data formats should strike a proper balance between
flexibility and economics of nonchanging record structure. They should
be designed for ease of use by the scientist. The ability to compare
diverse data sets in compatible form may be vital to a successful
research effort.''
(b) ``Appropriate ancillary data should be supplied, as needed,
with the primary data.''
(c) ``Data should be processed and distributed to users in a timely
fashion as required by the user community. This responsibility applies
to Principal Investigators and to NASA and other agencies involved in
data collection. Emphasis must be given to ensuring that data are
validated.''
(d) ``Proper documentation should accompany all data sets that have
been validated and are ready for distribution or archival storage.''
4. ``FACILITIES. A proper balance between cost and scientific
productivity govern the data-processing and storage capabilities
provided to the scientist.''
5. ``SOFTWARE. Special emphasis should be devoted to the acquisition or
production of structured, transportable, and adequately-documented
software.''
6. ``SCIENTIFIC DATA STORAGE. Scientific data should be suitably
annotated and stored in a permanent and retrievable form. Data should
be purged only when deemed no longer needed by responsible scientific
overseers.''
7. ``DATA SYSTEM FUNDING. Adequate financial resources should be set
aside early in each project to complete database management and
computational activities; these resources should be clearly protected
from loss due to overruns in costs in other parts of a given project.''

More Related Content

Similar to 00 12-06 the national virtual observatory

WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...Ramine Tinati
 
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...Lyle Birkey
 
Movilidad Ry C
Movilidad Ry CMovilidad Ry C
Movilidad Ry Croke
 
artificial inteliigence in spacecraft power application
artificial inteliigence in spacecraft  power applicationartificial inteliigence in spacecraft  power application
artificial inteliigence in spacecraft power applicationarjuna adiga
 
Decision making under uncertainty
Decision making under uncertainty Decision making under uncertainty
Decision making under uncertainty Ofer Erez
 
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...American Astronautical Society
 
Plan de orbita 2013 ingles
Plan de orbita 2013   inglesPlan de orbita 2013   ingles
Plan de orbita 2013 inglesCarlos Duarte
 
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012Iugo Net
 
nasa open government plan
nasa open government plannasa open government plan
nasa open government planGovLoop
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 

Similar to 00 12-06 the national virtual observatory (20)

ROSES-CADET
ROSES-CADETROSES-CADET
ROSES-CADET
 
AAS National Conference 2008: Mike Freilich
AAS National Conference 2008: Mike FreilichAAS National Conference 2008: Mike Freilich
AAS National Conference 2008: Mike Freilich
 
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
WSI Stimulus Project: Centre for longitudinal studies of online citizen parti...
 
Investments in the Future: NASA's Technology Programs
Investments in the Future: NASA's Technology ProgramsInvestments in the Future: NASA's Technology Programs
Investments in the Future: NASA's Technology Programs
 
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...
FY 2013 R&D REPORT January 6 2014 - National Aeronautics and Space Administra...
 
Herring Noaa Spring08
Herring Noaa Spring08Herring Noaa Spring08
Herring Noaa Spring08
 
Movilidad Ry C
Movilidad Ry CMovilidad Ry C
Movilidad Ry C
 
artificial inteliigence in spacecraft power application
artificial inteliigence in spacecraft  power applicationartificial inteliigence in spacecraft  power application
artificial inteliigence in spacecraft power application
 
Decision making under uncertainty
Decision making under uncertainty Decision making under uncertainty
Decision making under uncertainty
 
Moore chris[1]
Moore chris[1]Moore chris[1]
Moore chris[1]
 
Niac aiaa
Niac aiaaNiac aiaa
Niac aiaa
 
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...
NASA Headquarters Associate Administrators' Panel: Exploration Systems Missio...
 
Cp 04 01
Cp 04 01Cp 04 01
Cp 04 01
 
Plan de orbita 2013 ingles
Plan de orbita 2013   inglesPlan de orbita 2013   ingles
Plan de orbita 2013 ingles
 
EKeller-resume_12232016
EKeller-resume_12232016EKeller-resume_12232016
EKeller-resume_12232016
 
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012
Hori iugonet-poster-for-data-intensive-scientific-discovery-sep10-2012
 
N A S A Plan
N A S A   PlanN A S A   Plan
N A S A Plan
 
nasa open government plan
nasa open government plannasa open government plan
nasa open government plan
 
NISAR Utilization Plan
NISAR Utilization PlanNISAR Utilization Plan
NISAR Utilization Plan
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 

More from Sean Casey, USRA

2007 t hz astronomy with sofia
2007 t hz astronomy with sofia2007 t hz astronomy with sofia
2007 t hz astronomy with sofiaSean Casey, USRA
 
04 10-20 nano-sensor report
04 10-20 nano-sensor report04 10-20 nano-sensor report
04 10-20 nano-sensor reportSean Casey, USRA
 
10 01-01 team cheese-final report_lunar landing gear
10 01-01 team cheese-final report_lunar landing gear10 01-01 team cheese-final report_lunar landing gear
10 01-01 team cheese-final report_lunar landing gearSean Casey, USRA
 
10 05-03 uof a-cubesat_final report public
10 05-03 uof a-cubesat_final report public10 05-03 uof a-cubesat_final report public
10 05-03 uof a-cubesat_final report publicSean Casey, USRA
 
10 06-03 uva boone-technical report
10 06-03 uva boone-technical report10 06-03 uva boone-technical report
10 06-03 uva boone-technical reportSean Casey, USRA
 
10 05-28final tfn slide-deck
10 05-28final tfn slide-deck10 05-28final tfn slide-deck
10 05-28final tfn slide-deckSean Casey, USRA
 

More from Sean Casey, USRA (9)

2003 cospar casey
2003 cospar casey2003 cospar casey
2003 cospar casey
 
2007 t hz astronomy with sofia
2007 t hz astronomy with sofia2007 t hz astronomy with sofia
2007 t hz astronomy with sofia
 
04 10-20 nano-sensor report
04 10-20 nano-sensor report04 10-20 nano-sensor report
04 10-20 nano-sensor report
 
10 09-18 team frednet
10 09-18 team frednet10 09-18 team frednet
10 09-18 team frednet
 
10 01-01 team cheese-final report_lunar landing gear
10 01-01 team cheese-final report_lunar landing gear10 01-01 team cheese-final report_lunar landing gear
10 01-01 team cheese-final report_lunar landing gear
 
10 05-03 uof a-cubesat_final report public
10 05-03 uof a-cubesat_final report public10 05-03 uof a-cubesat_final report public
10 05-03 uof a-cubesat_final report public
 
10 06-03 uva boone-technical report
10 06-03 uva boone-technical report10 06-03 uva boone-technical report
10 06-03 uva boone-technical report
 
09 07-21 frednet
09 07-21 frednet09 07-21 frednet
09 07-21 frednet
 
10 05-28final tfn slide-deck
10 05-28final tfn slide-deck10 05-28final tfn slide-deck
10 05-28final tfn slide-deck
 

00 12-06 the national virtual observatory

  • 1. The National Virtual Observatory: Leveraging the Astronomical Resources of the Nation for Astrophysics in the 21st Century A National Mandate: The National Virtual Observatory (NVO) needs to provide a unified and negotiated architecture to the nation’s unparalleled, long-term investment in astronomical observations. In addition, the NVO should support a negotiated architecture for the support of laboratory and theoretical modeling results to facilitate the analysis of this same astronomical data. The development of the NVO is complicated by the heritage of existing programs and the legacy of existing systems. In the interest of future scientists and their research programs, NASA needs to lead a tactical and strategic effort to federate university, industry, and government offices under a seamless multi-node program dedicated to astrophysical research. The NVO should be this program. In return, NASA’s drive toward federation will enable highly focused new mission concepts, enhance the return of an investment in research to the public and the nation’s educators, and develop a national market place for analysis, visualization, and archival tools. The NVO will provide a higher return on the nation’s astrophysical research investment than is currently possible. After many decades of working in space, the NASA vision of space science missions divides into three phases: research, flight mission development, and mission operations and analysis. This cycle has been iteratively followed through a number of NASA observatory programs at wavelengths spanning the ultra-violet, gamma-ray, and x-ray and through the infrared and sub-millimeter. Without NASA flight opportunities and dedicated satellite platforms, observational astrophysical research at these wavelengths would not have been possible. Following the launch of SIRTF in the coming years, NASA will have completed its road map for launching the nation’s great observatory program. NASA programs have served as inspiration to the astronomical community and the nation and, as a result, have reshaped our view of the universe and have greatly influenced the practice of fundamental astrophysical research. NASA must now lead an effort reaching beyond federal organizations, beyond international boundaries, and into University and Government labs – including (but not limited to) programs and facilities created by NASA - to federate the nation’s astronomical archives, laboratory research, and theoretical modeling efforts. The federation of the astrophysical research community organized by wavelength regimes, observational analysis, and theoretical modeling will in addition to the benefits above provide a leveraged and growing legacy of NASA’s multi-decade investment in the space sciences. Federated systems are a natural by-product of evolutionary development. Riegler [1] has modeled the lifecycle of space science missions as a federated series of programs: - Research: Gives rise to mission concepts and test key concepts – theoretical studies, new instrumentation, exploratory ground based and sub-orbital research.
  • 2. - Flight Development: Mission studies, development and planning, check out and testing. - Operations and Data Analysis: Planned observations, data interpretation, confirm or revise theoretical modeling. As illustrated in the figure below, each phase flows naturally into the one following. Operations and data analysis typical result in new scientific hypotheses, which requires new missions, new technology, after which the cycle repeats. Observer's Plan Observer's Proposal & Experimen t Archive Analyzed Data Data Data Collectio n Plan Mission Plan Quick Look Publication Simulation QLA B' C F F D G G F G PR A' A C' B C' Figure 1. The iterative lifecycle of NASA science missions: clear communications are essential among all lifecycle phases. What is evident from figure 1 is the need for clear communications between all phases of the science mission lifecycle. Above all else, the process of federation establishes clear channels of communication. Many other systems are federated. For example, NASA space flight and research centers, the US military (Army, Navy, Air Force, and Marines), the US economy, the human body, and the internet are all federated systems. All these examples have defined means of communication. As communication improves, so does performance. In many cases, systematic problems can be linked to problems in communication. Defects in human DNA (mistaken communications) can produce the wrong amino acids and have serious life threatening consequences. Clear and standardized communications are key to the success of all federated systems. In addition to communications, federated systems must have clear guide- lines for functional performance and expected deliverables. Each must
  • 3. component must satisfy a specific need of the overall federated system. In the US economy, buyers and sellers compete for one another’s business based upon negotiated offerings. When viewed from a far, the market place is complex. But in an ideal economy, the solutions are often optimal. ‘Market place’ standard can specify the federated principles in any number of management, engineering, and scientific disciplines. NASA must work to federate the existing astrophysical ‘market place’ to provide an efficient means by which research and scientific data are clearly communicated among ‘market place’ producers and consumers. These standards of the ‘market place’ should be defined under the NASA’s NVO program. With a large number of astrophysics specific missions in operation (9), development (9), and study (10), along with a growing list of previously archived science data (available from ST ScI, IRSA, HEASEARC, NSSDC, etc.). NASA is aware of the need to federate science data visualization and archival activities. A broad series of report address many of the issues in a federated data and analysis research center: 1982 CODMAC report, 1996 final report of the Task Group on Science Data Management [2], the 1996 – 2000 Senior Review of Astrophysics Mission Operations and Data Analysis Programs [3], the 1997 – 2000 meeting reports of the Astrophysical Working Group [4], most recently the 2001 – 2010 decadal review by the Astronomy and Astrophysics Survey Committee [5]. A combination of these documents highlights either the needs or requirements of the astrophysics community in developing a federated data program. Given the opportunity at hand, NASA can capitalize upon the current NVO call to develop a similar federated structure for laboratory and theoretical research programs. Many of these efforts are already federated to some degree. NASA is required to join the existing components together and create a functional ‘market place’ of astrophysics research which will benefit the scientific community, the nations educators, and the broader tax paying public. In specific, the science community is asking for a unified approach for science data centers and national archives to exchange data from a large number of astrophysical missions. Many researchers find the large number of visualization tools required for archival research and idea generation, proposal preparation and mission planning, data reduction and analysis to be specific to either missions or science centers. Astronomical software developers find that to support the documentation needs of community, the programming needs of the facility, and the algorithmic needs of the instrument teams requires a duplicity of effort in all three areas. Since the earliest reports on data center management have been issued – even as early as 1995 – the connectivity and software modeling tools of the information science community have permanently changed our world. Although most astronomical data center developers (> 80%) use object oriented languages (C/C++/JAVA/Visual Basic) almost half (45%) do not use object oriented modeling methods (e.g. OMT, Booch, or UML) [6]. Of this group, only a small fraction (2%) have a formal degree in computer science although almost half (48%) have doctorates. The needed for a federated program to guide center developers in software management and engineering practices should also be contained within the NASA’s NVO program.
  • 4. The issues of the NVO are not unique to NASA, nor are they unique to science. NASA must work with university, government, and industrial partners to support the development of federated network of astrophysics and information science centers to address the needs of the science research community, educators, and the public. A National Mandate and Plan for the Virtual Observatory: The NASA vision of astrophysical research must drive the development of the NVO. Creation of a successful federate system – moving forward from existing subsystems – requires vision, a mission, a clear set of values. The vision is a statement of the future – where the community needs to go. The mission serves as a charter for the organization. The values are the principles upon which the organization is based. And a set of specific, measurable, realistic, and time oriented goals are needed to measure the progress in achieving these three. USRA suggests the following. The NASA NVO Vision: The astronomical community needs to chare a common set of values and approaches for software management and development. These values are reflected in the choice of tools and processes by which the community develops applications whose strength and longevity arises from a growing interactive network of scientists, engineers, and managers. The NASA NVO Mission: The NVO is chartered to provide the community with the recommended best practiced for the developments at hand. This spans not only ranges of technology readiness (perhaps focusing on only the highest TR levels) but also addressing the needs of the observational and theoretical astrophysical community. The NASA NVO values: 1. That open and competitive community wide participation between University, Government, and Industrial research centers is essential to the scientific, engineering, and management health of the NVO. 2. That "best practices", "lessons learned", “peer review”, and "appropriate management infrastructures and insight" are the guiding principles of the NVO. 3. That evolving developments in the field of software engineering must sit side-by-side with established mechanical, optical, electrical, and cryogenic disciplines with the country's astronomical community. The NASA NVO Goals: 1. Immediately fund a study team to generate US community buy-in by 2002 on the broadened approach of the NVO. Formulate and issue a NVO lead institution proposal call by 2002. 2. Have an NVO lead institution selected and funded by 2004.
  • 5. 3. Have the lead institution established as a central facility (like the astrobiology center) and issue a proposal call for NVO development by 2004. 4. Have funded six 'legacy type' proposals - with institutional leads and partners to address the known needs of the community (e.g. astronomical tool visualization, archiving practices, instrument control, observational modeling and data reduction) by 2006. 5. Have each proposal meet a series of deliverables and milestones in community workshops, continued education, product deliverables, measures community satisfaction, etc. each year between 2007 and 2010. 6. Re-assess the direction of the NVO and the current leadership in 2008. Assess the successes of proposal deliverables in 2010. 7. Extend, re-bid, re-asses the success of the lead NVO institution by 2012. 8. Begin phase II of the NVO activity in 2014. A USRA led effort within the NASA NVO program: The Universities Space Research Association (USRA) with its present and recent programs to operate major NASA facilities such as the Stratospheric Observatory for Infrared Astronomy (SOFIA), the Research Institute for Advanced Computer Science (RIACS), the Center for Excellence in Space Data and Information Sciences (CESDIS), and the Lunar and Planetary Institute (LPI) understands the issues of the NVO in detail. Furthermore, USRA foresees the need to involve actively the national university community in the NVO initiative at an early stage. Clearly, the dynamic interaction of the research and education that characterizes the university academic environment will be an essential component of the successful NVO program. USRA, with its 85 member institutions, has the expertise necessary to pull the university community together in support of the NVO program. As an example of this approach, USRA offers the successful collaboration engineered to produce the design of the Data Cycle System (DCS) for SOFIA. We describe how the USRA approach to teaming should benefit the development and foundation of the NVO. The SOFIA Data Cycle System Development: The SOFIA DCS is designed to general needs of an observatory and instantiated for the specific needs of SOFIA. The DCS covers the complete science cycle of the observatory. The cycle starts with research into archives of previous observations, continues through proposal submission, observation, and analysis planning. And includes instrument scripting, data reduction, and analysis. It concludes with interpretation and publication followed by archival storage of the results. The DCS serves as a micro-economic proto-type of the NVO concept. The DCS architecture is engineered as distributed development with an ability for load sharing among networked systems and users. By developing a system that enables fluid access to existing models and
  • 6. data and which support distributed development, modeling, and analysis, the DCS will facility the planning and decision making process of the observatory science programs. The work done to date on the development of the SOFIA DCS architecture and operating proto-type shows that the core system has the capability to support the broader science cycle Develop Proposal Awareness Publish Review Proposal Refine Proposal Flight Planning Scheduling Analysis Planning Observation Planning Make Observations Analyze Data Scientific Interpretation Figure 2 – The SOFIA Data Cycle System is based upon technologies adopted as current industry standards, platform neutral, and vendor neutral. Using a foundation of CORBA and XML, the architecture is flexible to accommodate future growth and change. planning and decision analysis tool needed to support observatory erations.op An Overview of the USRA SOFIA Data Cycle System: To scope of the DCS is to address the complete astronomical science data cycle. The system has been in design and development since early 1999 and has passed though four successful reviews. USRA leads a team in which the core architecture development is done by the Center for Imaging Science at the Rochester Institute of Technology. The team includes experts from the following organizations: GSFC, ARC, Sterling Software, UCLA, U. Chicago, Cornell and IPAC. The core DCS architecture design has been completed and prototype software has been written and demonstrated to an external review team in November, 2000. The following items comprise the goals of the DCS as presented to the SOFIA Science Council in December 1998:  Complete end-to-end data cycle support maximizes scientific return  Capitalize on lessons learned  Ease of use  Relentless observing  Interfaces with a uniform look and feel
  • 7. The goals of the DCS were summarized as: More photons + more astronomers + more archival research = more scientific return The gain in SOFIA scientific performance is accomplished by combining:  Excellent community access  Uniform interface for general investigator operations  Uniform approach to facility instrument operations  Uniform interface to data reduction tools  Uniform production of data products for archival research In developing the DCS requirements and the DCS core prototype, USRA finds a clear correspondence to the needs of the NVO. The fact that the design has reached a level of maturity that enables the construction of a full prototype for testing by the summer of 2001 provides a synergistic opportunity with NVO. An illustration of the DCS concept is shown in Figure 2. The goal is to construct a system that enables fluent collaboration between the scientists and the observatory to conduct the complete cycle of observation planning, data collection, data reduction, analysis and archiving. The DCS will enable the scientist to model the expected results of an observation before the observation is made. It will be possible to construct simulated data sets based on instrument models, construct pipeline models based on existing modules written by instrument team and science center experts, access supporting data from archives, and simulate the observation and data reduction cycle. The plan that is produced will include the flight plan, instrument scripts for data acquisition, and the scripts for the data pipeline. The DCS will support final reduction of the data, archiving of all raw and final products, and archiving of publications. Key Software Technologies and Architectures for the DCS: Requirements for the SOFIA observatory include providing twenty years of service to the astronomical community. As change is the only constant in today’s computing industry, it is critically important that the DCS not be tied to any single platform or vendor standard. Therefore, the underlying technologies used in the DCS are those proven to be industry standard, platform neutral, vendor neutral, and flexible enough to accommodate future growth and change. Therefore, the DCS utilizes foundation technologies such as CORBA and XML. Using CORBA and XML as its foundations, then, the Data Cycle System embodies the “continuous improvement” paradigm. As new image reduction algorithms are developed, as new telescope instruments are created, as new ways of planning and executing observations are formulated, the DCS can easily incorporate these improvements. Additionally, these improvements are additions to, not replacements of, the software components that make up the system. It is important to maintain the data and algorithms used in previous practice as a reference when evaluating potential new methodologies. Continuous improvement and mechanisms that enable it have been a part of the DCS since its design.
  • 8. The DCS is a federation of software modules, implemented using CORBA and communicating via XML. The DCS federation is grouped into five functional domains. These domains are Data Acquisition, Data Reduction, Data Storage, User Interaction, and the Task Library. We show in figure 3 the relationship of each domain. We describe the detail of all five domains in Appendix A. of this paper. The elements within each functional domain, or subsystem, implement a well-defined set of interfaces to the rest of the system. In this way, the implementation details of a given component are not important to other software components in the system; indeed, those details are not known. This object-oriented encapsulation of functionality is instrumental in smoothly connecting software components developed by potentially different developers at different times together into a cohesive whole called the Data Cycle System. The DCS is a flexible, capable, and powerful system that enables the operation of an astronomical observatory. It is adaptable to future improvements in instruments, computers, software, and the science needs of astronomers. DCS Management, Engineering, and Science leadership: Within USRA SOFIA program, the DCS development has designated management, engineering, and science leads. Each lead is partnered with a deputy to provide a solid management foundation during development. The current positions are Bob Hovde (USRA) – Management lead, Peter Sharer (Sterling Software) – Management Deputy, John Graybeal (Sterling Software) – Engineering Lead, Bob Krzaczek (RIT) – Engineering Deputy, Sean Casey (USRA) – Science Lead, and Joel Kastner (RIT) – Science Deputy. For the NVO white paper development, USRA involved participation from Eric Becklin (USRA), Sean Casey (USRA), Ian Gatley (RIT), Joel Kastner (RIT), Jacques L’Heureux (USRA), Bob Krzaczek(RIT), Barry Leiner (USRA), Mark Morris (UCLA), Harvey Rhody (RIT), and Peter Sharer (Sterling Software). USRA’s involvement in SOFIA, RIT’s involvement in the South Pole astronomy program, and UCLA’s participation in the Keck, etc. combined with the joint experiences of the individuals involved provide the foundation for URSA to move forward as a strong lead within NASA’s NVO program. The NVO Development: The NVO will touch on all parts of the process of scientific inquiry and will be the central system through which all data and processing resources are accessed. The NVO will necessarily be distributed, to connect geographically distributed users and geographically distributed data sources. Indeed, a major challenge is the distributed and heterogeneous character of the basic environment. Kepner and McMahon [7] make a case that the NVO address all of the following elements: (1) Acquisition and storage of raw data; (2) Data reduction; (3) Acquisition and storage of detected sources and (4) Multi-sensor/multi- temporal data mining. They note that “the NVO’s core data mining and archive federation activities are heavily dependent on the underlying data pipeline software necessary to translate the raw data into scientifically relevant source detections.”
  • 9. Furthermore an open architecture is required for any data system serving the NVO; the community must be able to contribute to the growing body of data calibration and analysis knowledge, as expressed in algorithms and/or code, that is available to all NVO users. A definition of established roles within NVO is required. This should necessarily include the assignments of: A) Project Scientist – an astrophysicist with extensive experience in the development of data and archive systems who would lead the top- level science requirements definition for the NVO program based upon input from an established science definition team and the community. B) Project Engineer – an information technology expert who would oversee the design, development, and implementation of an overall architecture that satisfies the specified science requirements of the program. C) Project Manager – an individual experienced in leadership within science and engineering community, able to interact closely with NASA head quarters management, and who is able to guide science and engineering definition and development efforts to meet the goals of the NVO program. The USRA track record of experience and accomplishments: USRA is directly involved in center activities at the NASA Goddard, Marshall, and Johnson Space Flight Centers as well as NASA’s lead center for information technology the Ames Research Center. USRA has a successful track record of working closely with NASA centers and with NASA quarters staff. USRA also has working relationships with external NASA science centers – the Space Telescope Science Institute at Johns Hopkins University and the SIRTF Science Center at the California Institute of Technology. USRA is tied to NSF, other government programs, and other non-profit organizations (e.g. AURA) through its direct University membership consortium. Working teaming relationships include USRA partnering with for-profit companies such as Raytheon (SOFIA) and Lockheed (CSOC). In addition, USRA has a demonstrated record of spinning off appropriate research developments as start-up businesses within the country’s new economy (e.g. Scyld Computing Corporation - the original Beowulf team providing Second Generation Beowulf Clustering [8]). USRA is available to provide strong leadership in organizing the community to speak to NASA’s request for an NVO development roadmap. USRA has before – and is able to repeat past successes in support of NASA’s requests. References [1] Guenter Riegler, “The Lifecycle of Space Science Missions,” Research Division, Office of Space Science, NASA Headquarters, September 13, 2000. http://www.aas.org/policy/NASALifeCycle.htm
  • 10. [2] Jeffrey Linsky, “Final Report of the Task Group on Science Data Management to the Office of Space Science”, October 23, 1996. http://adc.gsfc.nasa.gov/~gass/linsky/report_available.html [3]NASA Senior Review Reports: 2000 - http://spacescience.nasa.gov/codesr/results/SenRev00.PDF 1998 - http://spacescience.nasa.gov/codesr/results/SenRev98.PDF 1996 - http://spacescience.nasa.gov/codesr/results/SenRev96.PDF [4] The "Astrophysics Working Group" (AWG)meeting reports: http://www.astronomy.ohio-state.edu/~AWG/Meetings/ [5] The Decadal Report: Astronomy and Astrophysics in the New Millennium, Astronomy and Astrophysics Survey Committee, National Research Council - http://books.nap.edu/catalog/9839.html [6] A. F. Granados, “A ‘Scientific’ Approach to Software Project Management: Part II: Results of a Survey of Scientific Computing”, in Astronomical Data Analysis Software and Systems IX, ASP Conference Series, 2000, Vol 216, N. Manset, C. Veillet, and D. Crabtree, eds. [7] Jeremy Kepner and Janice McMahon, “High Speed Interconnects and Parallel Software Libraries: Enabling Technologies for the NVO,” in The evolution of Galaxies on Cosmological Timescales, ASP Conference Series, 1999, J. E. Beckman and T. J. Mahoney, eds. [8] Scyld Computing Corporation http://scyld.com/ Glossary: CORBA, the Common Object Request Broker Architecture, is an industry standard set of standards that define how to distribute an object- oriented software architecture across platforms and networks, and how components of that architecture remotely inter-operate with each other. It enables the communication between software objects that may be located on multiple dissimilar machines and implemented in different languages. CORBA is not a software product. Rather, it is a standard maintained by the Open Management Group, a consortium of hundreds of organizations. Software applications based on CORBA can, by definition, communicate with each other despite being implemented on a variety of vendor platforms. XML, the Extensible Markup Language, is a remarkably flexible yet simple method for representing and structuring information. Its primary strength and advantage over other formats for information exchange is its ability to be extended to accommodate future data structures. Rather than implementing a fixed set of structures for information, developers using XML can extend a given structure repertoire by taking advantage of in-document definitions: an XML document can define its structure along with the data it represents. This gives software developers a very rich and powerful mechanism for both defining information exchange now, as well as accommodating future enhancements and extensions. The use of XML directly addresses problems associated with heterogeneous data structures. Appendix A: Below is a description of the domains which comprise the DCS core prototype. The federation of modules provides a flexible architecture
  • 11. to support the continuous improvement in SOFIA instruments, computers, software, and the science driven needs of astronomers and the observatory. The Data Acquisition subsystem is responsible for rendering, or translating, a planned observation into the idiosyncratic requirements of a particular instrument. Rather than re-implementing observation software for each instrument supported by the DCS, the system instead maintains observation plans in a flexible neutral internal format. When interacting with an instrument, the DCS applies a collection of software components to gradually refine and translate that observation into the format required by the instrument. This approach minimizes redundant software development and maximizes re-use of code between similar instrument science teams. The Data Reduction subsystem implements algorithm pipelines, converting raw instrument data into useful science products. This reduction is automatic; when raw data is introduced to the DCS, an appropriate pipeline is immediately scheduled and executed at the earliest opportunity. The DCS supports pipelines running on machines not only at the SOFIA Science Center, but distributed across networks and even reaching to a researcher’s desktop. Someone implementing a new pipeline algorithm, or evaluating an improvement to an existing algorithm, can easily use the DCS as a “test harness” for their software. All the data and algorithms available to a general investigator are available to a developer for testing and evaluation of improvements to the DCS. The Data Storage subsystem provides a repository of information to the distributed components of the DCS. More than just “data” is maintained in DCS Storage; experiments, observation plans, flight logs, reduction pipelines, and proposals are all maintained within DCS Storage, along with the expected archives of raw and reduced science data. All the information in this subsystem is maintained in a neutral format that is easily processed by software components within the DCS, and is easily exchanged with external DCS customers and archives. The User Interaction subsystem is a common interface to the functionality provided by the DCS. In fact, the rest of the DCS needs to provide no user interfaces; instead, all user interactions are conducted through a common, yet customizable, interaction layer. In this way, changes made to a user’s DCS experience (based on, perhaps, spoken languages or skill levels) do not require changes in the rest of the system software. The software that drives an instrument, for example, is unaffected by the user’s preference of German over English, or the user’s skill level in using the system. The Task Library works closely with the User Interaction subsystem in supporting the imaging scientist using the DCS. The Task Library provides the perceived intelligence of the DCS, coordinating the actions of the various system resources (acquisition, reduction, storage) in ways that satisfy user’s requests. A task may be simple (“locate any data matching a specified target”) or complex (“conduct a specified observation, reduce its data, store all intermediate and final results, and email the observation’s authors the results of the experiment when available”). Most importantly, the Task Library is where past, present, and future “best practices” are embedded within
  • 12. the DCS and made available to all of its users in a consistent, uniform manner. Appendix B: We have included this section (Section 3 - PRINCIPLES OF SUCCESSFUL MANAGEMENT OF SCIENTIFIC DATA by Linsky et al.) because of it’s timeless statement of science software requirements. “Through extensive case studies, the Committee on Data Management and Computation (CODMAC) of the Space Science Board, National Research Council, derived a suite of principles or operating guidelines for maximizing the scientific utilization for space data (NRC, 1982). The case studies involved large data centers such as the National Space Science Data Center (NSSDC), data activities associated with missions, and relatively small groups of individuals who were working on generating new products from data long after mission operations had ceased. The CODMAC principles and associated recommendations have been used over the past 15 years to help guide data system development within NASA, including establishing programs at NASA Headquarters, implementation of well-defined data systems within missions that include generation and validation of data products, and the recognition that discipline oriented systems are best suited for maximizing the scientific utilization of space data. “CODMAC also provided detailed recommendations for distributed, discipline-oriented data systems, focusing on expected data sets for each space science discipline, probable growth in networking and computational capabilities, and an examination of how each discipline- oriented community will likely work with data (NRC, 1986)… The principles are listed here, and later in the Report we restate these principles and our recommendations for successful space science data management that flow from these principles and the new case studies that we discuss later in this Report. The Task Group members unanimously concur that the principles remain entirely valid today, and they were used to help guide deliberations and recommendations.” 1. ``SCIENTIFIC INVOLVEMENT. There should be active involvement of scientists from inception to completion of space missions, projects, and programs in order to assure production of, and access to, high- quality data sets. Scientists should be involved in planning, acquisition, processing, and archiving of data. Such involvement will maximize the science return on both science-oriented and applications- oriented missions and improve the quality of applications data for applications users.'' 2. ``SCIENTIFIC OVERSIGHT. Oversight of scientific data-management activities should be implemented through a peer-review process that involves the user community.'' 3. ``DATA AVAILABILITY. Data should be made available to the scientific user in a manner suitable to scientific research needs and have the following characteristics:'' (a) ``The data formats should strike a proper balance between flexibility and economics of nonchanging record structure. They should
  • 13. be designed for ease of use by the scientist. The ability to compare diverse data sets in compatible form may be vital to a successful research effort.'' (b) ``Appropriate ancillary data should be supplied, as needed, with the primary data.'' (c) ``Data should be processed and distributed to users in a timely fashion as required by the user community. This responsibility applies to Principal Investigators and to NASA and other agencies involved in data collection. Emphasis must be given to ensuring that data are validated.'' (d) ``Proper documentation should accompany all data sets that have been validated and are ready for distribution or archival storage.'' 4. ``FACILITIES. A proper balance between cost and scientific productivity govern the data-processing and storage capabilities provided to the scientist.'' 5. ``SOFTWARE. Special emphasis should be devoted to the acquisition or production of structured, transportable, and adequately-documented software.'' 6. ``SCIENTIFIC DATA STORAGE. Scientific data should be suitably annotated and stored in a permanent and retrievable form. Data should be purged only when deemed no longer needed by responsible scientific overseers.'' 7. ``DATA SYSTEM FUNDING. Adequate financial resources should be set aside early in each project to complete database management and computational activities; these resources should be clearly protected from loss due to overruns in costs in other parts of a given project.''