0
Overview of the
Data Processing and Error
Analysis System (DPEAS)
Andrew S. Jones
Colorado State University (CSU)
Cooperat...
What is it?



Data processing system for “large” data analysis
tasks using common PCs
Features:


2nd generation syste...
What Does it Do?



Global merge capabilities for numerous data sets
Current system in operational use for 2+ years at C...
10 Data Types Are Currently Supported
 Reads

and Writes HDF-EOS natively
 GOES IMAGER (McIDAS)
 NOAA AVHRR GAC and LAC...
The Hardware

STORAGE VIEW

Legend
Primary
Backup
Wn Worker

Mirrored
Set
Primary

Backup

W1
66 GB

240 GB
240 GB
PROCESS...
Failover Mode

STORAGE VIEW

X

Legend
Primary
Backup
Wn Worker

Mirrored
Set

Primary

Backup

W1
66 GB

240 GB
240 GB
PR...
Module Context
GUIs

Batch Job Client

Explorer

Command Line

Web Browser

Command Line Script
Command Shell Interpreter
...
An example of a DPEAS input script file

DOD Center for Geosciences / Atmospheric Research

Colorado State University
How DPEAS Starts
Program Start
DPEAS Initialization
Interpreting DPEAS script
declarations
Interpreting DPEAS script
execu...
How DPEAS Ends
Interpreting DPEAS script
executable statements

DPEAS Summary

Program End

DOD Center for Geosciences / A...
How Are Spawned Input
Scripts and Jobs Created?




All spawned DPEAS jobs run machine-generated
DPEAS input scripts whi...
What Does DPEAS
Parallelism Look Like?
Do loop contents
are sent to other
resources in parallel
The new jobs run the
same ...
The 3 Programming Steps to
Add a User Routine to DPEAS
1.

Insert a program “hook”
The program hook makes the main DPEAS p...
How does the “User_Module.f90”
relate to my DPEAS Input Scripts?
Compile
User_Module.f90
Program Hook
Wrapper Routine
Appl...
User Example:
The user’s application routine
Using the virtual I/O data via pointers
1. Find each
MW channel
2. Allocate a...
User Example:
The results: Complete integration

The new user
routine is now
fully integrated
into DPEAS

DOD Center for G...
User Example:
The output HDF-EOS file

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The output image representation

150 GHz
Effective
Emissivity
Calculated from:
GOES-08 IMAGER
NOAA-15 AMSU-B...
User Example:
Summary
 Creates

2 new routines:

Wrapper routine
 Application routine


 Requires

25 lines of executa...
User Example:
How complex would the user routine be,
if written without the Virtual I/O library?


Creates 2 new routines...
User Example:
Conclusions


Implementation Insights





Virtual I/O Insights




Minimal amount of end-user code is...
Summary








DPEAS can process large data sets in an efficient
manner while maintaining centralized management
co...
What did I learn about
HDF-EOS in the process?





HDF-EOS is an excellent “universal” data format
It works for all sa...
My 2 cents: How HDF-EOS
could be made even better
(Hopefully someone has already thought of these things,
and this short l...
Why Data Attributes?


Many data channels have “associated” information







For example, it might be very meaningf...
The End
jones@cira.colostate.edu

DOD Center for Geosciences / Atmospheric Research

Colorado State University
Appendix
The following series of slides show how a
user can easily modify DPEAS
1.

The user’s program hook

2.

… wrapper...
User Example:
The user’s program hook

2 lines of code

DOD Center for Geosciences / Atmospheric Research

Colorado State ...
User Example:
The user’s wrapper routine

4 lines of executable code

DOD Center for Geosciences / Atmospheric Research

C...
User Example:
The user’s application routine
Using the virtual I/O data via pointers
1. Find each
MW channel
2. Allocate a...
User Example:
Usage of the new user routine in a
DPEAS input script file

DOD Center for Geosciences / Atmospheric Researc...
User Example:
The results: Complete integration

The new user
routine is now
fully integrated
into DPEAS

DOD Center for G...
Where Do I Find DPEAS?
DPEAS Home Page:
http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm
Please direct questions to jo...
Upcoming SlideShare
Loading in...5
×

Overview of the Data Processing Error Analysis System (DPEAS)

146

Published on

Slides 23 and 24 mentions experience with HDF-EOS.

Source: http://hdfeos.org/workshops/ws04/presentations/Jones/000901%20DPEAS%20Overview%20-%20HDFEOS%20Workshop.ppt

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
146
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • DPEAS is one executable that propagates copies of itself within a network cluster of machines in a controlled fashion.
  • Transcript of "Overview of the Data Processing Error Analysis System (DPEAS)"

    1. 1. Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CO DOD Center for Geosciences / Atmospheric Research Colorado State University
    2. 2. What is it?   Data processing system for “large” data analysis tasks using common PCs Features:  2nd generation system (replaces an earlier system called PORTAL (Jones et al., 1995))     Parallel implementation Web-based documentation and monitoring Incorporates a Fortran-interpreter for input tasks Virtualized I/O subsystem (only memory-resident data structures are needed, data algorithms now function like a model)     Able to failover to redundant hardware Extensible User Module Error Analysis code is still under development Implemented on Windows NT/2000 OS DOD Center for Geosciences / Atmospheric Research Colorado State University
    3. 3. What Does it Do?   Global merge capabilities for numerous data sets Current system in operational use for 2+ years at CIRA    Simplifies     Current average operational throughput rates using 15 processors on 8 PCs is 17 TB/yr (47 GB/day). Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day) Powerful abstraction layers allow anyone to write parallel code Virtual I/O subsystem reduces end-user code complexities Users interact using a language most already know Easily Scales    Limited process “cross-talk” improves scaling behavior Tests have shown that a 2000 machine cluster is physically feasible. Basically… just add hardware. DOD Center for Geosciences / Atmospheric Research Colorado State University
    4. 4. 10 Data Types Are Currently Supported  Reads and Writes HDF-EOS natively  GOES IMAGER (McIDAS)  NOAA AVHRR GAC and LAC (McIDAS)  NOAA AMSU-A and B (HDF-EOS)  DMSP SSM/I (Byte Stream)  DMSP SSM/T-2 (NGDC OIS)  DMSP OLS (NGDC OIS)  TRMM TMI and VIRS (HDF)  User extensible… (your format here) DOD Center for Geosciences / Atmospheric Research Colorado State University
    5. 5. The Hardware STORAGE VIEW Legend Primary Backup Wn Worker Mirrored Set Primary Backup W1 66 GB 240 GB 240 GB PROCESSOR VIEW W2 240 GB ClusterSummary - All Ingest Processes - Most Higher Level Remapped Products Primary Backup W1 W2 W3 OPERATIONAL CLUSTER (24/7) 9 Processors 3.0 GFlops 2.25 GB RAM ClusterSummary - Large Global Sectors W4 W5 W6 EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric Research 6 Processors 2.5 GFlops 2.5 GB RAM Colorado State University
    6. 6. Failover Mode STORAGE VIEW X Legend Primary Backup Wn Worker Mirrored Set Primary Backup W1 66 GB 240 GB 240 GB PROCESSOR VIEW W2 240 GB Failover Steps: X Primary Automated 1. Synchronize states 2. Promote the Backup Backup W1 W2 W3 OPERATIONAL CLUSTER (24/7) W4 W5 Restore Steps: Manually initiated 1. Demote the Backup 2. Restore Mirror Set 3. Synchronize states 4. Reactivate Primary W6 EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric Research Colorado State University
    7. 7. Module Context GUIs Batch Job Client Explorer Command Line Web Browser Command Line Script Command Shell Interpreter DPEAS Input Script Other Applications DPEAS Data Processing Engine Spawn Subtask DPEAS Subtask DPEAS Fortran Interpreter Batch Job Service Analysis Modules DPEAS System State User Modules DPEAS HDF-EOS Virtual I/O Subsystem Translation Modules Output Modules This is DPEAS Internet Information Services Operating System (Windows 2000) DOD Center for Geosciences / Atmospheric Research Colorado State University
    8. 8. An example of a DPEAS input script file DOD Center for Geosciences / Atmospheric Research Colorado State University
    9. 9. How DPEAS Starts Program Start DPEAS Initialization Interpreting DPEAS script declarations Interpreting DPEAS script executable statements DOD Center for Geosciences / Atmospheric Research Colorado State University
    10. 10. How DPEAS Ends Interpreting DPEAS script executable statements DPEAS Summary Program End DOD Center for Geosciences / Atmospheric Research Colorado State University
    11. 11. How Are Spawned Input Scripts and Jobs Created?   All spawned DPEAS jobs run machine-generated DPEAS input scripts which are generated by the data processing engine from the Master DPEAS input script (The examples shown previously were examples of DPEAS machine-generated code) This is automated within DPEAS and the user code goes along for the free ride since it is part of the DPEAS executable (it’s like meeting a friendly virus which helps to spread your code along with it) DOD Center for Geosciences / Atmospheric Research Colorado State University
    12. 12. What Does DPEAS Parallelism Look Like? Do loop contents are sent to other resources in parallel The new jobs run the same “DPEAS.exe”, but execute only the subtask operations Completed Jobs allow additional jobs to start DOD Center for Geosciences / Atmospheric Research Colorado State University
    13. 13. The 3 Programming Steps to Add a User Routine to DPEAS 1. Insert a program “hook” The program hook makes the main DPEAS program aware of the existence of your wrapper routine. 2. Create a wrapper routine The wrapper routine tells the DPEAS fortran interpreter how to parse and interact with your application subroutine arguments. 3. Create an application routine The application routine performs the “real” work. You can do anything you want within the application routine. DOD Center for Geosciences / Atmospheric Research Colorado State University
    14. 14. How does the “User_Module.f90” relate to my DPEAS Input Scripts? Compile User_Module.f90 Program Hook Wrapper Routine Application Routine Ordinary Fortran Compiler Interpret Automated Parallelization DPEAS Input Script Using Self-Replication "DPEAS.exe" DPEAS Input Script Subtask Interprets DPEAS Input Script "DPEAS.exe" Interprets DPEAS Input Script Return to Master End DOD Center for Geosciences / Atmospheric Research Colorado State University
    15. 15. User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this DOD Center for Geosciences / Atmospheric Research Colorado State University
    16. 16. User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS DOD Center for Geosciences / Atmospheric Research Colorado State University
    17. 17. User Example: The output HDF-EOS file DOD Center for Geosciences / Atmospheric Research Colorado State University
    18. 18. User Example: The output image representation 150 GHz Effective Emissivity Calculated from: GOES-08 IMAGER NOAA-15 AMSU-B DOD Center for Geosciences / Atmospheric Research Colorado State University
    19. 19. User Example: Summary  Creates 2 new routines: Wrapper routine  Application routine   Requires 25 lines of executable code: 2 – Program hook Small overhead for gaining massive parallelism capabilities!  4 – Wrapper routine  19 – Application routine     2 – Variable assignments 3 – Science algorithm 14 – Virtual I/O library calls (using only 2 Virtual I/O library routines) DOD Center for Geosciences / Atmospheric Research Colorado State University
    20. 20. User Example: How complex would the user routine be, if written without the Virtual I/O library?  Creates 2 new routines:    Wrapper routine Application routine Requires 59 lines of executable code:    2 – Program hook 4 – Wrapper routine 53 – Application routine    2 – Variable assignments 3 – Science algorithm 48 – HDF-EOS library calls (using 26 HDF-EOS library routines) DOD Center for Geosciences / Atmospheric Research Answer: Without the DPEAS Virtual I/O library there would be: 24 additional I/O routines called by the user (+1200%) 34 additional lines of user code (+236%) Colorado State University
    21. 21. User Example: Conclusions  Implementation Insights    Virtual I/O Insights   Minimal amount of end-user code is required The effort and resources involved are small (The DPEAS program recompiled in < 30 s on the user’s desktop) The DPEAS virtual I/O access method is less complex than traditional HDF-EOS file access methods End user’s perspective    End users are protected from technical data format issues End users can develop higher quality code by leveraging shared robust common modules Scalability is greatly enhanced with little end user effort DOD Center for Geosciences / Atmospheric Research Colorado State University
    22. 22. Summary       DPEAS can process large data sets in an efficient manner while maintaining centralized management controls and error handling behaviors Parallelism of the code is automatic and runs on “cheap hardware” Failover capabilities make the system more robust User code is shielded from complexities of the system using software abstraction layers Little training is needed since user interfaces are in a known scientific language User modules directly access data from memory – obsolesces traditional file access methods but maintains needed file compatibility DOD Center for Geosciences / Atmospheric Research Colorado State University
    23. 23. What did I learn about HDF-EOS in the process?    HDF-EOS is an excellent “universal” data format It works for all satellite sensors types I have encountered to date (10+) HDF-EOS requires serious software design before the implementation stage It is my experience that “Time” information as a geo/time field for sectorizing is overrated and is likely to cause future software design headaches with the more complex sensors if encouraged to be the “norm” DOD Center for Geosciences / Atmospheric Research Colorado State University
    24. 24. My 2 cents: How HDF-EOS could be made even better (Hopefully someone has already thought of these things, and this short list will be a reaffirmation)  Given that GOES data, for example, and other    multi-detector sensors can have multiple times for each channel for the same geolocation position, and that in addition, they can and do interrupt their sensor scans at any time… Treat “Time” as a data attribute Currently I associate “Time” and other associated arrays with its principle data array by nomenclature It would be better to use data array attribute “groups”. Then “Time”, “Calibration”, and other associated arrays could be grouped with the data array through the data format. DOD Center for Geosciences / Atmospheric Research Colorado State University
    25. 25. Why Data Attributes?  Many data channels have “associated” information     For example, it might be very meaningful to associate the min. and max. of a grid location with its mean value It would be better if there was a standard way of showing that group association, so we don’t have to understand each other’s unique nomenclatures, “intent”, or have to resort to the use of unusual “mixed” HDF/HDF-EOS data files Data attributes should not be arbitrarily limited in scope, but have full data type ranges Units could also be incorporated through data attributes DOD Center for Geosciences / Atmospheric Research Colorado State University
    26. 26. The End jones@cira.colostate.edu DOD Center for Geosciences / Atmospheric Research Colorado State University
    27. 27. Appendix The following series of slides show how a user can easily modify DPEAS 1. The user’s program hook 2. … wrapper routine 3. … application routine (using the virtual I/O data via pointers) 4. 5. Usage of the new user routine in a DPEAS input script file The Results: Complete Integration DOD Center for Geosciences / Atmospheric Research Colorado State University
    28. 28. User Example: The user’s program hook 2 lines of code DOD Center for Geosciences / Atmospheric Research Colorado State University
    29. 29. User Example: The user’s wrapper routine 4 lines of executable code DOD Center for Geosciences / Atmospheric Research Colorado State University
    30. 30. User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this DOD Center for Geosciences / Atmospheric Research Colorado State University
    31. 31. User Example: Usage of the new user routine in a DPEAS input script file DOD Center for Geosciences / Atmospheric Research Colorado State University
    32. 32. User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS DOD Center for Geosciences / Atmospheric Research Colorado State University
    33. 33. Where Do I Find DPEAS? DPEAS Home Page: http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm Please direct questions to jones@cira.colostate.edu DOD Center for Geosciences / Atmospheric Research Colorado State University
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×