0
Digital Data Handling with
Modern Cyberinfrastructure

             Scott Teige
        steige@indiana.edu


            O...
Contents
•   The trend toward “born digital” data
•   The bad old days
•   The new days
•   Examples: the new, the old



...
Trends
• The US will produce 113 million medical images in the
  next year (CNN)
• CT and MRI scans are “born digital”
• P...
The Bad old days (~1992)




                           Scott Teige
The bad old days, part 2
• Data written from the instrument to 8mm video tape (loss
  of ~5%)
• Tapes carried from DAQ com...
Almost there …
• USArray, locations of the transportable seismographs.




                                               ...
Almost there …
• Data written to a hard drive on the seismometer
• Data uplinked via cell phone or satellite to central lo...
A modern case
• The electron microscope in Simon Hall




                                          Scott Teige
A modern case
• Images are digitized by the instrument
• The digitized images are written directly to the Data
  Capacitor...
Infrastructure, The Data Capacitor

   • >300 TeraBytes




                                     Scott Teige
Infrastructure, HPSS

• >3 PetaBytes




                       Scott Teige
Infrastructure, CPU Resources
     Big Red [TeraGrid System]
        30 TFLOPS IBM JS21 SuSE Cluster
        768 blades/30...
Infrastructure, Network

•   10 GigE to parts of campus, 1GigE to entire system
•   4x10GigE from BigRed to DC
•   48x1Gig...
What does this give you?




                           Scott Teige
What does this give you? FAQ
• How much data can I have?
   • All of it, right now.
• Where is my data?
   • Everywhere.
•...
Acknowledgments
This material is based upon work supported by the National Science Foundation under
   Grant Numbers 01160...
Upcoming SlideShare
Loading in...5
×

Switc Hpa

330

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
330
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Switc Hpa"

  1. 1. Digital Data Handling with Modern Cyberinfrastructure Scott Teige steige@indiana.edu October 2009
  2. 2. Contents • The trend toward “born digital” data • The bad old days • The new days • Examples: the new, the old Scott Teige
  3. 3. Trends • The US will produce 113 million medical images in the next year (CNN) • CT and MRI scans are “born digital” • Physics has a long tradition of digital data acquisition which continues with, for example, the latest CERN experiments • Chemistry, Biology, Geology, Communication and Culture, Anthropology and Economics are also producing increasing amounts of data • Hard drives are down to $0.07 per GigaByte, 8GB thumb drives are SWAG at conferences. Scott Teige
  4. 4. The Bad old days (~1992) Scott Teige
  5. 5. The bad old days, part 2 • Data written from the instrument to 8mm video tape (loss of ~5%) • Tapes carried from DAQ computers to analysis computers • Tapes carried (courier) from instrument building to “storage” facility at BNL (Patty M. office bookshelves) • 2nd pass analysis on BNL mainframes (loss ~5%) • Tapes copied to DLT (loss ~10%) • … years pass … • DLT copied to HPSS (loss ~5%) Scott Teige
  6. 6. Almost there … • USArray, locations of the transportable seismographs. Scott Teige
  7. 7. Almost there … • Data written to a hard drive on the seismometer • Data uplinked via cell phone or satellite to central location • Researchers request specific portions of the data via web interface • Data sent via e-mail (small request) or hard drive to researcher (“large” request) • Once a year, or so, someone goes to the seismographs and retrieves the hard drives… Scott Teige
  8. 8. A modern case • The electron microscope in Simon Hall Scott Teige
  9. 9. A modern case • Images are digitized by the instrument • The digitized images are written directly to the Data Capacitor • The Data Capacitor appears as a local file system on the researchers desktop computer, BigRed, Quarry and some other TeraGrid systems • The researcher does quality checks, tuning, optimization, etc. on his local workstation. • CPU intensive analysis is done on the large systems provided by IU or the TeraGrid • Data is archived daily to the HPSS (via high bandwidth connection from DC to HPSS) Scott Teige
  10. 10. Infrastructure, The Data Capacitor • >300 TeraBytes Scott Teige
  11. 11. Infrastructure, HPSS • >3 PetaBytes Scott Teige
  12. 12. Infrastructure, CPU Resources Big Red [TeraGrid System] 30 TFLOPS IBM JS21 SuSE Cluster 768 blades/3072 cores: 2.5 GHz PPC 970MP 8GB Memory, 4 cores per blade Myrinet 2000 LoadLeveler & Moab Quarry [Future TeraGrid System] 7 TFLOPS IBM HS21 RHEL Cluster 140 blades/1120 cores: 2.0 GHz Intel Xeon 5335 8GB Memory, 8 cores per blade 1Gb Ethernet (upgrading to 10Gb) PBS (Torque) & Moab Scott Teige
  13. 13. Infrastructure, Network • 10 GigE to parts of campus, 1GigE to entire system • 4x10GigE from BigRed to DC • 48x1GigE from Quarry to DC • 15x10 GigE from DC to HPSS Scott Teige
  14. 14. What does this give you? Scott Teige
  15. 15. What does this give you? FAQ • How much data can I have? • All of it, right now. • Where is my data? • Everywhere. • Where can I analyze my data? • Anywhere. • How long can I keep my data? • Forever. • Is there a backup? • Yes, two of them. Scott Teige
  16. 16. Acknowledgments This material is based upon work supported by the National Science Foundation under Grant Numbers 0116050 and 0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF). This work was support in part by the Indiana Metabolomics and Cytomics Initiative (METACyt). METACyt is supported in part by Lilly Endowment, Inc. This work was support in part by the Indiana Genomics Initiative. The Indiana Genomics Initiative of Indiana University is supported in part by Lilly Endowment, Inc. This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. Scott Teige
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×