Mexico talk foster march 2012

1,005 views
955 views

Published on

Keynote talk at the 3rd International Conference on Supercomputing in Mexico: www.isum.mx. A great group of people!

A substantially revised version of a talk with the same title given on previous occasions.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,005
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Cyberinfrastructure:The distributed computer, information, and communication technologies [that] empower the modern scientific research endeavor [Atlins report]
  • Gap of >1000 – AND many more systems as people jump on bandwagonMeanwhile, other resources [money, people] stay flatCrisis10^5 in 6 years10 in 6 years
  • http://omicsmaps.com/
  • PI and a handful of students and staff
  • 80% of awards and 50% of grant $$ are < $350K
  • Lewis CarrollEnd-to-end crisis
  • The answer cannot simply be more moneyWe lack both $$ and the people to spend $$ on
  • Not (particularly) computing as a serviceBut the IT functions that researchers need to functionInclude collaboration as a service
  • Infrastructure will be provided by many – competitive – race to the bottomInteresting questions are What is the platform? And what is the software?
  • Sequencing: at center X, move data to Y, analyze, load into Short Read Archive (?), share, …
  • Sequencing: at center X, move data to Y, analyze, load into Short Read Archive (?), share, …
  • But when we get to work, we go back in time 20 years
  • User Hub-- Profiles-- IdentitiesGroup Hub-- Definitions-- PoliciesResource Hub-- Definitions-- History
  • User Hub-- Profiles-- IdentitiesGroup Hub-- Definitions-- PoliciesResource Hub-- Definitions-- History
  • User Hub-- Profiles-- IdentitiesGroup Hub-- Definitions-- PoliciesResource Hub-- Definitions-- History
  • User Hub-- Profiles-- IdentitiesGroup Hub-- Definitions-- PoliciesResource Hub-- Definitions-- History
  • With a high-speed network, one can show up.Not just in person, but also computationally.
  • Mexico talk foster march 2012

    1. 1. Accelerating data-driven discoveryby outsourcing the mundaneIan Foster www.ci.anl.gov www.ci.uchicago.edu
    2. 2. The data deluge www.ci.anl.gov www.ci.uchicago.edu
    3. 3. The data deluge in biology x10 in 6 years x105 in 6 years www.ci.anl.gov3 www.ci.uchicago.edu
    4. 4. Number of sequencing machines http://omicsmaps.com/ www.ci.anl.gov4 www.ci.uchicago.edu
    5. 5. Moore’s Law for X-ray sources 18 orders of magnitude12 orders of in 5 decades!magnitudein 6 decades www.ci.anl.gov 5 Credit: Linda Young www.ci.uchicago.edu
    6. 6. Exploding data volumes in astronomy MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB 100,000 TB Sloan: 40 TBPan-STARRS: 40,000 TB www.ci.anl.gov6 www.ci.uchicago.edu
    7. 7. Exploding data volumes in climate science 2004: 36 TB 2012: 2,300 TBClimatemodel intercomparisonproject (CMIP) of the IPCC www.ci.anl.gov7 www.ci.uchicago.edu
    8. 8. Big science has been successful OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010LIGO: 1 PB data in last sciencerun, distributed worldwide Robust production solutions Substantial teams and expense Sustained, multi-year effort Application-specific solutions, built on common technology ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs 8 All build on NSF OCI (& DOE)-supported Globus Toolkit software www.ci.anl.gov www.ci.uchicago.edu
    9. 9. Small science is strugglingMore data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates www.ci.anl.gov9 www.ci.uchicago.edu
    10. 10. Dark data in the long tail of science Awarded Amount 2007 $7,000,000 $6,000,000 $5,000,000 $4,000,000 $3,000,000 $2,000,000 $1,000,000 $0 1 586 1171 1756 2341 2926 3511 4096 4681 5266 5851 6436 7021 7606 8191 8776 NSF grant awards, 2007 (Bryan Heidorn) www.ci.anl.gov10 www.ci.uchicago.edu
    11. 11. The challenge of staying competitive"Well, in our country," said Alice … "youd generally get to somewhere else — if you run very fast for a long time, as weve been doing.”"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!" www.ci.anl.gov11 www.ci.uchicago.edu
    12. 12. A crisis that demands new approaches• We have exceptional infrastructure for the 1% (e.g., supercomputers, Large Hadron Collider, …)• But not for the 99% (e.g., the vast majority of the 1.8M publicly funded researchers in the EU) We need new approaches to providing research cyberinfrastructure, that: — Reduce barriers to entry — Are cheaper — Are sustainable www.ci.anl.gov12 www.ci.uchicago.edu
    13. 13. You can run a company from a coffee shop www.ci.anl.gov13 www.ci.uchicago.edu
    14. 14. Because businesses outsource their IT Web presence Email (hosted Exchange) Calendar Software Telephony (hosted VOIP) as a Service Human resources and payroll (SaaS) Accounting Customer relationship mgmt www.ci.anl.gov14 www.ci.uchicago.edu
    15. 15. And often their large-scale computing too Web presence Email (hosted Exchange) Calendar Software Telephony (hosted VOIP) as a Service Human resources and payroll (SaaS) Accounting Customer relationship mgmt Infrastructure Data analytics as a Service Content distribution (IaaS) www.ci.anl.gov15 www.ci.uchicago.edu
    16. 16. Let’s rethink how we provide research ITAccelerate discovery and innovation worldwideby providing research IT as a serviceLeverage the cloud to• provide millions of researchers with unprecedented access to powerful tools;• enable a massive shortening of cycle times in time-consuming research processes; and• reduce research IT costs dramatically via economies of scale www.ci.anl.gov16 www.ci.uchicago.edu
    17. 17. grail.cs.washington.edu17 www.ci.anl.gov www.ci.uchicago.edu
    18. 18. Cloud layers Software as a Service: SaaS Platform as a Service: PaaS Infrastructure as a Service: IaaS www.ci.anl.gov 1818 www.ci.uchicago.edu
    19. 19. Common research data management steps • Dark Energy Survey • SBGrid structural biology consortium • Galaxy genomics • NCAR climate data applications • LIGO observatory • Land use change; economics www.ci.anl.gov19 www.ci.uchicago.edu
    20. 20. Common research data management steps • Dark Energy Survey • SBGrid structural biology consortium • Galaxy genomics • NCAR climate data applications • LIGO observatory • Land use change; economics www.ci.anl.gov20 www.ci.uchicago.edu
    21. 21. Scientific data delivery, 2012 1980• “*A+ majority of users at BES facilities … physically transport data to a home institution using portable media … data volumes are going to increase significantly in the next few years (to 70 TB/day or more) – data must be transferred over the network”• “the effectiveness of data transfer middleware [is] not just on the transfer speed, but also the time and interruption to other work required to supervise and check on the success of large data transfers”• “It took two weeks and email traffic between network specialists at NERSC and ORNL, sys-admins at NERSC, … and combustion staff at ORNL and SNL to move 10 TB from NERSC to ORNL” Major usability, productivity, performance problems [ESNet Network Requirements Workshops, 2007-2010] www.ci.anl.gov21 www.ci.uchicago.edu
    22. 22. The challenge: Moving big data easilyWhat should be trivial … “I need my data over there Data Data – at my _____” ( Source Destination supercomputing center, campus server, etc.) … can be painfully tedious and time-consuming “GAAAH !%&@#& ” ! Config issues Data Data ! Firewall issues Source Destination ! Unexpected failure = manual retry www.ci.anl.gov22 www.ci.uchicago.edu
    23. 23. • GO PICTURE
    24. 24. Globus Online: Data transfer as SaaS• Reliable file transfer. – Easy “fire-and-forget” transfers – Automatic fault recovery – High performance – Across multiple security domains• No IT required. – Software as a Service (SaaS) • No client software installation • New features automatically available – Consolidated support & troubleshooting – Works with existing GridFTP servers – Globus Connect solves “last mile problem”• >4000 registered users, >3 Petabytes movedRecommended by XSEDE, NERSC, Blue Waters, and many campuses www.ci.anl.gov 24 www.ci.uchicago.edu
    25. 25. Dark Energy Survey use of Globus Online• Dark Energy Survey Blanco 4m on Cerro Tololo receives 100,000 files each night in Illinois• They transmit files to Texas for analysis … then move results back to Illinois• Process must be reliable, routine, and efficient• They outsource this task Image credit: Roger Smith/NOAO/AURA/NSF to Globus Online www.ci.anl.gov 25 www.ci.uchicago.edu
    26. 26. www.ci.anl.gov26 www.ci.uchicago.edu
    27. 27. www.ci.anl.gov27 www.ci.uchicago.edu
    28. 28. Integration with Earth System GridHigh-speed transfersAutomated retriesWorks behind firewallsCredential managementTransfer monitoring www.ci.anl.gov28 www.ci.uchicago.edu 2
    29. 29. Globus Online under the covers User Hub manages user identities and profiles Group Hub manages groups and policies Resource Hub for resource definitions www.ci.anl.gov29 www.ci.uchicago.edu
    30. 30. Globus Online under the coversMonitoring and controlAuto-tuning of transfer User Hub manages parameters user identities andDetection & attempted profiles correction of errors Group Hub managesManual intervention groups and policies when required Resource Hub for resource definitions www.ci.anl.gov30 www.ci.uchicago.edu
    31. 31. Globus Online under the coversMonitoring and controlAuto-tuning of transfer User Hub manages parameters user identities andDetection & attempted profiles correction of errors Group Hub managesManual intervention groups and policies when required Resource Hub for resource definitions Reliable cloud-based infrastructure EC2 for transfer management S3 for system state SimpleDB for lock management Replication across availability zones www.ci.anl.gov31 www.ci.uchicago.edu
    32. 32. Globus Online under the coversMonitoring and controlAuto-tuning of transfer User Hub manages parameters user identities andDetection & attempted profiles correction of errors Group Hub managesManual intervention groups and policies when required Resource Hub for resource definitions Reliable cloud-based infrastructure EC2 for transfer management S3 for system state SimpleDB for lock management Replication across availability zones www.ci.anl.gov32 www.ci.uchicago.edu
    33. 33. Towards “research IT as a service” • Dark Energy Survey • SBGrid structural biology consortium • Galaxy genomics • NCAR climate data applications • LIGO observatory • Land use change; economics www.ci.anl.gov33 www.ci.uchicago.edu
    34. 34. Towards “research IT as a service” Research data management as a service Globus Globus Globus Globus ... SaaS Transfer Storage Collaborate Catalog Globus Integrate platform PaaS www.ci.anl.gov34 www.ci.uchicago.edu
    35. 35. Globus Storage: For when you want to …• Place your data where you want• Access it from anywhere GridFTP, HTTP, WebDAV via different protocols• Update it, version it, Globus Storage and take snapshots volume• Share versions with who you want Commercial Campus National• Synchronize among storage service research computing center center locations provider www.ci.anl.gov 35 www.ci.uchicago.edu
    36. 36. Globus Collaborate: For when you want toJoin with a few ormany people to:• Share documents• Track tasks• Send email• Share data• Do whateverWith:• Common groups• Delegated mgmt www.ci.anl.gov36 www.ci.uchicago.edu
    37. 37. Globus Integrate: For when you want toWrite programs that access/manage useridentities, profiles, groups, resources—and data … Globus Globus Transfer Globus Storage Collaborate • In production use • Early release • Service and Web available in March • Initial projects UI enhancements • Generally starting in March continue available in Q3 • Early release sometime in Q3 Globus Integrate Globus Connect • Transfer API available Multi User • User profile, group APIs in alpha • APIs for Storage, Collaborate Globus Connect planned after app release… via REST APIs and command line programs www.ci.anl.gov37 www.ci.uchicago.edu
    38. 38. Other innovative science SaaS projects www.ci.anl.gov38 www.ci.uchicago.edu
    39. 39. Other innovative science SaaS projects www.ci.anl.gov39 www.ci.uchicago.edu
    40. 40. Other innovative science SaaS projects www.ci.anl.gov40 www.ci.uchicago.edu
    41. 41. Other innovative science SaaS projects www.ci.anl.gov41 www.ci.uchicago.edu
    42. 42. Realizing the benefits of cloud services• Understand what services researchers really need• Acquire and sustain the expertise required to create and operate useful services• Incentivize those who produce services that are widely adopted• Provide excellent network connectivity www.ci.anl.gov42 www.ci.uchicago.edu
    43. 43. On the importance of networks “80 percent of success is showing up” www.ci.anl.gov43 www.ci.uchicago.edu
    44. 44. Time required to move 10 Terabytes 10,000.00 1,000.00 Hours to transfer 10 Terabytes 100.00 10.00 1.00 0.10 0.01 1.E+01 3.E+01 1.E+02 3.E+02 1.E+03 3.E+03 1.E+04 3.E+04 1.E+05 3.E+05 1.E+06 Network speed in Megabits/sec www.ci.anl.gov44 www.ci.uchicago.edu
    45. 45. Time required to move 10 Terabytes 10,000.00 1,000.00 Hours to transfer 10 Terabytes 100.00 10.00 2 hours US R1 Universities 1.00 0.10 0.01 1.E+01 3.E+01 1.E+02 3.E+02 1.E+03 3.E+03 1.E+04 3.E+04 1.E+05 3.E+05 1.E+06 Network speed in Megabits/sec www.ci.anl.gov45 www.ci.uchicago.edu
    46. 46. Time required to move 10 Terabytes 10,000.00 1,000.00 Hours to transfer 10 Terabytes 100.00 10.00 2 hours US R1 Universities 1.00 10 mins Upgrade 0.10 0.01 1.E+01 3.E+01 1.E+02 3.E+02 1.E+03 3.E+03 1.E+04 3.E+04 1.E+05 3.E+05 1.E+06 Network speed in Megabits/sec www.ci.anl.gov46 www.ci.uchicago.edu
    47. 47. Time required to move 10 Terabytes 10,000.00 1,000.00 1 month Cinvestav Langebio Hours to transfer 10 Terabytes 100.00 10.00 2 hours US R1 Universities 1.00 10 mins Upgrade 0.10 0.01 1.E+01 3.E+01 1.E+02 3.E+02 1.E+03 3.E+03 1.E+04 3.E+04 1.E+05 3.E+05 1.E+06 Network speed in Megabits/sec www.ci.anl.gov47 www.ci.uchicago.edu
    48. 48. A 21st C research cyberinfrastructure• To provide Small and medium laboratories and projects L L L L L L L L L more capability for L L P L PL L P L P L L P L more people at less cost … L L L L L L L L L• Create cloud-based services – Robust and universal Research data management a – Economies of scale Collaboration, computation a Research administration S – Positive returns to scale• Via the creative use of – Aggregation (“cloud”) – Federation (“grid”)• Powered by networks www.ci.anl.gov 48 www.ci.uchicago.edu
    49. 49. Questions for you• How much “dark data” exists in your field? How important is that data?• Can you quantify the scale, in your field, of – Wasted resources due to duplicated effort – Delays in research progress due to inadequate infrastructure?• If you could do one thing to accelerate adoption of advanced computing within your field, what would it be? www.ci.anl.gov49 www.ci.uchicago.edu
    50. 50. AcknowledgmentsColleagues at UChicago and Argonne Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik, Rachana Ananthakrisnan, Raj Kettimuthu, and others listed at www.globusonline.org/about/goteam/NSF Office of CyberinfrastructureDOE Office of Advanced Scientific Computing Res.National Institutes of Health www.ci.anl.gov50 www.ci.uchicago.edu
    51. 51. For more informationAttend GlobusWorld in Chicago, April 10-12, 2012• www.globusonline.org• Twitter: @globusonline, Globus Online on Facebook• Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.• Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswa my, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pi ckett, K. and Tuecke, S. Software as a Service for Data Scientists. Communications of the ACM, Feb, 2012. www.ci.anl.gov51 www.ci.uchicago.edu
    52. 52. Thank you!foster@uchicago.edufoster@anl.govwww.globusonline.orgTwitter: @globusonline, @ianfoster www.ci.anl.gov www.ci.uchicago.edu

    ×