Virtual Sciencein the Cloud<br />Roy Williams<br />California Institute of Technology<br />
humans<br />clouds<br />sensors<br />beginner to expert<br />sharing<br />logins and access<br />click to code to workflow...
Compute Services<br />Registry<br />Getting Data<br />
Service Oriented Architecture<br />3. bind<br />service<br />request<br />request<br />client<br />response<br />response<...
VO Data Services<br />Cone Search<br />radius+position list of objects <br />encoded as VOTable<br />Simple Image Access ...
VO Compute Services<br />Asynchronous<br />May not get immediate answer<br />just get a place to check back<br />Security<...
VO Registry<br />publish -- find -- bind<br />Registry Metadata<br />Descriptions of <br />data collections <br />data del...
Distributed Registry<br />Astrogrid<br />CfA<br />NCSA<br />CDS<br />ESO<br />STScI/JHU<br />NOAO<br />Caltech<br />HEASAR...
Semantics & Search<br />Identifiers  ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722<br />Free tags  beard Fred pudding <...
Cloud Based Tools<br />code & presentation<br />data<br />
Open SkyQuery.netVO Astronomical Crossmatch Service<br /><ul><li> Query builder
 Presentation</li></li></ul><li>Execution<br /><ul><li> Query planning
 Query execution
 Workflow</li></li></ul><li>International<br />Authors<br />Subscribers<br />GCN Broker<br />annotation from archives<br /...
Skyalert<br />Push-based workflow<br />Can be cyclic<br />Portfolio aggregation by citation<br />Annotation as software co...
Skyalert Stream Registry... will be VO registry<br />
Roles<br />human or robot1. browse<br />query, human computing, WWT/Google<br />skyalert.org<br />human or robot2. subscri...
skyalert.org<br />Cyclic workflow graph<br />Trigger<br />CRTS[“Geometry”][“Moon angle”] &gt; 30<br />and SDSS[“Photoprima...
Skyalert-LSST<br />skyalert.org<br /><ul><li>Test run for LSST mobile app
Data service from CRTS and Skyalert
 gets JSON event list via http
LSST building skyalert clone
 Pasadena and Tucson both get events by Jabber/XMPP
 “Unknown” is now choice of</li></ul>Cataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability...
Tier1 and Tier2 Event NodesEvolving in IVOA<br />Brokering<br />Registry:<br />Tier1<br /><ul><li> Stream definitions
 Event Servers</li></ul>Tier2<br />Authoring<br />Distribution<br />Jabber/XMPP<br />or raw socket<br />Tier1: <br />Immed...
NSF Teragrid<br /><ul><li> World’s largest open distributed cyberinfrastructure
 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
 Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
 For US researchers and their collaborators through national peer-review process</li></li></ul><li>Teragrid  2002<br />job...
Architectures 2010<br />Science Gateway (no architecture!)<br />Node farm (condor)<br />Parallel computing<br />Message-pa...
Science Gateways<br />Biology and Biomedicine Science Gateway<br />Open Life Sciences Gateway<br />The Telescience Project...
GPU for molecular modelling<br />
Pannstarrs PS1<br />compute<br />User facing<br />SQL/casjobs<br />workbench<br />privacy/share<br />stored queries<br />D...
Cloud Supercomputing?<br />Teragrid/Globusvs   Cloud/Amazon MI<br />Both ways to get wholesale computing<br />Both provide...
Science and Web 2.0 <br />Easy for groups to form and collaborate<br />Integrates with user workspace<br />iGoogle and Ope...
Science and Web 2.0<br />Server delivers only code<br />Browser makes presentation<br />Ajax and Ajaj and Http “long poll”...
Adaptive Optics Gateway<br /><ul><li> Adaptive optics simulations
 30-meter telescope
 Planet finding coronograph
 4-day run for 4-sec!
 Parallel  parameter sweeps</li></ul>proposed upgrade of the Palomar AO system to a 56x56 subaperture system<br />
Arroyo<br />
Arroyo Gateway Architecture<br />1. use HTML/JS from webserver to create job definition.<br />wholesale computing<br />2. ...
Pegasus workflow<br />E. Deelman<br />
E. Deelman, G. Berriman, RW, et al<br />
LIGO Grid<br /><ul><li> Condor/DAGMan
 now 45,000 jobs per month
 Pegasus for load balancing?</li></li></ul><li>Asynchronous services: User needs feedback<br /><ul><li> AJAJ (AJAX but wit...
 Detailed progress reports during run
 Strong/weak security model with certificates</li></li></ul><li>
Wide-area Mosaicking<br />158 feet<br />Griffith Observatory, Los Angeles<br />
Citizen Science<br />
Human Volunteers<br />Science Layer<br />Describe what you see in image<br />Each person has level of expertise<br />How t...
Upcoming SlideShare
Loading in...5
×

Virtual Science in the Cloud

550

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
550
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Virtual Science in the Cloud

  1. 1. Virtual Sciencein the Cloud<br />Roy Williams<br />California Institute of Technology<br />
  2. 2. humans<br />clouds<br />sensors<br />beginner to expert<br />sharing<br />logins and access<br />click to code to workflow<br />personal storage<br />big data and replication<br />compute and scaling<br />software as component<br />interoperabilty<br />survey and event<br />control or autonomous<br />The New Science<br />
  3. 3. Compute Services<br />Registry<br />Getting Data<br />
  4. 4. Service Oriented Architecture<br />3. bind<br />service<br />request<br />request<br />client<br />response<br />response<br />2. find<br />service contract<br />registry<br />1. publish<br />Principle: Click or Code<br />
  5. 5. VO Data Services<br />Cone Search<br />radius+position list of objects <br />encoded as VOTable<br />Simple Image Access Protocol<br />Simple Spectrum Access Protocol<br />spectra have subtleties  protocol more complicated<br />Astronomical Data Query Language<br />For database queries<br />Core SQL functions plus astronomy-specific extensions<br />Sky region, Xmatch<br />Table Access Protocol<br />Exposes relational databases<br />What tables<br />What table schema<br />Here is a query in ADQL<br />
  6. 6. VO Compute Services<br />Asynchronous<br />May not get immediate answer<br />just get a place to check back<br />Security<br />Expensive resources, big requests, sequestered data<br />Strong or Weak or None<br />Scalable<br />Graduated path to powerful computation and big data<br />Cloud store<br />VOSpace<br />Sharable<br />
  7. 7. VO Registry<br />publish -- find -- bind<br />Registry Metadata<br />Descriptions of <br />data collections <br />data delivery services<br />organizations, etc.<br />Based on Dublin Core with astronomy-specific extensions<br />Represented as XML schema; extensible<br />Contents stored in Resource Registries <br />exchange metadata records through the Open Archives Initiative Protocol (OAI-PMH)<br />
  8. 8. Distributed Registry<br />Astrogrid<br />CfA<br />NCSA<br />CDS<br />ESO<br />STScI/JHU<br />NOAO<br />Caltech<br />HEASARC<br />JapanVO<br />Ongoing harvesting March 07<br />(CfA, ESO, NOAO soon)<br />
  9. 9. Semantics & Search<br />Identifiers ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722<br />Free tags beard Fred pudding <br />Controlled Vocab (UCD) phot.flux;em.ir<br />Controlled Vocabinterop (SKOS)<br />Ontology Greek isA Man, Socrates isA Greek  Socrates isA Man<br />Data Models Each sky position will have a circular positional error estimate ...<br />Text markup Outflows from &lt;object&gt;NGC 666&lt;/object&gt; are irregular ...<br />Schema Columns are Magnitude, Position, Identifier , ...<br />Metadata (registry) forms Full Registry: true; ManagedAuthorities: authority, nasa.heasarc<br />Formal service description<br />
  10. 10. Cloud Based Tools<br />code & presentation<br />data<br />
  11. 11.
  12. 12. Open SkyQuery.netVO Astronomical Crossmatch Service<br /><ul><li> Query builder
  13. 13. Presentation</li></li></ul><li>Execution<br /><ul><li> Query planning
  14. 14. Query execution
  15. 15. Workflow</li></li></ul><li>International<br />Authors<br />Subscribers<br />GCN Broker<br />annotation from archives<br />AstronomersAmateursStudents<br />Microlensing<br />Optical transients<br />Radio transients<br />X-ray transients<br />Gamma transients<br />skyalert.org<br />Events and annotation disseminated to subscribers in real time with intelligence<br />Follow-up Scheduler<br />Telescope<br />Telescope<br />Telescope<br />
  16. 16. Skyalert<br />Push-based workflow<br />Can be cyclic<br />Portfolio aggregation by citation<br />Annotation as software components<br />Stream owner builds template<br />Django, Python, Jquery<br />now 4 developers via SVN<br />
  17. 17. Skyalert Stream Registry... will be VO registry<br />
  18. 18. Roles<br />human or robot1. browse<br />query, human computing, WWT/Google<br />skyalert.org<br />human or robot2. subscribe<br />human or robot3. author<br />4. annotate<br />contrib software components<br />archive, mining<br />push<br />inject<br />web<br />portfolios db<br />IM/tweet/email/TCP<br />triggers<br />actions<br />
  19. 19. skyalert.org<br />Cyclic workflow graph<br />Trigger<br />CRTS[“Geometry”][“Moon angle”] &gt; 30<br />and SDSS[“Photoprimary”][“g-magnitude”] &lt; 18<br />Action<br />annotator<br />followup request<br />dynamically loads module<br />run(triggerEvent, portfolio):<br /> &lt;business logic&gt;<br />can build event and inject recursively<br />send message<br />Alerts and event cascade<br />18<br />
  20. 20. Skyalert-LSST<br />skyalert.org<br /><ul><li>Test run for LSST mobile app
  21. 21. Data service from CRTS and Skyalert
  22. 22. gets JSON event list via http
  23. 23. LSST building skyalert clone
  24. 24. Pasadena and Tucson both get events by Jabber/XMPP
  25. 25. “Unknown” is now choice of</li></ul>Cataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare<br />
  26. 26. Tier1 and Tier2 Event NodesEvolving in IVOA<br />Brokering<br />Registry:<br />Tier1<br /><ul><li> Stream definitions
  27. 27. Event Servers</li></ul>Tier2<br />Authoring<br />Distribution<br />Jabber/XMPP<br />or raw socket<br />Tier1: <br />Immediate Forwarding, Reliable?, Topology?<br />Tier2:<br />Subscription, Repository, Query, Portfolio, Registry, Machine Learning, Substreams etc etc<br />
  28. 28. NSF Teragrid<br /><ul><li> World’s largest open distributed cyberinfrastructure
  29. 29. 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
  30. 30. Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
  31. 31. For US researchers and their collaborators through national peer-review process</li></li></ul><li>Teragrid 2002<br />job submission and queueing<br />(Condor, PBS, ..)<br />login node<br />100s of nodes<br />user<br />purged /scratch<br />parallel I/O<br />parallel file system<br />/home<br />global file system<br />metadata node<br />Unix, Globus, C++, ssh, files, MPI, PBS, make<br />
  32. 32. Architectures 2010<br />Science Gateway (no architecture!)<br />Node farm (condor)<br />Parallel computing<br />Message-passing MPI<br />Shared memory<br />Graphics Processing Units<br />104 independent tiny threads<br />Data Intensive<br />Flash memory (TG/UCSD)<br />Graywulf (JHU/Pannstarrs)<br />Immediate resources<br />
  33. 33. Science Gateways<br />Biology and Biomedicine Science Gateway<br />Open Life Sciences Gateway<br />The Telescience Project<br />Grid Analysis Environment (GAE)<br />Neutron Science Instrument Gateway<br />TeraGrid Visualization Gateway, ANL<br />BIRN<br />Open Science Grid (OSG)<br />Special PRiority and Urgent Computing Environment (SPRUCE)<br />National Virtual Observatory (NVO)<br />Arroyo Adaptive Optics<br />Linked Environments for Atmospheric Discovery (LEAD)<br />Computational Chemistry Grid (GridChem)<br />Computational Science and Engineering Online (CSE-Online)<br />GEON(GEOsciences Network)<br />Network for Earthquake Engineering Simulation (NEES)<br />SCEC Earthworks Project<br />Network for Computational Nanotechnology and nanoHUB<br />GIScience Gateway (GISolve)<br />Gridblast Bioinformatics Gateway<br />Earth Systems Grid<br />Astrophysical Data Repository (Cornell)<br />Slide courtesy of Nancy Wilkins-Diehr<br />
  34. 34. GPU for molecular modelling<br />
  35. 35. Pannstarrs PS1<br />compute<br />User facing<br />SQL/casjobs<br />workbench<br />privacy/share<br />stored queries<br />Data valet<br />load/validate<br />merge<br />crawl<br />replicate<br />log<br />workflow<br />workflow<br />data<br />head/slice<br />hot/warm/cold<br />Fault tolerance: multiple replication, fault workflow<br />Cost and energy carefully considered<br />Future: Hadoop/Mapreduce<br />
  36. 36. Cloud Supercomputing?<br />Teragrid/Globusvs Cloud/Amazon MI<br />Both ways to get wholesale computing<br />Both provide IaaS, Infrastructure as a Service<br />Virtual Machine more popular than CTSS stack<br />What about parallelism? I/O speed? GPUs? etc<br />Watch 3leaf and ScaleMP for these<br />
  37. 37. Science and Web 2.0 <br />Easy for groups to form and collaborate<br />Integrates with user workspace<br />iGoogle and OpenSocial<br />alongside other aspects of their lives<br />Use existing tools<br />SlideShare, blogs, google gadgets, facebook, Gwave, Flickr, YouTube<br />Sharing workspace<br />Electronic log<br />Provenance<br />Virtual Data as “equivalent script”<br />
  38. 38. Science and Web 2.0<br />Server delivers only code<br />Browser makes presentation<br />Ajax and Ajaj and Http “long poll”<br />Jquery and Google toolkit<br />see WWT and GSky in Skyalert<br />“Everything is a wiki”<br />or a wave?<br />Visible/editable by group/s<br />
  39. 39. Adaptive Optics Gateway<br /><ul><li> Adaptive optics simulations
  40. 40. 30-meter telescope
  41. 41. Planet finding coronograph
  42. 42. 4-day run for 4-sec!
  43. 43. Parallel  parameter sweeps</li></ul>proposed upgrade of the Palomar AO system to a 56x56 subaperture system<br />
  44. 44. Arroyo<br />
  45. 45. Arroyo Gateway Architecture<br />1. use HTML/JS from webserver to create job definition.<br />wholesale computing<br />2. Daemon is polling & sees new job, makes local space for it.<br />3. Start job on compute resource & update jpb status.<br />daemon<br />7. User fetches results from webserver<br />4. Fetch &update status of running job. Repeat.<br />5. Output to remote space.<br />webserver<br />Django<br />MySQL<br />job definitions and status<br />5. Daemon copies output from remote to local, updates job status.<br />local space for results<br />remote space for results<br />retail<br />wholesale<br />RW and J. Bunn<br />
  46. 46. Pegasus workflow<br />E. Deelman<br />
  47. 47. E. Deelman, G. Berriman, RW, et al<br />
  48. 48. LIGO Grid<br /><ul><li> Condor/DAGMan
  49. 49. now 45,000 jobs per month
  50. 50. Pegasus for load balancing?</li></li></ul><li>Asynchronous services: User needs feedback<br /><ul><li> AJAJ (AJAX but with JSON)
  51. 51. Detailed progress reports during run
  52. 52. Strong/weak security model with certificates</li></li></ul><li>
  53. 53. Wide-area Mosaicking<br />158 feet<br />Griffith Observatory, Los Angeles<br />
  54. 54. Citizen Science<br />
  55. 55. Human Volunteers<br />Science Layer<br />Describe what you see in image<br />Each person has level of expertise<br />How to use results most effectively<br />Galaxyzoo.org, citizensky.org good models<br />Game Layer<br />Makes people come back<br />Top 10 ranking etc<br />Anonymous partner a la gwap.com<br />
  56. 56. Human Volunteer Evidence<br />Donalek et al<br />arXiv:0810.4945 [astro-ph] <br />4 of 10 say artifact artifact<br />
  57. 57. RW and C. Donalek<br />
  58. 58. Macromolecule Citizen Science<br />A. Cunha<br />
  59. 59. Information Fusion<br />
  60. 60. Classic Machine LearningMetric in “Feature Space”<br />Relevance Vector Machine (Tipping)<br />Feature Vectors<br />Learning from Training set<br />Picking relevant lessons<br />RW and J. Beck<br />
  61. 61. New Machine Learning:Information Fusion<br />Data Portfolios<br />selected from known set of object types<br />Evidence object<br />set of class/proband prior assumptions<br />may be correlated priors<br />Annotator builds evidence<br />from portfolio<br />may include other evidence<br />Inference (= Expert System)<br />Combines evidence with cost-benefit<br />Builds Importance<br /><ul><li>Alchemy
  62. 62. Logic handles complexity
  63. 63. Probability handles uncertainty
  64. 64. Markov Logic Networks
  65. 65. Matrix Completion
  66. 66. Influence Diagrams</li></li></ul><li>Automated Decision through Tripod of Data<br />Archive<br />nearby radio source escalates p(blazar)<br />nearby galaxy escalates p(supernova)<br />Human<br />Crowded field? Artifact present?<br />Can make follow-up observation<br />Machine<br />Fuzzy center escalates p(host galaxy)<br />Moving source escalates p(asteroid)<br />Bobotic follow-up observation<br />decision<br />human<br />archive<br />machinelearning<br />
  67. 67. Lessons Learned<br />
  68. 68. User Interface (wrong)<br />and now do some science....<br />Finally get some help<br />Ask for help<br />Translate VOTable format<br />Learn to use VO Registry<br />Read about web services<br />Read about XML<br />Wait for account<br />Register<br />
  69. 69. User interface (right)<br />in Darwinian evolution every small change must give benefit<br />Power user<br />Learn the VO structure<br />hey this is interesting ....<br />Run bigger job<br />more science....<br />Register<br />some science....<br />Web form<br />Anonymous<br />be careful with complex authentication!<br />
  70. 70. Steering the Ship<br />Short term Pragmatism<br />useful tools now<br />simple protocols (eg cone search)<br />“just use RA and Dec”<br />vs<br />Long term Architecture<br />modular suite of interoperable tools<br />sophisticated protocols (egskynode)<br />sophisticated Space-Time coordinates<br />
  71. 71. Building Information Standards<br /><ul><li> Documents
  72. 72. Agreements
  73. 73. Data Models
  74. 74. Tight Schema
  75. 75. Loose Schema
  76. 76. UML
  77. 77. XSD
  78. 78. WSDL
  79. 79. Semantics
  80. 80. Meaning
  81. 81. Usefulness
  82. 82. Applicability
  83. 83. Code
  84. 84. Services
  85. 85. Interfaces</li></ul>A Data Model is a bridge from<br />community to computers<br />
  86. 86. What is a Data Center?<br />machines<br />services<br />doesn’t matter where or how<br />testing testing testing<br />do we have enough power and HVAC?<br />
  87. 87. Complex scienceComplex machines<br />Separate science user from complexity<br />Must have domain science context<br />Making simple things simple but<br />Power to scale up<br />Drill-down if wanted<br />Machines are not the objective<br />Science through data, compute, sharing<br />
  88. 88. eScience is for People, right?<br />Getting Started<br />Help Desk<br />Forum<br />Documentation<br />Knowledge Base<br />Calendar<br />Contact Us<br />Social Media<br />Blog/newsfeed<br />Campus Champions<br />Summer Schools<br />Advanced Support<br />for Developers<br />Education<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×