Keynote at Gateways 2017 Conference, Ann Arbor MI
Speaker: Ian Stokes-Rees
"Connecting Cyberinfrastructure Back To The Laptop"
Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future.
2. About Me
• Today: Computational Scientist at Anaconda
• Platform for Python-centric data science
• Yesterday: Postdoc and Lecturer at Harvard
• Built a science gateway for protein structure analysis
• Leveraged OSG and XSEDE
• Last Week: PhD on CERN LHCb experiment
• 2007: “A REST Model for High Throughput Scheduling in Computational
Grids”
• Last Millennium: Electrical Engineer, University of Waterloo
http://about.me/ijstokes @ijstokes
3. Abstract
Science Gateways today are generally built to provide a web-accessible interface for a
particular scientific community to access a combination of software, hardware, and data
deployed in an expertly managed computing center. But what happens when the
scientist wants to repatriate their data? Or perform some analysis that is not supported
by the gateway? Both for the purposes of encouraging innovative workflows and
serving an audience with a wide range of computational experience it is important to
consider how a gateway can fit into the broader computational ecosystem of a
particular researcher or research group. One simple starting point for this is to ask the
question "how can the gateway connect back to the laptop?". This talk will consider
how this is being done today in science gateways and present some ideas for how this
could be expanded in the future.
Slides at http://bit.ly/gateways17-beyond
4. Tetralogy
• First Book: The Story of Science Gateways (a play in three acts)
• Second Book: Going Beyond Gateways Today
• Third Book: Opportunities For Future Success
• Epilogue: Anaconda For Reproducible Science
5. The Story of Science
Gateways
(a play in three acts)
6. The Cast
Beth, a biochemist
âś“Experiment design
âś“Microarray equipment
âś“Wet lab skills
âś—Database expert
âś—Computer programming
âś—Linux administration
Sakina, a software engineer
âś“Python
âś“Web development
âś“Data wrangling
âś—Biology
âś—Liquid chromatography
Dipesh, a devops engineer
âś“Clusters & containers
âś“Security
âś“Storage systems
âś—Application development
âś—Genetics & proteomics
8. • Beth focused on her science
• Worked in the wet lab
• Collected data
• Submitted it to the
Microarray Analysis Gateway
• Got back results
• Iterated
• Published paper
Success!
9. Data: Act II
Microarray Analysis Gateway
• Beth is now making heavy
use of the Gateway
• She realizes there are
some opportunities for
cross-experiment data
analysis
• The Gateway doesn’t
support this
• She’s got funding for a
postdoc who wants to
investigate further
Paul, a postdoc
Paul’s laptop
10. Data Movement
• Paul needed access to Beth’s data
• Input and output data
• Some way to access raw output data, not just web-based
graphics
• This is an exceptional request for the Gateway team
• Paul and Dipesh coordinate to repatriate the data
11. Software: Act III
• Beth has an applied mathematician colleague Claire who would
like to try out a new GPU-based numerical analysis algorithm
she has developed
• Claire needs access to parts of the Gateway’s workflow
software
• Claire and Sakina coordinate to share the software details and
the workflow
12. Reproducible Software Stack
• Workflow is coupled to Gateway framework
• Claire works from a Mac, whereas the Gateway runs on a Linux
cluster
• Installation of software and dependencies is laborious
• Claire is an experienced in writing high performance numerical
algorithms, but not RESTful APIs, web servers, and workflow
managers
• Beth and Claire want to move the
collaborative research forward but
are feeling daunted