Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Structural Biology in the Clouds: A Success Story of 10 years

210 views

Published on

Keynote lecture at the IT TRANSFORMATION AND CLOUD CONTENT MANAGEMENT FOR LIFE SCIENCES conference, Berlin March 8-9, 2018

Published in: Science
  • Be the first to comment

Structural Biology in the Clouds: A Success Story of 10 years

  1. 1. Structural biology in the clouds: A success story of 10 years Alexandre Bonvin Utrecht University The Netherlands a.m.j.j.bonvin@uu.nl
  2. 2. Solution NMR: 950, 900-cryo, 750, 600-cryo, 600US, 2x500 MHz Solid-state NMR: 800WB-DNP, 400WB-DNP, 700US, 500WB MHz e-infrastructure: >1900 CPU cores + EGI grid (>110’000 CPU cores) 2019?: 1.2 GHz National and European infrastructure
  3. 3. The molecular machines of life
  4. 4. NMR and structural biology A SpronkNMR production
  5. 5. The protein-protein interaction Cosmos
  6. 6. Understanding molecular recognition at atomic details
  7. 7. Solving molecular puzzles by computational docking haddock.science.uu.nl >10000 users worldwide Used by major pharma companies
  8. 8. Haddock web portal • > 10500 registered users • > 188000 served runs since June 2008 • > 35% on the GRID Visit bonvinlab.org/software De Vries et al. Nature Prot. 2010 Van Zundert et al. J.Mol.Biol. 2016
  9. 9. HADDOCK’s user base
  10. 10. 10 years of serving the research community Made possible via HTC resources provided by H2020 e-Infrastructure projects over the years
  11. 11. European Open Science Cloud CC Under EGI-Engage The eInfrastructure landscape over the years
  12. 12. European Open Science Cloud CC Under EGI-Engage The eInfrastructure landscape over the years
  13. 13. Virtual Research Community
  14. 14. # Number of dimensions 2 # INAME 1 1H # INAME 2 1H 12 2.137 2.387 1 T 0.000e+00 0.00e+00 - 0 2756 2760 0 14 2.387 4.140 1 T 0.000e+00 0.00e+00 - 0 2760 2752 0 32 1.849 4.432 1 T 0.000e+00 0.00e+00 - 0 2259 2257 0 36 1.849 3.143 1 T 0.000e+00 0.00e+00 - 0 2259 2587 0 39 1.760 4.432 1 T 0.000e+00 0.00e+00 - 0 2260 2257 0 40 1.760 1.849 1 T 0.000e+00 0.00e+00 - 0 2260 2259 0 43 1.760 3.143 1 T 0.000e+00 0.00e+00 - 0 2260 2587 0 46 1.649 4.432 1 T 1.035e+05 0.00e+00 r 0 2583 2257 0 47 1.649 1.849 1 T 0.000e+00 0.00e+00 - 0 2583 2259 0 assign ( resid 501 and name OO ) ( resid 501 and name Z ) ( resid 501 and name X ) ( resid 501 and name Y ) ( resid 2 and name CA ) -0.1400 0.15000 assign ( resid 501 and name OO ) ( resid 501 and name Z ) ( resid 501 and name X ) ( resid 501 and name Y ) ( resid 3 and name CA ) -0.0100 0.15000 Data interpretation Structure, dynamics & interactions è impact on research and health: - origin of disease - design of new experiments - drug design… Exploiting GRID resources in structural biology… Computations NMR data collection and processing SAXS data analysis
  15. 15. eScience hub for NMR and structural biology Infrastructure Science Com m unity Knowledge The WeNMR VRC
  16. 16. www.wenmr.eu
  17. 17. WeNMR VRC (February 2018) • enmr.eu: One of the largest (#users) VO in life sciences • >830 users have registered so far(36% outside EU) • Support from >40 sites for >200’000 CPU cores via EGI infrastructure • User-friendly access to Grid via web portals • Supported by an SLA (2016, updated in 2017) with EGI and NGIs www.wenmr.eu NMR SAXSA worldwide e-Infrastructure for NMR and structural biology
  18. 18. Sustained growth of the WeNMR VRC 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 2011-01 2011-04 2011-07 2011-10 2012-01 2012-04 2012-07 2012-10 2013-01 2013-04 2013-07 2013-10 2014-01 2014-04 2014-07 2014-10 2015-01 2015-04 2015-07 2015-10 2016-01 2016-04 2016-07 2016-10 2017-01 2017-04 2017-07 2017-10 2018-01
  19. 19. 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 2009/1 2009/4 2009/7 2009/10 2010/1 2010/4 2010/7 2010/10 2011/1 2011/4 2011/7 2011/10 2012/1 2012/4 2012/7 2012/10 2013/1 2013/4 2013/7 2013/10 2014/1 2014/4 2014/7 2014/10 2015/1 2015/4 2015/7 2015/10 2016/1 2016/4 2016/7 2016/10 2017/1 2017/4 2017/7 2017/10 enmr.eu VO grid jobs Operational since 10 years End of WeNMR funding Start of EGI-Engage Start of West-Life End of eNMR ~2400 normalized CPU years over 2017
  20. 20. Challenges & e-Solutions § Attract users! § Offer them top of the line eScience solutions for their research ... which means top of the line software
  21. 21. The WeNMR VRC Knowledge Help Center Tutorials, Wiki Consultancy Services Portals VRC Third-party aggregation Grid Exposure Marketplace Blogs, news, events.. User SSO Facebook • 39 web portals (31 NMR, 7 SAXS) • of which 29 by partners • Uniform access through the new Single Sign On functionality • RPC access available for some portals
  22. 22. The WeNMR services portfolio
  23. 23. Challenges & e-Solutions § Attract users! § Offer them top of the line eScience solutions for their research ... which means top of the line software § Provide them training, tutorials and support
  24. 24. The WeNMR VRC Knowledge Help Center Tutorials, Wiki Consultancy Services Portals VRC Third-party aggregation Grid Exposure Marketplace Blogs, news, events.. User SSO Facebook • Help center • Consultancy remote or on location • Tutorials, wiki documents, movies • YouTube channel • Many workshops …
  25. 25. Challenges & e-Solutions § Attract users! § Offer them top of the line eScience solutions for their research ... which means top of the line softwares) § Provide them training, tutorials and support § Make their life easier
  26. 26. The WeNMR VRC Knowledge Help Center Tutorials, Wiki Consultancy Services Portals VRC Third-party aggregation Grid Exposure Marketplace Blogs, news, events.. User SSO Facebook
  27. 27. Challenges & e-Solutions § Attract users! § Access to e-infrastructure
  28. 28. Distribution of resources World-wide: Dec 2015 ~ 140’000 CPU cores from 42 sites (EGI & OSG)
  29. 29. • Support for WeNMR/MoBrain/West-Life activities • 75M CPU hours, 50 TB storage from 7 sites • Renewed in 2017 – CESNET-MetaCloud (Czech Republic) – INFN-PADOVA (Italy) – IFCA-LCG2 (Spain) – NCG-INGRID-PT (Portugal) – NIKHEF (The Netherlands) – RAL-LCG2 (UK) – SURFsara (The Netherlands) – TW-NCHC (Taiwan) SLA agreement
  30. 30. Challenges & e-Solutions § Attract users! § Access to e-infrastructure § Job management / submission
  31. 31. Job management § Need to handle millions of job submission § Initially based on gLite § Mostly migrated to DIRAC4EGI § From a user perspective DIRAC is in principle grid/cloud agnostic: § Can automatically launch VMs § Software distributed via CVMFS
  32. 32. But where is the CLOUD?
  33. 33. European Open Science Cloud CC Under EGI-Engage The eInfrastructure landscape over the years
  34. 34. § With activities toward: § Integrating the communities § Making best use of cloud resources § Bringing data to the cloud (cryo-EM) § Exploiting GPGPU resources § While maintaining the quality of our current services! The MoBrain CC under EGI Engage
  35. 35. The West-Life VRE west-life.eu
  36. 36. The West-Life-VRE portal • Virtual machines Already used for several workshops (INSTRUCT Utrecht Apr. 2016; NeCEN, Leiden, ISCG2017, Taipei)
  37. 37. SCIPION cloud server now in production Cryo-EM in the clouds
  38. 38. Virtualising portals using EC3
  39. 39. EGI Fedcloud usage 0 20 40 60 80 100 120 140 160 2016/12016/22016/32016/42016/52016/62016/72016/82016/92016/102016/112016/122017/12017/22017/32017/42017/52017/62017/72017/82017/92017/102017/112017/12 enmr.eu VO number of VMs Average of 21 VMs full time per month
  40. 40. Harvesting GPGPU resources
  41. 41. Exploring GPGPU resources: PowerFit • Python package to automatically fit high- resolution biomolecular structures into cryo-EM densities • Simple command-line program, able to run using single/multiple CPUs or GPU van Zundert and Bonvin. AIMS Biophysics 2, 73-87 (2015) www.github.com/haddocking/powerfit
  42. 42. Exploring GPGPU resources: DisVis • Python package to Python package to visualize and quantify the accessible interaction space of distance restrained binary biomolecular complexes. • Simple command-line program, able to run using single/multiple CPUs or GPU van Zundert and Bonvin. Bioinformatics. 31, 3222-3224 (2015) www.github.com/haddocking/disvis
  43. 43. Accessible interaction space as a function of the number of distance restraints DISVIS example
  44. 44. Baremetal vs grid vs cloud ID Type GPU #Cores CPU type Mem (GB) B-K20 Baremetal Tesla K20 24 HT (12 real) Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 32 B-K40 Baremetal Tesla K40 48 HT (24 real) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 512 D-K20 Docker on K20 Tesla K20 24 Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 32 K-K40 KVM on K40 Tesla K40 24 Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 32 Case Machine TimeGPU (sec) TimeCPU 1 core CPU1/GPU B-K40 Baremetal 674 7928 11.8 K-K40 KVM 671 7996 11.9 B-K20 Baremetal 830 11839 14.3 D- K20 Docker 837 11926 14.3 No loss of performance CourtesyofMarioDavid INDIGO <= Grid <= Cloud
  45. 45. GPGPU, GRID-enabled web portals http://milou.science.uu.nl/enmr/services/DISVIS/ http://milou.science.uu.nl/enmr/services/POWERFIT/
  46. 46. Pre-processing + Input files packaging Architecture behind the portals User DB User not found Input error WEB CLIENT WEB SERVER MASTER NODE WORKING NODE GPU- calculation Validation Submission to local nodes Submission to grid node CPU- calculation Chimera image generation Post-processing + Results formatting Output files packaging + submission of image generation OR
  47. 47. Software Provisioning indigodatacloudapps/disvis indigodatacloudapps/powerfit Because of complex software dependencies we use docker containers • Python2.7 • NumPy 1.8+ • SciPy • FFTW3 • pyFFTW 0.10+ • OpenCL1.1+ • pyopencl • clFFT • gpyfft And to avoid security issues on the grid side, udocker from INDIGO
  48. 48. Some usage statsOperational since Aug. 2016 Published Dec. 2016 Top pulls in INDIGO applications docker hub https://hub.docker.com/r/indigodatacloudapps/
  49. 49. Where are we going?
  50. 50. Which solution?
  51. 51. A bit of everything…
  52. 52. Thematic services under EOSC-Hub https://www.egi.eu/use-cases/scientific-applications-tools/
  53. 53. Thematic services under EOSC-Hub § Harvest both § DIRAC4EGI can handle both without the additional burden of managing the cloud VMs § We still have much more grid than cloud resources § HADDOCK portal as use case in Helix Nebula Science Cloud
  54. 54. The exascale challenge Ø ~20’000 human proteins Ø Hundreds of thousands of interactions Ø Billions CPU hours and exabytes of data Ø Need to make our software ready for it! bioexcel.eu
  55. 55. Acknowledgments: the CSB group@UU VICI TOP-PUNT WeNMR West-Life EGI-Engage INDIGO- Datacloud BioExcel CoE EOSC-Hub SURFSara €€
  56. 56. Thank you for your attention! http://bonvinlab.org

×