Successfully reported this slideshow.
Your SlideShare is downloading. ×

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Big Data Repository for
Structural Biology:
Challenges and Opportunities
Piotr Sliz, PhD
sliz@hkl.hms.harvard.edu
!
SBGrid...
SBGrid supports compilation, installation
and upgrades of ~300 scientific applications
Several Software Categories (EM, NMR...
New Opportunity:
Data
anonymous SBGrid member 1:
“we cannot find the original frames for many of our
structures (move from ...
Advertisement

Check these out next

1 of 8 Ad

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

Download to read offline

SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.

SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz (20)

Advertisement

More from datascienceiqss (20)

Recently uploaded (20)

Advertisement

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

  1. 1. Big Data Repository for Structural Biology: Challenges and Opportunities Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org Twitter: @SBGrid YouTube: SBGridTV SBGrid Consortium Support Center at Harvard Medical School 300 Research Groups 13 Countries Long Term Sustainability: Membership Fee Harvard Medical! School
  2. 2. SBGrid supports compilation, installation and upgrades of ~300 scientific applications Several Software Categories (EM, NMR, Xrays, Comp Chem, etc.) Multiple versions of most applications OS X (10.6-10.10) and Linux support (CentOS 5-7) No additional, end-user configuration required Software always works = more time for research Core Mission: Grid Computing (Open Science Grid VO + Grid Portal) General Research Infrastructure (Boston Area) Training (workshops, software cataloguing, webtales) Webinars at youtube.com/SBGridTV Developer Resources Advocating for Open Source Software Morin et al. Shining Light into Black Boxes. Science, 2012. Other Activities: Additional! Publications Primary Citation: Other Citations:
  3. 3. New Opportunity: Data anonymous SBGrid member 1: “we cannot find the original frames for many of our structures (move from X to Y), including recent high impact projects. What do you recommend that we do?” anonymous SBGrid member 2: “I was able to locate the data directory but I must have done a good job cleaning up the disk space before I left: usually there are only two .img files left in the data directory, the 1st and the last image of a full run.” Lack of Storage Support for Diffraction Images derive reproduce improve correct • Stokes-Rees, I., Levesque, I., Murphy, F.V., Yang, W., Deacon, A., and Sliz, P. (2012). Adapting federated cyberinfrastructure for shared data collection facilities in structural biology. J Synchrotron Radiat 19, 462–467. • Terwilliger, T.C., and Bricogne, G. (2014). Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data. Acta Crystallogr. D Biol. Crystallogr. 70, 2533–2543. • Terwilliger, T.C. (2014). Archiving raw crystallographic data. Acta Crystallogr D Biol Crystallogr. • Guss, J.M., and McMahon (2014). How to make deposition of images a reality. Acta Crystallogr. D Biol. Crystallogr. 70, 2520–2532
  4. 4. Focus on Primary Data SBGrid Data Bank. Pilot: May 1st, Production: June 1st, 2015 EZID Dataset Lock BIODBCORE-­‐000683 re3data.org Data Mining and Annotation
  5. 5. Web Interface Related! Datasets Depositors: URL: data.sbgrid.org Dataset Landing Page DataCite! Schema CC0 License Download Dataset URL
  6. 6. Current Statistics Publication Workflow:
  7. 7. Data Access Alliance: Make Data easily accessible for reprocessing Minimize Project Cost Increase Redundancy Challenges Dataset Size (APIs, Data Access Alliance) Journal + Data Automation automated embargo release cross-referencing coordination/communication with journals Data vs Journal Citations Metrics: Dataset Deposition Rates Data Use: DAA Membership vs. direct downloads Dataset Quality (Level 0-2) Data Citations Master Format OME-TIFF vs DataCite vs DataVerse schema Transition to a Research Data Management Software ORCID integration and adoption
  8. 8. Opportunities Better support to ~300 structural biology laboratories: Compliance Reproducibility Integration with PDB and other repositories Other data types in addition to X-ray diffraction Thank you Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org ! Twitter: @SBGrid YouTube: SBGridTV Stephanie Socias Pete Meyer Merce Crosas

×