Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data Repository for
Structural Biology:
Challenges and Opportunities
Piotr Sliz, PhD
sliz@hkl.hms.harvard.edu
!
SBGrid...
SBGrid supports compilation, installation
and upgrades of ~300 scientific applications
Several Software Categories (EM, NMR...
New Opportunity:
Data
anonymous SBGrid member 1:
“we cannot find the original frames for many of our
structures (move from ...
Focus on Primary	

Data
SBGrid Data Bank. Pilot: May 1st, Production: June 1st, 2015	

EZID
Dataset
Lock
BIODBCORE-­‐00068...
Web 	

Interface
Related!
Datasets
Depositors:
URL: data.sbgrid.org
Dataset Landing Page
DataCite!
Schema CC0 License
Down...
Current Statistics
Publication Workflow:
Data Access Alliance:
Make Data easily accessible for reprocessing
Minimize Project Cost
Increase Redundancy
Challenges
Da...
Opportunities
Better support to ~300 structural biology laboratories:
Compliance
Reproducibility
Integration with PDB and ...
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
Next
Download to read offline and view in fullscreen.

0

Share

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

Download to read offline

SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

  1. 1. Big Data Repository for Structural Biology: Challenges and Opportunities Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org Twitter: @SBGrid YouTube: SBGridTV SBGrid Consortium Support Center at Harvard Medical School 300 Research Groups 13 Countries Long Term Sustainability: Membership Fee Harvard Medical! School
  2. 2. SBGrid supports compilation, installation and upgrades of ~300 scientific applications Several Software Categories (EM, NMR, Xrays, Comp Chem, etc.) Multiple versions of most applications OS X (10.6-10.10) and Linux support (CentOS 5-7) No additional, end-user configuration required Software always works = more time for research Core Mission: Grid Computing (Open Science Grid VO + Grid Portal) General Research Infrastructure (Boston Area) Training (workshops, software cataloguing, webtales) Webinars at youtube.com/SBGridTV Developer Resources Advocating for Open Source Software Morin et al. Shining Light into Black Boxes. Science, 2012. Other Activities: Additional! Publications Primary Citation: Other Citations:
  3. 3. New Opportunity: Data anonymous SBGrid member 1: “we cannot find the original frames for many of our structures (move from X to Y), including recent high impact projects. What do you recommend that we do?” anonymous SBGrid member 2: “I was able to locate the data directory but I must have done a good job cleaning up the disk space before I left: usually there are only two .img files left in the data directory, the 1st and the last image of a full run.” Lack of Storage Support for Diffraction Images derive reproduce improve correct • Stokes-Rees, I., Levesque, I., Murphy, F.V., Yang, W., Deacon, A., and Sliz, P. (2012). Adapting federated cyberinfrastructure for shared data collection facilities in structural biology. J Synchrotron Radiat 19, 462–467. • Terwilliger, T.C., and Bricogne, G. (2014). Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data. Acta Crystallogr. D Biol. Crystallogr. 70, 2533–2543. • Terwilliger, T.C. (2014). Archiving raw crystallographic data. Acta Crystallogr D Biol Crystallogr. • Guss, J.M., and McMahon (2014). How to make deposition of images a reality. Acta Crystallogr. D Biol. Crystallogr. 70, 2520–2532
  4. 4. Focus on Primary Data SBGrid Data Bank. Pilot: May 1st, Production: June 1st, 2015 EZID Dataset Lock BIODBCORE-­‐000683 re3data.org Data Mining and Annotation
  5. 5. Web Interface Related! Datasets Depositors: URL: data.sbgrid.org Dataset Landing Page DataCite! Schema CC0 License Download Dataset URL
  6. 6. Current Statistics Publication Workflow:
  7. 7. Data Access Alliance: Make Data easily accessible for reprocessing Minimize Project Cost Increase Redundancy Challenges Dataset Size (APIs, Data Access Alliance) Journal + Data Automation automated embargo release cross-referencing coordination/communication with journals Data vs Journal Citations Metrics: Dataset Deposition Rates Data Use: DAA Membership vs. direct downloads Dataset Quality (Level 0-2) Data Citations Master Format OME-TIFF vs DataCite vs DataVerse schema Transition to a Research Data Management Software ORCID integration and adoption
  8. 8. Opportunities Better support to ~300 structural biology laboratories: Compliance Reproducibility Integration with PDB and other repositories Other data types in addition to X-ray diffraction Thank you Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org ! Twitter: @SBGrid YouTube: SBGridTV Stephanie Socias Pete Meyer Merce Crosas

SBGrid (Morin et al., 2013, eLIFE and www.sbgrid.org) is a Harvard based structural biology global computing consortium with a primary focus on the curation of research software. Dr. Sliz will discuss a recent SBGrid project that aims to establish a repository for experimental datasets from SBGrid laboratories. Issues of handling large data volumes, data validation and repository sustainability will be addressed in this talk.

Views

Total views

1,139

On Slideshare

0

From embeds

0

Number of embeds

86

Actions

Downloads

15

Shares

0

Comments

0

Likes

0

×