Successfully reported this slideshow.

Zmasek bosc2010 topsan


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Zmasek bosc2010 topsan

  1. 1. Connecting TOPSAN to Computational Analysis<br />Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik<br />Joint Center for Structural Genomics<br />Sanford-Burnham Medical Research Institute, La Jolla, California, USA<br />University of California, San Diego, La Jolla, California, USA<br />Joint Center for Molecular Modeling<br />
  2. 2. Overview<br />What is TOPSAN?<br />TOPSAN: The Open Protein Structure Annotation Network <br />community based annotation protein structures<br />“Semantic” TOPSAN<br />How to enter machine-readable, structured data<br />Example: editor -> entry -> semantic web<br />Different ways to download information<br />SPARQL example<br />Availability and licenses<br />Acknowledgements<br />Connecting TOPSAN to Computational Analysis<br />2<br />
  3. 3. What is TOPSAN?<br />TOPSAN: The Open Protein Structure Annotation Network <br />Ten-thousands of protein structures have been determined by structural genomics (SG) centers and many more are expected<br />While these structures are available in PDB (Protein Data Bank)…<br />… annotations for most of them a limited to one-line PDB titles<br />TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers<br />Connecting TOPSAN to Computational Analysis<br />3<br />
  4. 4. What is TOPSAN?<br />TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure<br />TOPSAN combines automated with human edited elements <br />TOPSAN spans the range of analysis of <br />single proteins<br />characterization of protein families<br />reconstruction of entire genomes<br />Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins<br />Collaborating with PFAM to use JCSG structures to refine and create new PFAM families<br />Connecting TOPSAN to Computational Analysis<br />4<br />
  5. 5. TOPSAN example entry<br />Connecting TOPSAN to Computational Analysis<br />5<br />
  6. 6. “Semantic” TOPSAN<br />Use the principles of the semantic web to turn TOPSAN into a database that can be:<br />edited<br />searched<br />linked<br />TOPSAN content is being made accessible to computational query and analysis via semantic web technologies<br />Connecting TOPSAN to Computational Analysis<br />6<br />
  7. 7. Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS)<br />Takes the form subject, predicate, object<br />Subject: the protein in question<br />Predicate, examples: <br />homologous<br />encoded_by<br />citation<br />member_of<br />Object: “direct value” or link to other database<br />Example:<br />{{ ‘pfam_family_member’, ‘PFAM:PF07980′ ) }}<br />More information:<br />Connecting TOPSAN to Computational Analysis<br />7<br />
  8. 8. Example: in the Editor<br />Connecting TOPSAN to Computational Analysis<br />8<br />
  9. 9. Example: the resulting TOPSAN entry<br />Connecting TOPSAN to Computational Analysis<br />9<br />
  10. 10. Example: on the Semantic Web<br />Connecting TOPSAN to Computational Analysis<br />10<br /><> <> <> <br /><> <> <><br /><> <> <><br />
  11. 11. Different ways to download information<br /><ul><li>Generic TOPSAN page
  12. 12. Semantic information embedded into every TOPSAN page
  13. 13. RDFa interface
  14. 14.
  15. 15. XML
  16. 16. Bulk Download
  17. 17.
  18. 18. All unique semantic triples stored in a single N3 formatted file</li></ul>Connecting TOPSAN to Computational Analysis<br />11<br />
  19. 19. Simple SPARQL<br />PREFIX tps:<><br />SELECT ?id ?weight WHERE { <br />?id tps:molecular_weight ?weight<br />}<br />Connecting TOPSAN to Computational Analysis<br />12<br />
  20. 20. Availability and Licenses<br />Project Site:<br />Software:<br />Data: Open Source Licenses: Creative Commons Attribution 3.0 License<br />Software: GNU General Public License<br />Connecting TOPSAN to Computational Analysis<br />13<br />
  21. 21. Summary<br />Structural genomics centers produce a large number of proteins structures, most of which never get a publication<br />TOPSAN provides a means for community annotation of such protein structures<br />The TOPSAN Protein Syntax (TPS) allows annotators to easily enter machine-readable, structured data<br />TOPSAN content is being made accessible to computational query and analysis via semantic web technologies<br />Many aspects of TOPSAN are still under development and are planned to evolve with user needs<br />Connecting TOPSAN to Computational Analysis<br />14<br />
  22. 22. Acknowledgements<br />Inspiration for TOPSAN/semantic web connection: DBCLS BioHackathon 2010<br />Developers: Krishna Subramanian, Kyle Ellrott, Dana Weekes<br />All contributors and users<br />Connecting TOPSAN to Computational Analysis<br />15<br />