Successfully reported this slideshow.

Zmasek bosc2010 topsan

490 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Zmasek bosc2010 topsan

  1. 1. Connecting TOPSAN to Computational Analysis<br />Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik<br />Joint Center for Structural Genomics<br />Sanford-Burnham Medical Research Institute, La Jolla, California, USA<br />University of California, San Diego, La Jolla, California, USA<br />Joint Center for Molecular Modeling<br />
  2. 2. Overview<br />What is TOPSAN?<br />TOPSAN: The Open Protein Structure Annotation Network <br />community based annotation protein structures<br />“Semantic” TOPSAN<br />How to enter machine-readable, structured data<br />Example: editor -> entry -> semantic web<br />Different ways to download information<br />SPARQL example<br />Availability and licenses<br />Acknowledgements<br />Connecting TOPSAN to Computational Analysis<br />2<br />
  3. 3. What is TOPSAN?<br />TOPSAN: The Open Protein Structure Annotation Network <br />Ten-thousands of protein structures have been determined by structural genomics (SG) centers and many more are expected<br />While these structures are available in PDB (Protein Data Bank)…<br />… annotations for most of them a limited to one-line PDB titles<br />TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers<br />Connecting TOPSAN to Computational Analysis<br />3<br />
  4. 4. What is TOPSAN?<br />TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure<br />TOPSAN combines automated with human edited elements <br />TOPSAN spans the range of analysis of <br />single proteins<br />characterization of protein families<br />reconstruction of entire genomes<br />Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins<br />Collaborating with PFAM to use JCSG structures to refine and create new PFAM families<br />Connecting TOPSAN to Computational Analysis<br />4<br />
  5. 5. TOPSAN example entry<br />Connecting TOPSAN to Computational Analysis<br />5<br />
  6. 6. “Semantic” TOPSAN<br />Use the principles of the semantic web to turn TOPSAN into a database that can be:<br />edited<br />searched<br />linked<br />TOPSAN content is being made accessible to computational query and analysis via semantic web technologies<br />Connecting TOPSAN to Computational Analysis<br />6<br />
  7. 7. Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS)<br />Takes the form subject, predicate, object<br />Subject: the protein in question<br />Predicate, examples: <br />homologous<br />encoded_by<br />citation<br />member_of<br />Object: “direct value” or link to other database<br />Example:<br />{{ note.link( ‘pfam_family_member’, ‘PFAM:PF07980′ ) }}<br />More information: http://topsan.wordpress.com/2010/06/01/96/<br />Connecting TOPSAN to Computational Analysis<br />7<br />
  8. 8. Example: in the Editor<br />Connecting TOPSAN to Computational Analysis<br />8<br />
  9. 9. Example: the resulting TOPSAN entry<br />Connecting TOPSAN to Computational Analysis<br />9<br />
  10. 10. Example: on the Semantic Web<br />Connecting TOPSAN to Computational Analysis<br />10<br /><http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2afb> <br /><http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2var><br /><http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#functional_assignment> <http://purl.org/obo/owl/EC#EC_2.7.1.45><br />
  11. 11. Different ways to download information<br /><ul><li>Generic TOPSAN page
  12. 12. Semantic information embedded into every TOPSAN page
  13. 13. RDFa interface
  14. 14. http://topsan.org/rdfa/2A2M
  15. 15. XML
  16. 16. Bulk Download
  17. 17. http://files.topsan.org/topsan.n3.gz
  18. 18. All unique semantic triples stored in a single N3 formatted file</li></ul>Connecting TOPSAN to Computational Analysis<br />11<br />
  19. 19. Simple SPARQL<br />PREFIX tps:<http://purl.org/topsan/tps#><br />SELECT ?id ?weight WHERE { <br />?id tps:molecular_weight ?weight<br />}<br />Connecting TOPSAN to Computational Analysis<br />12<br />
  20. 20. Availability and Licenses<br />Project Site: http://www.topsan.org<br />Software:http://www.topsan.org/Tools<br />Data: Open Source Licenses: Creative Commons Attribution 3.0 License<br />Software: GNU General Public License<br />Connecting TOPSAN to Computational Analysis<br />13<br />
  21. 21. Summary<br />Structural genomics centers produce a large number of proteins structures, most of which never get a publication<br />TOPSAN provides a means for community annotation of such protein structures<br />The TOPSAN Protein Syntax (TPS) allows annotators to easily enter machine-readable, structured data<br />TOPSAN content is being made accessible to computational query and analysis via semantic web technologies<br />Many aspects of TOPSAN are still under development and are planned to evolve with user needs<br />Connecting TOPSAN to Computational Analysis<br />14<br />
  22. 22. Acknowledgements<br />Inspiration for TOPSAN/semantic web connection: DBCLS BioHackathon 2010<br />Developers: Krishna Subramanian, Kyle Ellrott, Dana Weekes<br />All contributors and users<br />Connecting TOPSAN to Computational Analysis<br />15<br />

×