Zmasek bosc2010 topsan
Upcoming SlideShare
Loading in...5
×
 

Zmasek bosc2010 topsan

on

  • 501 views

 

Statistics

Views

Total Views
501
Views on SlideShare
501
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Zmasek bosc2010 topsan Zmasek bosc2010 topsan Presentation Transcript

  • Connecting TOPSAN to Computational Analysis
    Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik
    Joint Center for Structural Genomics
    Sanford-Burnham Medical Research Institute, La Jolla, California, USA
    University of California, San Diego, La Jolla, California, USA
    Joint Center for Molecular Modeling
  • Overview
    What is TOPSAN?
    TOPSAN: The Open Protein Structure Annotation Network
    community based annotation protein structures
    “Semantic” TOPSAN
    How to enter machine-readable, structured data
    Example: editor -> entry -> semantic web
    Different ways to download information
    SPARQL example
    Availability and licenses
    Acknowledgements
    Connecting TOPSAN to Computational Analysis
    2
  • What is TOPSAN?
    TOPSAN: The Open Protein Structure Annotation Network
    Ten-thousands of protein structures have been determined by structural genomics (SG) centers and many more are expected
    While these structures are available in PDB (Protein Data Bank)…
    … annotations for most of them a limited to one-line PDB titles
    TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers
    Connecting TOPSAN to Computational Analysis
    3
  • What is TOPSAN?
    TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure
    TOPSAN combines automated with human edited elements
    TOPSAN spans the range of analysis of
    single proteins
    characterization of protein families
    reconstruction of entire genomes
    Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins
    Collaborating with PFAM to use JCSG structures to refine and create new PFAM families
    Connecting TOPSAN to Computational Analysis
    4
  • TOPSAN example entry
    Connecting TOPSAN to Computational Analysis
    5
  • “Semantic” TOPSAN
    Use the principles of the semantic web to turn TOPSAN into a database that can be:
    edited
    searched
    linked
    TOPSAN content is being made accessible to computational query and analysis via semantic web technologies
    Connecting TOPSAN to Computational Analysis
    6
  • Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS)
    Takes the form subject, predicate, object
    Subject: the protein in question
    Predicate, examples:
    homologous
    encoded_by
    citation
    member_of
    Object: “direct value” or link to other database
    Example:
    {{ note.link( ‘pfam_family_member’, ‘PFAM:PF07980′ ) }}
    More information: http://topsan.wordpress.com/2010/06/01/96/
    Connecting TOPSAN to Computational Analysis
    7
  • Example: in the Editor
    Connecting TOPSAN to Computational Analysis
    8
  • Example: the resulting TOPSAN entry
    Connecting TOPSAN to Computational Analysis
    9
  • Example: on the Semantic Web
    Connecting TOPSAN to Computational Analysis
    10
    <http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2afb>
    <http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2var>
    <http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#functional_assignment> <http://purl.org/obo/owl/EC#EC_2.7.1.45>
  • Different ways to download information
    • Generic TOPSAN page
    • Semantic information embedded into every TOPSAN page
    • RDFa interface
    • http://topsan.org/rdfa/2A2M
    • XML
    • Bulk Download
    • http://files.topsan.org/topsan.n3.gz
    • All unique semantic triples stored in a single N3 formatted file
    Connecting TOPSAN to Computational Analysis
    11
  • Simple SPARQL
    PREFIX tps:<http://purl.org/topsan/tps#>
    SELECT ?id ?weight WHERE {
    ?id tps:molecular_weight ?weight
    }
    Connecting TOPSAN to Computational Analysis
    12
  • Availability and Licenses
    Project Site: http://www.topsan.org
    Software:http://www.topsan.org/Tools
    Data: Open Source Licenses: Creative Commons Attribution 3.0 License
    Software: GNU General Public License
    Connecting TOPSAN to Computational Analysis
    13
  • Summary
    Structural genomics centers produce a large number of proteins structures, most of which never get a publication
    TOPSAN provides a means for community annotation of such protein structures
    The TOPSAN Protein Syntax (TPS) allows annotators to easily enter machine-readable, structured data
    TOPSAN content is being made accessible to computational query and analysis via semantic web technologies
    Many aspects of TOPSAN are still under development and are planned to evolve with user needs
    Connecting TOPSAN to Computational Analysis
    14
  • Acknowledgements
    Inspiration for TOPSAN/semantic web connection: DBCLS BioHackathon 2010
    Developers: Krishna Subramanian, Kyle Ellrott, Dana Weekes
    All contributors and users
    Connecting TOPSAN to Computational Analysis
    15