Bionlp 07
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Bionlp 07

on

  • 1,224 views

Presentation on OTMI at BioNLP 2007on June 29, 2007. This was a one-day workshop attached to ACL 2007 (45th Annual Meeting of the Association for Computational Linguistics) conference held in quiet ...

Presentation on OTMI at BioNLP 2007on June 29, 2007. This was a one-day workshop attached to ACL 2007 (45th Annual Meeting of the Association for Computational Linguistics) conference held in quiet outskirts of Prague.

Statistics

Views

Total Views
1,224
Views on SlideShare
1,224
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bionlp 07 Presentation Transcript

  • 1. Open Text Mining Initiative Tony Hammond Nature Publishing Group
  • 2. Publishing Opportunity
    • Opening up sites for text mining can potentially lead to content misuse and lost business
    • But, what if content can be provided openly in a form fit for purpose yet control be maintained?
    • Hence, OTMI - a proposed industry standard from Nature
  • 3. History
    • Brief summary of idea at Bio-IT World Conference (Boston, 3-5 Apr. ‘06)
    • Nascent (Apr. & Jun. ‘06, Feb, ‘07)
    • Nature’ s 27 Apr. ’06 Editorial “Machine Readability”
    • Discussions also continued on blogs:
      • O’Reilly Radar, HubLog, Open Access News, ars technica, LiveSerials
  • 4. The Big Ideas
    • Present full text in nonlinear order (i.e. not in document order)
    • Keep size of ordered text strings (“snippets”) under publisher control
    • Streamline content for consumption
      • Use standard XML schema
      • Clean text of extraneous markup
    • Include word vectors
  • 5. Design Goals
    • Enable text mining on full text
    • Facilitate document categorization
    • Allow domain entities (e.g. chemical compounds, genomes, etc) to be mapped
    • Encourage published entity maps to reference original document
  • 6. Generator Publisher HTML or XML Full Text Document Publisher OTMI Document OTMI Generator Process Conversion Tailored to Publisher-Specific Source
  • 7. Standards
    • Core
      • Content (XML) presented as Atom “Entry” Document (see RFC 4287)
      • Manifest (XML) in OPML (known format)
    • Optional
      • Metadata uses PRISM (IDEAlliance)
      • References use DOI (NISO, [ISO])
      • Stopwords from NLM table
  • 8. Anatomy
    • Basic components:
    • Document sections
    • Vectors (word counts)
    • “ Snippets” (units of full text)
    • Figures
    • References (with DOI)
  • 9. Entry / Data
    • <atom:entry xmlns:otmi=‘...’ xmlns:prism=‘...‘ xmlns:atom=‘...'>
    • <atom:title>Structural biology Dangerous liaisons on neurons</atom:title>
    • <atom:author>
    • <atom:name>Giampietro Schiavo</atom:name>
    • </atom:author>
    • <atom:id>info:doi/10.1038/nature05410</atom:id>
    • <atom:link href='http://dx.doi.org/10.1038/nature05410' />
    • <atom:link href='http://.../nature/journal/v444/n7122/otmi/nature05410.otmi‘ rel='self' />
    • <atom:link href='http://opentextmining.org/' rel='related' />
    • <atom:published>2006-12-21T00:00:00Z</atom:published>
    • <atom:updated>2006-12-21T00:00:00Z</atom:updated>
    • <atom:rights type='html'>(c) 2006 Nature Publishing Group</atom:rights>
    • <prism:publicationName>Nature</prism:publicationName>
    • <prism:volume>444</prism:volume>
    • <prism:number>7122</prism:number>
    • <prism:startingPage>1019</prism:startingPage>
    • <prism:endingPage>1020</prism:endingPage>
    • <prism:issn>0028-0836</prism:issn>
    • <prism:eIssn />
    • <otmi:data/>
    • </atom:entry>
  • 10. Data / Vectors
    • <otmi:data>
    • <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' />
    • <otmi:section name='body'>
    • <otmi:section name='other'>
    • <otmi:vectors>
    • ...
    • </otmi:vectors>
    • <otmi:snippets>
    • ...
    • </otmi:snippets>
    • </otmi:section>
    • </otmi:section>
    • <otmi:figure>
    • ...
    • </otmi:figure>
    • <otmi:references>
    • ...
    • </otmi:references>
    • </otmi:data>
  • 11. Vectors
    • <otmi:vectors>
    • <otmi:split-regex> (?-mix:s+W+|W+s+|s+|/) </otmi:split-regex>
    • ...
    • <otmi:vector count=' 9 '> vesicles </otmi:vector>
    • <otmi:vector count=' 8 '> al </otmi:vector>
    • <otmi:vector count=' 8 '> et </otmi:vector>
    • <otmi:vector count=' 8 '> protein </otmi:vector>
    • <otmi:vector count=' 8 '> synaptic </otmi:vector>
    • <otmi:vector count=' 8 '> vesicle </otmi:vector>
    • <otmi:vector count=’ 7 '> chain </otmi:vector>
    • <otmi:vector count=’ 7 '> neuron </otmi:vector>
    • ...
    • </otmi:vectors>
  • 12. Data / Snippets
    • <otmi:data>
    • <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' />
    • <otmi:section name='body'>
    • <otmi:section name='other'>
    • <otmi:vectors>
    • ...
    • </otmi:vectors>
    • <otmi:snippets>
    • ...
    • </otmi:snippets>
    • </otmi:section>
    • </otmi:section>
    • <otmi:figure>
    • ...
    • </otmi:figure>
    • <otmi:references>
    • ...
    • </otmi:references>
    • </otmi:data>
  • 13. Snippets
    • <otmi:snippets>
    • <otmi:split-regex> (?-mix:.s+(?=[A-Z])) </otmi:split-regex>
    • ...
    • <otmi:snippet> The amino acids lining this cleft are very similar to those found in BoNT/G (ref. 10 ), but differ in the other toxin family members, which explains why different BoNTs recognize distinct protein receptors </otmi:snippet>
    • <otmi:snippet> The model predicts that the interaction of BoNTs with both PSGs and protein receptors is necessary to explain their awesome potency , with a different protein receptor being recognized by each BoNT </otmi:snippet>
    • <otmi:snippet> The rigid character of this interaction might be further enhanced by the association of the toxins heavy chain with nearby negatively charged lipid molecules, which play an accessory role in stabilizing the toxin on membranes </otmi:snippet>
    • <otmi:snippet> The simplest possibility is that BoNT/B binds to PSGs and synaptotagmin within the lumen of a synaptic vesicle that is fused to the neuron membrane </otmi:snippet>
    • <otmi:snippet> The toxins then escape from the vesicle lumen when the vesicles are acidified as they reload with neurotransmitters </otmi:snippet>
    • <otmi:snippet> The two binding sites would firmly anchor the tip of BoNT/B to the vesicles inner surface, constraining the toxins mobility </otmi:snippet>
    • ...
    • </otmi:snippets>
  • 14. Data / Figures
    • <otmi:data>
    • <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' />
    • <otmi:section name='body'>
    • <otmi:section name='other'>
    • <otmi:vectors>
    • ...
    • </otmi:vectors>
    • <otmi:snippets>
    • ...
    • </otmi:snippets>
    • </otmi:section>
    • </otmi:section>
    • <otmi:figure>
    • ...
    • </otmi:figure>
    • <otmi:references>
    • ...
    • </otmi:references>
    • </otmi:data>
  • 15. Figures
    • <otmi:figure>
    • <otmi:title>
    • <otmi:reduced-text> Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle. </otmi:reduced-text>
    • </otmi:title>
    • <otmi:caption>
    • <otmi:reduced-text> Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle. </otmi:reduced-text>
    • </otmi:caption>
    • </otmi:figure>
  • 16. Data / References
    • <otmi:data>
    • <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' />
    • <otmi:section name='body'>
    • <otmi:section name='other'>
    • <otmi:vectors>
    • ...
    • </otmi:vectors>
    • <otmi:snippets>
    • ...
    • </otmi:snippets>
    • </otmi:section>
    • </otmi:section>
    • <otmi:figure>
    • ...
    • </otmi:figure>
    • <otmi:references>
    • ...
    • </otmi:references>
    • </otmi:data>
  • 17. References
    • <otmi:references>
    • <otmi:ref-id> info:doi/10.1038/nature05387 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1038/nature05411 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1016/0968-0004(86)90282-3 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1016/0014-5793(95)01471-3 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1083/jcb.200305098 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1074/jbc.M403945200 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1126/science.1123654 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1016/j.febslet.2006.02.074 </otmi:ref-id>
    • <otmi:ref-id> info:doi/10.1083/jcb.200508170 </otmi:ref-id>
    • <otmi:refs-noid> 3 </otmi:refs-noid>
    • </otmi:references>
  • 18. Repository
    • http://nature.com/otmi
    • Discovery / Navigation *.opml -> *.opml
    • Content
      • “ tarballs” - *.tar.gz (issues)
      • documents - *.otmi (articles)
  • 19. Autodiscovery
    • All content on Nature.com to be linked for autodiscovery
      • Abstracts, Full Text (HTML)
      • Web Feeds (RSS/Atom)
    • Use link elements such as: <link rel=&quot;otmi&quot; type=&quot;application/xml&quot;
    • href=&quot;../otmi/nature04614.otmi&quot; />
  • 20. Tools
    • Ruby Generator Script
      • Open source
      • GPL’ed
      • Modular (Nature-specific code marked out)
      • Handles multiple DTD’s
  • 21. Present Status
    • Now being integrated into Production Workflow (Jun./Jul. ‘07)
    • Already two years archive (‘05, ‘06) available online:
      • Nature
      • Nature Genetics
      • Nature Reviews Drug Discovery
      • Nature Structural & Molecular Biology
      • The Pharmacogenomegics Journal
  • 22. Improvements
    • Future possibilities:
      • Add in references to associated data files and/or database entries
      • For open-access titles allow text in normal human-readable form
      • etc. (as feedback indicates - your call)
  • 23. More Information
    • [email_address] (public discussion)
    • [email_address] (private feedback)
    • opentextmining.org/
      • Wiki pages
      • Resources (draft spec, scripts, etc.)
  • 24. Thanks
    • Tony Hammond
    • <t.hammond@nature.com>
    • or
    • <otmi@nature.com>