Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Text mining

on

  • 606 views

 

Statistics

Views

Total Views
606
Views on SlideShare
606
Embed Views
0

Actions

Likes
0
Downloads
53
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Text mining Text mining Presentation Transcript

    • Text mining
    • Explosion
    • exponential increase
    •  
    •  
    • some things are constant
    •  
    • “ graph calculus”
    • =
    • ~45 seconds per paper
    • Information retrieval
    • find the relevant papers
    • user-specified query
    • “ yeast AND cell cycle”
    •  
    • stemming
    • dynamic query expansion
    • ranking
    • Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
    • no tool will find that
    • Entity recognition
    • identify the substance(s)
    • Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
    • Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
    • Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology , 2009
    • comprehensive lexicon
    • orthographic variation
    • “ black list”
    • manual correction
    • still too much to read
    • Information extraction
    • formalize the facts
    • co-occurrence
    • global statistical analysis
    • NLP Natural Language Processing
    • parsing individual sentences
      • Gene and protein names
      • Cue words for entity recognition
      • Verbs for relation extraction
      • [ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ]
    • Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
    • store in a database
    • then the fun begins :-)
    • Acknowledgments
      • NLP pipeline
        • Jasmin Saric
        • Rossitza Ouzounova
        • Isabel Rojas
        • Peer Bork
      • Reflect
        • Heiko Horn
        • Sune Frankild
        • Evangelos Pafilis
        • Sven Haag
        • Michael Kuhn
        • Peer Bork
        • Reinhardt Schneider
        • Sean O’Donoghue