Automatically
   indexing
science using
   natural-
  language
 processing,
  RDF and
  SPARQL

   Andrew          Automat...
Automatically
   indexing
science using
   natural-
  language
                                                 Data sourc...
Automatically
   indexing
science using
   natural-
  language
                                                 Data sourc...
Automatically
   indexing
science using
   natural-
  language
                                                       Data...
Automatically
   indexing
science using
   natural-
  language
                                                       Data...
Automatically
   indexing
science using
   natural-
  language
                                                       Data...
Automatically
   indexing
science using
   natural-
  language
                           Supplemental data: CrystalEye
 p...
Automatically
   indexing
science using
   natural-
  language
                             Supplemental data: CrystalEye
...
Automatically
   indexing
science using
   natural-
  language
                                             Journals and a...
Automatically
   indexing
science using
   natural-
  language
                                               Journals and...
Automatically
   indexing
science using
   natural-
  language
                                         Journalism and blo...
Automatically
   indexing
science using
   natural-
  language
                                          Journalism and bl...
Automatically
   indexing
science using
   natural-
  language
                              Semi-structured data: Golem
 ...
Automatically
   indexing
science using
   natural-
  language
                              Semi-structured data: Golem
 ...
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem...
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem...
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem...
Automatically
   indexing
science using
   natural-
  language
                                Semi-structured data: Golem...
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3...
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3...
Automatically
   indexing
science using
   natural-
  language
                                          Free text: OSCAR3...
Automatically
   indexing
science using
   natural-
  language
                                             Free text: OSC...
Automatically
   indexing
science using
   natural-
  language
                                             Free text: OSC...
Automatically
   indexing
science using
   natural-
  language
                                            Getting the dat...
Automatically
   indexing
science using
   natural-
  language
                                            Getting the dat...
Automatically
   indexing
science using
   natural-
  language
                                            Getting the dat...
Automatically
   indexing
science using
   natural-
  language
                                  Serializing metadata
 pro...
Automatically
   indexing
science using
   natural-
  language
                                       Serializing metadata...
Automatically
   indexing
science using
   natural-
  language
                                        Serializing metadat...
Automatically
   indexing
science using
   natural-
  language
                                        Serializing metadat...
Automatically
   indexing
science using
   natural-
  language
                                          Serializing metad...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                                       The ...
Automatically
   indexing
science using
   natural-
  language
                                          SPARQL is great.
...
Automatically
   indexing
science using
   natural-
  language
                             SPARQL isn’t (entirely) great....
Automatically
   indexing
science using
   natural-
  language
                             SPARQL isn’t (entirely) great....
Automatically
   indexing
science using
   natural-
  language
                              SPARQL isn’t (entirely) great...
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
...
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
...
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
...
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
...
Automatically
   indexing
science using
   natural-
  language
                                  What queries do we want?
...
Automatically
   indexing
science using
   natural-
  language
                                   Demo!
 processing,
  RDF...
Automatically
   indexing
science using
   natural-
  language
                                                    Thanks ...
Automatically
   indexing
science using
   natural-
  language
                                                    Thanks ...
Upcoming SlideShare
Loading in …5
×

SemanticCampLondon, 16th February 2008

3,645
-1

Published on

My presentation at SemanticCamp London, 16th February 2008

Published in: Education, Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,645
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

SemanticCampLondon, 16th February 2008

  1. 1. Automatically indexing science using natural- language processing, RDF and SPARQL Andrew Automatically indexing science using Walkingshaw, Nick Day, Peter Corbett, natural-language processing, RDF and Jim Downing, Joe SPARQL Townsend, Peter Murray-Rust Gathering Andrew Walkingshaw, Nick Day, Peter Corbett, Jim data Downing, Joe Townsend, Peter Murray-Rust Extracting (meta)data Using the data Thanks February 16, 2008
  2. 2. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  3. 3. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering data Extracting (meta)data Using the data Thanks
  4. 4. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data Extracting (meta)data Using the data Thanks
  5. 5. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data Using the data Thanks
  6. 6. Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data • Blogs Using the data Thanks
  7. 7. Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering data Extracting (meta)data Using the data Thanks
  8. 8. Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering • Repository for crystallographic data data Extracting (meta)data Using the data Thanks
  9. 9. Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering data Extracting (meta)data Using the data Thanks
  10. 10. Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering • Titles and abstracts. . . data Extracting (meta)data Using the data Thanks
  11. 11. Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering data Extracting (meta)data Using the data Thanks
  12. 12. Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering • . . . hence Google Scholar, Web of Science, etc. data Extracting (meta)data Using the data Thanks
  13. 13. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  14. 14. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  15. 15. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data Extracting (meta)data Using the data Thanks
  16. 16. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting (meta)data Using the data Thanks
  17. 17. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets
  18. 18. Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets • upshot: we can extract JSON objects from CML files.
  19. 19. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  20. 20. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  21. 21. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering data Extracting (meta)data Using the data Thanks
  22. 22. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data Using the data Thanks
  23. 23. Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data • Author’s blog: Using the data http://wwmm.ch.cam.ac.uk/blogs/corbett/ Thanks
  24. 24. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . Gathering data Extracting (meta)data Using the data Thanks
  25. 25. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data Extracting (meta)data Using the data Thanks
  26. 26. Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data • Thankfully: feedparser (http://feedparser.org/) Extracting (meta)data Using the data Thanks
  27. 27. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  28. 28. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  29. 29. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data Extracting (meta)data Using the data Thanks
  30. 30. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data Using the data Thanks
  31. 31. Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data • (it’d be good to standardise these, but to be honest, not Using the data many people are doing this sort of thing) Thanks
  32. 32. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  33. 33. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering data Extracting (meta)data Using the data Thanks
  34. 34. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data Thanks
  35. 35. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks
  36. 36. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl)
  37. 37. Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl) • And. . .
  38. 38. Automatically indexing science using natural- language SPARQL is great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Just post queries at a SPARQL endpoint: Joe Townsend, authortemplate=’’’ Peter Murray-Rust PREFIX dc: <http://purl.org/dc/terms/> PREFIX ce: Gathering data <http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#> Extracting DESCRIBE ?file WHERE { ?file dc:contributor (meta)data Using the data some author . } Thanks ’’’
  39. 39. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  40. 40. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting (meta)data Using the data Thanks
  41. 41. Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting • (i.e. it’s got to look like a website.) (meta)data Using the data Thanks
  42. 42. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks
  43. 43. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering data Extracting (meta)data Using the data Thanks
  44. 44. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data Extracting (meta)data Using the data Thanks
  45. 45. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data Using the data Thanks
  46. 46. Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data • django + rdflib + sparql + Talis Platform Using the data Thanks
  47. 47. Automatically indexing science using natural- language Demo! processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust And here it is. Gathering data Extracting (meta)data Using the data Thanks
  48. 48. Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering data Extracting (meta)data Using the data Thanks
  49. 49. Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering • and to the RSC and IUCr for their support of CrystalEye. data Extracting (meta)data Using the data Thanks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×