Building a HighPerformance Environment   for RDF Publishing        Pascal Christoph
These slides and all the graphics made by the author and thosetaken from https://openclipart.org/ are dedicated to the pub...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
ing is for Consuming                        PublishPublishing is for Consuming  Building a High Performance Environment fo...
ing is for Consuming                                    Publish                             MandatoryA resource:          ...
ing is for Consuming                                    Publish                             MandatoryA resource:gets a der...
ing is for Consuming                                                        Publish                                       ...
ing is for Consuming                           Publish                    Mandatory=> basic LOD publishing is very simple:...
ing is for Consuming                                              Publish                                      Nice to hav...
ing is for Consuming                                              Publish                                  SPARQL Endpoint...
ing is for Consuming                                                  Publish                                      SPARQL ...
ing is for Consuming                                   Publish                           Nice to haveIn principle, web dev...
ing is for Consuming                                   Publish                           Nice to haveIn principle, web dev...
ing is for Consuming                                                        Publish                                       ...
ing is for Consuming                                   Publish                           Nice to haveIn principle, web dev...
ing is for Consuming                                    Publish                  RESTful SPARQL example  getting all data ...
ing is for Consuming                      Publish                Nice to haveBuilding a High Performance Environment for R...
ing is for Consuming                                                            Publish                                  R...
ing is for Consuming                      Publish                Nice to have               As it is, web developers dont ...
ing is for Consuming                                   Publish                           Nice to haveWeb developers want A...
Happy web developer
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
What is l obid.org ?                 lobid.orgBuilding a High Performance Environment for RDF Publishing   24
What is l obid.org ?●    lobid := linking open bibliographic data●    LOD services of the hbz     ●       lobid-resources ...
What is lobid.org ?                                   Whats missing?•    Dumps•    Content Negotiation (different RDF seri...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •      Wh...
storing the data                2010 - 2011, lobid-organisationFilesystem :     + easy to maintain     + reliable     + fa...
storing the data                              lobid todayTriple Store (4store) :     + power of SPARQL     +/- depending o...
storing the data                               lobid todaySearch engine (elasticsearch):     + fast search     + stemming,...
lobid today storing/getting the data
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
getting the datalobid : technology/dependency stack            Webapp                  Search Engine                      ...
getting the datalobid : technology/dependency stack            Webapp              we can do that                  Search ...
getting the datalobid : technology/dependency stack            Webapp              we can do that                  Search ...
getting the datalobid : technology/dependency stack            Webapp              we can do that                  Search ...
oring/getting the data                                                  st            Variant 1 : technology/dependency st...
oring/getting the data                                                  st            Variant 1 : technology/dependency st...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
D with elasticsearch                                 Publishing LO             Variant 2: technology/dependency stack     ...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch              Publishing LO                   Benefitsfast, scalable search engine  Building a High Pe...
D with elasticsearch                                            Publishing LO                                            p...
D with elasticsearch                             Publishing LO                            performance test(there is a supp...
D with elasticsearch                                 Publishing LO                                performance testElastics...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch                    Publishing LO                         Benefitsbuild to be easily made highly avail...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch                       Publishing LO                            BenefitsVersioning with elasticsearch:...
D with elasticsearch                                    Publishing LO      Benefits, relying on elasticsearch as basic LOD...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch                             Publishing LO                             Why JSON-LD?    JSON is :•    s...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch                           Publishing LO                                BenefitsRESTful elasticsearch ...
D with elasticsearch                                  Publishing LO                                       Benefits•    … a...
D with elasticsearch                                    Publishing LO                                         Benefits•   ...
D with elasticsearch                                    Publishing LO                   ( … ok, something is left to be do...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
D with elasticsearch                                    Publishing LO                                         Caveats•    ...
D with elasticsearch                           Publishing LO                                CaveatsHow to integrate semant...
D with elasticsearch                              Publishing LO                                   CaveatsOur data flow :fr...
D with elasticsearch                              Publishing LO                                   CaveatsOur data flow :fr...
D with elasticsearch                      Publishing LO                           Caveatsfrom records to RDF triples to re...
D with elasticsearch                                Publishing LO                                     CaveatsFrom records ...
D with elasticsearch               Publishing LO                    Caveats    tree-based vs graph-based:Pre-render the wh...
D with elasticsearch            Publishing LO                 CaveatsBuilding a High Performance Environment for RDF Publi...
D with elasticsearch                         Publishing LO                              CaveatsWhat is the document ? Only...
D with elasticsearch                          Publishing LO                               CaveatsWhat is the document ? On...
D with elasticsearch                              Publishing LO                                   Caveatssearching needs i...
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
D with elasticsearch                  Publishing LO                    auto suggestauthority IDs must be easily found     ...
D with elasticsearch                  Publishing LO                    auto suggestauthority IDs must be easily found   =>...
D with elasticsearch                   Publishing LO                     auto suggestauto suggests needs fast searching   ...
D with elasticsearch            Publishing LO                   Demo        auto suggestBuilding a High Performance Enviro...
D with elasticsearchPublishing LO                               75
D with elasticsearch                                Publishing LO                                  auto suggestRESTful API...
D with elasticsearch                                         Publishing LO                                           auto ...
D with elasticsearch            Publishing LO              auto suggest GND authority file                    in     lobid...
D with elasticsearchPublishing LO                               79
D with elasticsearch            Publishing LOBuilding a High Performance Environment for RDF Publishing   80
OverviewPublishing is for Consuming  •    Mandatory  •    Nice to haveStory so far - experiences with lobid.org  •    What...
D with elasticsearch                                 Publishing LO                                   Conclusion    a highl...
D with elasticsearch             Publishing LOthe software is Open Source:              http://elasticsearch.org/         ...
Any Questions ?     Pascal Christoph     christoph@hbz-nrw.de     semweb@hbz-nrw.de
Using a dark background, this presentation saves maybe 70% of energy
Upcoming SlideShare
Loading in...5
×

Building a High Performance Environment for RDF Publishing

2,428

Published on

SWIB12, cologne 2012-11-27. Video recording: http://www.scivee.tv/node/55329

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,428
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
37
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Building a High Performance Environment for RDF Publishing

  1. 1. Building a HighPerformance Environment for RDF Publishing Pascal Christoph
  2. 2. These slides and all the graphics made by the author and thosetaken from https://openclipart.org/ are dedicated to the publicdomain : https://creativecommons.org/about/cc0 .All marks mentioned may be trademarks or registered trademarksof their respective owners.Read about the license of „The scream“ of Edward Munch athttps://en.wikipedia.org/wiki/File:The_Scream.jpg Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  3. 3. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 3
  4. 4. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 4
  5. 5. ing is for Consuming PublishPublishing is for Consuming Building a High Performance Environment for RDF Publishing 5
  6. 6. ing is for Consuming Publish MandatoryA resource: Building a High Performance Environment for RDF Publishing 6
  7. 7. ing is for Consuming Publish MandatoryA resource:gets a dereferenceable URI: Building a High Performance Environment for RDF Publishing 7
  8. 8. ing is for Consuming Publish MandatoryA resource:gets a dereferenceable URI:which provides RDF:<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> . Building a High Performance Environment for RDF Publishing 8
  9. 9. ing is for Consuming Publish Mandatory=> basic LOD publishing is very simple:you just need a Webserver Building a High Performance Environment for RDF Publishing 9
  10. 10. ing is for Consuming Publish Nice to have• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation (best: RDFa in HTML)• Data searchable• Timely updates• High Availability• Versioning• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 10
  11. 11. ing is for Consuming Publish SPARQL Endpoint• (Dumps)• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation (best: RDFa in HTML)• Data searchable• Timely updates• High Availability• Versioning• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 11
  12. 12. ing is for Consuming Publish SPARQL Endpoint• (Dumps): but may be painfully slow when having lots of data• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation (best: RDFa in HTML)• (Data searchable) : maybe painfully slow• Timely updates• High Availability• Versioning• Web developers want simple APIs providing JSON • most triple stores provides JSON/RDF • Simple powerful API : too powerful/complex ?• ... Building a High Performance Environment for RDF Publishing 12
  13. 13. ing is for Consuming Publish Nice to haveIn principle, web developers already got simple APIs : LOD is the API ! Building a High Performance Environment for RDF Publishing 13
  14. 14. ing is for Consuming Publish Nice to haveIn principle, web developers already got simple APIs : Remember: Building a High Performance Environment for RDF Publishing 14
  15. 15. ing is for Consuming Publish MandatoryA resource:gets a dereferenceable URI:which provides the data (in RDF):<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> . Building a High Performance Environment for RDF Publishing 15
  16. 16. ing is for Consuming Publish Nice to haveIn principle, web developers already got powerful APIs : RESTful SPARQL Building a High Performance Environment for RDF Publishing 16
  17. 17. ing is for Consuming Publish RESTful SPARQL example getting all data of all resources having a particular ISBN:curl -H "Accept: application/json" --data-urlencode query=prefix bibo: <http://purl.org/ontology/bibo/>SELECT * WHERE { ?s bibo:isbn13 "9780851706238" ; ?p ?o . } LIMIT 100 http://lobid.org/sparql/ Building a High Performance Environment for RDF Publishing 17
  18. 18. ing is for Consuming Publish Nice to haveBuilding a High Performance Environment for RDF Publishing 18
  19. 19. ing is for Consuming Publish RESTful SPARQL example… and the JSON/RDF result:{ "head": { "vars": [ "s", "p","o"] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://openlibrary.org/works/OL2109573W" }, "p": { "type": "uri", "value": "http://rdvocab.info/RDARelationshipsWEMI/workManifested" }, "s": { "type": "uri", "value": "http://lobid.org/resource/HT007824357" } }, { "o": { ... Building a High Performance Environment for RDF Publishing 19
  20. 20. ing is for Consuming Publish Nice to have As it is, web developers dont like SPARQL web developerBuilding a High Performance Environment for RDF Publishing 20
  21. 21. ing is for Consuming Publish Nice to haveWeb developers want APIs like:http://lobid.org/resources/api/isbn/$isbn Building a High Performance Environment for RDF Publishing 21
  22. 22. Happy web developer
  23. 23. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 23
  24. 24. What is l obid.org ? lobid.orgBuilding a High Performance Environment for RDF Publishing 24
  25. 25. What is l obid.org ?● lobid := linking open bibliographic data● LOD services of the hbz ● lobid-resources : ● exposes 85% of the hbz cooperative catalogue ● entries coming from > 200 scientific German libraries ● ~ 16 M records with 700 M triples ● with links to ~ 5 M other resources ● with links to ~ 32 M items (consisting of 300 M triples) ● lobid-organisations : ● exposes German Sigelverzeichnis and MARC-Isil directory ● ~ 40 k descriptions of institutions Building a High Performance Environment for RDF Publishing 25
  26. 26. What is lobid.org ? Whats missing?• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Timely updates• High Availability• Versioning• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 26
  27. 27. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 27
  28. 28. storing the data 2010 - 2011, lobid-organisationFilesystem : + easy to maintain + reliable + fast - no search - no SPARQL - ... Building a High Performance Environment for RDF Publishing 28
  29. 29. storing the data lobid todayTriple Store (4store) : + power of SPARQL +/- depending on the query: fast to horribly slow +/- search (but string searches often slow and limited) - sometimes gets stuck ! Building a High Performance Environment for RDF Publishing 29
  30. 30. storing the data lobid todaySearch engine (elasticsearch): + fast search + stemming, linguistics … + wildcard searching + facets + geo search + JSON + schema-less + simple RESTful API + many plugins + ... + easy to achieve High Availability + scales nicely Building a High Performance Environment for RDF Publishing 30
  31. 31. lobid today storing/getting the data
  32. 32. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 32
  33. 33. getting the datalobid : technology/dependency stack Webapp Search Engine Triple Store Building a High Performance Environment for RDF Publishing 33
  34. 34. getting the datalobid : technology/dependency stack Webapp we can do that Search Engine highly available ! Triple Store sometimes gets stuck! Building a High Performance Environment for RDF Publishing 34
  35. 35. getting the datalobid : technology/dependency stack Webapp we can do that Search Engine < highly available ! Triple Store = sometimes gets stuck! Building a High Performance Environment for RDF Publishing 35
  36. 36. getting the datalobid : technology/dependency stack Webapp we can do that Search Engine highly available ! Triple Store sometimes gets stuck! Building a High Performance Environment for RDF Publishing 36
  37. 37. oring/getting the data st Variant 1 : technology/dependency stack Triple Store Triple StoreClosed, internal. Will be safe from malign queries. For external access. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 37
  38. 38. oring/getting the data st Variant 1 : technology/dependency stack redundant, complex … Triple Store Triple StoreClosed, internal. Will be safe from malign queries. For external access. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 38
  39. 39. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch Benefits • • Some more details • CaveatsFuture prospects Building a High Performance Environment for RDF Publishing 39
  40. 40. D with elasticsearch Publishing LO Variant 2: technology/dependency stack Webapp we can do that Search Engine highly available !LOD basis functionality (and some other Triple Store For external access and some APIs) are highly available fancy nice-to-have stuff. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 40
  41. 41. D with elasticsearch Publishing LO Benefits• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• (Versioning)• Web developers want simple APIs returning JSON• ... Building a High Performance Environment for RDF Publishing 41
  42. 42. D with elasticsearch Publishing LO Benefitsfast, scalable search engine Building a High Performance Environment for RDF Publishing 42
  43. 43. D with elasticsearch Publishing LO performance testData: 10 M records <=> 300 M tripleCase-insensitive query: „beach“SELECT ?sWHERE { ?s <http://purl.org/dc/terms/title> ?o FILTER regex(str(?o), "beach", "i") }#### => SPARQL execution time for Q8316: 108.7s, returned 2815 rows.http://$ip:9200/_search?q=beach&from=0&size=2800# => Elasticsearch needed 0.4s=> Elasticsearch is 250 times faster Building a High Performance Environment for RDF Publishing 43
  44. 44. D with elasticsearch Publishing LO performance test(there is a support for text indexing in 4store, have not tested that.) Building a High Performance Environment for RDF Publishing 44
  45. 45. D with elasticsearch Publishing LO performance testElasticsearch: 18 M records , 6 GB RAM: 5 hour4store: 1 B triples, having 72 GB RAM: 7 hoursCPU: Quad Core mit 2.4 GhZ und Hyperthreading => 8 CPUsHD: 6 x 2.5" 10k U/min a 146GB(Dont take benchmarks too seriously – they just give a clue !) Building a High Performance Environment for RDF Publishing 45
  46. 46. D with elasticsearch Publishing LO Benefits• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• (Versioning) • Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 46
  47. 47. D with elasticsearch Publishing LO Benefitsbuild to be easily made highly available ! Building a High Performance Environment for RDF Publishing 47
  48. 48. D with elasticsearch Publishing LO Benefits• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• (Versioning) • Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 48
  49. 49. D with elasticsearch Publishing LO BenefitsVersioning with elasticsearch:Not out-of-the-box, but comes at least e.g. with* concurrency control* documents have a version number=> implementing versioning is not hard Building a High Performance Environment for RDF Publishing 49
  50. 50. D with elasticsearch Publishing LO Benefits, relying on elasticsearch as basic LOD storage• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• Versionizing• Web developers want: • JSON (LD) • Simple APIs• ... Building a High Performance Environment for RDF Publishing 50
  51. 51. D with elasticsearch Publishing LO Benefits• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• (Versioning) • Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 51
  52. 52. D with elasticsearch Publishing LO Why JSON-LD? JSON is :• stored natively by many tools (e.g. elasticsearch)• loved by consumers (web developers) JSON-LD is :• supported by RDF libraries (e.g. transforming to NTriples) Building a High Performance Environment for RDF Publishing 52
  53. 53. D with elasticsearch Publishing LO Benefits• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• (Versioning) • Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 53
  54. 54. D with elasticsearch Publishing LO BenefitsRESTful elasticsearch API, e. g. :http://lobid.org/resources/_search?q=isbn:$isbn Building a High Performance Environment for RDF Publishing 54
  55. 55. D with elasticsearch Publishing LO Benefits• … and many other nice things come with elasticsearch • geo-search : „Query only libraries/items residing up to 10 km from me.“ • … Building a High Performance Environment for RDF Publishing 55
  56. 56. D with elasticsearch Publishing LO Benefits• Dumps Content Negotiation (different RDF serializations) d!•• SPARQL plishe• accom Human readable representation ( RDFa in HTML)•• Data searchable Mis sion Near Real Time updates• High Availability• (Versioning)• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 56
  57. 57. D with elasticsearch Publishing LO ( … ok, something is left to be done ! )• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates• High Availability• Versionizing• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 57
  58. 58. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demoConclusion Building a High Performance Environment for RDF Publishing 58
  59. 59. D with elasticsearch Publishing LO Caveats• Dumps• Content Negotiation (different RDF serializations)• SPARQL• Human readable representation ( RDFa in HTML)• Data searchable• Near Real Time updates•• High Availability Versionizing !?• Web developers want simple APIs providing JSON• ... Building a High Performance Environment for RDF Publishing 59
  60. 60. D with elasticsearch Publishing LO CaveatsHow to integrate semantic search into a document storage ?dct:contributor --------> dct:creator -------> dc:creator ---------> dc:contributor --------> bibo:translator …There is no inferencing as comes with SPARQL ! Building a High Performance Environment for RDF Publishing 60
  61. 61. D with elasticsearch Publishing LO CaveatsOur data flow :from records to RDF triples to records Building a High Performance Environment for RDF Publishing 61
  62. 62. D with elasticsearch Publishing LO CaveatsOur data flow :from records to RDF triples to records Building a High Performance Environment for RDF Publishing 62
  63. 63. D with elasticsearch Publishing LO Caveatsfrom records to RDF triples to records !? Building a High Performance Environment for RDF Publishing 63
  64. 64. D with elasticsearch Publishing LO CaveatsFrom records to RDF triples |-----> graph-database ------> computing ---> record-database MARC/MAB/PICA... JSON-LD Building a High Performance Environment for RDF Publishing 64
  65. 65. D with elasticsearch Publishing LO Caveats tree-based vs graph-based:Pre-render the whole document? What is the document ? Building a High Performance Environment for RDF Publishing 65
  66. 66. D with elasticsearch Publishing LO CaveatsBuilding a High Performance Environment for RDF Publishing 66
  67. 67. D with elasticsearch Publishing LO CaveatsWhat is the document ? Only the top-level node ? Building a High Performance Environment for RDF Publishing 67
  68. 68. D with elasticsearch Publishing LO CaveatsWhat is the document ? Only the top-level node ?… but then you couldnt even search the authors name ! Building a High Performance Environment for RDF Publishing 68
  69. 69. D with elasticsearch Publishing LO Caveatssearching needs integration of some fields from subgraphs into the document Building a High Performance Environment for RDF Publishing 69
  70. 70. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demoConclusion Building a High Performance Environment for RDF Publishing 70
  71. 71. D with elasticsearch Publishing LO auto suggestauthority IDs must be easily found 71 Building a High Performance Environment for RDF Publishing
  72. 72. D with elasticsearch Publishing LO auto suggestauthority IDs must be easily found => in need of auto suggest 72 Building a High Performance Environment for RDF Publishing
  73. 73. D with elasticsearch Publishing LO auto suggestauto suggests needs fast searching Building a High Performance Environment for RDF Publishing 73
  74. 74. D with elasticsearch Publishing LO Demo auto suggestBuilding a High Performance Environment for RDF Publishing 74
  75. 75. D with elasticsearchPublishing LO 75
  76. 76. D with elasticsearch Publishing LO auto suggestRESTful APIs:http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karlhttp://demo.lobid.org/search?format=page&index=gnd-index&author=Schmidt%2C+Karlhttp://demo.lobid.org/search?format=full&index=gnd-index&author=Schmidt%2C+Karl…API usage:GET /search?format=<page|full|short>&index=<lobid-index|gnd-index>&author=<query>easy to enhance with the play framework and the elasticsearch API Building a High Performance Environment for RDF Publishing 76
  77. 77. D with elasticsearch Publishing LO auto suggestRESTful APIs:http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl [ "Schmidt, Karl (1894-1945)", "Schmidt, Karl", "Schmidt, Karl (1910-)", "Schmidt, Karl (1846-1928)", "Schmidt, Karl (1913-)", "Schmidt, Karl (1899-)", "Schmidt, Karl (1924-)", "Schmidt, Karl (1836-1888)", "Schmidt, L. F. Karl", "Schmidt, Karl (1902-1945)", "Schmidt, Karl J.", "Schmidt, Karl (1848-1905)", "Schmidt, Karl (1817-1882)", "Schmidt, Karl R.", "Schmidt, Karl (1954-)", "Schmidt, Karl (1888-)", "Schmidt, Karl (1867-)", ... ] Building a High Performance Environment for RDF Publishing 77
  78. 78. D with elasticsearch Publishing LO auto suggest GND authority file in lobid-resourcesBuilding a High Performance Environment for RDF Publishing 78
  79. 79. D with elasticsearchPublishing LO 79
  80. 80. D with elasticsearch Publishing LOBuilding a High Performance Environment for RDF Publishing 80
  81. 81. OverviewPublishing is for Consuming • Mandatory • Nice to haveStory so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the dataPublishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demoConclusion Building a High Performance Environment for RDF Publishing 81
  82. 82. D with elasticsearch Publishing LO Conclusion a highly customizable/reliable/feature-rich LOD service Webapp we can do that Search Engine highly available !LOD basis functionality (and some other Triple Store For external access and some APIs) are highly available fancy nice-to-have stuff. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 82
  83. 83. D with elasticsearch Publishing LOthe software is Open Source: http://elasticsearch.org/ https://hadoop.apache.org/ http://www.playframework.org/ http://4store.org/ https://github.com/lobid/Building a High Performance Environment for RDF Publishing 83
  84. 84. Any Questions ? Pascal Christoph christoph@hbz-nrw.de semweb@hbz-nrw.de
  85. 85. Using a dark background, this presentation saves maybe 70% of energy
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×