WWW.SPAZIODATI.EU                                       JSONpedia                          Facilitating consumption of Med...
What is JSONpedia?mercoledì 10 ottobre 12
“JSONpedia is a library and a web service                     meant to read WikiText markup as JSON.”mercoledì 10 ottobre 12
‣       Initially conceived as a tool to produce data to                     train Machine Learning models.             ‣ ...
Differences with Sweeblemercoledì 10 ottobre 12
‣    Lightweight Event based parser.                     ‣    More tolerant to frequent syntax errors                     ...
Differences with DBpediamercoledì 10 ottobre 12
‣       JSONpedia doesnt add any semantic to                          the extracted data.                  ‣       JSONped...
JSONpedia Internalsmercoledì 10 ottobre 12
Architecture                             Parser      Structure                                         Validator          ...
WikiText Parser Events                   // Document bounding.                    // Links                   void beginDoc...
WikiText Processors                Processors receive the stream of events generated by the                parser and perf...
Structure                 The Structure Processor receives a stream of                 WikiText parsing events and builds ...
Extractors                          Extractors are specific Processors that                          collect a certain type...
Linkers                      A Linker is a Processor which links the                      current document entity to other...
Splitters                          A Splitter is a Processor able to cut sub                          trees of the JSON do...
Validator                          A Validator is a Processor performing the                          check of data struct...
Forthcoming Features                     ‣    JSONpedia DB (based on MongoDB +                          ElasticSearch) can...
Release                          JSONpedia will be fully released                          OpenSource in by the end of the...
Live Demo                          http://bit.ly/jsonpedia                                    or        http://json.it.dbp...
WWW.SPAZIODATI.EU                                   Thanks!                  Michele Mostarda <mostarda@spaziodati.eu>, TW...
Upcoming SlideShare
Loading in …5
×

Introducing JSONpedia

3,658 views

Published on

Introduction to JSONpedia a JSON version of Wikipedia

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,658
On SlideShare
0
From Embeds
0
Number of Embeds
1,376
Actions
Shares
0
Downloads
11
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Introducing JSONpedia

  1. 1. WWW.SPAZIODATI.EU JSONpedia Facilitating consumption of MediaWiki content. Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmosmercoledì 10 ottobre 12
  2. 2. What is JSONpedia?mercoledì 10 ottobre 12
  3. 3. “JSONpedia is a library and a web service meant to read WikiText markup as JSON.”mercoledì 10 ottobre 12
  4. 4. ‣ Initially conceived as a tool to produce data to train Machine Learning models. ‣ The REST service,inspired by Sweeble Crystalball,produces JSON, HTML and (coming soon) RDF data. ‣ Written over a context-dependent event based parser to be more performant than an Regex based parser (like the wikiparser) or a DOM based parser (like Sweeble).mercoledì 10 ottobre 12
  5. 5. Differences with Sweeblemercoledì 10 ottobre 12
  6. 6. ‣ Lightweight Event based parser. ‣ More tolerant to frequent syntax errors present within WikiText pages. ‣ Serializes to JSON output which is easier to consume!mercoledì 10 ottobre 12
  7. 7. Differences with DBpediamercoledì 10 ottobre 12
  8. 8. ‣ JSONpedia doesnt add any semantic to the extracted data. ‣ JSONpedia could integrate the current DBpedia regex-based parser. ‣ JSONpedia is a not competitor of DBpedia but rather a complement.mercoledì 10 ottobre 12
  9. 9. JSONpedia Internalsmercoledì 10 ottobre 12
  10. 10. Architecture Parser Structure Validator Input WikiText Extractor Splitter DBpedia API/ Linker Freebase Output JSON +mercoledì 10 ottobre 12
  11. 11. WikiText Parser Events // Document bounding. // Links void beginDocument(URL document); void beginLink(String url); void endDocument(); void endLink(String url); // Error handling. // lists void parseWarning(String msg, void beginList(); ParserLocation location); void listItem(); void parseError(Exception e, void endList(); ParserLocation location); // Templates // Tag handling. void beginTemplate(String name); void beginTag(String node, Attribute[] void endTemplate(String name); attributes); void endTag(String node); // Tables void inlineTag(String node, void beginTable(); Attribute[] attributes); void headCell(int row, int col); void commentTag(String comment); void bodyCell(int row, int col); void endTable(); // Sections void section(String title, int level); // Generic parameter void parameter(String param); // References // parameter / text value void beginReference(String label); void text(String content); void endReference(String label);mercoledì 10 ottobre 12
  12. 12. WikiText Processors Processors receive the stream of events generated by the parser and perform data construction and transformation. ‣ Structure ‣ Extractors ‣ Linkers ‣ Splitters ‣ Validatormercoledì 10 ottobre 12
  13. 13. Structure The Structure Processor receives a stream of WikiText parsing events and builds a 1-1JSON representation of the document DOM.mercoledì 10 ottobre 12
  14. 14. Extractors Extractors are specific Processors that collect a certain type of data from the event stream: for example the SectionsExtractor collects the list of all sections detected in the document stream.mercoledì 10 ottobre 12
  15. 15. Linkers A Linker is a Processor which links the current document entity to other informations acquired from external sources. An example of Linker is the FreebaseLinker which connects an entity to the same representation in Freebase if any.mercoledì 10 ottobre 12
  16. 16. Splitters A Splitter is a Processor able to cut sub trees of the JSON document built by the Structure processor. An example of Splitter is the TableSplitter which extract the JSON structures representing the tables declared in the document.mercoledì 10 ottobre 12
  17. 17. Validator A Validator is a Processor performing the check of data structures parsed from a document.mercoledì 10 ottobre 12
  18. 18. Forthcoming Features ‣ JSONpedia DB (based on MongoDB + ElasticSearch) can be queried online. Also JSONpedia dumps will be available. ‣ Online data model Exporter Tool (CSV) ‣ RDF output.mercoledì 10 ottobre 12
  19. 19. Release JSONpedia will be fully released OpenSource in by the end of the year.mercoledì 10 ottobre 12
  20. 20. Live Demo http://bit.ly/jsonpedia or http://json.it.dbpedia.org/frontend/form.htmlmercoledì 10 ottobre 12
  21. 21. WWW.SPAZIODATI.EU Thanks! Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmosmercoledì 10 ottobre 12

×