Linked Data atSemantic Teamsemantica@corp.globo.comTatiana Al-Chueyr and Rodrigo D. A. Senra{tatiana.martins, rodrigo.senr...
Andréia BustamanteÍcaro MedeirosTatiana Al-ChueyrRodrigo SenraSemantic Team
Franklin AmorimJoão Caros MendesAlberto BeloniAndré NicodemusContributors
BROADCAST MOVIES PAY TV INTERNETEVENTS MUSICPUBLISHINGNEW VENTURES NEWSPAPERRADIO NETWORK
MotivationSoccer playerCross-link content from different web products
PoliticianMotivationCross-link content from different web products
CelebrityMotivation● Cross-link content from different web productsMotivationCross-link content from different web products
Isabella Nardoni foi morta em 29 de março de 2008na Zona Norte de São Paulo (Foto:Reprodução)Isabella de Oliveira Nardoni,...
Recommend annotations to information ProducerMotivation
Suggest related content to information ConsumerMotivation
Suggest related content to information ConsumerMotivation
Suggest related content to information ConsumerMotivation
Outcomes● Flexible ways to organize content● Ease to find related issues● Explicit relations derived from annotated conten...
Status QuoUsed by the main web products of Globo.comlinking, among others:○ 18,485 organizations○ 82,386 people○ 9,129 pla...
Legacy ArchitectureCDACMAtriplestoresearchengineontology
CDACMACDACMACDACMACDACMALegacy Architecturetriplestoresearchengineontology
Poor data management○ direct access to triple store (unmanaged)○ difficulty to share data (distributed DBs)○ re-sync tripl...
Problems
Ontology EngineeringDomain-driven(current)BaseG1 GE EGO TVGnews sports gossip tvUpperPerson OrganizationMusicPoliticsProgr...
Possible SolutionUpperOntology
Semantic as a library○ many different versions in production○ programming language dependent○ steep learning curve for RDF...
Create an open semantic data management platform● Scalable● Mobile and Web friendly● Interconnect Globos data with externa...
Brainiaklinked data restfulAPI
CDACMACDACMACDACMACDACMALegacy Architecturetriplestoresearchengineontology
APIBrainiakCMACDACDACDACDAtriplestoresearchengineUnder Development
Requirements● Indirect usage of SPARQL● Programming language independent● Data management with quality● Finer-grained auth...
SPARQL queryDEFINE input:inference <http://data.globo.com/ruleset>SELECT ?uri ?labelFROM <http://data.globo.com/sports/>WH...
/sports/TeamBrainiak queryGET
SPARQL response
Brainiak response
Brainiak concepts● Instance● Collection (set of instances from a given Class)● Schema (the Class definition)● Context
Instance
Collection
Schema
Context
placeStateBrazilCountryJapanCityReal example
/placeGET/place/CountryGET/place/Country/_schemaGET/place/Country/BrazilGETReal example
resource URL→ /place/Country/Brazilcontext (graph)→ http://semantica.globo.com/place/class → http://semantica.globo.com/pl...
/place/River?graph_uri=http://dbpedia.org/resource/classes#&class_uri=dbpedia:RiverOverridencontext (graph) → http://dbped...
Hypermedia● Flexibility and programmatic adaptation● Semantic affordances● Client has to understand what is consumed● "Hyp...
Brainiak hypermedia graphcontext instance/ schemainCollectioniteminstancesinstancesdescribedByselfreplacedeleteselfinstanc...
Services● List Contexts● List Collections● Get a Schema● List Prefixes● Status of Services● Create● Retrieve● Delete● Edit...
Features● JSON-Schema● JSON-LD● REST● Python + TornadoOPTIONS GET PUT POST DELETE
/sports/TeamBrainiak queryGET
Brainiak response
Brainiak response
Brainiak response
Brainiak response
SPARQL querySELECT DISTINCT ?classWHERE {<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION(TRANSITIVE, t_di...
SPARQL querySELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?...
SPARQL querySELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE {<http://data.globo...
/place/City/_schemaBrainiak queryGET
● SEO (automatic schema.org)● Improved annotator (DBpedia Spotlight)● Richer content relationships (inference)● Link to op...
Stay tuned@brainiak_api... will be soon releasedas an open source project !
Semantic Teamsemantica@corp.globo.comglobo.comThank youfor the attention!
Upcoming SlideShare
Loading in …5
×

Semantic day 2013 linked data at globo.com

2,031 views

Published on

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,031
On SlideShare
0
From Embeds
0
Number of Embeds
29
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Semantic day 2013 linked data at globo.com

  1. 1. Linked Data atSemantic Teamsemantica@corp.globo.comTatiana Al-Chueyr and Rodrigo D. A. Senra{tatiana.martins, rodrigo.senra}@corp.globo.comglobo.com
  2. 2. Andréia BustamanteÍcaro MedeirosTatiana Al-ChueyrRodrigo SenraSemantic Team
  3. 3. Franklin AmorimJoão Caros MendesAlberto BeloniAndré NicodemusContributors
  4. 4. BROADCAST MOVIES PAY TV INTERNETEVENTS MUSICPUBLISHINGNEW VENTURES NEWSPAPERRADIO NETWORK
  5. 5. MotivationSoccer playerCross-link content from different web products
  6. 6. PoliticianMotivationCross-link content from different web products
  7. 7. CelebrityMotivation● Cross-link content from different web productsMotivationCross-link content from different web products
  8. 8. Isabella Nardoni foi morta em 29 de março de 2008na Zona Norte de São Paulo (Foto:Reprodução)Isabella de Oliveira Nardoni, de 5anos, foi morta na noite de 29 demarço de 2008. A perícia concluiuque a menina foi atirada do sextoandar do prédio onde moravam seupai, Alexandre Nardoni, suamadrasta, Anna Carolina Jatobá, edois filhos pequenos do casal, naVila Isolina Mazzei, na zona norte deSão Paulo.Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.Caso Isabella NardoniJuliana CardilliG1 SPRDFFOAFGEODublinCoreSKOSSemantic markup in web pagesMotivation
  9. 9. Recommend annotations to information ProducerMotivation
  10. 10. Suggest related content to information ConsumerMotivation
  11. 11. Suggest related content to information ConsumerMotivation
  12. 12. Suggest related content to information ConsumerMotivation
  13. 13. Outcomes● Flexible ways to organize content● Ease to find related issues● Explicit relations derived from annotated content● Up-to-date topic pages with little editorial effort● Linking content across different web products● Seamless navigation leading to flow state
  14. 14. Status QuoUsed by the main web products of Globo.comlinking, among others:○ 18,485 organizations○ 82,386 people○ 9,129 places○ 1,000,000+ annotated newsfrom August 2010 to May 2013
  15. 15. Legacy ArchitectureCDACMAtriplestoresearchengineontology
  16. 16. CDACMACDACMACDACMACDACMALegacy Architecturetriplestoresearchengineontology
  17. 17. Poor data management○ direct access to triple store (unmanaged)○ difficulty to share data (distributed DBs)○ re-sync triple-store and search engine index○ scalability of triple store○ high entropy in distributed ontology engineeringProblems
  18. 18. Problems
  19. 19. Ontology EngineeringDomain-driven(current)BaseG1 GE EGO TVGnews sports gossip tvUpperPerson OrganizationMusicPoliticsProgramme EducationSportsProduct-driven(past)Place
  20. 20. Possible SolutionUpperOntology
  21. 21. Semantic as a library○ many different versions in production○ programming language dependent○ steep learning curve for RDF/OWL/SPARQLProblems
  22. 22. Create an open semantic data management platform● Scalable● Mobile and Web friendly● Interconnect Globos data with external data sources● Automate content extraction (including NER)Next Step
  23. 23. Brainiaklinked data restfulAPI
  24. 24. CDACMACDACMACDACMACDACMALegacy Architecturetriplestoresearchengineontology
  25. 25. APIBrainiakCMACDACDACDACDAtriplestoresearchengineUnder Development
  26. 26. Requirements● Indirect usage of SPARQL● Programming language independent● Data management with quality● Finer-grained authorization and authentication● Isolate applications from triplestore● Improve triplestore performance
  27. 27. SPARQL queryDEFINE input:inference <http://data.globo.com/ruleset>SELECT ?uri ?labelFROM <http://data.globo.com/sports/>WHERE{?uri a <http://data.globo.com/sports/Team>;rdfs:label ?label .}LIMIT 10OFFSET 0task: list all sports teams
  28. 28. /sports/TeamBrainiak queryGET
  29. 29. SPARQL response
  30. 30. Brainiak response
  31. 31. Brainiak concepts● Instance● Collection (set of instances from a given Class)● Schema (the Class definition)● Context
  32. 32. Instance
  33. 33. Collection
  34. 34. Schema
  35. 35. Context
  36. 36. placeStateBrazilCountryJapanCityReal example
  37. 37. /placeGET/place/CountryGET/place/Country/_schemaGET/place/Country/BrazilGETReal example
  38. 38. resource URL→ /place/Country/Brazilcontext (graph)→ http://semantica.globo.com/place/class → http://semantica.globo.com/place/Countryinstance → http://semantica.globo.com/place/Country/BrazilURI Conventions
  39. 39. /place/River?graph_uri=http://dbpedia.org/resource/classes#&class_uri=dbpedia:RiverOverridencontext (graph) → http://dbpedia.org/resource/classes#class → http://dbpedia.org/ontology/RiverConventioncontext (graph)→ http://semantica.globo.com/place/class → http://semantica.globo.com/place/RiverLegacy URIs
  40. 40. Hypermedia● Flexibility and programmatic adaptation● Semantic affordances● Client has to understand what is consumed● "Hypermedia APIs are not fully baked yet"
  41. 41. Brainiak hypermedia graphcontext instance/ schemainCollectioniteminstancesinstancesdescribedByselfreplacedeleteselfinstancesselfselfselfcreatecollection
  42. 42. Services● List Contexts● List Collections● Get a Schema● List Prefixes● Status of Services● Create● Retrieve● Delete● Edit● ListInstances
  43. 43. Features● JSON-Schema● JSON-LD● REST● Python + TornadoOPTIONS GET PUT POST DELETE
  44. 44. /sports/TeamBrainiak queryGET
  45. 45. Brainiak response
  46. 46. Brainiak response
  47. 47. Brainiak response
  48. 48. Brainiak response
  49. 49. SPARQL querySELECT DISTINCT ?classWHERE {<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION(TRANSITIVE, t_distinct, t_step(step_no) as ?n, t_min (0)) .?class a owl:Class .}task: retrieve all superclasses of a class
  50. 50. SPARQL querySELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE {{GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } .} UNION {graph ?predicate_graph {?predicate rdfs:domain ?blank} .?blank a owl:Class .?blank owl:unionOf ?enumeration .OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .OPTIONAL { ?list_node rdf:first ?domain_class } .}FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)){?predicate rdfs:range ?range .}UNION {?predicate rdfs:range ?blank .?blank a owl:Class .?blank owl:unionOf ?enumeration .OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .OPTIONAL { ?list_node rdf:first ?range } .}FILTER (!isBlank(?range))?predicate rdfs:label ?title .?predicate rdf:type ?type .OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } .FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) .FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) .OPTIONAL { ?predicate rdfs:comment ?predicate_comment }FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) .OPTIONAL {GRAPH ?range_graph {?range rdfs:label ?range_label .FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) .}}}task: retrieve all properties of a group of classes
  51. 51. SPARQL querySELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE {<http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step(step_no) as ?n,t_min (0)) .?s owl:onProperty ?predicate .OPTIONAL { ?s owl:minQualifiedCardinality ?min } .OPTIONAL { ?s owl:maxQualifiedCardinality ?max } .OPTIONAL {{ ?s owl:onClass ?range }UNION { ?s owl:onDataRange ?range }UNION { ?s owl:allValuesFrom ?range }OPTIONAL { ?range owl:oneOf ?enumeration } .OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .OPTIONAL { ?list_node rdf:first ?enumerated_value } .OPTIONAL {?enumerated_value rdfs:label ?enumerated_value_label .} .}}}task: retrieve the cardinalities of all properties of a certain class
  52. 52. /place/City/_schemaBrainiak queryGET
  53. 53. ● SEO (automatic schema.org)● Improved annotator (DBpedia Spotlight)● Richer content relationships (inference)● Link to open data (e.g. DBPedia, dados.gov.br)Next steps
  54. 54. Stay tuned@brainiak_api... will be soon releasedas an open source project !
  55. 55. Semantic Teamsemantica@corp.globo.comglobo.comThank youfor the attention!

×