Linked data at globo.com

  • 802 views
Uploaded on

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
802
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Linked Data at Semantic Team semantica@corp.globo.com Tatiana Al-Chueyr and Rodrigo D. A. Senra {tatiana.martins, rodrigo.senra}@corp.globo.com globo.com
  • 2. Andréia Bustamante Ícaro Medeiros Tatiana Al-Chueyr Rodrigo Senra Semantic Team
  • 3. Franklin Amorim João Carlos Mendes Luís Alberto Beloni André Nicodemus Contributors
  • 4. BROADCAST MOVIES PAY TV INTERNET EVENTS MUSIC PUBLISHING NEW VENTURES NEWSPAPERRADIO NETWORK
  • 5. Motivation Soccer player Cross-link content from different web products
  • 6. Politician MotivationCross-link content from different web products
  • 7. Celebrity Motivation ● Cross-link content from different web products MotivationCross-link content from different web products
  • 8. Isabella Nardoni foi morta em 29 de março de 2008 na Zona Norte de São Paulo (Foto:Reprodução) Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo. Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso. Caso Isabella Nardoni Juliana Cardilli G1 SP RDF FOAF GEO Dublin Core SKOS Semantic markup in web pages Motivation
  • 9. Recommend annotations to information Producer Motivation
  • 10. Suggest related content to information Consumer Motivation
  • 11. Suggest related content to information Consumer Motivation
  • 12. Suggest related content to information Consumer Motivation
  • 13. Outcomes ● Flexible ways to organize content ● Ease to find related issues ● Explicit relations derived from annotated content ● Up-to-date topic pages with little editorial effort ● Linking content across different web products ● Seamless navigation leading to flow state
  • 14. Status Quo Used by the main web products of Globo.com linking, among others: ○ 18,485 organizations ○ 82,386 people ○ 9,129 places ○ 1,000,000+ annotated news from August 2010 to May 2013
  • 15. Legacy Architecture CDA CMA triple store search engine ontology
  • 16. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  • 17. Poor data management ○ direct access to triple store (unmanaged) ○ difficulty to share data (distributed DBs) ○ re-sync triple-store and search engine index ○ scalability of triple store ○ high entropy in distributed ontology engineering Problems
  • 18. Problems
  • 19. Ontology Engineering Domain-driven (current) Base G1 GE EGO TVG news sports gossip tv Upper Person Organization Music Politics Programme Education Sports Product-driven (past) Place
  • 20. Possible Solution Upper Ontology
  • 21. Semantic as a library ○ many different versions in production ○ programming language dependent ○ steep learning curve for RDF/OWL/SPARQL Problems
  • 22. Create an open semantic data management platform ● Scalable ● Mobile and Web friendly ● Interconnect Globo's data with external data sources ● Automate content extraction (including NER) Next Step
  • 23. Brainiak linked data restful API
  • 24. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  • 25. API Brainiak CMA CDA CDA CDA CDA triple store search engine Under Development
  • 26. Requirements ● Indirect usage of SPARQL ● Programming language independent ● Data management with quality ● Finer-grained authorization and authentication ● Isolate applications from triplestore ● Improve triplestore performance
  • 27. SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0 task: list all sports teams
  • 28. /sports/Team Brainiak query GET
  • 29. SPARQL response
  • 30. Brainiak response
  • 31. Brainiak concepts ● Instance ● Collection (set of instances from a given Class) ● Schema (the Class definition) ● Context
  • 32. Instance
  • 33. Collection
  • 34. Schema
  • 35. Context
  • 36. place State Brazil Country Japan City Real example
  • 37. /placeGET /place/CountryGET /place/Country/_schemaGET /place/Country/BrazilGET Real example
  • 38. resource URL→ /place/Country/Brazil context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Country instance → http://semantica.globo.com/place/Country/Brazil URI Conventions
  • 39. /place/River ?graph_uri=http://dbpedia.org/resource/classes# &class_uri=dbpedia:River Overriden context (graph) → http://dbpedia.org/resource/classes# class → http://dbpedia.org/ontology/River Convention context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River Legacy URIs
  • 40. Hypermedia ● Flexibility and programmatic adaptation ● Semantic affordances ● Client has to understand what is consumed ● "Hypermedia APIs are not fully baked yet"
  • 41. Brainiak hypermedia graph context instance / schema inCollection item instances instances describedBy self replace delete self instances self self self create collection
  • 42. Services ● List Contexts ● List Collections ● Get a Schema ● List Prefixes ● Status of Services ● Create ● Retrieve ● Delete ● Edit ● List Instances
  • 43. Features ● JSON-Schema ● JSON-LD ● REST ● Python + Tornado OPTIONS GET PUT POST DELETE
  • 44. /sports/Team Brainiak query GET
  • 45. Brainiak response
  • 46. Brainiak response
  • 47. Brainiak response
  • 48. Brainiak response
  • 49. SPARQL query SELECT DISTINCT ?class WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class . } task: retrieve all superclasses of a class
  • 50. SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property WHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo. com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } } } task: retrieve all properties of a group of classes
  • 51. SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . } } } task: retrieve the cardinalities of all properties of a certain class
  • 52. /place/City/_schema Brainiak query GET
  • 53. ● SEO (automatic schema.org) ● Improved annotator (DBpedia Spotlight) ● Richer content relationships (inference) ● Link to open data (e.g. DBPedia, dados.gov.br) Next steps
  • 54. Stay tuned @brainiak_api ... will be soon released as an open source project !
  • 55. Semantic Team semantica@corp.globo.com globo.com Thank you for the attention!