Your SlideShare is downloading. ×
0
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Linked data at globo.com
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Linked data at globo.com

937

Published on

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
937
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Linked Data at Semantic Team semantica@corp.globo.com Tatiana Al-Chueyr and Rodrigo D. A. Senra {tatiana.martins, rodrigo.senra}@corp.globo.com globo.com
  • 2. Andréia Bustamante Ícaro Medeiros Tatiana Al-Chueyr Rodrigo Senra Semantic Team
  • 3. Franklin Amorim João Carlos Mendes Luís Alberto Beloni André Nicodemus Contributors
  • 4. BROADCAST MOVIES PAY TV INTERNET EVENTS MUSIC PUBLISHING NEW VENTURES NEWSPAPERRADIO NETWORK
  • 5. Motivation Soccer player Cross-link content from different web products
  • 6. Politician MotivationCross-link content from different web products
  • 7. Celebrity Motivation ● Cross-link content from different web products MotivationCross-link content from different web products
  • 8. Isabella Nardoni foi morta em 29 de março de 2008 na Zona Norte de São Paulo (Foto:Reprodução) Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo. Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso. Caso Isabella Nardoni Juliana Cardilli G1 SP RDF FOAF GEO Dublin Core SKOS Semantic markup in web pages Motivation
  • 9. Recommend annotations to information Producer Motivation
  • 10. Suggest related content to information Consumer Motivation
  • 11. Suggest related content to information Consumer Motivation
  • 12. Suggest related content to information Consumer Motivation
  • 13. Outcomes ● Flexible ways to organize content ● Ease to find related issues ● Explicit relations derived from annotated content ● Up-to-date topic pages with little editorial effort ● Linking content across different web products ● Seamless navigation leading to flow state
  • 14. Status Quo Used by the main web products of Globo.com linking, among others: ○ 18,485 organizations ○ 82,386 people ○ 9,129 places ○ 1,000,000+ annotated news from August 2010 to May 2013
  • 15. Legacy Architecture CDA CMA triple store search engine ontology
  • 16. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  • 17. Poor data management ○ direct access to triple store (unmanaged) ○ difficulty to share data (distributed DBs) ○ re-sync triple-store and search engine index ○ scalability of triple store ○ high entropy in distributed ontology engineering Problems
  • 18. Problems
  • 19. Ontology Engineering Domain-driven (current) Base G1 GE EGO TVG news sports gossip tv Upper Person Organization Music Politics Programme Education Sports Product-driven (past) Place
  • 20. Possible Solution Upper Ontology
  • 21. Semantic as a library ○ many different versions in production ○ programming language dependent ○ steep learning curve for RDF/OWL/SPARQL Problems
  • 22. Create an open semantic data management platform ● Scalable ● Mobile and Web friendly ● Interconnect Globo's data with external data sources ● Automate content extraction (including NER) Next Step
  • 23. Brainiak linked data restful API
  • 24. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  • 25. API Brainiak CMA CDA CDA CDA CDA triple store search engine Under Development
  • 26. Requirements ● Indirect usage of SPARQL ● Programming language independent ● Data management with quality ● Finer-grained authorization and authentication ● Isolate applications from triplestore ● Improve triplestore performance
  • 27. SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0 task: list all sports teams
  • 28. /sports/Team Brainiak query GET
  • 29. SPARQL response
  • 30. Brainiak response
  • 31. Brainiak concepts ● Instance ● Collection (set of instances from a given Class) ● Schema (the Class definition) ● Context
  • 32. Instance
  • 33. Collection
  • 34. Schema
  • 35. Context
  • 36. place State Brazil Country Japan City Real example
  • 37. /placeGET /place/CountryGET /place/Country/_schemaGET /place/Country/BrazilGET Real example
  • 38. resource URL→ /place/Country/Brazil context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Country instance → http://semantica.globo.com/place/Country/Brazil URI Conventions
  • 39. /place/River ?graph_uri=http://dbpedia.org/resource/classes# &class_uri=dbpedia:River Overriden context (graph) → http://dbpedia.org/resource/classes# class → http://dbpedia.org/ontology/River Convention context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River Legacy URIs
  • 40. Hypermedia ● Flexibility and programmatic adaptation ● Semantic affordances ● Client has to understand what is consumed ● "Hypermedia APIs are not fully baked yet"
  • 41. Brainiak hypermedia graph context instance / schema inCollection item instances instances describedBy self replace delete self instances self self self create collection
  • 42. Services ● List Contexts ● List Collections ● Get a Schema ● List Prefixes ● Status of Services ● Create ● Retrieve ● Delete ● Edit ● List Instances
  • 43. Features ● JSON-Schema ● JSON-LD ● REST ● Python + Tornado OPTIONS GET PUT POST DELETE
  • 44. /sports/Team Brainiak query GET
  • 45. Brainiak response
  • 46. Brainiak response
  • 47. Brainiak response
  • 48. Brainiak response
  • 49. SPARQL query SELECT DISTINCT ?class WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class . } task: retrieve all superclasses of a class
  • 50. SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property WHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo. com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } } } task: retrieve all properties of a group of classes
  • 51. SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . } } } task: retrieve the cardinalities of all properties of a certain class
  • 52. /place/City/_schema Brainiak query GET
  • 53. ● SEO (automatic schema.org) ● Improved annotator (DBpedia Spotlight) ● Richer content relationships (inference) ● Link to open data (e.g. DBPedia, dados.gov.br) Next steps
  • 54. Stay tuned @brainiak_api ... will be soon released as an open source project !
  • 55. Semantic Team semantica@corp.globo.com globo.com Thank you for the attention!

×