Linked Data at
Semantic Team
semantica@corp.globo.com
Tatiana Al-Chueyr and Rodrigo D. A. Senra
{tatiana.martins, rodrigo....
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
Franklin Amorim
João Carlos Mendes Luís
Alberto Beloni
André Nicodemus
Contributors
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
Motivation
Soccer player
Cross-link content from different web products
Politician
MotivationCross-link content from different web products
Celebrity
Motivation
● Cross-link content from different web products
MotivationCross-link content from different web prod...
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardon...
Recommend annotations to information Producer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Outcomes
● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated con...
Status Quo
Used by the main web products of Globo.com
linking, among others:
○ 18,485 organizations
○ 82,386 people
○ 9,12...
Legacy Architecture
CDA
CMA
triple
store
search
engine
ontology
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync tr...
Problems
Ontology Engineering
Domain-driven
(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Poli...
Possible Solution
Upper
Ontology
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for ...
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with ext...
Brainiak
linked data restful
API
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
API
Brainiak
CMA
CDA
CDA
CDA
CDA
triple
store
search
engine
Under Development
Requirements
● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained ...
SPARQL query
DEFINE input:inference <http://data.globo.com/ruleset>
SELECT ?uri ?label
FROM <http://data.globo.com/sports/...
/sports/Team
Brainiak query
GET
SPARQL response
Brainiak response
Brainiak concepts
● Instance
● Collection (set of instances from a given Class)
● Schema (the Class definition)
● Context
Instance
Collection
Schema
Context
place
State
Brazil
Country
Japan
City
Real example
/placeGET
/place/CountryGET
/place/Country/_schemaGET
/place/Country/BrazilGET
Real example
resource URL→ /place/Country/Brazil
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/...
/place/River
?graph_uri=http://dbpedia.org/resource/classes#
&class_uri=dbpedia:River
Overriden
context (graph) → http://d...
Hypermedia
● Flexibility and programmatic adaptation
● Semantic affordances
● Client has to understand what is consumed
● ...
Brainiak hypermedia graph
context instance
/ schema
inCollection
item
instances
instances
describedBy
self
replace
delete
...
Services
● List Contexts
● List Collections
● Get a Schema
● List Prefixes
● Status of Services
● Create
● Retrieve
● Dele...
Features
● JSON-Schema
● JSON-LD
● REST
● Python + Tornado
OPTIONS GET PUT POST DELETE
/sports/Team
Brainiak query
GET
Brainiak response
Brainiak response
Brainiak response
Brainiak response
SPARQL query
SELECT DISTINCT ?class
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION
(TRANSITIVE, ...
SPARQL query
SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ...
SPARQL query
SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label
WHERE {
<http://data.gl...
/place/City/_schema
Brainiak query
GET
● SEO (automatic schema.org)
● Improved annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to...
Stay tuned
@brainiak_api
... will be soon released
as an open source project !
Semantic Team
semantica@corp.globo.com
globo.com
Thank you
for the attention!
Upcoming SlideShare
Loading in...5
×

Linked data at globo.com

981

Published on

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

Published in: Technology, Education

Linked data at globo.com

  1. 1. Linked Data at Semantic Team semantica@corp.globo.com Tatiana Al-Chueyr and Rodrigo D. A. Senra {tatiana.martins, rodrigo.senra}@corp.globo.com globo.com
  2. 2. Andréia Bustamante Ícaro Medeiros Tatiana Al-Chueyr Rodrigo Senra Semantic Team
  3. 3. Franklin Amorim João Carlos Mendes Luís Alberto Beloni André Nicodemus Contributors
  4. 4. BROADCAST MOVIES PAY TV INTERNET EVENTS MUSIC PUBLISHING NEW VENTURES NEWSPAPERRADIO NETWORK
  5. 5. Motivation Soccer player Cross-link content from different web products
  6. 6. Politician MotivationCross-link content from different web products
  7. 7. Celebrity Motivation ● Cross-link content from different web products MotivationCross-link content from different web products
  8. 8. Isabella Nardoni foi morta em 29 de março de 2008 na Zona Norte de São Paulo (Foto:Reprodução) Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo. Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso. Caso Isabella Nardoni Juliana Cardilli G1 SP RDF FOAF GEO Dublin Core SKOS Semantic markup in web pages Motivation
  9. 9. Recommend annotations to information Producer Motivation
  10. 10. Suggest related content to information Consumer Motivation
  11. 11. Suggest related content to information Consumer Motivation
  12. 12. Suggest related content to information Consumer Motivation
  13. 13. Outcomes ● Flexible ways to organize content ● Ease to find related issues ● Explicit relations derived from annotated content ● Up-to-date topic pages with little editorial effort ● Linking content across different web products ● Seamless navigation leading to flow state
  14. 14. Status Quo Used by the main web products of Globo.com linking, among others: ○ 18,485 organizations ○ 82,386 people ○ 9,129 places ○ 1,000,000+ annotated news from August 2010 to May 2013
  15. 15. Legacy Architecture CDA CMA triple store search engine ontology
  16. 16. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  17. 17. Poor data management ○ direct access to triple store (unmanaged) ○ difficulty to share data (distributed DBs) ○ re-sync triple-store and search engine index ○ scalability of triple store ○ high entropy in distributed ontology engineering Problems
  18. 18. Problems
  19. 19. Ontology Engineering Domain-driven (current) Base G1 GE EGO TVG news sports gossip tv Upper Person Organization Music Politics Programme Education Sports Product-driven (past) Place
  20. 20. Possible Solution Upper Ontology
  21. 21. Semantic as a library ○ many different versions in production ○ programming language dependent ○ steep learning curve for RDF/OWL/SPARQL Problems
  22. 22. Create an open semantic data management platform ● Scalable ● Mobile and Web friendly ● Interconnect Globo's data with external data sources ● Automate content extraction (including NER) Next Step
  23. 23. Brainiak linked data restful API
  24. 24. CDA CMA CDA CMA CDA CMA CDA CMA Legacy Architecture triple store search engine ontology
  25. 25. API Brainiak CMA CDA CDA CDA CDA triple store search engine Under Development
  26. 26. Requirements ● Indirect usage of SPARQL ● Programming language independent ● Data management with quality ● Finer-grained authorization and authentication ● Isolate applications from triplestore ● Improve triplestore performance
  27. 27. SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0 task: list all sports teams
  28. 28. /sports/Team Brainiak query GET
  29. 29. SPARQL response
  30. 30. Brainiak response
  31. 31. Brainiak concepts ● Instance ● Collection (set of instances from a given Class) ● Schema (the Class definition) ● Context
  32. 32. Instance
  33. 33. Collection
  34. 34. Schema
  35. 35. Context
  36. 36. place State Brazil Country Japan City Real example
  37. 37. /placeGET /place/CountryGET /place/Country/_schemaGET /place/Country/BrazilGET Real example
  38. 38. resource URL→ /place/Country/Brazil context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Country instance → http://semantica.globo.com/place/Country/Brazil URI Conventions
  39. 39. /place/River ?graph_uri=http://dbpedia.org/resource/classes# &class_uri=dbpedia:River Overriden context (graph) → http://dbpedia.org/resource/classes# class → http://dbpedia.org/ontology/River Convention context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River Legacy URIs
  40. 40. Hypermedia ● Flexibility and programmatic adaptation ● Semantic affordances ● Client has to understand what is consumed ● "Hypermedia APIs are not fully baked yet"
  41. 41. Brainiak hypermedia graph context instance / schema inCollection item instances instances describedBy self replace delete self instances self self self create collection
  42. 42. Services ● List Contexts ● List Collections ● Get a Schema ● List Prefixes ● Status of Services ● Create ● Retrieve ● Delete ● Edit ● List Instances
  43. 43. Features ● JSON-Schema ● JSON-LD ● REST ● Python + Tornado OPTIONS GET PUT POST DELETE
  44. 44. /sports/Team Brainiak query GET
  45. 45. Brainiak response
  46. 46. Brainiak response
  47. 47. Brainiak response
  48. 48. Brainiak response
  49. 49. SPARQL query SELECT DISTINCT ?class WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class . } task: retrieve all superclasses of a class
  50. 50. SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property WHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo. com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } } } task: retrieve all properties of a group of classes
  51. 51. SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label WHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . } } } task: retrieve the cardinalities of all properties of a certain class
  52. 52. /place/City/_schema Brainiak query GET
  53. 53. ● SEO (automatic schema.org) ● Improved annotator (DBpedia Spotlight) ● Richer content relationships (inference) ● Link to open data (e.g. DBPedia, dados.gov.br) Next steps
  54. 54. Stay tuned @brainiak_api ... will be soon released as an open source project !
  55. 55. Semantic Team semantica@corp.globo.com globo.com Thank you for the attention!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×