Semantic day 2013 linked data at globo.com

Linked Data at
Semantic Team
semantica@corp.globo.com
Tatiana Al-Chueyr and Rodrigo D. A. Senra
{tatiana.martins, rodrigo.senra}@corp.globo.com
globo.com

Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team

Franklin Amorim
João Caros Mendes
Alberto Beloni
André Nicodemus
Contributors

BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK

Motivation
Soccer player
Cross-link content from different web products

Politician
MotivationCross-link content from different web products

Celebrity
Motivation
● Cross-link content from different web products
MotivationCross-link content from different web products

Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5
anos, foi morta na noite de 29 de
março de 2008. A perícia concluiu
que a menina foi atirada do sexto
andar do prédio onde moravam seu
pai, Alexandre Nardoni, sua
madrasta, Anna Carolina Jatobá, e
dois filhos pequenos do casal, na
Vila Isolina Mazzei, na zona norte de
São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli
G1 SP
RDF
FOAF
GEO
Dublin
Core
SKOS
Semantic markup in web pages
Motivation

Recommend annotations to information Producer
Motivation

Suggest related content to information Consumer
Motivation

Outcomes
● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state

Status Quo
Used by the main web products of Globo.com
linking, among others:
○ 18,485 organizations
○ 82,386 people
○ 9,129 places
○ 1,000,000+ annotated news
from August 2010 to May 2013

Legacy Architecture
CDA
CMA
triple
store
search
engine
ontology

CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology

Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems

Ontology Engineering
Domain-driven
(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven
(past)
Place

Possible Solution
Upper
Ontology

Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems

Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Next Step

Brainiak
linked data restful
API

API
Brainiak
CMA
CDA
CDA
CDA
CDA
triple
store
search
engine
Under Development

Requirements
● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance

SPARQL query
DEFINE input:inference <http://data.globo.com/ruleset>
SELECT ?uri ?label
FROM <http://data.globo.com/sports/>
WHERE
{
?uri a <http://data.globo.com/sports/Team>;
rdfs:label ?label .
}
LIMIT 10
OFFSET 0
task: list all sports teams

/sports/Team
Brainiak query
GET

Brainiak concepts
● Instance
● Collection (set of instances from a given Class)
● Schema (the Class definition)
● Context

place
State
Brazil
Country
Japan
City
Real example

/placeGET
/place/CountryGET
/place/Country/_schemaGET
/place/Country/BrazilGET
Real example

resource URL→ /place/Country/Brazil
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/Country
instance → http://semantica.globo.com/place/Country/Brazil
URI Conventions

/place/River
?graph_uri=http://dbpedia.org/resource/classes#
&class_uri=dbpedia:River
Overriden
context (graph) → http://dbpedia.org/resource/classes#
class → http://dbpedia.org/ontology/River
Convention
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/River
Legacy URIs

Hypermedia
● Flexibility and programmatic adaptation
● Semantic affordances
● Client has to understand what is consumed
● "Hypermedia APIs are not fully baked yet"

Brainiak hypermedia graph
context instance
/ schema
inCollection
item
instances
instances
describedBy
self
replace
delete
self
instances
self
self
self
create
collection

Services
● List Contexts
● List Collections
● Get a Schema
● List Prefixes
● Status of Services
● Create
● Retrieve
● Delete
● Edit
● List
Instances

Features
● JSON-Schema
● JSON-LD
● REST
● Python + Tornado
OPTIONS GET PUT POST DELETE

SPARQL query
SELECT DISTINCT ?class
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION
(TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) .
?class a owl:Class .
}
task: retrieve all superclasses of a class

SPARQL query
SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property
WHERE {
{
GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } .
} UNION {
graph ?predicate_graph {?predicate rdfs:domain ?blank} .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?domain_class } .
}
FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.
com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>))
{?predicate rdfs:range ?range .}
UNION {
?predicate rdfs:range ?blank .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?list_node rdf:first ?range } .
}
FILTER (!isBlank(?range))
?predicate rdfs:label ?title .
?predicate rdf:type ?type .
OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } .
FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) .
FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) .
OPTIONAL { ?predicate rdfs:comment ?predicate_comment }
FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) .
OPTIONAL {
GRAPH ?range_graph {
?range rdfs:label ?range_label .
FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) .
}
}
}
task: retrieve all properties of a group of classes

SPARQL query
SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n,
t_min (0)) .
?s owl:onProperty ?predicate .
OPTIONAL { ?s owl:minQualifiedCardinality ?min } .
OPTIONAL { ?s owl:maxQualifiedCardinality ?max } .
OPTIONAL {
{ ?s owl:onClass ?range }
UNION { ?s owl:onDataRange ?range }
UNION { ?s owl:allValuesFrom ?range }
OPTIONAL { ?range owl:oneOf ?enumeration } .
OPTIONAL { ?list_node rdf:first ?enumerated_value } .
OPTIONAL {
?enumerated_value rdfs:label ?enumerated_value_label .
} .
}
}
}
task: retrieve the cardinalities of all properties of a certain class

/place/City/_schema
Brainiak query
GET

● SEO (automatic schema.org)
● Improved annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps

Stay tuned
@brainiak_api
... will be soon released
as an open source project !

Semantic Team
semantica@corp.globo.com
globo.com
Thank you
for the attention!

Semantic day 2013 linked data at globo.com

Recommended

Recommended

More Related Content

Similar to Semantic day 2013 linked data at globo.com

Similar to Semantic day 2013 linked data at globo.com (20)

Recently uploaded

Recently uploaded (20)

Semantic day 2013 linked data at globo.com