SlideShare a Scribd company logo
1 of 84
Download to read offline
Linking the world with Python and Semantics
@tati_alchueyr (Globo.com)
25th July 2012, FISL 13
how do you store your data?
how do you store your data?

[   ] data... what data?!
[   ] raw files (csv, json, xml)
[   ] database (eg. Relational Data Base)
[   ] graphs (eg. Resource Description Framework)
[   ] other...
how do you search for...?

Apartments near English-Portuguese bilingual
childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo
and New York.

Researchers working on artificial intelligence in
Southeast of Brazil.

GNU GPL software for image processing
developed from 2009 to 2010 authored also by
Brazilian developers
how do you search for...?

Apartments near English-Portuguese bilingual
childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo
and New York.

Researchers working on artificial intelligence in
Southeast of Brazil.

GNU GPL software for image processing
developed from 2009 to 2010 authored also by
Brazilian developers
how do you search for...?

Apartments near English-Portuguese bilingual
childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo
and New York.

Researchers working on artificial intelligence in
Southeast of Brazil.

GNU GPL software for image processing
developed from 2009 to 2010 authored also by
Brazilian developers
how do you search for...?

Apartments near English-Portuguese bilingual
childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo
and New York.

Researchers working on artificial intelligence in
Southeast of Brazil.

GNU GPL software for image processing
developed from 2009 to 2010 authored also by
Brazilian developers
what ^ have in common?
linked open data in 2007
linked open data in 2008
linked open data in 2009
linked open data in 2011
traditional RDMS
linked data graph
linked data modelling
modelling
modelling
quering RDB

select bookID, authorName
from books, authors
where books.aid = authors.aid
      and books.isbn = ‘006251587X’.
quering RDF

select ?authName ?authEmail
where {
   <amazon:book#006251587X> <amazon:hasAuthor>
<foaf:name#TimBerners-Lee>
   <foaf:name#TimBerners-Lee> <foaf:name> ?
authName
   <foaf:name#TimBerners-Lee> <foaf:email>?
authEmail
}
globo.com developers before using
         web semantics
globo.com developers while learning
          web semantics




              (?w ?t ?f)
globo.com developers after using
         web semantics
Sample hard to test code
approach 1
# queries isolation
approach 2
# data as object
      DAO
Y U NO make
SPARQL queries?!
Y U NO make
data access easy?!
Y U NO make
things testable?!
product developers evaluating
        web semantics
fact 1: we don't have an
  out-of-box solution
fact 2: but we do have
    some options
some options
#1: create a solution
from scratch
#2: study existing
solutions and then
[ ] contribute to them
[ ] develop on top of
them
[ ] goto #1
the final decision is not only ours
but we chose starting from #2




#2: study existing solutions and then (...)
ok, lmgfy
a few results from google

ActiveRDF             PyRdfa
active-semantic       pysparql
Django4Store          RDFAlchemy
Django-RDF
                      RdfLib
Django-RDFAlchemy
                      Redland
Djubby
                      semantic-django
EasyRDF
                      SPARQLWrapper
Jena
FuXi                  Sparrow
Oort                  Sparta
Pymantic              SuRF
click to know more

ActiveRDF                PyRdfa
active-semantic          pysparql
Django4Store             RDFAlchemy
Django-RDF
                         RdfLib
Django-RDFAlchemy
                         Redland
Djubby
                         semantic-django
EasyRDF
                         SPARQLWrapper
Jena
FuXi                     Sparrow
Oort                     Sparta
Pymantic                 SuRF
{?project :by_author ?author .
?author :works_at :globocom . }
ActiveRDF           PyRdfa
active-semantic     pysparql
Django4Store        RDFAlchemy
Django-RDF
                    RdfLib
Django-RDFAlchemy
                    Redland
Djubby
                    semantic-django
EasyRDF
                    SPARQLWrapper
Jena
FuXi                Sparrow
Oort                Sparta
Pymantic            SuRF
{?project :use_language :python . }

 ActiveRDF           PyRdfa
 active-semantic     pysparql
 Django4Store        RDFAlchemy
 Django-RDF
                     RdfLib
 Django-RDFAlchemy
                     Redland
 Djubby
                     semantic-django
 EasyRDF
                     SPARQLWrapper
 Jena
 FuXi                Sparrow
 Oort                Sparta
 Pymantic            SuRF
{?project :use_language :python ;
       :last_commit ?commit .
  FILTER (?commit >= "2011-12-01"^^xsd:date) }
ActiveRDF               PyRdfa
active-semantic         pysparql
Django4Store            RDFAlchemy
Django-RDF
                        RdfLib
Django-RDFAlchemy
                        Redland
Djubby
                        semantic-django
EasyRDF
                        SPARQLWrapper
Jena
FuXi                    Sparrow
Oort                    Sparta
Pymantic                SuRF
relation between these tools
team filtering

ActiveRDF                 PyRdfa
active-semantic           pysparql
Django4Store              RDFAlchemy
Django-RDF
                          RdfLib
Django-RDFAlchemy
                          Redland
Djubby
                          semantic-django
EasyRDF
                          SPARQLWrapper
Jena
FuXi                      Sparrow
Oort                      Sparta
Pymantic                  SuRF
SPARQLWrapper
problem: list all predicates of a class
 # List all predicates of dbonto:Band
 query = """
 SELECT distinct ?subject
 FROM <http://dbpedia.org>
 {
    ?subject rdfs:domain ?object .
    <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object
    OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min
 (0) ).
 }"""                                       http://live.dbpedia.org/sparql

 sparql = SPARQLWrapper("http://dbpedia.org/sparql")
 sparql.setQuery(query)
 sparql.setReturnFormat(JSON)
 results = sparql.query().convert()
 for result in results["results"]["bindings"]:
     print(result["subject"]["value"])
SPARQLWrapper
abstract endpoint   returns dict



  # List all predicates of dbonto:Band
  query = """
  SELECT distinct ?subject
  FROM <http://dbpedia.org>
  {
     ?subject rdfs:domain ?object .
     <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object
     OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min
  (0) ).
  }"""                                       http://live.dbpedia.org/sparql

 sparql = SPARQLWrapper("http://dbpedia.org/sparql")
 sparql.setQuery(query)
 sparql.setReturnFormat(JSON)
 results = sparql.query().convert()
 for result in results["results"]["bindings"]:
     print(result["subject"]["value"])
SPARQLWrapper




Ok, not different from what we have...
SPARQLWrapper




just a wrapper around a SPARQL server
well tested ;)
SPARQLWrapper
problem: list all subjects given ?p ?o
 from SPARQLWrapper import SPARQLWrapper, JSON

 # List all instances (eg. bands) with genre Metal
 query = """
 PREFIX db: <http://dbpedia.org/resource/>
 PREFIX dbonto: <http://dbpedia.org/ontology/>

 SELECT DISTINCT ?who
 FROM <http://dbpedia.org>
 WHERE {
     ?who dbonto:genre db:Metal .
 }
 """

 sparql = SPARQLWrapper("http://dbpedia.org/sparql")
 sparql.setQuery(query)
 sparql.setReturnFormat(JSON)
 results = sparql.query().convert()

 for result in results["results"]["bindings"]:
     print(result["who"]["value"])
RdfLib
problem: list all subjects given ?p ?o
 import rdflib
 import rdfextras.store.SPARQL
 # SPARQL endpoint setup
 endpoint = "http://dbpedia.org/sparql"
 store = rdfextras.store.SPARQL.SPARQLStore(endpoint)
 graph = rdflib.Graph(store)
 # Definitions
 genre = rdflib.URIRef("http://dbpedia.org/ontology/genre")
 metal = rdflib.URIRef("http://dbpedia.org/resource/Metal")
 # Query
 for label in graph.subjects(genre, metal):
     print label
RdfLib
abstract endpoint   returns dict    namespace



  import rdflib
  import rdfextras.store.SPARQL
  # SPARQL endpoint setup
  endpoint = "http://dbpedia.org/sparql"
  store = rdfextras.store.SPARQL.SPARQLStore(endpoint)
  graph = rdflib.Graph(store)
  # Namespaces to clear up definitions
  DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")
  DB = rdflib.Namespace("http://dbpedia.org/resource/")
  # Query
  for label in graph.subjects(DBONTO.genre, DB.Metal):
      print label
RdfLib
abstract endpoint   returns dict    namespace



  import rdflib
  import rdfextras.store.SPARQL
  # SPARQL endpoint setup
  endpoint = "http://dbpedia.org/sparql"
  store = rdfextras.store.SPARQL.SPARQLStore(endpoint)
  graph = rdflib.Graph(store)
  # Namespaces to clear up definitions
  DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")
  DB = rdflib.Namespace("http://dbpedia.org/resource/")
  # Query
  for label in graph.subjects(DBONTO.genre, DB.Metal):
      print label

                         subjects
                         predicates
                         objects
                         subject_predicates
                         subject_objects
                         predicates_objects
RdfLib
abstract endpoint   returns dict    namespace



  import rdflib
  import rdfextras.store.SPARQL
  # SPARQL endpoint setup
  endpoint = "http://dbpedia.org/sparql"
  store = rdfextras.store.SPARQL.SPARQLStore(endpoint)
  graph = rdflib.Graph(store)
  # Namespaces to clear up definitions
  DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")
  DB = rdflib.Namespace("http://dbpedia.org/resource/")
  # Using triples
  for musician, _, _ in graph.triples((None, DBONTO.genre, DB.Metal)):
        print musician
RdfLib
abstract endpoint   returns dict    namespace   query by triples



  import rdflib
  import rdfextras.store.SPARQL
  # SPARQL endpoint setup
  endpoint = "http://dbpedia.org/sparql"
  store = rdfextras.store.SPARQL.SPARQLStore(endpoint)
  graph = rdflib.Graph(store)
  # Namespaces to clear up definitions
  DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")
  DB = rdflib.Namespace("http://dbpedia.org/resource/")
  # Query
  for label in graph.subjects(DBONTO.genre, DB.Metal):
      print label
RdfLib
abstract endpoint   returns dict    namespace   query by triples   add / remove



  import rdflib
  import rdfextras.store.SPARQL
  # n3 fixture file
  graph = rdflib.Graph()
  graph.parse("fixture_genre_metal.nt", format="nt")
  # Namespace
  DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")
  DB = rdflib.Namespace("http://dbpedia.org/resource/")

  # Add nodes
  graph.add((DB.AndrewsMedina, DBONTO.genre, DB.Metal))
  graph.add((DB.Siminino, DBONTO.genre, DB.Metal))
  graph.add((DB.Herman, DBONTO.genre, DB.Metal))

  # Remove nodes
  graph.remove((DB.AndrewsMedina, DBONTO.genre, DB.Metal))
RdfLib



  concentrates on
  providing the core
  RDF types and
  interfaces, through
  plugin interface
RdfLib



  makes testing
  simple, allowing
  fixtures using n3
  files, add triples
  and remove triples
RdfLib

but...

each triple query
requires a new
connection to
SPARQL
RdfLib

therefore

too many access to
SPARQL endpoint
RdfLib

and...

doesn't provide an
ORM (object
relational mapping)
SuRF
abstract endpoint   returns dict   namespace   query by triples   add / remove



  from surf import Store, Session, ns, query

  store = Store(reader='sparql_protocol',
                     endpoint='http://dbpedia.org/sparql')
  session = Session(store, {})
  session.enable_logging = False
  ns.register(db='http://dbpedia.org/resource/')
  ns.register(dbonto='http://dbpedia.org/ontology/')

  MusicalArtist = session.get_class(ns.DB['MusicalArtist'])
  artistas_metal = MusicalArtist.get_by(dbonto_genre=ns.DB["Metal"])
  print artistas_metal




     ORM
SuRF
problem: list all subjects given ?p ?o
  from surf import Store, Session, ns, query

  store = Store(reader='sparql_protocol',
                endpoint='http://dbpedia.org/sparql')
  session = Session(store, {})
  ns.register(db='http://dbpedia.org/resource/')
  ns.register(dbonto='http://dbpedia.org/ontology/')

  query_surf = query.select("?who").distinct()
  query_surf.where(("?who", ns.DBONTO.genre, ns.DB.Metal))
  metal_bands = session.default_store.execute(query_surf)

  for band in metal_bands:
      print band




                 composed
    ORM
                  queries
SuRF




 various approaches
        ORM
    programaticaly
SuRF




     simple ORM
 no need to redeclare
    TTL definitions
SuRF




  “complex” queries
         using
   lazy evalutation
SuRF




   documentation
        &
     community
SuRF



but...

no django-style
models
SuRF




verbose syntax
RDFAlchemy
problem: list all subjects given ?p ?o
  from rdfalchemy.sparql import SPARQLGraph
  from rdflib import Namespace
  endpoint = "http://dbpedia.org/sparql"
  graph = SPARQLGraph(endpoint)

  DB = Namespace("http://dbpedia.org/resource/")
  DBONTO = Namespace("http://dbpedia.org/ontology/")
  metal_bands = graph.subjects(predicate=DBONTO.genre,
                                 object=DB.Metal)
  for band in metal_bands:
      print band
RDFAlchemy
abstract endpoint   returns dict   namespace   query by triples   add / remove


    from rdfalchemy.sparql import SPARQLGraph
    from rdfalchemy import rdfSubject, rdfSingle
    from rdflib import Namespace
    DB = Namespace('http://dbpedia.org/resource/')
    DBONTO = Namespace("http://dbpedia.org/ontology/")
    RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#')
    endpoint = "http://live.dbpedia.org/sparql"
    graph = SPARQLGraph(endpoint)
    rdfSubject.db = graph
    class MusicalArtist(rdfSubject):
        rdfs_label = rdfSingle(RDFS.label, 'label')
        genre = rdfSingle(DBONTO.genre, 'genre')

    metal_artists = MusicalArtist.filter_by(genre=DB.Metal)

    for band in metal_artists:
        print band


     ORM            django-like
RDFAlchemy




       django-like
         models
RDFAlchemy




      simple syntax
RDFAlchemy



but...

non-lazy
RDFAlchemy



 we have to declare
  all data already
described in TTL files
 as python classes
semantic-django
abstract endpoint   returns dict   namespace   query by triples   add / remove



  # Classes similar to django model's are created from TTL
  # files (using manage.py)
  class BaseLugar(BaseEntidade):
      latitude = models.UriField()
      longitude = models.UriField()
      geonameid = models.UriField()
      tem_mapa = models.UriField()
      apelido = models.UriField()
      ImagemMapa = models.UriField()
      genero_gramatical = models.UriField()
      class Meta:
          semantic_graph = 'http://semantica.globo.com/base/Lugar'




     ORM            django-like
semantic-django




https://github.com/rfloriano/semantic-django
semantic-django




            dream of
              many
       product developers
semantic-django



but...

just started to be
developed
study existing solutions, and now?




[   ] contribute to them
[   ] develop on top of them
[   ] create a solution from scratch
[   ] other, _________________
grab your post-it, it's review time!

                           =)                    =(   comments

                  shows                 no              my
SuRF              query                models         favorite



                                         not            my
                                nice
                  models                lazy          choice
RDFAlchemy                      API




                  name                   low
RDFlib            space                 layer




                  django                 just
semantic-django     like               started



(...)
any questions...?




 @tati_alchueyr
casting by
(click to know more about each meme)

More Related Content

What's hot

What's hot (20)

Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Sqoop
SqoopSqoop
Sqoop
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Similar to Linking the world with Python and Semantics

2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
Josef Petrák
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
andyseaborne
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
DataWorks Summit
 
Querying the Semantic Web with SPARQL
Querying the Semantic Web with SPARQLQuerying the Semantic Web with SPARQL
Querying the Semantic Web with SPARQL
Emanuele Della Valle
 

Similar to Linking the world with Python and Semantics (20)

2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
 
Sparql
SparqlSparql
Sparql
 
Ruby semweb 2011-12-06
Ruby semweb 2011-12-06Ruby semweb 2011-12-06
Ruby semweb 2011-12-06
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
Comparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPComparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHP
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rb
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
 
Querying the Semantic Web with SPARQL
Querying the Semantic Web with SPARQLQuerying the Semantic Web with SPARQL
Querying the Semantic Web with SPARQL
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Sesam4 project presentation sparql - april 2011
Sesam4   project presentation sparql - april 2011Sesam4   project presentation sparql - april 2011
Sesam4 project presentation sparql - april 2011
 
Sesam4 project presentation sparql - april 2011
Sesam4   project presentation sparql - april 2011Sesam4   project presentation sparql - april 2011
Sesam4 project presentation sparql - april 2011
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQL
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Learning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQLLearning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQL
 

More from Tatiana Al-Chueyr

More from Tatiana Al-Chueyr (20)

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache Airflow
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Scaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamScaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache Beam
 
Responsible machine learning at the BBC
Responsible machine learning at the BBCResponsible machine learning at the BBC
Responsible machine learning at the BBC
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
Responsible Machine Learning at the BBC
Responsible Machine Learning at the BBCResponsible Machine Learning at the BBC
Responsible Machine Learning at the BBC
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
 
Sprint cPython at Globo.com
Sprint cPython at Globo.comSprint cPython at Globo.com
Sprint cPython at Globo.com
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English Right
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging software
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correction
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolution
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Linking the world with Python and Semantics

  • 1. Linking the world with Python and Semantics @tati_alchueyr (Globo.com) 25th July 2012, FISL 13
  • 2. how do you store your data?
  • 3. how do you store your data? [ ] data... what data?! [ ] raw files (csv, json, xml) [ ] database (eg. Relational Data Base) [ ] graphs (eg. Resource Description Framework) [ ] other...
  • 4. how do you search for...? Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state. ERP service providers with offices in São Paulo and New York. Researchers working on artificial intelligence in Southeast of Brazil. GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
  • 5. how do you search for...? Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state. ERP service providers with offices in São Paulo and New York. Researchers working on artificial intelligence in Southeast of Brazil. GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
  • 6. how do you search for...? Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state. ERP service providers with offices in São Paulo and New York. Researchers working on artificial intelligence in Southeast of Brazil. GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
  • 7. how do you search for...? Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state. ERP service providers with offices in São Paulo and New York. Researchers working on artificial intelligence in Southeast of Brazil. GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
  • 8. what ^ have in common?
  • 10. linked open data in 2008
  • 11. linked open data in 2009
  • 12. linked open data in 2011
  • 18. quering RDB select bookID, authorName from books, authors where books.aid = authors.aid and books.isbn = ‘006251587X’.
  • 19. quering RDF select ?authName ?authEmail where { <amazon:book#006251587X> <amazon:hasAuthor> <foaf:name#TimBerners-Lee> <foaf:name#TimBerners-Lee> <foaf:name> ? authName <foaf:name#TimBerners-Lee> <foaf:email>? authEmail }
  • 20. globo.com developers before using web semantics
  • 21. globo.com developers while learning web semantics (?w ?t ?f)
  • 22. globo.com developers after using web semantics
  • 23. Sample hard to test code
  • 24. approach 1 # queries isolation
  • 25.
  • 26. approach 2 # data as object DAO
  • 27.
  • 28. Y U NO make SPARQL queries?!
  • 29. Y U NO make data access easy?!
  • 30. Y U NO make things testable?!
  • 32.
  • 33. fact 1: we don't have an out-of-box solution
  • 34. fact 2: but we do have some options
  • 35. some options #1: create a solution from scratch #2: study existing solutions and then [ ] contribute to them [ ] develop on top of them [ ] goto #1
  • 36. the final decision is not only ours
  • 37. but we chose starting from #2 #2: study existing solutions and then (...)
  • 39. a few results from google ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 40. click to know more ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 41. {?project :by_author ?author . ?author :works_at :globocom . } ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 42. {?project :use_language :python . } ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 43. {?project :use_language :python ; :last_commit ?commit . FILTER (?commit >= "2011-12-01"^^xsd:date) } ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 45. team filtering ActiveRDF PyRdfa active-semantic pysparql Django4Store RDFAlchemy Django-RDF RdfLib Django-RDFAlchemy Redland Djubby semantic-django EasyRDF SPARQLWrapper Jena FuXi Sparrow Oort Sparta Pymantic SuRF
  • 46. SPARQLWrapper problem: list all predicates of a class # List all predicates of dbonto:Band query = """ SELECT distinct ?subject FROM <http://dbpedia.org> { ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ). }""" http://live.dbpedia.org/sparql sparql = SPARQLWrapper("http://dbpedia.org/sparql") sparql.setQuery(query) sparql.setReturnFormat(JSON) results = sparql.query().convert() for result in results["results"]["bindings"]: print(result["subject"]["value"])
  • 47. SPARQLWrapper abstract endpoint returns dict # List all predicates of dbonto:Band query = """ SELECT distinct ?subject FROM <http://dbpedia.org> { ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ). }""" http://live.dbpedia.org/sparql sparql = SPARQLWrapper("http://dbpedia.org/sparql") sparql.setQuery(query) sparql.setReturnFormat(JSON) results = sparql.query().convert() for result in results["results"]["bindings"]: print(result["subject"]["value"])
  • 48. SPARQLWrapper Ok, not different from what we have...
  • 49. SPARQLWrapper just a wrapper around a SPARQL server well tested ;)
  • 50. SPARQLWrapper problem: list all subjects given ?p ?o from SPARQLWrapper import SPARQLWrapper, JSON # List all instances (eg. bands) with genre Metal query = """ PREFIX db: <http://dbpedia.org/resource/> PREFIX dbonto: <http://dbpedia.org/ontology/> SELECT DISTINCT ?who FROM <http://dbpedia.org> WHERE { ?who dbonto:genre db:Metal . } """ sparql = SPARQLWrapper("http://dbpedia.org/sparql") sparql.setQuery(query) sparql.setReturnFormat(JSON) results = sparql.query().convert() for result in results["results"]["bindings"]: print(result["who"]["value"])
  • 51. RdfLib problem: list all subjects given ?p ?o import rdflib import rdfextras.store.SPARQL # SPARQL endpoint setup endpoint = "http://dbpedia.org/sparql" store = rdfextras.store.SPARQL.SPARQLStore(endpoint) graph = rdflib.Graph(store) # Definitions genre = rdflib.URIRef("http://dbpedia.org/ontology/genre") metal = rdflib.URIRef("http://dbpedia.org/resource/Metal") # Query for label in graph.subjects(genre, metal): print label
  • 52. RdfLib abstract endpoint returns dict namespace import rdflib import rdfextras.store.SPARQL # SPARQL endpoint setup endpoint = "http://dbpedia.org/sparql" store = rdfextras.store.SPARQL.SPARQLStore(endpoint) graph = rdflib.Graph(store) # Namespaces to clear up definitions DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/") DB = rdflib.Namespace("http://dbpedia.org/resource/") # Query for label in graph.subjects(DBONTO.genre, DB.Metal): print label
  • 53. RdfLib abstract endpoint returns dict namespace import rdflib import rdfextras.store.SPARQL # SPARQL endpoint setup endpoint = "http://dbpedia.org/sparql" store = rdfextras.store.SPARQL.SPARQLStore(endpoint) graph = rdflib.Graph(store) # Namespaces to clear up definitions DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/") DB = rdflib.Namespace("http://dbpedia.org/resource/") # Query for label in graph.subjects(DBONTO.genre, DB.Metal): print label subjects predicates objects subject_predicates subject_objects predicates_objects
  • 54. RdfLib abstract endpoint returns dict namespace import rdflib import rdfextras.store.SPARQL # SPARQL endpoint setup endpoint = "http://dbpedia.org/sparql" store = rdfextras.store.SPARQL.SPARQLStore(endpoint) graph = rdflib.Graph(store) # Namespaces to clear up definitions DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/") DB = rdflib.Namespace("http://dbpedia.org/resource/") # Using triples for musician, _, _ in graph.triples((None, DBONTO.genre, DB.Metal)): print musician
  • 55. RdfLib abstract endpoint returns dict namespace query by triples import rdflib import rdfextras.store.SPARQL # SPARQL endpoint setup endpoint = "http://dbpedia.org/sparql" store = rdfextras.store.SPARQL.SPARQLStore(endpoint) graph = rdflib.Graph(store) # Namespaces to clear up definitions DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/") DB = rdflib.Namespace("http://dbpedia.org/resource/") # Query for label in graph.subjects(DBONTO.genre, DB.Metal): print label
  • 56. RdfLib abstract endpoint returns dict namespace query by triples add / remove import rdflib import rdfextras.store.SPARQL # n3 fixture file graph = rdflib.Graph() graph.parse("fixture_genre_metal.nt", format="nt") # Namespace DBONTO = rdflib.Namespace("http://dbpedia.org/ontology/") DB = rdflib.Namespace("http://dbpedia.org/resource/") # Add nodes graph.add((DB.AndrewsMedina, DBONTO.genre, DB.Metal)) graph.add((DB.Siminino, DBONTO.genre, DB.Metal)) graph.add((DB.Herman, DBONTO.genre, DB.Metal)) # Remove nodes graph.remove((DB.AndrewsMedina, DBONTO.genre, DB.Metal))
  • 57. RdfLib concentrates on providing the core RDF types and interfaces, through plugin interface
  • 58. RdfLib makes testing simple, allowing fixtures using n3 files, add triples and remove triples
  • 59. RdfLib but... each triple query requires a new connection to SPARQL
  • 60. RdfLib therefore too many access to SPARQL endpoint
  • 61. RdfLib and... doesn't provide an ORM (object relational mapping)
  • 62. SuRF abstract endpoint returns dict namespace query by triples add / remove from surf import Store, Session, ns, query store = Store(reader='sparql_protocol', endpoint='http://dbpedia.org/sparql') session = Session(store, {}) session.enable_logging = False ns.register(db='http://dbpedia.org/resource/') ns.register(dbonto='http://dbpedia.org/ontology/') MusicalArtist = session.get_class(ns.DB['MusicalArtist']) artistas_metal = MusicalArtist.get_by(dbonto_genre=ns.DB["Metal"]) print artistas_metal ORM
  • 63. SuRF problem: list all subjects given ?p ?o from surf import Store, Session, ns, query store = Store(reader='sparql_protocol', endpoint='http://dbpedia.org/sparql') session = Session(store, {}) ns.register(db='http://dbpedia.org/resource/') ns.register(dbonto='http://dbpedia.org/ontology/') query_surf = query.select("?who").distinct() query_surf.where(("?who", ns.DBONTO.genre, ns.DB.Metal)) metal_bands = session.default_store.execute(query_surf) for band in metal_bands: print band composed ORM queries
  • 64. SuRF various approaches ORM programaticaly
  • 65. SuRF simple ORM no need to redeclare TTL definitions
  • 66. SuRF “complex” queries using lazy evalutation
  • 67. SuRF documentation & community
  • 70. RDFAlchemy problem: list all subjects given ?p ?o from rdfalchemy.sparql import SPARQLGraph from rdflib import Namespace endpoint = "http://dbpedia.org/sparql" graph = SPARQLGraph(endpoint) DB = Namespace("http://dbpedia.org/resource/") DBONTO = Namespace("http://dbpedia.org/ontology/") metal_bands = graph.subjects(predicate=DBONTO.genre, object=DB.Metal) for band in metal_bands: print band
  • 71. RDFAlchemy abstract endpoint returns dict namespace query by triples add / remove from rdfalchemy.sparql import SPARQLGraph from rdfalchemy import rdfSubject, rdfSingle from rdflib import Namespace DB = Namespace('http://dbpedia.org/resource/') DBONTO = Namespace("http://dbpedia.org/ontology/") RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#') endpoint = "http://live.dbpedia.org/sparql" graph = SPARQLGraph(endpoint) rdfSubject.db = graph class MusicalArtist(rdfSubject): rdfs_label = rdfSingle(RDFS.label, 'label') genre = rdfSingle(DBONTO.genre, 'genre') metal_artists = MusicalArtist.filter_by(genre=DB.Metal) for band in metal_artists: print band ORM django-like
  • 72. RDFAlchemy django-like models
  • 73. RDFAlchemy simple syntax
  • 75. RDFAlchemy we have to declare all data already described in TTL files as python classes
  • 76. semantic-django abstract endpoint returns dict namespace query by triples add / remove # Classes similar to django model's are created from TTL # files (using manage.py) class BaseLugar(BaseEntidade): latitude = models.UriField() longitude = models.UriField() geonameid = models.UriField() tem_mapa = models.UriField() apelido = models.UriField() ImagemMapa = models.UriField() genero_gramatical = models.UriField() class Meta: semantic_graph = 'http://semantica.globo.com/base/Lugar' ORM django-like
  • 78. semantic-django dream of many product developers
  • 80. study existing solutions, and now? [ ] contribute to them [ ] develop on top of them [ ] create a solution from scratch [ ] other, _________________
  • 81. grab your post-it, it's review time! =) =( comments shows no my SuRF query models favorite not my nice models lazy choice RDFAlchemy API name low RDFlib space layer django just semantic-django like started (...)
  • 82.
  • 84. casting by (click to know more about each meme)