SuRF – Tapping into the Web of DataCosmin BascaDigital Enterprise Research Institute, Galwaycosmin.basca@gmail.comSpecial Thanks to: Benjamin Heitman andUldis BojarsDigital Enterprise Research Institute, Galwayfirstname.lastname@deri.org
OutlineAbout DERIWhy Semantic Web?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF2
DERI – http://www.deri.ie/Digital Enterprise Research Institute (DERI): http://www.deri.ie/main goal: enabling networked knowledgeresearch about the future of the Webbiggest Semantic Web research institute in the world120 peoplepart of the National University of Ireland, Galway3
OutlineAbout DERIWhy Semantic Web?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF4
Why ?Develop Web applications that allow Data IntegrationFlexibilitySchema definition and modelingSchema evolutionRobustnessSupport for new Data SourcesTypes5
There is a Wealth of (RDF) data out there6
Popular Semantic Web VocabulariesFOAF = for describing people and social network connections between them   http://xmlns.com/foaf/spec/SIOC = for describing Social Web content created by people   http://sioc-project.org/DOAP = for describing software projects   http://trac.usefulinc.com/doapused by PyPi7
Linked Open Data - Growth8
Linked Open Data - Growth9
Linked Open Data - Growth10
The data modelTraditional Approach use the Relational modelUsually leads to big ugly Schemas11
The RDF (Graph) Data modelFlexibleSupport for both schema and data evolution during runtimeSimple modelRelations are represented explicitlySchema is a graphCan integrate data – union of two graphs12
The RDF (Graph) Data model13SubjectObjectPredicateA tripleis a EricPerson
Example RDF graph describing Eric Miller (RDF Primer) – human readable format14Personis a Erichas full nameEric Millerhas e-mailem@w3.orghas personal titleDr.
Example RDF graph describing Eric Miller (RDF Primer) – machine readable format15http://w3.org/2000/10/swap/pic/contact#Personhttp://www.w3.org/1999/02/22-rdf-syntax-ns#typehttp://w3.org/People/EM/contact#mehttp://www.w3.org/2000/10/swap/pim/contact#fullNameEric Millerhttp://www.w3.org/2000/10/swap/pim/contact#mailboxmailto:em@w3.orghttp://www.w3.org/2000/10/swap/pim/contact#personalTitleDr.
The RDF (Graph) Data model – Identification URI’s provide strong referencesThe URIref is a an unambiguous pointer to something of meaningNodes (“Subjects”)	connect via Links (“Predicates”)	to ObjectsCan be Nodes or Literals (plain or typed strings)16
SPARQL – Querying the Semantic WebSPARQL is to RDF what SQL is to Relational tablesExpressive, designed with the Graph data model in mind17CarrieFisherstarred_inStarWarsstarred_inHarrisonFordstarred_inBlade Runnerstarred_inDarrylHannahSELECT ?actor ?movie WHERE {?actor starred_in ?movie}
Levels of Data abstraction18Direct SPARQLAccessO-RDF Mapper SuRF
O-RDF Mapper, Why?Clean OO designIncreased productivitymodel is free from persistence constraintsSeparation of concerns and specializationORMs often reduce the amount of code needed to be written, making the software more robust20% to 30% less code needs to be writtenLess code – less testing – less errors19
O-RDF Mapper, How?How do we see RDF data?As a SET of triples?As a SET of resources?The resource view is more suitable for the OO modelHow do we define an RDF resource ?All triples <S,P,O>with same subject (ActiveRDF, SuRF)And all triples <O,P,S> (SuRF)Apply Open World principles20
OutlineAbout DERIWhy Semantic Web?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF21
SuRF – Semantic Resource FrameworkInspired by ActiveRDFDeveloped in DERI for rubyExpose RDF as sets of resourcesSemantic attributes exposed as a “virtual API”, generated through introspection. Naming convention:instance.namespace_attributecosmin.foaf_knowsFinder methodsRetrieve resources by type or by attributesSession keeps track of resources, when calling session.commit() only dirty resources will be persisted22
SuRF – Architecture23
SuRF – Architecture – Currently supported plugins24Add your own plugins, extend:surf.store.plugins.RDFReadersurf.store.plugins.RDFWriterRedefine the __type__ attributeThis is the plugin identifierTo install pluginsimport my_plugin
SuRF - installationAvailable on PyPieasy_install –U surf (to get the latest)Open-source available on Google Code, BSD licencehttp://code.google.com/p/surfrdf/25
OutlineAbout DERIWhy Semantic Web?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF26
DBpedia public SPARQL endpoint - read-onlyCreate the store proxyfrom surf import*	store = Store(reader='sparql-protocol',endpoint='http://dbpedia.org/sparql',                default_graph='http://dbpedia.org')Create the surf sessionprint'Create the session'session =Session(store,{})Map a dbpedia concept to an internal classPhilCollinsAlbums=session.get_class(ns.YAGO['PhilCollinsAlbums'])SuRF – simple example27
SuRF – simple exampleDBpedia public SPARQL endpoint - read-onlyGet all Phill Collins albumsall_albums=PhilCollinsAlbums.all()Do something with the albums (display the links to their covers)print'All covers'for a inall_albums:    ifa.dbpedia_name:        print'\tCover %s for "%s"'%(a.dbpedia_cover,a.dbpedia_name)28
OutlineAbout DERIWhy Semantic Web?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF29
SuRF – integrate into PylonsCreate a blog on top of an RDF databaseReplace SQLAlchemy with SuRFDownload and install either AllegroGraph Free Edition (preferred) or Sesame2http://www.franz.com/downloads/clp/ag_surveyFree for up to 50.000.000 triples (records)Install pylons: easy_install pylonsInstall SuRF: easy_install surfCreate a pylons application:paster create -t pylons MyBlogcd MyBlog30
SuRF – Pylons Blog ~/MyBlog/development.ini: In the [app:main] section addrdf_store = localhostrdf_store_port = 6789rdf_repository = tagbuilderrdf_catalog = repositories~/MyBlog/myblog/config/environment.pyfrom surf import *rdf_store = Store(  reader      =   'sparql-sesame2-api',	                        writer      	=   'sesame2-api',	                        server       =   config['rdf_store'],	                        port           =   config['rdf_store_port'],	                        catalog      =   config['rdf_catalog'],	                        repository  =   config['rdf_repository'])rdf_session = Session(rdf_store, {})31
SuRF – Pylons Blog ~/MyBlog/myblog/model/__ init __.pyfrom surf import *	definit_model(session):			global rdf_sessionrdf_session = session			# register a namespace for the concepts in my blogns.register(myblog=‘http://example.url/myblog/namespace#’)Blog = rdf_session.get_class(ns.MYBLOG[‘Blog’])Create the blog controllerpaster controller blog~/MyBlog/myblog/controllers/blog.pyimport logging frommyblog.lib.baseimport * log = logging.getLogger(__name__) classBlogController(BaseController): def index(self): c.posts = model.Blog.all(0,5)return render("/blog/index.html")32
SuRF – Pylons Blog Create the template mkdir ~/MyBlog/myblog/templates/blog~/MyBlog/myblog/templates/blog/index.html <%inherit file="site.html" /> <%def name="title()">MyBlog Home</%def> <p>${len(c.posts)} new blog posts!</p> % for post inc.posts: <p class="content" style="border-style:solid;border-width:1px"> <span class="h3"> ${post.myblog_title} </span> <span class="h4">Posted on: ${post.myblog_date} by ${post.myblog_author}</span> <br> ${post.myblog_content} </p> % endfor~/MyBlog/myblog/templates/blog/site.htmlStart the development built in server:paster serve --reload development.ini33
SuRF – Tapping into the Web of DataCan tap into the web of DataSPARQL endpointsLocal or remote RDF StoresPlugin framework, allows for more access protocols to be definedCode is generated dynamically (pragmatic bottom up approach):Introspection, meta-programming, exposing a virtual API (defined by the data and the schema) to the developerCan easily be integrated into popular python frameworkspylons34
exit()cosmin.basca@deri.prghttp://code.google.com/p/surfrdf/easy_install –U surf35

SuRf – Tapping Into The Web Of Data

  • 1.
    SuRF – Tappinginto the Web of DataCosmin BascaDigital Enterprise Research Institute, Galwaycosmin.basca@gmail.comSpecial Thanks to: Benjamin Heitman andUldis BojarsDigital Enterprise Research Institute, Galwayfirstname.lastname@deri.org
  • 2.
    OutlineAbout DERIWhy SemanticWeb?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF2
  • 3.
    DERI – http://www.deri.ie/DigitalEnterprise Research Institute (DERI): http://www.deri.ie/main goal: enabling networked knowledgeresearch about the future of the Webbiggest Semantic Web research institute in the world120 peoplepart of the National University of Ireland, Galway3
  • 4.
    OutlineAbout DERIWhy SemanticWeb?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF4
  • 5.
    Why ?Develop Webapplications that allow Data IntegrationFlexibilitySchema definition and modelingSchema evolutionRobustnessSupport for new Data SourcesTypes5
  • 6.
    There is aWealth of (RDF) data out there6
  • 7.
    Popular Semantic WebVocabulariesFOAF = for describing people and social network connections between them   http://xmlns.com/foaf/spec/SIOC = for describing Social Web content created by people   http://sioc-project.org/DOAP = for describing software projects   http://trac.usefulinc.com/doapused by PyPi7
  • 8.
  • 9.
  • 10.
  • 11.
    The data modelTraditionalApproach use the Relational modelUsually leads to big ugly Schemas11
  • 12.
    The RDF (Graph)Data modelFlexibleSupport for both schema and data evolution during runtimeSimple modelRelations are represented explicitlySchema is a graphCan integrate data – union of two graphs12
  • 13.
    The RDF (Graph)Data model13SubjectObjectPredicateA tripleis a EricPerson
  • 14.
    Example RDF graphdescribing Eric Miller (RDF Primer) – human readable format14Personis a Erichas full nameEric Millerhas e-mailem@w3.orghas personal titleDr.
  • 15.
    Example RDF graphdescribing Eric Miller (RDF Primer) – machine readable format15http://w3.org/2000/10/swap/pic/contact#Personhttp://www.w3.org/1999/02/22-rdf-syntax-ns#typehttp://w3.org/People/EM/contact#mehttp://www.w3.org/2000/10/swap/pim/contact#fullNameEric Millerhttp://www.w3.org/2000/10/swap/pim/contact#mailboxmailto:em@w3.orghttp://www.w3.org/2000/10/swap/pim/contact#personalTitleDr.
  • 16.
    The RDF (Graph)Data model – Identification URI’s provide strong referencesThe URIref is a an unambiguous pointer to something of meaningNodes (“Subjects”) connect via Links (“Predicates”) to ObjectsCan be Nodes or Literals (plain or typed strings)16
  • 17.
    SPARQL – Queryingthe Semantic WebSPARQL is to RDF what SQL is to Relational tablesExpressive, designed with the Graph data model in mind17CarrieFisherstarred_inStarWarsstarred_inHarrisonFordstarred_inBlade Runnerstarred_inDarrylHannahSELECT ?actor ?movie WHERE {?actor starred_in ?movie}
  • 18.
    Levels of Dataabstraction18Direct SPARQLAccessO-RDF Mapper SuRF
  • 19.
    O-RDF Mapper, Why?CleanOO designIncreased productivitymodel is free from persistence constraintsSeparation of concerns and specializationORMs often reduce the amount of code needed to be written, making the software more robust20% to 30% less code needs to be writtenLess code – less testing – less errors19
  • 20.
    O-RDF Mapper, How?Howdo we see RDF data?As a SET of triples?As a SET of resources?The resource view is more suitable for the OO modelHow do we define an RDF resource ?All triples <S,P,O>with same subject (ActiveRDF, SuRF)And all triples <O,P,S> (SuRF)Apply Open World principles20
  • 21.
    OutlineAbout DERIWhy SemanticWeb?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF21
  • 22.
    SuRF – SemanticResource FrameworkInspired by ActiveRDFDeveloped in DERI for rubyExpose RDF as sets of resourcesSemantic attributes exposed as a “virtual API”, generated through introspection. Naming convention:instance.namespace_attributecosmin.foaf_knowsFinder methodsRetrieve resources by type or by attributesSession keeps track of resources, when calling session.commit() only dirty resources will be persisted22
  • 23.
  • 24.
    SuRF – Architecture– Currently supported plugins24Add your own plugins, extend:surf.store.plugins.RDFReadersurf.store.plugins.RDFWriterRedefine the __type__ attributeThis is the plugin identifierTo install pluginsimport my_plugin
  • 25.
    SuRF - installationAvailableon PyPieasy_install –U surf (to get the latest)Open-source available on Google Code, BSD licencehttp://code.google.com/p/surfrdf/25
  • 26.
    OutlineAbout DERIWhy SemanticWeb?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF26
  • 27.
    DBpedia public SPARQLendpoint - read-onlyCreate the store proxyfrom surf import* store = Store(reader='sparql-protocol',endpoint='http://dbpedia.org/sparql',                default_graph='http://dbpedia.org')Create the surf sessionprint'Create the session'session =Session(store,{})Map a dbpedia concept to an internal classPhilCollinsAlbums=session.get_class(ns.YAGO['PhilCollinsAlbums'])SuRF – simple example27
  • 28.
    SuRF – simpleexampleDBpedia public SPARQL endpoint - read-onlyGet all Phill Collins albumsall_albums=PhilCollinsAlbums.all()Do something with the albums (display the links to their covers)print'All covers'for a inall_albums:    ifa.dbpedia_name:        print'\tCover %s for "%s"'%(a.dbpedia_cover,a.dbpedia_name)28
  • 29.
    OutlineAbout DERIWhy SemanticWeb?Linked Open Data (LOD)RDF (Resource Description Framework)SPARQLO-RDF Mapping (ActiveRDF / SuRF)How?ArchitectureInstallationExamplesSimple: access DBpedia (Semantic Wikipedia)More complex: create a blog on top of RDF29
  • 30.
    SuRF – integrateinto PylonsCreate a blog on top of an RDF databaseReplace SQLAlchemy with SuRFDownload and install either AllegroGraph Free Edition (preferred) or Sesame2http://www.franz.com/downloads/clp/ag_surveyFree for up to 50.000.000 triples (records)Install pylons: easy_install pylonsInstall SuRF: easy_install surfCreate a pylons application:paster create -t pylons MyBlogcd MyBlog30
  • 31.
    SuRF – PylonsBlog ~/MyBlog/development.ini: In the [app:main] section addrdf_store = localhostrdf_store_port = 6789rdf_repository = tagbuilderrdf_catalog = repositories~/MyBlog/myblog/config/environment.pyfrom surf import *rdf_store = Store( reader = 'sparql-sesame2-api', writer = 'sesame2-api', server = config['rdf_store'], port = config['rdf_store_port'], catalog = config['rdf_catalog'], repository = config['rdf_repository'])rdf_session = Session(rdf_store, {})31
  • 32.
    SuRF – PylonsBlog ~/MyBlog/myblog/model/__ init __.pyfrom surf import * definit_model(session): global rdf_sessionrdf_session = session # register a namespace for the concepts in my blogns.register(myblog=‘http://example.url/myblog/namespace#’)Blog = rdf_session.get_class(ns.MYBLOG[‘Blog’])Create the blog controllerpaster controller blog~/MyBlog/myblog/controllers/blog.pyimport logging frommyblog.lib.baseimport * log = logging.getLogger(__name__) classBlogController(BaseController): def index(self): c.posts = model.Blog.all(0,5)return render("/blog/index.html")32
  • 33.
    SuRF – PylonsBlog Create the template mkdir ~/MyBlog/myblog/templates/blog~/MyBlog/myblog/templates/blog/index.html <%inherit file="site.html" /> <%def name="title()">MyBlog Home</%def> <p>${len(c.posts)} new blog posts!</p> % for post inc.posts: <p class="content" style="border-style:solid;border-width:1px"> <span class="h3"> ${post.myblog_title} </span> <span class="h4">Posted on: ${post.myblog_date} by ${post.myblog_author}</span> <br> ${post.myblog_content} </p> % endfor~/MyBlog/myblog/templates/blog/site.htmlStart the development built in server:paster serve --reload development.ini33
  • 34.
    SuRF – Tappinginto the Web of DataCan tap into the web of DataSPARQL endpointsLocal or remote RDF StoresPlugin framework, allows for more access protocols to be definedCode is generated dynamically (pragmatic bottom up approach):Introspection, meta-programming, exposing a virtual API (defined by the data and the schema) to the developerCan easily be integrated into popular python frameworkspylons34
  • 35.