The Social Data Web
Upcoming SlideShare
Loading in...5
×
 

The Social Data Web

on

  • 4,686 views

This presentation is the culmination of my detail to the E-Government Office in the US Office of Management and Budget and the work I did to evolve and mature initiatives like recovery.gov and ...

This presentation is the culmination of my detail to the E-Government Office in the US Office of Management and Budget and the work I did to evolve and mature initiatives like recovery.gov and data.gov.

Statistics

Views

Total Views
4,686
Views on SlideShare
4,669
Embed Views
17

Actions

Likes
6
Downloads
90
Comments
1

2 Embeds 17

http://www.slideshare.net 15
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Social Data Web The Social Data Web Presentation Transcript

  • transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • Web Oriented Architecture (WOA)
    • REpresentational State Transfer (REST)
      • The architectural style of the World Wide Web
      • aka Resource Oriented Architecture (ROA)
    • hyperlinks dereference (information) resource representations
      • HTTP URI's and content negotiation
        • user agent prefers .htm, .xml, .rdf, .etc
    • statefulness
      • servers maintain resource state, clients maintain application state
    • RESTful Web services
      • HTTP uniform interface
        • CRUD analog to HTTP PUT/GET/POST/DELETE
      • contrast to Remote Procedure Call (RPC) style Web services
        • SOAP/WSDL, you design the methods to invoke
    • global visibility (the Web) and persistence (permalinks)
      • caching, crawling, indexing
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • XForms - human data capture
    • Orbeon server side XForms engine, Ajax browser GUI's
      • catalog and builder apps
      • create new XSD bound forms
      • populate, persist, search
      • Tomcat and eXist
      • off-line capability
      • transformation pipeline
  • Atom Publishing Protocol (APP)
    • automated invocation of the RESTful Web service
      • HTTP PUT/POST the spreadsheet or XML instance doc
        • to atomserver.codehaus.org
    • where else is APP used?
      • Google Data API's, Microsoft Live Framework
  • Atom Syndication Format
      • transform XForm or APP captured info into XHTML+RDFa
      • (permalinked) public recordset in feed entry <content>
  • the london-gazette.co.uk
  • london-gazette.co.uk/listing small, discreet, component ontology/data-domain-metamodels
  • web page = web service
  • RDFa enabled 'deep link' discovery
      • Rich Snippets from Google
      • SearchMonkey from Yahoo
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • goal: federated dataset correlation
    • graph based dynamic schema evolution across silos
      • centralization/normalization not required (or realistic/practical!)
  • Web as DB - Web API
      • Linking Open (Government) Data (LOD)
      • SPARQL endpoints
    linkeddata.org
  • browse: from web of docs to web of data
  • http://data.linkedmdb.org/ page /actor/10
    • content negotiation, user agent prefers;
      • human (html) or machine (rdf/xml) readable
    RDF/N3
  • http://data.linkedmdb.org/ page /actor/10
    • now at the bottom of the same page/actor/10
      • triple is Subject (S) Predicate (P) Object (O)
        • 10 (S) vocabulary: property (P) <object> (O)
      • properties link to other dataset instances
        • that use different datatype definitions
      • note D2R app, expose RDB as RDF, SPARQL to SQL
  • http://data.linkedmdb.org/ data /actor/10
    • <subject> has predicate {space} object1 , objectN ; repeat until .
    • <http://data.linkedmdb.org/resource/actor/10>
    • foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> , <http://www.imdb.com/name/nm0000564/> ;
    • owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;
    • rdf:type movie:actor ,
    • foaf:Person .
    • this is an 'N3' RDF serialization, instead of RDF/XML (or others)
    • some properties have RESTful SPARQL queries as <objects>
    • foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>
  • Web based SPARQL query builder http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')
  • creates dbpedia.org query
    • use response data in next query
  • authoritative metadata - provided tags!!
      • using standardized datatype and property specifications
      • ontologies emerges from social folksonomy
    http://commontag.org
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • indexing/searching the Data Web
  • aggregation and live data reporting http://sig.ma
  • many to many set visualization http://mqlx.com/~david/parallax interface used to aggregate data across multiple (data) 'bases' on http://freebase.com
  • ad-hoc analyst/end-user 'meshups'
  • schema/bizmo/federal_enterprise
    • bizmo.freebase.com = OMG BMM + CPIC (+SOA...)
      • Obama is an instance of the Federal Enterprise type
    • Federal Enterprise (S) Fed Ent Goal (P) Goal (O)
  • /rdf/bizmo.federal_enterprise (excerpt)
    • (W3C/FBase) <subject/topic> <predicate/property> <object/topic>
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/type.object. name > &quot;Federal Enterprise&quot;@en.
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/freebase.type_profile. instance_count > &quot;1&quot; ^^<http://www.w3.org/2001/ XMLSchema#long >.
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise> <http://rdf.freebase.com/ns/type.type. instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_strategy >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_tactic >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_directive >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_objective >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_information_technology_budget >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_goal >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/ xhtml/vocab#license > < http://creativecommons.org/licenses/by/3.0/ >.
  • connecting the data dots:
    • create the following subject/predicate/object or topic/property/topic schema :
    • Goal / amplifies / Vision
    • Objective / quantifies / Goal
    • Federal Enterprise / (has) Fed Ent Goal / (of type) Goal
    • Federal Agency / maintains / Exhibit 53
    • Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)
    • Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal
    • then create instances with data from http://it.usaspending.gov:
    • Obama / is of type / Federal Enterprise
    • Obama / has a Fed Ent Goal / Health Care Reform
    • HHS / is of type / Federal Agency
    • HHS / maintains / HHS Exhibit 53
    • HHS Exhibit 53 / contains / Nat Health Info Network Connect
    • Nat Health Info Network Connect / supports Obama Goal / Health Care Reform
  • search all 'bases' for 'Exhibit 53' http://mqlx.com/~david/parallax interface to http://bizmo.freebase.com
  • base/bizmo/e53 returns
    • a collection (2 instances) of an Exhibit 53 topic
      • one from HHS and GSA (data from it.usaspending.gov)
    • triple in Exhibit 53 topic schema
      • Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)
  • discovering unknown data structures
    • the power of 'faceted' search and browsing
    • interactive query – which of these?
      • Ex53 Recordset (S) Supports Federal Goal (P) ? (O)
  • traversing the data graph
    • from info about an IT investment
    • to info about Administration priorities
    • 2 Ex53's to 3 Recordsets to 1 that has Obama Goal
      • <uri> (S) <uri> (P) <uri> (O)
  • http://freemix.it - more faceted filtering
  • scatter chart driven by tag clouds
  • more multi-dataset faceted meshups
  • drag & drop metadata/data 'curation'
  • publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • crowdsourced analytics shown using 'Top Braid Composer Maestro' from http://topquandrant.com 'SPARQLMotion' script – also see Yahoo | Deri http://pipes.yahoo.com | http://pipes.deri.org
  • cloud scale analytics (petabyte batch)
    • proprietary Google
      • GFS, BigTable and MapReduce
      • page rank impl
    • open source Apache Hadoop
      • HDFS, HBase and MapReduce
      • entity, RDFa extraction
    • Amazon EMR, Cloudera
      • COSS prof service providers
    facebook.com
  • talis.com/platform - cloud graph store
    • Software as a Service, enabling rapid development with zero deployment costs
    • a simple, consistent web API for storing, managing and retrieving both structured and unstructured data
    • flexible, schema-free metadata that allows applications to be easily evolved
    • a range of data access and query options enabling easy integration into both new and existing applications
    • access control options to support hosting of both public and private data
    • a data hosting solution that is founded on open internet standards and web architectural best practices
    • ...
    • every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request
    • SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF
    • content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • application to EA discipline
    • getting there from here
      • stop:
        • publishing / analyzing / visualizing unstructured data
        • using structure data only in file or message exchanges
      • start:
        • align Gov and Web architecture (including EA KB's!)
        • publish component ontologies on the Web
        • and begin linking their metadata and data
        • using the Social Data Web
      • continue:
        • embrace emergent structure and continuous improvement
        • using open source and enabling long-tail crowd-sourcing
  • q&a - discussion
    • thanks for your time and attention!
    • contact me
      • http://xri.net/=george.thomas
      • GSA OCIO Chief Enterprise Architect
      • FCIOC-AIC Services Subcommittee Chair
      • W3C eGov IG invited expert
      • OMG GovDTF Steering Committee
      • Graduate School Faculty SOA Instructor