The Social Data Web

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    The Social Data Web - Presentation Transcript

    1. transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009
    2. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    3. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    4. Web Oriented Architecture (WOA)
      • REpresentational State Transfer (REST)
        • The architectural style of the World Wide Web
        • aka Resource Oriented Architecture (ROA)
      • hyperlinks dereference (information) resource representations
        • HTTP URI's and content negotiation
          • user agent prefers .htm, .xml, .rdf, .etc
      • statefulness
        • servers maintain resource state, clients maintain application state
      • RESTful Web services
        • HTTP uniform interface
          • CRUD analog to HTTP PUT/GET/POST/DELETE
        • contrast to Remote Procedure Call (RPC) style Web services
          • SOAP/WSDL, you design the methods to invoke
      • global visibility (the Web) and persistence (permalinks)
        • caching, crawling, indexing
    5. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    6. XForms - human data capture
      • Orbeon server side XForms engine, Ajax browser GUI's
        • catalog and builder apps
        • create new XSD bound forms
        • populate, persist, search
        • Tomcat and eXist
        • off-line capability
        • transformation pipeline
    7. Atom Publishing Protocol (APP)
      • automated invocation of the RESTful Web service
        • HTTP PUT/POST the spreadsheet or XML instance doc
          • to atomserver.codehaus.org
      • where else is APP used?
        • Google Data API's, Microsoft Live Framework
    8. Atom Syndication Format
        • transform XForm or APP captured info into XHTML+RDFa
        • (permalinked) public recordset in feed entry <content>
    9. the london-gazette.co.uk
    10. london-gazette.co.uk/listing small, discreet, component ontology/data-domain-metamodels
    11. web page = web service
    12. RDFa enabled 'deep link' discovery
        • Rich Snippets from Google
        • SearchMonkey from Yahoo
    13. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    14. goal: federated dataset correlation
      • graph based dynamic schema evolution across silos
        • centralization/normalization not required (or realistic/practical!)
    15. Web as DB - Web API
        • Linking Open (Government) Data (LOD)
        • SPARQL endpoints
      linkeddata.org
    16. browse: from web of docs to web of data
    17. http://data.linkedmdb.org/ page /actor/10
      • content negotiation, user agent prefers;
        • human (html) or machine (rdf/xml) readable
      RDF/N3
    18. http://data.linkedmdb.org/ page /actor/10
      • now at the bottom of the same page/actor/10
        • triple is Subject (S) Predicate (P) Object (O)
          • 10 (S) vocabulary: property (P) <object> (O)
        • properties link to other dataset instances
          • that use different datatype definitions
        • note D2R app, expose RDB as RDF, SPARQL to SQL
    19. http://data.linkedmdb.org/ data /actor/10
      • <subject> has predicate {space} object1 , objectN ; repeat until .
      • <http://data.linkedmdb.org/resource/actor/10>
      • foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> , <http://www.imdb.com/name/nm0000564/> ;
      • owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;
      • rdf:type movie:actor ,
      • foaf:Person .
      • this is an 'N3' RDF serialization, instead of RDF/XML (or others)
      • some properties have RESTful SPARQL queries as <objects>
      • foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>
    20. Web based SPARQL query builder http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')
    21. creates dbpedia.org query
      • use response data in next query
    22. authoritative metadata - provided tags!!
        • using standardized datatype and property specifications
        • ontologies emerges from social folksonomy
      http://commontag.org
    23. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    24. indexing/searching the Data Web
    25. aggregation and live data reporting http://sig.ma
    26. many to many set visualization http://mqlx.com/~david/parallax interface used to aggregate data across multiple (data) 'bases' on http://freebase.com
    27. ad-hoc analyst/end-user 'meshups'
    28. schema/bizmo/federal_enterprise
      • bizmo.freebase.com = OMG BMM + CPIC (+SOA...)
        • Obama is an instance of the Federal Enterprise type
      • Federal Enterprise (S) Fed Ent Goal (P) Goal (O)
    29. /rdf/bizmo.federal_enterprise (excerpt)
      • (W3C/FBase) <subject/topic> <predicate/property> <object/topic>
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/type.object. name > &quot;Federal Enterprise&quot;@en.
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/freebase.type_profile. instance_count > &quot;1&quot; ^^<http://www.w3.org/2001/ XMLSchema#long >.
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise> <http://rdf.freebase.com/ns/type.type. instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_strategy >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_tactic >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_directive >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_objective >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_information_technology_budget >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_goal >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/ xhtml/vocab#license > < http://creativecommons.org/licenses/by/3.0/ >.
    30. connecting the data dots:
      • create the following subject/predicate/object or topic/property/topic schema :
      • Goal / amplifies / Vision
      • Objective / quantifies / Goal
      • Federal Enterprise / (has) Fed Ent Goal / (of type) Goal
      • Federal Agency / maintains / Exhibit 53
      • Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)
      • Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal
      • then create instances with data from http://it.usaspending.gov:
      • Obama / is of type / Federal Enterprise
      • Obama / has a Fed Ent Goal / Health Care Reform
      • HHS / is of type / Federal Agency
      • HHS / maintains / HHS Exhibit 53
      • HHS Exhibit 53 / contains / Nat Health Info Network Connect
      • Nat Health Info Network Connect / supports Obama Goal / Health Care Reform
    31. search all 'bases' for 'Exhibit 53' http://mqlx.com/~david/parallax interface to http://bizmo.freebase.com
    32. base/bizmo/e53 returns
      • a collection (2 instances) of an Exhibit 53 topic
        • one from HHS and GSA (data from it.usaspending.gov)
      • triple in Exhibit 53 topic schema
        • Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)
    33. discovering unknown data structures
      • the power of 'faceted' search and browsing
      • interactive query – which of these?
        • Ex53 Recordset (S) Supports Federal Goal (P) ? (O)
    34. traversing the data graph
      • from info about an IT investment
      • to info about Administration priorities
      • 2 Ex53's to 3 Recordsets to 1 that has Obama Goal
        • <uri> (S) <uri> (P) <uri> (O)
    35. http://freemix.it - more faceted filtering
    36. scatter chart driven by tag clouds
    37. more multi-dataset faceted meshups
    38. drag & drop metadata/data 'curation'
    39. publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with
    40. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    41. crowdsourced analytics shown using 'Top Braid Composer Maestro' from http://topquandrant.com 'SPARQLMotion' script – also see Yahoo | Deri http://pipes.yahoo.com | http://pipes.deri.org
    42. cloud scale analytics (petabyte batch)
      • proprietary Google
        • GFS, BigTable and MapReduce
        • page rank impl
      • open source Apache Hadoop
        • HDFS, HBase and MapReduce
        • entity, RDFa extraction
      • Amazon EMR, Cloudera
        • COSS prof service providers
      facebook.com
    43. talis.com/platform - cloud graph store
      • Software as a Service, enabling rapid development with zero deployment costs
      • a simple, consistent web API for storing, managing and retrieving both structured and unstructured data
      • flexible, schema-free metadata that allows applications to be easily evolved
      • a range of data access and query options enabling easy integration into both new and existing applications
      • access control options to support hosting of both public and private data
      • a data hosting solution that is founded on open internet standards and web architectural best practices
      • ...
      • every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request
      • SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF
      • content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application
    44. agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    45. application to EA discipline
      • getting there from here
        • stop:
          • publishing / analyzing / visualizing unstructured data
          • using structure data only in file or message exchanges
        • start:
          • align Gov and Web architecture (including EA KB's!)
          • publish component ontologies on the Web
          • and begin linking their metadata and data
          • using the Social Data Web
        • continue:
          • embrace emergent structure and continuous improvement
          • using open source and enabling long-tail crowd-sourcing
    46. q&a - discussion
      • thanks for your time and attention!
      • contact me
        • http://xri.net/=george.thomas
        • GSA OCIO Chief Enterprise Architect
        • FCIOC-AIC Services Subcommittee Chair
        • W3C eGov IG invited expert
        • OMG GovDTF Steering Committee
        • Graduate School Faculty SOA Instructor

    + George ThomasGeorge Thomas, 2 months ago

    custom

    560 views, 0 favs, 0 embeds more stats

    This presentation is the culmination of my detail t more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 560
      • 560 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 2
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories