Gt ea2009
Upcoming SlideShare
Loading in...5
×
 

Gt ea2009

on

  • 1,154 views

 

Statistics

Views

Total Views
1,154
Views on SlideShare
1,154
Embed Views
0

Actions

Likes
1
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Gt ea2009 Gt ea2009 Presentation Transcript

  • transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • Web Oriented Architecture (WOA)
    • REpresentational State Transfer (REST)
      • The architectural style of the World Wide Web
      • aka Resource Oriented Architecture (ROA)
    • hyperlinks dereference (information) resource representations
      • HTTP URI's and content negotiation
        • user agent prefers .htm, .xml, .rdf, .etc
    • statefulness
      • servers maintain resource state, clients maintain application state
    • RESTful Web services
      • HTTP uniform interface
        • CRUD analog to HTTP PUT/GET/POST/DELETE
      • contrast to Remote Procedure Call (RPC) style Web services
        • SOAP/WSDL, you design the methods to invoke
    • global visibility (the Web) and persistence (permalinks)
      • caching, crawling, indexing
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • XForms - human data capture
    • Orbeon server side XForms engine, Ajax browser GUI's
      • catalog and builder apps
      • create new XSD bound forms
      • populate, persist, search
      • Tomcat and eXist
      • off-line capability
      • transformation pipeline
  • Atom Publishing Protocol (APP)
    • automated invocation of the RESTful Web service
      • HTTP PUT/POST the spreadsheet or XML instance doc
        • to atomserver.codehaus.org
    • where else is APP used?
      • Google Data API's, Microsoft Live Framework
  • Atom Syndication Format
      • transform XForm or APP captured info into XHTML+RDFa
      • (permalinked) public recordset in feed entry <content>
  • the london-gazette.co.uk
  • london-gazette.co.uk/listing small, discreet, component ontology/data-domain-metamodels
  • web page = web service
  • RDFa enabled 'deep link' discovery
      • Rich Snippets from Google
      • SearchMonkey from Yahoo
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • goal: federated dataset correlation
    • graph based dynamic schema evolution across silos
      • centralization/normalization not required (or realistic/practical!)
  • Web as DB - Web API
      • Linking Open (Government) Data (LOD)
      • SPARQL endpoints
    linkeddata.org
  • browse: from web of docs to web of data
  • http://data.linkedmdb.org/ page /actor/10
    • content negotiation, user agent prefers;
      • human (html) or machine (rdf/xml) readable
    RDF/N3
  • http://data.linkedmdb.org/ page /actor/10
    • now at the bottom of the same page/actor/10
      • triple is Subject (S) Predicate (P) Object (O)
        • 10 (S) vocabulary: property (P) <object> (O)
      • properties link to other dataset instances
        • that use different datatype definitions
      • note D2R app, expose RDB as RDF, SPARQL to SQL
  • http://data.linkedmdb.org/ data /actor/10
    • <subject> has predicate {space} object1 , objectN ; repeat until .
    • <http://data.linkedmdb.org/resource/actor/10>
    • foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> , <http://www.imdb.com/name/nm0000564/> ;
    • owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;
    • rdf:type movie:actor ,
    • foaf:Person .
    • this is an 'N3' RDF serialization, instead of RDF/XML (or others)
    • some properties have RESTful SPARQL queries as <objects>
    • foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>
  • Web based SPARQL query builder http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')
  • creates dbpedia.org query
    • use response data in next query
  • authoritative metadata - provided tags!!
      • using standardized datatype and property specifications
      • ontologies emerges from social folksonomy
    http://commontag.org
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • indexing/searching the Data Web
  • aggregation and live data reporting http://sig.ma
  • many to many set visualization http://mqlx.com/~david/parallax interface used to aggregate data across multiple (data) 'bases' on http://freebase.com
  • ad-hoc analyst/end-user 'meshups'
  • schema/bizmo/federal_enterprise
    • bizmo.freebase.com = OMG BMM + CPIC (+SOA...)
      • Obama is an instance of the Federal Enterprise type
    • Federal Enterprise (S) Fed Ent Goal (P) Goal (O)
  • /rdf/bizmo.federal_enterprise (excerpt)
    • (W3C/FBase) <subject/topic> <predicate/property> <object/topic>
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/type.object. name > &quot;Federal Enterprise&quot;@en.
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/freebase.type_profile. instance_count > &quot;1&quot; ^^<http://www.w3.org/2001/ XMLSchema#long >.
    • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise> <http://rdf.freebase.com/ns/type.type. instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_strategy >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_tactic >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_directive >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_objective >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_information_technology_budget >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_goal >.
    • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/ xhtml/vocab#license > < http://creativecommons.org/licenses/by/3.0/ >.
  • connecting the data dots:
    • create the following subject/predicate/object or topic/property/topic schema :
    • Goal / amplifies / Vision
    • Objective / quantifies / Goal
    • Federal Enterprise / (has) Fed Ent Goal / (of type) Goal
    • Federal Agency / maintains / Exhibit 53
    • Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)
    • Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal
    • then create instances with data from http://it.usaspending.gov:
    • Obama / is of type / Federal Enterprise
    • Obama / has a Fed Ent Goal / Health Care Reform
    • HHS / is of type / Federal Agency
    • HHS / maintains / HHS Exhibit 53
    • HHS Exhibit 53 / contains / Nat Health Info Network Connect
    • Nat Health Info Network Connect / supports Obama Goal / Health Care Reform
  • search all 'bases' for 'Exhibit 53' http://mqlx.com/~david/parallax interface to http://bizmo.freebase.com
  • base/bizmo/e53 returns
    • a collection (2 instances) of an Exhibit 53 topic
      • one from HHS and GSA (data from it.usaspending.gov)
    • triple in Exhibit 53 topic schema
      • Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)
  • discovering unknown data structures
    • the power of 'faceted' search and browsing
    • interactive query – which of these?
      • Ex53 Recordset (S) Supports Federal Goal (P) ? (O)
  • traversing the data graph
    • from info about an IT investment
    • to info about Administration priorities
    • 2 Ex53's to 3 Recordsets to 1 that has Obama Goal
      • <uri> (S) <uri> (P) <uri> (O)
  • http://freemix.it - more faceted filtering
  • scatter chart driven by tag clouds
  • more multi-dataset faceted meshups
  • drag & drop metadata/data 'curation'
  • publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • crowdsourced analytics shown using 'Top Braid Composer Maestro' from http://topquandrant.com 'SPARQLMotion' script – also see Yahoo | Deri http://pipes.yahoo.com | http://pipes.deri.org
  • cloud scale analytics (petabyte batch)
    • proprietary Google
      • GFS, BigTable and MapReduce
      • page rank impl
    • open source Apache Hadoop
      • HDFS, HBase and MapReduce
      • entity, RDFa extraction
    • Amazon EMR, Cloudera
      • COSS prof service providers
    facebook.com
  • talis.com/platform - cloud graph store
    • Software as a Service, enabling rapid development with zero deployment costs
    • a simple, consistent web API for storing, managing and retrieving both structured and unstructured data
    • flexible, schema-free metadata that allows applications to be easily evolved
    • a range of data access and query options enabling easy integration into both new and existing applications
    • access control options to support hosting of both public and private data
    • a data hosting solution that is founded on open internet standards and web architectural best practices
    • ...
    • every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request
    • SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF
    • content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application
  • agenda
    • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
    • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
    • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
    • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
    • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
    • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
  • application to EA discipline
    • getting there from here
      • stop:
        • publishing / analyzing / visualizing unstructured data
        • using structure data only in file or message exchanges
      • start:
        • align Gov and Web architecture (including EA KB's!)
        • publish component ontologies on the Web
        • and begin linking their metadata and data
        • using the Social Data Web
      • continue:
        • embrace emergent structure and continuous improvement
        • using open source and enabling long-tail crowd-sourcing
  • q&a - discussion
    • thanks for your time and attention!
    • contact me
      • http://xri.net/=george.thomas
      • GSA OCIO Chief Enterprise Architect
      • FCIOC-AIC Services Subcommittee Chair
      • W3C eGov IG invited expert
      • OMG GovDTF Steering Committee
      • Graduate School Faculty SOA Instructor