• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Gt ea2009
 

Gt ea2009

on

  • 1,138 views

 

Statistics

Views

Total Views
1,138
Views on SlideShare
1,138
Embed Views
0

Actions

Likes
1
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Gt ea2009 Gt ea2009 Presentation Transcript

    • transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • Web Oriented Architecture (WOA)
      • REpresentational State Transfer (REST)
        • The architectural style of the World Wide Web
        • aka Resource Oriented Architecture (ROA)
      • hyperlinks dereference (information) resource representations
        • HTTP URI's and content negotiation
          • user agent prefers .htm, .xml, .rdf, .etc
      • statefulness
        • servers maintain resource state, clients maintain application state
      • RESTful Web services
        • HTTP uniform interface
          • CRUD analog to HTTP PUT/GET/POST/DELETE
        • contrast to Remote Procedure Call (RPC) style Web services
          • SOAP/WSDL, you design the methods to invoke
      • global visibility (the Web) and persistence (permalinks)
        • caching, crawling, indexing
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • XForms - human data capture
      • Orbeon server side XForms engine, Ajax browser GUI's
        • catalog and builder apps
        • create new XSD bound forms
        • populate, persist, search
        • Tomcat and eXist
        • off-line capability
        • transformation pipeline
    • Atom Publishing Protocol (APP)
      • automated invocation of the RESTful Web service
        • HTTP PUT/POST the spreadsheet or XML instance doc
          • to atomserver.codehaus.org
      • where else is APP used?
        • Google Data API's, Microsoft Live Framework
    • Atom Syndication Format
        • transform XForm or APP captured info into XHTML+RDFa
        • (permalinked) public recordset in feed entry <content>
    • the london-gazette.co.uk
    • london-gazette.co.uk/listing small, discreet, component ontology/data-domain-metamodels
    • web page = web service
    • RDFa enabled 'deep link' discovery
        • Rich Snippets from Google
        • SearchMonkey from Yahoo
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • goal: federated dataset correlation
      • graph based dynamic schema evolution across silos
        • centralization/normalization not required (or realistic/practical!)
    • Web as DB - Web API
        • Linking Open (Government) Data (LOD)
        • SPARQL endpoints
      linkeddata.org
    • browse: from web of docs to web of data
    • http://data.linkedmdb.org/ page /actor/10
      • content negotiation, user agent prefers;
        • human (html) or machine (rdf/xml) readable
      RDF/N3
    • http://data.linkedmdb.org/ page /actor/10
      • now at the bottom of the same page/actor/10
        • triple is Subject (S) Predicate (P) Object (O)
          • 10 (S) vocabulary: property (P) <object> (O)
        • properties link to other dataset instances
          • that use different datatype definitions
        • note D2R app, expose RDB as RDF, SPARQL to SQL
    • http://data.linkedmdb.org/ data /actor/10
      • <subject> has predicate {space} object1 , objectN ; repeat until .
      • <http://data.linkedmdb.org/resource/actor/10>
      • foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> , <http://www.imdb.com/name/nm0000564/> ;
      • owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;
      • rdf:type movie:actor ,
      • foaf:Person .
      • this is an 'N3' RDF serialization, instead of RDF/XML (or others)
      • some properties have RESTful SPARQL queries as <objects>
      • foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>
    • Web based SPARQL query builder http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')
    • creates dbpedia.org query
      • use response data in next query
    • authoritative metadata - provided tags!!
        • using standardized datatype and property specifications
        • ontologies emerges from social folksonomy
      http://commontag.org
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • indexing/searching the Data Web
    • aggregation and live data reporting http://sig.ma
    • many to many set visualization http://mqlx.com/~david/parallax interface used to aggregate data across multiple (data) 'bases' on http://freebase.com
    • ad-hoc analyst/end-user 'meshups'
    • schema/bizmo/federal_enterprise
      • bizmo.freebase.com = OMG BMM + CPIC (+SOA...)
        • Obama is an instance of the Federal Enterprise type
      • Federal Enterprise (S) Fed Ent Goal (P) Goal (O)
    • /rdf/bizmo.federal_enterprise (excerpt)
      • (W3C/FBase) <subject/topic> <predicate/property> <object/topic>
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/type.object. name > &quot;Federal Enterprise&quot;@en.
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise > <http://rdf.freebase.com/ns/freebase.type_profile. instance_count > &quot;1&quot; ^^<http://www.w3.org/2001/ XMLSchema#long >.
      • <http://rdf.freebase.com/ns/base.bizmo. federal_enterprise> <http://rdf.freebase.com/ns/type.type. instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_strategy >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_tactic >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_directive >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_objective >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_information_technology_budget >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type. properties > <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise. federal_enterprise_goal >.
      • <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/ xhtml/vocab#license > < http://creativecommons.org/licenses/by/3.0/ >.
    • connecting the data dots:
      • create the following subject/predicate/object or topic/property/topic schema :
      • Goal / amplifies / Vision
      • Objective / quantifies / Goal
      • Federal Enterprise / (has) Fed Ent Goal / (of type) Goal
      • Federal Agency / maintains / Exhibit 53
      • Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)
      • Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal
      • then create instances with data from http://it.usaspending.gov:
      • Obama / is of type / Federal Enterprise
      • Obama / has a Fed Ent Goal / Health Care Reform
      • HHS / is of type / Federal Agency
      • HHS / maintains / HHS Exhibit 53
      • HHS Exhibit 53 / contains / Nat Health Info Network Connect
      • Nat Health Info Network Connect / supports Obama Goal / Health Care Reform
    • search all 'bases' for 'Exhibit 53' http://mqlx.com/~david/parallax interface to http://bizmo.freebase.com
    • base/bizmo/e53 returns
      • a collection (2 instances) of an Exhibit 53 topic
        • one from HHS and GSA (data from it.usaspending.gov)
      • triple in Exhibit 53 topic schema
        • Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)
    • discovering unknown data structures
      • the power of 'faceted' search and browsing
      • interactive query – which of these?
        • Ex53 Recordset (S) Supports Federal Goal (P) ? (O)
    • traversing the data graph
      • from info about an IT investment
      • to info about Administration priorities
      • 2 Ex53's to 3 Recordsets to 1 that has Obama Goal
        • <uri> (S) <uri> (P) <uri> (O)
    • http://freemix.it - more faceted filtering
    • scatter chart driven by tag clouds
    • more multi-dataset faceted meshups
    • drag & drop metadata/data 'curation'
    • publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • crowdsourced analytics shown using 'Top Braid Composer Maestro' from http://topquandrant.com 'SPARQLMotion' script – also see Yahoo | Deri http://pipes.yahoo.com | http://pipes.deri.org
    • cloud scale analytics (petabyte batch)
      • proprietary Google
        • GFS, BigTable and MapReduce
        • page rank impl
      • open source Apache Hadoop
        • HDFS, HBase and MapReduce
        • entity, RDFa extraction
      • Amazon EMR, Cloudera
        • COSS prof service providers
      facebook.com
    • talis.com/platform - cloud graph store
      • Software as a Service, enabling rapid development with zero deployment costs
      • a simple, consistent web API for storing, managing and retrieving both structured and unstructured data
      • flexible, schema-free metadata that allows applications to be easily evolved
      • a range of data access and query options enabling easy integration into both new and existing applications
      • access control options to support hosting of both public and private data
      • a data hosting solution that is founded on open internet standards and web architectural best practices
      • ...
      • every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request
      • SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF
      • content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application
    • agenda
      • An overview of Web Oriented Architecture (WOA) design principles that have made the Web the most successful distributed computing platform ever created will be given.
      • Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.
      • Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.
      • Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.
      • A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.
      • Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.
    • application to EA discipline
      • getting there from here
        • stop:
          • publishing / analyzing / visualizing unstructured data
          • using structure data only in file or message exchanges
        • start:
          • align Gov and Web architecture (including EA KB's!)
          • publish component ontologies on the Web
          • and begin linking their metadata and data
          • using the Social Data Web
        • continue:
          • embrace emergent structure and continuous improvement
          • using open source and enabling long-tail crowd-sourcing
    • q&a - discussion
      • thanks for your time and attention!
      • contact me
        • http://xri.net/=george.thomas
        • GSA OCIO Chief Enterprise Architect
        • FCIOC-AIC Services Subcommittee Chair
        • W3C eGov IG invited expert
        • OMG GovDTF Steering Committee
        • Graduate School Faculty SOA Instructor