Structured Dynamics' Semantic Technologies Product Stack
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Structured Dynamics' Semantic Technologies Product Stack

on

  • 6,875 views

Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information ...

Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.

Statistics

Views

Total Views
6,875
Views on SlideShare
5,084
Embed Views
1,791

Actions

Likes
12
Downloads
143
Comments
1

11 Embeds 1,791

http://www.mkbergman.com 981
http://structureddynamics.com 476
http://www.structureddynamics.com 208
http://www.techgig.com 103
http://www.slideshare.net 15
http://www.lmodules.com 2
https://www.linkedin.com 2
http://www.linkedin.com 1
http://www.google.co.uk 1
http://webcache.googleusercontent.com 1
http://timesjobs.techgig.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • At present, though constantly increasing, Zitgist's existing conversion services recognizes nearly 100 various formats GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a W3C markup format for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT  GRDDL accomodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • At present, though constantly increasing, Zitgist's existing conversion services recognizes nearly 100 various formats GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a W3C markup format for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT  GRDDL accomodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • More here also, use the candidate properties content to get the extract to the SC context (??? more about the “aboutness”) contextual UMBEL metadata on the fly

Structured Dynamics' Semantic Technologies Product Stack Presentation Transcript

  • 1. May (updated) 2010 Product Stack
  • 2. Enterprise Approach
  • 3. Enterprise Approach
    • Semantic Enterprise based on semantic Web , linked data
    • Leverage existing assets
      • Data, records and instances
      • Taxonomies, structure and schema
    • Layer semantics on to existing systems
    • Develop incrementally
    • Add sophistication, scope over time
    • Keep risks low
    • Integrate with public and Web data (“open world”)
  • 4. Linked Data
    • “ Linked Data is a set of best practices for publishing and deploying instance and class data using the RDF data model, naming the data objects using uniform resource identifiers (URIs), thereby exposing the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.”
  • 5. Layers and Current Products
  • 6. Current Products
    • the pivotal product; Web services middleware that provides distributed data access and federation
    • Drupal-based structured data linkage to structWSF
    • spreadsheet, JSON and XML authoring and conversion framework
    • reference set of linking subjects and basis for domain vocabularies
    • an ontology- and entity-driven information extraction and tagging system
  • 7. Fit of Current Products within Layers
  • 8. Existing Assets Layer
  • 9. Existing Assets
    • These are the materials that need to be federated, made interoperable, and given a common semantics
            • structured data / databases
            • semi-structure data (XML, Web pages)
            • unstructured data (text)
  • 10. Preserving Existing Assets
    • Relational databases (RDBMs)
    • Distributed structured assets
      • spreadsheets
      • lightweight datastores
    • Web pages and Web sites
    • Existing documents and text
    • Web databases and APIs
    • Other databases (RDF, OO, etc.)
  • 11. Access/Conversion Layer
  • 12. Conversion
    • Provides in-place access to existing information
    • Translates existing formats and structures to RDF
    • Extracts structured information from unstructured text
    • Aids creation of interoperable datasets
    • Geared almost entirely to records , instances or entities (that is, basic data)
  • 13. Conversion Methods
    • Relational DBs: RDB2RDF
    • RDFizers
    • Information Extraction
    • New Dataset Authoring
    • Direct Use (already in RDF)
  • 14. Relational DB Conversion
    • Simple mappings of instance records to RDF
    • Methodologies well proven if kept to the instance level
    • RDB schema inform the interoperable layer (“ontologies”)
    • Relational datastores left in place
    • Record data obtained via access layer (structWSF)
  • 15. RDFizers
    • General serialization or data format conversions to RDF
    • Mostly applied to:
      • Standard data formats and data structs
      • Web content
      • APIs
      • Some legacy content
    • Sometimes some minor ontology or schema mapping
    • Embodies all conversion steps to linked data
    • We have access to more than 100+ existing formats
  • 16. RDFizers – Listing 1
    • URN handlers (in addition to IRI and URI):
    • DOI
    • LSID
    • OAI
    • RDF
    • Serialization formats:
      • irON
      • N3
      • RDF/XML
      • Turtle
    • Languages and ontologies:
      • AB Meta
      • Annotea
      • APML
      • AtomOWL
      • Bibliographic Ontology
      • Creative Commons
      • EXIF
      • FOAF
      • GeoNames
      • GoodRelations
      • Java
      • Javadoc
      • MARC/MODS
      • Meta Standards
      • Music Ontology
      • Natural Language
      • Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
      • Open Geospatial
      • OWL
      • SIOC
      • SIOCT
      • SKOS
      • UMBEL
      • vCard
      • XML
      • Others
    • (X)HTML pages
    • Embedded Microformats and GRDDL * (see note below):
      • DC
      • eRDF
      • geoURL
      • Google Base
      • hAudio
      • hCalendar
      • hCard
      • hListing
      • hResume
      • hReview
      • HR-XML
      • Ning
      • RDFa
      • relLicense
      • SVG
      • XBRL
      • XFN
      • xFolk
      • XR-XML
      • XSLT
    • Syndication Formats:
      • Atom
      • OPML
      • OCS
      • RSS 1.1
      • RSS 2.0
      • XBEL (for bookmarks)
    • REST-style Web service APIs:
      • Alchemy
      • Amazon
      • Apple
      • Best Buy
      • Calais
      • CNet
      • CrunchBase
      • Del.icio.us
      • Digg
      • Discogs
      • Disqus
      • eBay
      • Facebook
      • Flickr
      • Freebase (MQL)
      • FriendFeed
      • Garmin
      • Get Satisfaction
      • Google
      • Google Apps
      • Hoover's
      • HTTP (raw)
      • ISBN DB
      • Last.fm
      • Library Thing
      • Magnolia
      • Meetup
      • MusicBrainz
      • New York Times
      • New York Times Campaign Finance (NYTCF)
      • New York Times tags
  • 17. RDFizers – Listing 2
      • Open Library
      • Open Social
      • Open Street
      • OpenLink (facets)
      • O'Reilly
      • Picasa
      • Radio Pop (BBC)
      • Rhapsody
      • Salesforce
      • Slideshare
      • Slidy
      • Technorati
      • Tesco
      • They Work For You
      • Twine
      • Twitter
      • Weather
      • Wikipedia
      • World Bank
      • Yahoo! BOSS
      • Yahoo! Finance
      • Yahoo! Maps
      • Yahoo! Weather
      • Yelp
      • YouTube
      • Zemanta
      • Zillow
    • Files (multitude of file formats and MIME types, including):
      • audio (general)
      • BibJSON
      • BibTEX and others
      • BitTorrent
      • commON
      • CSV
      • Fink
      • Flat files
      • irJSON
      • irXML
      • JPEG
      • JSON
      • images
      • MS Office
      • OpenOffice
      • Open Document Format
      • Palm
      • RDF123
      • video
      • XLS
      • etc.
    • Metadata extractors:
      • CRW
      • DEB
      • EXIF
      • OCW
      • RPM
      • XMP
    • Email formats:
      • EMail
      • Outlook
      • RFC822
    • Version control and related systems:
      • Bugzilla
      • Jira
      • POM
      • Subversion
    • Other Web service frameworks:
      • BPEL
      • WSDL
      • XBRL
      • XBEL
    • Data exchange formats:
      • iCalendar
      • LDIF
      • vCalendar
      • vCard
    • Relational databases and related:
      • D2RQ
      • D2RMAP
      • RDF Views
    • Virtuoso VADs
    • OpenLink license files
    • Third party metadata extraction frameworks:
      • Aperture
      • Spotlight
    • Miscellaneous and other related converters:
      • MPEG-7/CS -> OWL
      • Random
      • XSD -> OWL
    • * GRDDL (Gleaning Resource Descriptions from Dialects of Languages) accommodates a wide variety of dialects (see one listing ) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).
  • 18. scones
  • 19. Information Extraction
    • scones ( S ubject C oncept O r N amed E ntitie S ) is our IE tagger
    • Information extraction is applied to input Web pages and unstructured text
    • May be applied after structure extraction:
    • (often, at minimum, defluffing )
    • Settable “window” for snippet (from # of bracketing terms to full document)
    • Extraction is performed for both:
      • Entities (per Wikipedia and enterprise dictionaries)
      • Subject concepts (per UMBEL and domain ontologies)
    • Presently in prototype
  • 20. (Named) Entities
    • The places , events , people , objects , and specific things of the real world
    • Literally millions of notable instances
    • Each belongs to one or more subject concept (s)
    • Currently, the predominate basis for linked data
    • Public sources include Wikipedia and Freebase, others
    • Can be readily mixed-and-matched with private entities
  • 21. Creating New Entity Dictionaries
  • 22. Triangulating Information Extraction
  • 23. irON – i nstance r ecord and O bject N otation
  • 24. irON Dataset Authoring Framework
    • Simple authoring and dataset creation
    • irON includes an abstract notation and vocabulary for instance records
    • Serializations available for:
      • XML (irXML)
      • JSON (irJSON)
      • CSV/spreadsheets (commON)
    • Notations for:
      • Instance records
      • Schema
      • Datasets and metadata
      • Linkages to other schema
  • 25. Three irON Serializations
    • irXML
    irJSON commON
  • 26. More-or-less Interchangeable Formats
  • 27. structWSF
  • 28. structWSF
    • Generally RESTful Web services middleware
    • Uniform, distributed access point
    • Provides the interoperability architecture
    • Based on canonical RDF data model
    • Dataset access orientation
    • Standard tools and services:
      • User permissions and access
      • CRUD (create, read, update, delete)
      • Browse
      • Full-text, faceted search
      • Import / export
      • Many others
  • 29. RDF and Data Federation Model
  • 30. Advantages of a Canonical Model
    • All tools can be driven from a single data format basis
    • Single converters can link in other hubs of data forms
    • ‘ Round-tripping’ thru the canonical form can bring consistency and cleanliness to inputted data
    • RDF is well-suited as the canonical form:
      • Structured data
      • Semi-structured data
      • Unstructured data (after IE)
      • Simple-to-complex data structures
      • Logic and inferencing
      • Suitable to all input data formats
      • Many serializations possible
  • 31. A Collaborative, Distributed Network
  • 32. Flexible User Access Permissions
  • 33. Access, APIs and Endpoints
    • The resulting linked data may be exposed as:
    • APIs
    • Web services
    • SPARQL endpoints
  • 34. Ontologies Layer
  • 35. Ontologies
    • Ontologies provide the basis for:
      • Interoperating
      • Reconciling semantics
    • Multiples may be used at any time
    • Both enterprise (internal) and external ontologies
    • Best built incrementally, with participation
    • Easily modified: OK to test and experiment
  • 36. Ontologies
    • The structural relationships of concepts within a domain
    • Generally class- (or set-) oriented
    • Analogous to relational database schema, only with controlled vocabularies and exact semantics
    • Sets the structure of how to organize the actual data (“instances”) in the domain
    • Semantics and mapping techniques allow disparate ontologies to be inter-related
    • Can inference or reason over the structure
  • 37. Migrating Structure to the Ontology Layer
  • 38. Ontologies Layer
  • 39. irON
  • 40. irON Record Vocabulary
    • irON also provides the standard instance record vocabulary for all federated records
    • Each record source has its own attributes
    • But, irON provides common descriptors:
      • Useful for interoperating
      • Unique, Web-accessible identifiers
      • Standard descriptions and labels
      • Conventions for “driving” user interfaces and tools
  • 41. UMBEL
    • UMBEL ( U pper M apping and B inding E xchange L ayer)
    • 20,000 defined reference points in information space
    • Means to assert what a given chunk of content is about
    • Enable similar content to be aggregated
    • Place content in context with other content
    • Aggregation points for tying in instances and entities
    • Derived and a subset of the Cyc knowledge base
    • Vocabulary basis for domain-specific subject ontologies
  • 42. Notable Ontologies and Vocabularies
  • 43. Management Layer
  • 44. Management/Federation Layer
    • Management/Federation Layer handles:
      • Ontology mapping, management
      • Queries and retrievals
      • All Web services
      • Imports and exports
      • Inferencing and logic
      • Ontology creation and expansion
    • Works off of many RDF datastores
    • Has efficient, full-text indexing with faceting
    • Interface to the system is structWSF
    • Can plug into many options at the Applications Layer
      • (only Drupal with conStruct SCS yet deployed)
  • 45. Web-oriented Architecture
  • 46. Applications Layer
  • 47. conStruct SCS
  • 48. conStruct Browse Screen
  • 49. conStruct Capabilities
    • Based on Drupal
    • Single-click ( cloud ) deployment
    • Theming
    • User and group access and management
    • Data display templates
    • General content management system (CMS)
    • Publishing RDF
    • Open source
  • 50. Re-cap
  • 51. Summary
    • Incremental, low-risk approach to the semantic enterprise
    • Maximum leverage and re-use of existing information assets
    • Conversion and federation of all available data forms
    • Excellent uses for:
      • Business intelligence
      • Knowledge management
      • Master data management modernization
      • Taxonomy modernization
      • Enterprise content integration
    • All baseline products are open source
  • 52. Contacts & Information
    • Michael K. Bergman
      • CEO
      • 319.621.5225
      • [email_address]
      • blog: www.mkbergman.com
    • Steve Ardire
      • Senior Advisor
      • [email_address]
    • Frédérick Giasson
      • CTO
      • [email_address]
      • blog: fgiasson.com /blog
    • Web Sites
      • structureddynamics.com
      • umbel.org
      • umbel.structureddynamics.com (UMBEL Web services)
      • citizen- dan.org (community indicator systems)
      • openstructs.org (open source distros + documentation)
      • constructscs.com (Drupal structured data system)
  • 53.