SemTech 2010: Pelorus Platform


Published on

Pelorus Platform is a suite of tools for building Linked Data and Semantic Web applications.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SemTech 2010: Pelorus Platform

  1. 1. Pelorus: A Semantic Web Application Platform 2010 Semantic Technology Conference Michael Grove Director of Software Development Clark & Parsia, LLC. --
  2. 2. Who are we? Clark & Parsia is a Semantic software startup founded in 2005 Offices in DC and Cambridge, MA Software products for end-user and OEM use Provides software development and integration services Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers.
  3. 3. Where do we start? No, literally, where do we start? Enterprise increasingly wants to utilize semweb tech to manage information Lack of in-house SemWeb expertise So what's the first step in these cases? It's hard to get a project off the ground without expertise In many cases, you just want to get a prototype running ASAP to evaluate the approach An integrated platform to rapidly prototype and assess semweb tech, which also scales to production, is crucial
  4. 4. The Pelorus Platform Pelorus Platform aims to ease this situation It's a standards-based application development stack geared toward enterprise information integration via RDF, SPARQL and OWL. Provides a collection of software designed to take you from ontology (or data) to application Based on years of customer engagements learning what parts are the same for everyone, and what parts are customized by everyone--and facilitating both. Minimal or no human in the loop steps are required to get a barebones application running From there, it's just UI customization
  5. 5. Ingredients PelletServer RESTful server-side component powered by Pellet Provides: Reasoning Semantic Search Integrity constraints Query services Machine Learning ... and Planning too! Semantic ETL Toolkit for transforming existing data into RDF Support for most common formats, XML, CSV, Excel, relational, etc. Conversion driven from domain ontology
  6. 6. More Ingredients Annex - A linked data server Publishes your RDF as linked data Works in-place against any RDF database No files to parse and directory structure to fill out Javascript module and pluggable template API for rendering resources CRUD workflow support for maintaining your data
  7. 7. More Ingredients Machine Learning Suite Bootstrap ontologies from existing data Provides capabilities for learning ETL transformations from existing data, decreasing by-hand mapping burden Automatically create Pelorus models for browsing Analysis support, clustering, classification, and more. Pelorus Faceted browsing via SPARQL for RDF data.
  8. 8. So What Now? Intent of Platform is to take either your existing data, or an existing ontology, as input and provide as output a working skeleton application. This is the Staples Easy button for the Semantic Web Some minimal configuration and UI customize may be required The goal is to Just Add Data and get back a working, full- service, modern app that's optimized for data integration and analysis.
  9. 9. Getting Started Legacy data in a series of databases, XML files, etc This is a maintenance nightmare How to you search this data, analyze it, or verify it's correctness? If we could get the data out of these legacy formats and integrate them, then we could do something useful...
  10. 10. 1. Integrate Legacy Data Ontology Bootstrapping via ML We can learn the basic ontology from our existing data Feed data to a ML process that will produce our ontology Semantic ETL Using our ontology, and some additional ML, we can generate mappings from the source data to the ontology Automatically convert our legacy data into RDF
  11. 11. 2. Publish Integrated Data Now that we have RDF, we'd like to publish it as Linked Data Annex Linked Data server takes any RDF database and exposes it's contents as Linked Data. Customizable template framework Javascript API to access original RDF database We'd also like to maintain our data Using Empire, we can generate Java beans to represent our domain ontology. Annex provides generic CRUD templates driven from standard Java beans, using JPA as a persistence mechanism. By virtue of simply having RDF in a database, we've got publication as Linked Data, and maintenance via simple CRUD pages for free.
  12. 12. 3. Browse & Search & Query We've published our RDF, but clicking around pages looking for a particular resource is not ideal Having a simple interface to browse the data would be great. Pelorus is served via Annex Facet model is generated dynamically via more ML Uses same Javascript template framework for custom display of RDF content.
  13. 13. Step 4: Analyze & Plan & Act We can use OWL reasoning via Pellet to learn new things about the data; for example: which products should we sell to which customers? which products should we sell to which prospects? why do we make these recommendations? We can use Machine Learning to learn new things, too: which customers are like others? (similarity) which groups do our customers fall into? (clustering) which employees are liaisons between parts of the company (social network analysis) which employees are most likely to retire in the next year? (classification) We can use Automated Planning to: build actionable plans/workflows based on these analyses
  14. 14. Interlude: Pelorus Demos -- American baseball -- NASA Space Program -- data catalog
  15. 15. What's the point? Getting to step 4 (and beyond) is the point, that's where the real ROI lives... You want to get there sooner & cheaper But many times step 1-3 is a hurdle If you've got limited time and/or budget to prove value in step 4, you don't want to waste it on the drudgery of getting off the ground This is the key to semantic technology's value proposition
  16. 16. Questions?