More Related Content


An Alfresco Apache Stanbol Integration (port of OpenCalais integration) - Alfresco DevCon 2012 San Jose

  1. An Alfresco Apache Stanbol Integration (port of OpenCalais Integration) Steve Reiner CTO Integrated Semantics
  2. OpenCalais Integration Features • Share, FlexSpaces, and Explorer UI • Auto tagging action (manual and rules) in all • List semantic tags in details in all • Share, FlexSpaces: Semantic Tag Clouds, Geo-Tagged Map • FlexSpaces: Suggest Tags, Add / Remove Tags on Doc • Open Source
  3. OpenCalais Share Integration Features • Auto tag menu in Doc Lib, Repo, Details • Semantic Tag Cloud Dashlets with category drop-down • Geo-tagged map dashlet • Dashlets work both on site and overall dashboards • Search results list when click tags in these dashlets
  4. OpenCalais Advantages / Disadvantages • Advantages: • Good Recognition Results on Names, Cities, Companies • Good for news, public website text • Disadvantages: • Doc Size limit on All versions (100k bytes) • Daily submission items limits on Free OpenCalais (50k) and Calais Professional (100k) • Keep metadata extracted • Focused on English, some support for French,Spanish • Not Customizable in Taxonomy or in recognition code • Not Open Source
  5. Apache Stanbol • Disadvantages: • OpenNLP Recognition of Names, Cities, Companies not as good as OpenCalais (can chain other engines/services including OpenCalais) • Advantages: • No doc size or submission item limits • Multi language focused • Customizable in Taxonomy and in recognition code • Open Source
  6. Apache Stanbol • More of a full semantic platform, not just text enhancement • Focused on semantic content management • Could be used for a more general semantic platform • Componentized, OSGi based • Enhancer, Enhancement Engines, Entity Hub, ContentHub, Ontology Mgr, Rules, Reasoners, CMS Adapter, FactStore
  7. Port of OpenCalais Integration to Apache Stanbol • Prototype download available now • • Open Source • All previous features in Share and Explorer are available • Alfresco extension (4.x) and Share extension (4.0 and 4.2) • Share auto-tag menus, semantic tag clouds dashlet, geo-tagged dashlet, semantic tags listed in details • Action can be used in content rules to auto-tag all submissions to a folder, etc. • Auto tag action also available in Explorer, semantic tags listed on details page • Suggest tag webscript not complete • FlexSpaces doesn’t have support yet (need to add additional calls to different webscript URLs and add preference options of to use OpenCalais or • Leveraged a Java client API library contributed by Zaizi to Stanbol that makes REST calls to Stanbol
  8. Apache Stanbol Integration Features Roadmap • Finish Suggest Tag WebScript and add support to FlexSpaces for Stanbol • Display of dbpedia info / webpage on entity next to search results list on page displayed after semantic tag click • Add using Stanbol contenthub instead of stateless entityhub to retain semantic enhancement of docs • If Zaizi Stanbol integration is not made available as open source, will add some things such as Solr Facets search UI of semantic categories / entities • Other things considering • SKOS taxonomy editor • Semantic Categories Graph (single doc, multiple docs) • Tie in Alfresco as the content mgr of versions of Proté gé GWT Web UI ontology editor / tie in with Stanbol • Stanbol support for enhancing any CMIS repository • Stanbol as platform semantic data integration of structured data in addition to unstructured
  9. Links to Find out more • • blog • • • • • Twitter: @stevereiner