Building a semantic enterprise content management system from scratch v1


Published on

How we built a practical ontology-driven corporate intranet portal
in the cloud in three months using off-the-shelf technology. Presented at SemTechBiz San Francisco, June 6th 2012.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • About three years ago Jesse Dudley was working at Thomson Reuters on a product called KOLexperts that identifies experts in the pharma and biotech industries by analyzing content in places like PubMed. She attended the Special Libraries Association (SLA) Conference in June of 2009 in DC and, because of her work on KOLexperts, she attended a presentation titled “ Translational medicine meets the semantic web” by Olivier Bodenreider from the National Library of Medicine. This was her introduction into semtech after which she started spreading semtech stuff to me and I spread it along to Fynydd. It had obvious value for a lot of enterprise knowledge management tools we work on. So as we worked with customers interested in improving their knowledge sharing tools and intranets we started experimenting and recommending it. We started working with Clark and Parsia and began building prototype content management systems that ran on Stardog, their new RDF database. This eventually resulted in a semantic content management prototype and framework we called Cambridge, which has been well received in various incarnations by a couple clients. And then almost exactly three years from SLA 2009 we are speaking at SemtechBiz 2012.
  • Traditional ECM is most often the intranet portal. It’s primitive, slow to change, hard to deploy. It’s broken. It’s time to change.
  • SECMS tries to solve some of these problems by understanding the meaning of content and the goals of users. SECMS is the intersection of meaning and goals. We store information in more logical and standard formats (RDF) and use more modern and standard tools (SPARQL) to query them.
  • Some design principles. First is build it yourself. Often debated - no perfect answer. Why did we? -Semtech marketplace for this kind of thing is in its infancy, esp. UI and UX -Innovative and cutting edge solution -Tools shape thinking- differentiate yourself
  • Next: don’t build all of it yourself. Its the age of the mashup. Get advice and assistance from the best in the field. Build using the best software components and tools, open source, commercial, etc.
  • The cliched cloud slide. Why does the cloud matter? Provisioning real servers is slow and costly, bureaucratic. Even if final deployment is onsite, cloud is great for prototyping. Scale quickly. Cheaper and more efficient servers. While prototyping you can never be sure what resources you’ll need.
  • Another cliched slide: agile development. But why does it matter? Talk to clients - end users, not management - understand problems. Build iteratively. Build a system that doesn't require lots of documentation Build iteratively. Respond to change in business, marketplace, technology, capabilities. I
  • Last design principle: sometimes you need to upgrade your content. Our policy & procedure story. Started thinking how to build tool to deal with existing content. But content was written and organized for an old medium - paper - then pushed to PDF. Redundant, disorganized, mixed together. Once we switched gears, rewrote & improve content, solution was easier to build and better for users.
  • Now for implementation. AWS: Incredibly flexible and innovative .NET and C#: great framework, language, well accepted in enterprise MSSQL: good for non-RDF needs, well accepted in enterprise, SQL Express is free Stardog: great RDF database, fast and easy to use dotNetRDF: open source, talk to Stardog with ease
  • .NET is our platform but what about a foundation? Build or buy? Lots of debate and procrastination. All choices required similar development times Build your own: faster to prototype, most flexible, better ability to innovate Avoid politics of deciding between systems already in place [lotus quickr, teamsite, sharepoint] Generic .NET solution moves easily into whatever framework customer has/wants
  • One of our biggest problem was overcomplicating the ontology, e.g. answer questions Define goal : findability. build as you need it Don’t make it complicated, build as you need it. Treat ontology like content not code. build nice tools, prepare for it to change often. Biggest thing of all - don’t talk to users too much ontology (or tech in general). it’s only a means to an end. But selling its value to stakeholders can work.
  • Initially planned for dynamic menus based on role, but too complicated & unnecessary. Curated top menus based on user research, card sorting, etc worked best. Dynamic sub menus and related content links work. Friendly urls are often forgotten - good for experts, for sharing Beautiful page - UX - layouts - whitespace & margins- improve browsability and user satisfaction.
  • Don’t delay autocomplete, it improves search dramatically. Take your inputs and “snap them to a grid” to find an answer. Context is important, personalization is important Federation: include all types of results. Adaptive: build in your own analytics early on and use them for self diagnosis and improvement Beautiful results are easier to read.
  • Tagging: simple approach of picking “subject” (hasSubject) and “audience” (hasAudience) entities from a hierarchical view of select pieces of ontology. Expand to let them choose other relationships ( eg. hasDestination mars) Simple auto tagging recommendations by matching text; add more complex with tools like Open Calais? Inline analytics were very valuable tool for authors and mgmt. Of course, editor has to be great, as should entire admin -- too often ignored.
  • Must constantly improve - plan and budget for it early on. Start with a basic tool that looks great and has some semantics, prove it, grow it. People are used to constant improvement - internet, cars, etc. Focus on search, navigation, UX and performance.
  • Building a semantic enterprise content management system from scratch v1

    1. 1. Building a Semantic Enterprise Content Management System from ScratchHow  we  built  a  prac/cal  ontology-­‐driven  corporate  intranet  portal in  the  cloud  in  three  months  using  off-­‐the-­‐shelf  technology SemTechBiz  San  Francisco,  June  6th  2012 Ron  Michael  Ze-lemoyer  and  Cliff  Jurkiewicz @ronmichael  and  @cessna_pilot
    2. 2. Mobile & Desktop Apps Web Apps & Servicesfynydd          :in-­‐id  -­‐  noun Semantic Knowledge Management1.    a  word  of  Welsh  origin   meaning  mountain. User Interface Design2.    a  company  of  big  thinkers,   innovative  problem  solvers,   and  doers. Systems Architecture Reporting & Analytics
    3. 3. How we got here @thomson “TranslaAonal   reuters #kolexperts @jwindz medicine  meets  the   semanAc  web” #semtech 2009 #sla2009 @candp #stardog @ronmichael @fynydd Cambridge #semtechbiz 2012 Steve  Jobs Crea%vity  is  just  connec%ng  things.
    4. 4. Traditional enterprise content management Andy  Warhol They  say  that  /me   changes  things,  but   you  actually  have  to   change  them  yourself.
    5. 5. Semantic enterprise content management represents recognizes responds  to   the  meaning  of  content the  goals  of  users
    6. 6. Build it yourself Julius  Caesar Crea/ng  is  the essence  of  life.
    7. 7. Stand on the shoulders of giants Henry  Ford I  invented nothing  new. I  simply  assembled   the  discoveries  of   other  people.  Had  I  worked  fiBy  or  ten  or  even  five  years  before,  I  would  have  failed.  So  it  is  with  every  new  thing.
    8. 8. Keep your head in the cloud Henry  David  Thoreau   If  you  have  built  castle s  in  the  air,   your  work  need  not  be that  is  where  they  sho  lost; uld  be.
    9. 9. Be agile arles  Darwin Ch the  species trongest  of   ntelligent. I t  is  not  the  s r  the  most  i that  survives  no the  most  adaptable It  is  the  o ne  that  is   to  change.
    10. 10. Tame your content Dr.  Seuss So  the  writer  who  breeds more  words  than  he  needs, is  making  a  chore for  the  reader  who  reads.
    11. 11. Architecture dotNetRDF
    12. 12. Foundation Microsoft SharePoint ? Cambridge
    13. 13. Ontology • Define  your  goal:  increase  content  findability • Build  simply  and  as  you  need  it • Provide  simple  management  tools • Sell  stakeholders  on  its  value • Hide  it  from  users
    14. 14. Browse • Research  and  curate  top  level  menus • Generate  dynamic  sub  menus • Generate  related  content  links • Adopt  friendly  URLs • Design  beau/ful  pages
    15. 15. Search • Start  with  autocomplete • Use  a  “snap-­‐to-­‐grid”  approach • Make  it  contextual  and  personalized • Provide  federated  and  adap/ve  results • Design  beau/ful  search  results
    16. 16. Search User  input Context Content   SPARQL OperaAons SQL metadata Ontology LINQ Content data Public Secret AnalyAcal datasets sauce data Results  &  suggesAons
    17. 17. Administration • Give  authors  manual  &  automa/c  tagging • Show  content-­‐level  analy/cs   • Build  a  great  editor • Design  beau/ful  adminsitra/ve  tools
    18. 18. Keep moving Lexus Anything  not is  moving  bac  moving  forw ard   kward.
    19. 19. Start building William  Wordsworth To  begin,  begin.
    20. 20. Libraries and Code dotNetRDF h-p:// Squickl  SQL  data  access  library h-ps:// AWS  Snapshot  Scheduler h-ps://­‐snapshot-­‐scheduler Stardog  Bites  MSSQL  CLR  extensions h-ps://­‐bites-­‐mssql CFrame  Content  Management  Framework h-ps:// dotNetRDF  Stardog  Helper h-ps://­‐stardog-­‐helper
    21. 21. References IntegraAng  SemanAc  Systems John  F.  Sowa:  h-p:// An  Ontology-­‐Based  Knowledge  Management  Pla]orm Aldea  et  al:  h-p:// SemanAc  Enterprise  Content  Management Mark  Fisher,  Amit  Sheth:  h-p:// The  SemanAc  Web  and  Entertainment  Weekly Donna  Slawsky:  h-p:// Improving  Content  Management  with  SemanAc  Technologies Fernando  Carolo  and  Leonardo  Burlamaqui:  h-p:// Content  Management  Bible Bob  Boiko:  h-p://
    22. 22. fynydd.comDon’t  forget  your  towel.