How we built a practical ontology-driven corporate intranet portal in the cloud in three months using off-the-shelf technology. Presented at SemTechBiz San Francisco, June 6th 2012.
Building a semantic enterprise content management system from scratch v1
1. Building a Semantic Enterprise
Content Management System from Scratch
How
we
built
a
prac/cal
ontology-‐driven
corporate
intranet
portal
in
the
cloud
in
three
months
using
off-‐the-‐shelf
technology
SemTechBiz
San
Francisco,
June
6th
2012
Ron
Michael
Ze-lemoyer
and
Cliff
Jurkiewicz
@ronmichael
and
@cessna_pilot
2. Mobile & Desktop Apps
Web Apps & Services
fynydd
:in-‐id
-‐
noun
Semantic Knowledge Management
1.
a
word
of
Welsh
origin
meaning
mountain.
User Interface Design
2.
a
company
of
big
thinkers,
innovative
problem
solvers,
and
doers.
Systems Architecture
fynydd.com
Reporting & Analytics
3. How we got here
@thomson “TranslaAonal
reuters #kolexperts
@jwindz medicine
meets
the
semanAc
web”
#semtech
2009
#sla2009
@candp
#stardog
@ronmichael
@fynydd Cambridge
#semtechbiz 2012
Steve
Jobs
Crea%vity
is
just
connec%ng
things.
4. Traditional enterprise content management
Andy
Warhol
They
say
that
/me
changes
things,
but
you
actually
have
to
change
them
yourself.
5. Semantic enterprise content management
represents
recognizes
responds
to
the
meaning
of
content
the
goals
of
users
7. Stand on the shoulders of giants
Henry
Ford
I
invented
nothing
new.
I
simply
assembled
the
discoveries
of
other
people.
Had
I
worked
fiBy
or
ten
or
even
five
years
before,
I
would
have
failed.
So
it
is
with
every
new
thing.
8. Keep your head in the cloud
Henry
David
Thoreau
If
you
have
built
castle
s
in
the
air,
your
work
need
not
be
that
is
where
they
sho
lost;
uld
be.
9. Be agile
arles
Darwin
Ch the
species
trongest
of
ntelligent.
I t
is
not
the
s r
the
most
i
that
survives
no the
most
adaptable
It
is
the
o ne
that
is
to
change.
10. Tame your content
Dr.
Seuss
So
the
writer
who
breeds
more
words
than
he
needs,
is
making
a
chore
for
the
reader
who
reads.
13. Ontology
• Define
your
goal:
increase
content
findability
• Build
simply
and
as
you
need
it
• Provide
simple
management
tools
• Sell
stakeholders
on
its
value
• Hide
it
from
users
14. Browse
• Research
and
curate
top
level
menus
• Generate
dynamic
sub
menus
• Generate
related
content
links
• Adopt
friendly
URLs
• Design
beau/ful
pages
15. Search
• Start
with
autocomplete
• Use
a
“snap-‐to-‐grid”
approach
• Make
it
contextual
and
personalized
• Provide
federated
and
adap/ve
results
• Design
beau/ful
search
results
16. Search
User
input
Context
Content
SPARQL OperaAons SQL
metadata Ontology LINQ Content
data
Public Secret AnalyAcal
datasets sauce data
Results
&
suggesAons
17. Administration
• Give
authors
manual
&
automa/c
tagging
• Show
content-‐level
analy/cs
• Build
a
great
editor
• Design
beau/ful
adminsitra/ve
tools
18. Keep moving
Lexus
Anything
not
is
moving
bac
moving
forw
ard
kward.
21. References
IntegraAng
SemanAc
Systems
John
F.
Sowa:
h-p://go.fynydd.com/vxzum
An
Ontology-‐Based
Knowledge
Management
Pla]orm
Aldea
et
al:
h-p://go.fynydd.com/opble
SemanAc
Enterprise
Content
Management
Mark
Fisher,
Amit
Sheth:
h-p://go.fynydd.com/qfllv
The
SemanAc
Web
and
Entertainment
Weekly
Donna
Slawsky:
h-p://go.fynydd.com/dygpj
Improving
Content
Management
with
SemanAc
Technologies
Fernando
Carolo
and
Leonardo
Burlamaqui:
h-p://go.fynydd.com/bpvor
Content
Management
Bible
Bob
Boiko:
h-p://go.fynydd.com/xhjbi
About three years ago Jesse Dudley was working at Thomson Reuters on a product called KOLexperts that identifies experts in the pharma and biotech industries by analyzing content in places like PubMed. She attended the Special Libraries Association (SLA) Conference in June of 2009 in DC and, because of her work on KOLexperts, she attended a presentation titled “ Translational medicine meets the semantic web” by Olivier Bodenreider from the National Library of Medicine. This was her introduction into semtech after which she started spreading semtech stuff to me and I spread it along to Fynydd. It had obvious value for a lot of enterprise knowledge management tools we work on. So as we worked with customers interested in improving their knowledge sharing tools and intranets we started experimenting and recommending it. We started working with Clark and Parsia and began building prototype content management systems that ran on Stardog, their new RDF database. This eventually resulted in a semantic content management prototype and framework we called Cambridge, which has been well received in various incarnations by a couple clients. And then almost exactly three years from SLA 2009 we are speaking at SemtechBiz 2012.
Traditional ECM is most often the intranet portal. It’s primitive, slow to change, hard to deploy. It’s broken. It’s time to change.
SECMS tries to solve some of these problems by understanding the meaning of content and the goals of users. SECMS is the intersection of meaning and goals. We store information in more logical and standard formats (RDF) and use more modern and standard tools (SPARQL) to query them.
Some design principles. First is build it yourself. Often debated - no perfect answer. Why did we? -Semtech marketplace for this kind of thing is in its infancy, esp. UI and UX -Innovative and cutting edge solution -Tools shape thinking- differentiate yourself
Next: don’t build all of it yourself. Its the age of the mashup. Get advice and assistance from the best in the field. Build using the best software components and tools, open source, commercial, etc.
The cliched cloud slide. Why does the cloud matter? Provisioning real servers is slow and costly, bureaucratic. Even if final deployment is onsite, cloud is great for prototyping. Scale quickly. Cheaper and more efficient servers. While prototyping you can never be sure what resources you’ll need.
Another cliched slide: agile development. But why does it matter? Talk to clients - end users, not management - understand problems. Build iteratively. Build a system that doesn't require lots of documentation Build iteratively. Respond to change in business, marketplace, technology, capabilities. I
Last design principle: sometimes you need to upgrade your content. Our policy & procedure story. Started thinking how to build tool to deal with existing content. But content was written and organized for an old medium - paper - then pushed to PDF. Redundant, disorganized, mixed together. Once we switched gears, rewrote & improve content, solution was easier to build and better for users.
Now for implementation. AWS: Incredibly flexible and innovative .NET and C#: great framework, language, well accepted in enterprise MSSQL: good for non-RDF needs, well accepted in enterprise, SQL Express is free Stardog: great RDF database, fast and easy to use dotNetRDF: open source, talk to Stardog with ease
.NET is our platform but what about a foundation? Build or buy? Lots of debate and procrastination. All choices required similar development times Build your own: faster to prototype, most flexible, better ability to innovate Avoid politics of deciding between systems already in place [lotus quickr, teamsite, sharepoint] Generic .NET solution moves easily into whatever framework customer has/wants
One of our biggest problem was overcomplicating the ontology, e.g. answer questions Define goal : findability. build as you need it Don’t make it complicated, build as you need it. Treat ontology like content not code. build nice tools, prepare for it to change often. Biggest thing of all - don’t talk to users too much ontology (or tech in general). it’s only a means to an end. But selling its value to stakeholders can work.
Initially planned for dynamic menus based on role, but too complicated & unnecessary. Curated top menus based on user research, card sorting, etc worked best. Dynamic sub menus and related content links work. Friendly urls are often forgotten - good for experts, for sharing Beautiful page - UX - layouts - whitespace & margins- improve browsability and user satisfaction.
Don’t delay autocomplete, it improves search dramatically. Take your inputs and “snap them to a grid” to find an answer. Context is important, personalization is important Federation: include all types of results. Adaptive: build in your own analytics early on and use them for self diagnosis and improvement Beautiful results are easier to read.
Tagging: simple approach of picking “subject” (hasSubject) and “audience” (hasAudience) entities from a hierarchical view of select pieces of ontology. Expand to let them choose other relationships ( eg. hasDestination mars) Simple auto tagging recommendations by matching text; add more complex with tools like Open Calais? Inline analytics were very valuable tool for authors and mgmt. Of course, editor has to be great, as should entire admin -- too often ignored.
Must constantly improve - plan and budget for it early on. Start with a basic tool that looks great and has some semantics, prove it, grow it. People are used to constant improvement - internet, cars, etc. Focus on search, navigation, UX and performance.