My Experiences Attempting to Scale the Semantic Web


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

My Experiences Attempting to Scale the Semantic Web

  1. 1. My Experiences Attempting to Scale the Semantic Web John Clarke Mills, Engineer at Radar Networks ’07 - ’09
  2. 2. • Social bookmarking website based around topics called “Twines”, a mashup of Delicious and Facebook • Users could bookmark, email, and upload documents, text, and links into Twine • Information would then be mined, urls followed, and the subsequent text would be turned into tags • Privacy model based on UNIX-like permissions • Connections between people would generate a newsfeed like Facebook
  3. 3. Technology • Based on RDF and OWL • Largely distributed system • Triple store enforced in code on top of Postgres • Built on open source Java technologies • Mina, ActiveMQ, Jetty, SOLR • Text analysis software from Expert Systems • Largely cached at many levels • Memcached, homegrown distributed coherency cache • Home grown MVC - routing, rendering, templating • Very introspective • Even controllers were first class objects defined in triple What do you call two Java engineers in a room?
  4. 4. Architecture
  5. 5. Problems Encountered • Many self joins • Deleting • Object graph caching
  6. 6. Many Self Joins
  7. 7. Deleting • Many Semantic Web folks don’t believe in deleting, unfortunately consumers do • How do you go about deleting objects that are a construct of code not enforced in the database?
  8. 8. Object Graph Caching • The “Facebook problem” • Every user has a different view on the world • Massive cache tiers become necessary
  9. 9. Does the Semantic Web Scale? • Yes, but only for a few ontologies in limited domain, but not at web scale! • Just as any high availability site doesn’t run on a single database implementation neither should a Sematic Web product Does a Triple Store Scale? • Yes, of course it does. • Facebook, the health care industry, IBM, many of the worlds governments, and some of you have successfully employed Semantic Web technologies
  10. 10. Shard, Duplicate, Replicate, Hack • As with any large scale webapp, 3rd normal form goes out the door very quickly once scale happens • Shard in sometimes awkward ways based on user behavior • Replicate data in different nodes for performance • Offload work for later processing Why should the Semantic Web be any different than any other large scale architecture?
  11. 11. Academia Must Be Intersected with Web Scale Engineering• Make concessions on both sides • Only use triples where triples are needed • Store inferenced triples for increased performance • Rely on other data storage mechanisms for metadata What is the actual problem you are trying to solve?