Your SlideShare is downloading. ×
0
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
My Experiences Attempting to Scale the Semantic Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

My Experiences Attempting to Scale the Semantic Web

4,548

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,548
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. My Experiences Attempting to Scale the Semantic Web John Clarke Mills, Engineer at Radar Networks ’07 - ’09 www.designmills.com
  • 2. Twine.com • Social bookmarking website based around topics called “Twines”, a mashup of Delicious and Facebook • Users could bookmark, email, and upload documents, text, and links into Twine • Information would then be mined, urls followed, and the subsequent text would be turned into tags • Privacy model based on UNIX-like permissions • Connections between people would generate a newsfeed like Facebook
  • 3. Technology • Based on RDF and OWL • Largely distributed system • Triple store enforced in code on top of Postgres • Built on open source Java technologies • Mina, ActiveMQ, Jetty, SOLR • Text analysis software from Expert Systems • Largely cached at many levels • Memcached, homegrown distributed coherency cache • Home grown MVC - routing, rendering, templating • Very introspective • Even controllers were first class objects defined in triple What do you call two Java engineers in a room?
  • 4. Architecture
  • 5. Problems Encountered • Many self joins • Deleting • Object graph caching
  • 6. Many Self Joins
  • 7. Deleting • Many Semantic Web folks don’t believe in deleting, unfortunately consumers do • How do you go about deleting objects that are a construct of code not enforced in the database?
  • 8. Object Graph Caching • The “Facebook problem” • Every user has a different view on the world • Massive cache tiers become necessary
  • 9. Does the Semantic Web Scale? • Yes, but only for a few ontologies in limited domain, but not at web scale! • Just as any high availability site doesn’t run on a single database implementation neither should a Sematic Web product Does a Triple Store Scale? • Yes, of course it does. • Facebook, the health care industry, IBM, many of the worlds governments, and some of you have successfully employed Semantic Web technologies
  • 10. Shard, Duplicate, Replicate, Hack • As with any large scale webapp, 3rd normal form goes out the door very quickly once scale happens • Shard in sometimes awkward ways based on user behavior • Replicate data in different nodes for performance • Offload work for later processing Why should the Semantic Web be any different than any other large scale architecture?
  • 11. Academia Must Be Intersected with Web Scale Engineering• Make concessions on both sides • Only use triples where triples are needed • Store inferenced triples for increased performance • Rely on other data storage mechanisms for metadata What is the actual problem you are trying to solve?

×