Building a super database from linked data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Building a super database from linked data

on

  • 2,256 views

Stephen Wang http://stephenwang.com...

Stephen Wang http://stephenwang.com
Alivenotdead.com CTO
mongoDB Beijing Presentation (March 3, 2011):
From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.

Statistics

Views

Total Views
2,256
Views on SlideShare
1,923
Embed Views
333

Actions

Likes
3
Downloads
13
Comments
0

3 Embeds 333

http://stephenwang.com 329
http://webcache.googleusercontent.com 3
http://perevod.yandex.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building a super database from linked data Presentation Transcript

  • 1. Building a super database from linked data Stephen Wang 王傳仁 me@stephenwang.com March 3, 2011
  • 2. Who is this NOT for? Who IS this for? Building a large database from a tiny team Organizing the worlds information Information innovation
  • 3. About Co-founder, CTO Popular movie reviews web site Aggregated reviews, comprehensive film database
  • 4. The Stone Age  Static HTML templates  Editors read articles and pull quotations  Only cover the newest movies  ~1000 films
  • 5. Modern Times  Shift to LAMP  License long-tail database  Automated spiders, early UGC via critics(How I felt maintaining Rotten  Use homegrownTomatoes overloaded database servers) CMS for additional content
  • 6. vThe Result 8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
  • 7. About Co-founder, CTO SNS for artists started with Daniel Wu 吴彦祖 Started with six artists, now 1,600 artists, 600K registered users Also powers official web sites:李连杰: JetLi.com成龙: JackieChan.com莫文蔚: KarenMok.com
  • 8. Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!?Scalability issues with real-time data, but without traffic from public, long-tail content
  • 9. About A better entertainment database Providing the long- tail content Still a part of alivenotdead.com Still in alpha
  • 10. Features Comprehensive info for celebrities, films, music, and TV Searchable, structured data Multilingual: English, Chinese, Japanese Aggregated social media from inside/outside China
  • 11. Why use mongoDB?Flexible schema for different data sources Dozens of other sources...
  • 12. Why use Scalable big data 2 million+ topics  500,000 translations covered Next challenge: Aggregating and storing the social media firehose
  • 13. Why useCrossing the border... Alivenotdead.com  alive.tom.com in in Hong Kong TianjinUse replica sets/eventual consistency to overcome frequent cross-border network issues
  • 14. Using Linked Open Data Wikipedia as structured data Creative Commons license  Multiple CC sources  Organized taxonomy  Acquired by Google  No Chinese/Japanese yet!
  • 15. Using Linked Open Data Wikipedia as structured data Creative Commons license  Only Wikipedia  Messy taxonomy  Chinese/Japanese topic translations, but requires English topic link
  • 16. Using Linked Open Data Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
  • 17. The Future  Developer API  Topic extraction  Real-time trends across languages  Other verticalsAlready 10x more data than Rotten Tomatoes...The complete sum of information from across the web...Information not constrained by language...
  • 18. Were hiring PHP engineers! Send your CV to me@stephenwang.com My blog: http://stephenwang.com