Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a super database from linked data


Published on

Stephen Wang CTO
mongoDB Beijing Presentation (March 3, 2011):
From Rotten Tomatoes to to, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.

Published in: Technology, Education
  • Login to see the comments

Building a super database from linked data

  1. 1. Building a super database from linked data Stephen Wang 王傳仁 March 3, 2011
  2. 2. Who is this NOT for? Who IS this for? Building a large database from a tiny team Organizing the worlds information Information innovation
  3. 3. About Co-founder, CTO Popular movie reviews web site Aggregated reviews, comprehensive film database
  4. 4. The Stone Age  Static HTML templates  Editors read articles and pull quotations  Only cover the newest movies  ~1000 films
  5. 5. Modern Times  Shift to LAMP  License long-tail database  Automated spiders, early UGC via critics(How I felt maintaining Rotten  Use homegrownTomatoes overloaded database servers) CMS for additional content
  6. 6. vThe Result 8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
  7. 7. About Co-founder, CTO SNS for artists started with Daniel Wu 吴彦祖 Started with six artists, now 1,600 artists, 600K registered users Also powers official web sites:李连杰: JetLi.com成龙: JackieChan.com莫文蔚:
  8. 8. Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!?Scalability issues with real-time data, but without traffic from public, long-tail content
  9. 9. About A better entertainment database Providing the long- tail content Still a part of Still in alpha
  10. 10. Features Comprehensive info for celebrities, films, music, and TV Searchable, structured data Multilingual: English, Chinese, Japanese Aggregated social media from inside/outside China
  11. 11. Why use mongoDB?Flexible schema for different data sources Dozens of other sources...
  12. 12. Why use Scalable big data 2 million+ topics  500,000 translations covered Next challenge: Aggregating and storing the social media firehose
  13. 13. Why useCrossing the border...  in in Hong Kong TianjinUse replica sets/eventual consistency to overcome frequent cross-border network issues
  14. 14. Using Linked Open Data Wikipedia as structured data Creative Commons license  Multiple CC sources  Organized taxonomy  Acquired by Google  No Chinese/Japanese yet!
  15. 15. Using Linked Open Data Wikipedia as structured data Creative Commons license  Only Wikipedia  Messy taxonomy  Chinese/Japanese topic translations, but requires English topic link
  16. 16. Using Linked Open Data Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
  17. 17. The Future  Developer API  Topic extraction  Real-time trends across languages  Other verticalsAlready 10x more data than Rotten Tomatoes...The complete sum of information from across the web...Information not constrained by language...
  18. 18. Were hiring PHP engineers! Send your CV to My blog: