Your SlideShare is downloading. ×
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Building a super database from linked data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a super database from linked data

1,949

Published on

Stephen Wang http://stephenwang.com …

Stephen Wang http://stephenwang.com
Alivenotdead.com CTO
mongoDB Beijing Presentation (March 3, 2011):
From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,949
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
13
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Building a super database from linked data Stephen Wang 王傳仁 me@stephenwang.com March 3, 2011
  • 2. Who is this NOT for? Who IS this for? Building a large database from a tiny team Organizing the worlds information Information innovation
  • 3. About Co-founder, CTO Popular movie reviews web site Aggregated reviews, comprehensive film database
  • 4. The Stone Age  Static HTML templates  Editors read articles and pull quotations  Only cover the newest movies  ~1000 films
  • 5. Modern Times  Shift to LAMP  License long-tail database  Automated spiders, early UGC via critics(How I felt maintaining Rotten  Use homegrownTomatoes overloaded database servers) CMS for additional content
  • 6. vThe Result 8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
  • 7. About Co-founder, CTO SNS for artists started with Daniel Wu 吴彦祖 Started with six artists, now 1,600 artists, 600K registered users Also powers official web sites:李连杰: JetLi.com成龙: JackieChan.com莫文蔚: KarenMok.com
  • 8. Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!?Scalability issues with real-time data, but without traffic from public, long-tail content
  • 9. About A better entertainment database Providing the long- tail content Still a part of alivenotdead.com Still in alpha
  • 10. Features Comprehensive info for celebrities, films, music, and TV Searchable, structured data Multilingual: English, Chinese, Japanese Aggregated social media from inside/outside China
  • 11. Why use mongoDB?Flexible schema for different data sources Dozens of other sources...
  • 12. Why use Scalable big data 2 million+ topics  500,000 translations covered Next challenge: Aggregating and storing the social media firehose
  • 13. Why useCrossing the border... Alivenotdead.com  alive.tom.com in in Hong Kong TianjinUse replica sets/eventual consistency to overcome frequent cross-border network issues
  • 14. Using Linked Open Data Wikipedia as structured data Creative Commons license  Multiple CC sources  Organized taxonomy  Acquired by Google  No Chinese/Japanese yet!
  • 15. Using Linked Open Data Wikipedia as structured data Creative Commons license  Only Wikipedia  Messy taxonomy  Chinese/Japanese topic translations, but requires English topic link
  • 16. Using Linked Open Data Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
  • 17. The Future  Developer API  Topic extraction  Real-time trends across languages  Other verticalsAlready 10x more data than Rotten Tomatoes...The complete sum of information from across the web...Information not constrained by language...
  • 18. Were hiring PHP engineers! Send your CV to me@stephenwang.com My blog: http://stephenwang.com

×