0
Building a super database from linked data                           Stephen Wang 王傳仁                           me@stephen...
Who is this NOT for?              Who IS this for?    Building a large database from a tiny team    Organizing the world...
About    Co-founder, CTO    Popular movie reviews web site    Aggregated reviews,    comprehensive film database
The Stone Age        Static HTML      templates        Editors read articles      and pull quotations        Only cover...
Modern Times                                                                                      Shift to LAMP          ...
vThe Result    8 million unique visitors / month    Lean startup: 25x traffic with 7 staff    Great site for film lover...
About    Co-founder, CTO    SNS for artists started    with Daniel Wu 吴彦祖    Started with six artists,    now 1,600 art...
Our LAMP stack: Not the best setup for...                         Newsfeeds...                     Viral loop analysis... ...
About    A better    entertainment    database    Providing the long-    tail content    Still a part of    alivenotdea...
Features    Comprehensive info    for celebrities, films,    music, and TV    Searchable, structured    data    Multili...
Why use mongoDB?Flexible schema for different data sources              Dozens of other sources...
Why use           Scalable big data    2 million+ topics                               500,000 translations    covered  ...
Why useCrossing the border...    Alivenotdead.com                              alive.tom.com in    in Hong Kong         ...
Using Linked Open Data    Wikipedia as structured data    Creative Commons license                                     ...
Using Linked Open Data    Wikipedia as structured data    Creative Commons license                                     ...
Using Linked Open Data    Use Freebase organized taxonomy, broad data    Expand DBpedia to Chinese-only topics    Same ...
The Future                                                                    Developer API                              ...
Were hiring PHP engineers! Send your CV to          me@stephenwang.com    My blog: http://stephenwang.com
Upcoming SlideShare
Loading in...5
×

Building a super database from linked data

1,966

Published on

Stephen Wang http://stephenwang.com
Alivenotdead.com CTO
mongoDB Beijing Presentation (March 3, 2011):
From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,966
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Building a super database from linked data"

  1. 1. Building a super database from linked data Stephen Wang 王傳仁 me@stephenwang.com March 3, 2011
  2. 2. Who is this NOT for? Who IS this for? Building a large database from a tiny team Organizing the worlds information Information innovation
  3. 3. About Co-founder, CTO Popular movie reviews web site Aggregated reviews, comprehensive film database
  4. 4. The Stone Age  Static HTML templates  Editors read articles and pull quotations  Only cover the newest movies  ~1000 films
  5. 5. Modern Times  Shift to LAMP  License long-tail database  Automated spiders, early UGC via critics(How I felt maintaining Rotten  Use homegrownTomatoes overloaded database servers) CMS for additional content
  6. 6. vThe Result 8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
  7. 7. About Co-founder, CTO SNS for artists started with Daniel Wu 吴彦祖 Started with six artists, now 1,600 artists, 600K registered users Also powers official web sites:李连杰: JetLi.com成龙: JackieChan.com莫文蔚: KarenMok.com
  8. 8. Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!?Scalability issues with real-time data, but without traffic from public, long-tail content
  9. 9. About A better entertainment database Providing the long- tail content Still a part of alivenotdead.com Still in alpha
  10. 10. Features Comprehensive info for celebrities, films, music, and TV Searchable, structured data Multilingual: English, Chinese, Japanese Aggregated social media from inside/outside China
  11. 11. Why use mongoDB?Flexible schema for different data sources Dozens of other sources...
  12. 12. Why use Scalable big data 2 million+ topics  500,000 translations covered Next challenge: Aggregating and storing the social media firehose
  13. 13. Why useCrossing the border... Alivenotdead.com  alive.tom.com in in Hong Kong TianjinUse replica sets/eventual consistency to overcome frequent cross-border network issues
  14. 14. Using Linked Open Data Wikipedia as structured data Creative Commons license  Multiple CC sources  Organized taxonomy  Acquired by Google  No Chinese/Japanese yet!
  15. 15. Using Linked Open Data Wikipedia as structured data Creative Commons license  Only Wikipedia  Messy taxonomy  Chinese/Japanese topic translations, but requires English topic link
  16. 16. Using Linked Open Data Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
  17. 17. The Future  Developer API  Topic extraction  Real-time trends across languages  Other verticalsAlready 10x more data than Rotten Tomatoes...The complete sum of information from across the web...Information not constrained by language...
  18. 18. Were hiring PHP engineers! Send your CV to me@stephenwang.com My blog: http://stephenwang.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×