Stephen Wang http://stephenwang.com Alivenotdead.com CTO mongoDB Beijing Presentation (March 3, 2011)： From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.
Building a super database from linked data Stephen Wang 王傳仁 email@example.com March 3, 2011
Who is this NOT for? Who IS this for? Building a large database from a tiny team Organizing the worlds information Information innovation
About Co-founder, CTO Popular movie reviews web site Aggregated reviews, comprehensive film database
The Stone Age Static HTML templates Editors read articles and pull quotations Only cover the newest movies ~1000 films
Modern Times Shift to LAMP License long-tail database Automated spiders, early UGC via critics(How I felt maintaining Rotten Use homegrownTomatoes overloaded database servers) CMS for additional content
vThe Result 8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
About Co-founder, CTO SNS for artists started with Daniel Wu 吴彦祖 Started with six artists, now 1,600 artists, 600K registered users Also powers official web sites:李连杰： JetLi.com成龙： JackieChan.com莫文蔚： KarenMok.com
Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!?Scalability issues with real-time data, but without traffic from public, long-tail content
About A better entertainment database Providing the long- tail content Still a part of alivenotdead.com Still in alpha
Features Comprehensive info for celebrities, films, music, and TV Searchable, structured data Multilingual: English, Chinese, Japanese Aggregated social media from inside/outside China
Why use mongoDB?Flexible schema for different data sources Dozens of other sources...
Why use Scalable big data 2 million+ topics 500,000 translations covered Next challenge: Aggregating and storing the social media firehose
Why useCrossing the border... Alivenotdead.com alive.tom.com in in Hong Kong TianjinUse replica sets/eventual consistency to overcome frequent cross-border network issues
Using Linked Open Data Wikipedia as structured data Creative Commons license Multiple CC sources Organized taxonomy Acquired by Google No Chinese/Japanese yet!
Using Linked Open Data Wikipedia as structured data Creative Commons license Only Wikipedia Messy taxonomy Chinese/Japanese topic translations, but requires English topic link
Using Linked Open Data Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
The Future Developer API Topic extraction Real-time trends across languages Other verticalsAlready 10x more data than Rotten Tomatoes...The complete sum of information from across the web...Information not constrained by language...
Were hiring PHP engineers! Send your CV to firstname.lastname@example.org My blog: http://stephenwang.com