SlideShare a Scribd company logo
1 of 18
Download to read offline
Building a super database from linked data




                           Stephen Wang 王傳仁
                           me@stephenwang.com
                                  March 3, 2011
Who is this NOT for?




              Who IS this for?

    Building a large database from a tiny team

    Organizing the world's information

    Information innovation
About

    Co-founder, CTO

    Popular movie reviews web site

    Aggregated reviews,
    comprehensive film database
The Stone Age

  
      Static HTML
      templates
  
      Editors read articles
      and pull quotations
  
      Only cover the
      newest movies
  
      ~1000 films
Modern Times

                                         
                                             Shift to LAMP
                                         
                                             License long-tail
                                             database
                                         
                                             Automated spiders,
                                             early UGC via critics

(How I felt maintaining Rotten
                                         
                                             Use homegrown
Tomatoes' overloaded database servers)       CMS for additional
                                             content
v




The Result

    8 million unique visitors / month

    Lean startup: 25x traffic with 7 staff

    Great site for film lovers (including Steve Jobs)
About

    Co-founder, CTO

    SNS for artists started
    with Daniel Wu 吴彦祖

    Started with six artists,
    now 1,600 artists,
    600K registered users

    Also powers official
    web sites:
李连杰: JetLi.com
成龙: JackieChan.com
莫文蔚: KarenMok.com
Our LAMP stack: Not the best setup for...
                         Newsfeeds...
                     Viral loop analysis...
                    Multivariate testing...


                   The Problem?!?
Scalability issues with real-time data, but without traffic from
                    public, long-tail content
About


    A better
    entertainment
    database

    Providing the long-
    tail content

    Still a part of
    alivenotdead.com

    Still in alpha
Features

    Comprehensive info
    for celebrities, films,
    music, and TV

    Searchable, structured
    data

    Multilingual: English,
    Chinese, Japanese

    Aggregated social
    media from
    inside/outside China
Why use mongoDB?

Flexible schema for different data sources




              Dozens of other sources...
Why use

           Scalable big data

    2 million+ topics   
                            500,000 translations
    covered

                            Next challenge:
                            Aggregating and
                            storing the social
                            media firehose
Why use

Crossing the border...

    Alivenotdead.com   
                           alive.tom.com in
    in Hong Kong           Tianjin




Use replica sets/eventual consistency to overcome
      frequent cross-border network issues
Using Linked Open Data

    Wikipedia as structured data

    Creative Commons license


                      
                          Multiple CC sources
                      
                          Organized taxonomy
                      
                          Acquired by Google
                      
                          No Chinese/Japanese yet!
Using Linked Open Data

    Wikipedia as structured data

    Creative Commons license


                         
                             Only Wikipedia
                         
                             Messy taxonomy
                         
                             Chinese/Japanese topic
                             translations, but requires
                             English topic link
Using Linked Open Data





    Use Freebase organized taxonomy, broad data

    Expand DBpedia to Chinese-only topics

    Same methodology across Chinese wiki sources
The Future
                                
                                    Developer API
                                
                                    Topic extraction
                                
                                    Real-time trends
                                    across languages
                                
                                    Other verticals

Already 10x more data than Rotten Tomatoes...
The complete sum of information from across the web...
Information not constrained by language...
We're hiring PHP engineers! Send your CV to
          me@stephenwang.com
    My blog: http://stephenwang.com

More Related Content

Similar to Building a super database from linked data

HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project Jie Bao
 
Library 2.0: A New Version for the Future
Library 2.0: A New Version for the FutureLibrary 2.0: A New Version for the Future
Library 2.0: A New Version for the Futurepddsnn
 
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017BigchainDB
 
Towards social webtops using semantic wiki
Towards social webtops using semantic wikiTowards social webtops using semantic wiki
Towards social webtops using semantic wikiJie Bao
 
Using Semantic Wiki as a Semantic Web Workbench
Using Semantic Wiki as a Semantic Web WorkbenchUsing Semantic Wiki as a Semantic Web Workbench
Using Semantic Wiki as a Semantic Web WorkbenchJie Bao
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionNitin Godawat
 
Semantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation WebSemantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation Webajithranabahu
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeAndrea Volpini
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatilap
 
Open Content Library LGM 2007
Open Content Library LGM 2007Open Content Library LGM 2007
Open Content Library LGM 2007Jon Phillips
 
IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012Tom-Cramer
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...John Breslin
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsJie Bao
 
medstream2.ppt
medstream2.pptmedstream2.ppt
medstream2.pptVideoguy
 
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...Fasten Project
 

Similar to Building a super database from linked data (20)

HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
Library 2.0: A New Version for the Future
Library 2.0: A New Version for the FutureLibrary 2.0: A New Version for the Future
Library 2.0: A New Version for the Future
 
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 
Towards social webtops using semantic wiki
Towards social webtops using semantic wikiTowards social webtops using semantic wiki
Towards social webtops using semantic wiki
 
Using Semantic Wiki as a Semantic Web Workbench
Using Semantic Wiki as a Semantic Web WorkbenchUsing Semantic Wiki as a Semantic Web Workbench
Using Semantic Wiki as a Semantic Web Workbench
 
Pipe dreams
Pipe dreamsPipe dreams
Pipe dreams
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming Revolution
 
Semantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation WebSemantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation Web
 
slides
slidesslides
slides
 
Web 3.0 Emerging
Web 3.0 EmergingWeb 3.0 Emerging
Web 3.0 Emerging
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in Rome
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatil
 
Open Content Library LGM 2007
Open Content Library LGM 2007Open Content Library LGM 2007
Open Content Library LGM 2007
 
IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012IIIF: International Image Interoperability Framework @ DLF2012
IIIF: International Image Interoperability Framework @ DLF2012
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
medstream2.ppt
medstream2.pptmedstream2.ppt
medstream2.ppt
 
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...
Software Ecosystems as Networks - Advances on the FASTEN project, Paolo Boldi...
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Building a super database from linked data

  • 1. Building a super database from linked data Stephen Wang 王傳仁 me@stephenwang.com March 3, 2011
  • 2. Who is this NOT for? Who IS this for?  Building a large database from a tiny team  Organizing the world's information  Information innovation
  • 3. About  Co-founder, CTO  Popular movie reviews web site  Aggregated reviews, comprehensive film database
  • 4. The Stone Age  Static HTML templates  Editors read articles and pull quotations  Only cover the newest movies  ~1000 films
  • 5. Modern Times  Shift to LAMP  License long-tail database  Automated spiders, early UGC via critics (How I felt maintaining Rotten  Use homegrown Tomatoes' overloaded database servers) CMS for additional content
  • 6. v The Result  8 million unique visitors / month  Lean startup: 25x traffic with 7 staff  Great site for film lovers (including Steve Jobs)
  • 7. About  Co-founder, CTO  SNS for artists started with Daniel Wu 吴彦祖  Started with six artists, now 1,600 artists, 600K registered users  Also powers official web sites: 李连杰: JetLi.com 成龙: JackieChan.com 莫文蔚: KarenMok.com
  • 8. Our LAMP stack: Not the best setup for... Newsfeeds... Viral loop analysis... Multivariate testing... The Problem?!? Scalability issues with real-time data, but without traffic from public, long-tail content
  • 9. About  A better entertainment database  Providing the long- tail content  Still a part of alivenotdead.com  Still in alpha
  • 10. Features  Comprehensive info for celebrities, films, music, and TV  Searchable, structured data  Multilingual: English, Chinese, Japanese  Aggregated social media from inside/outside China
  • 11. Why use mongoDB? Flexible schema for different data sources Dozens of other sources...
  • 12. Why use Scalable big data  2 million+ topics  500,000 translations covered Next challenge: Aggregating and storing the social media firehose
  • 13. Why use Crossing the border...  Alivenotdead.com  alive.tom.com in in Hong Kong Tianjin Use replica sets/eventual consistency to overcome frequent cross-border network issues
  • 14. Using Linked Open Data  Wikipedia as structured data  Creative Commons license  Multiple CC sources  Organized taxonomy  Acquired by Google  No Chinese/Japanese yet!
  • 15. Using Linked Open Data  Wikipedia as structured data  Creative Commons license  Only Wikipedia  Messy taxonomy  Chinese/Japanese topic translations, but requires English topic link
  • 16. Using Linked Open Data  Use Freebase organized taxonomy, broad data  Expand DBpedia to Chinese-only topics  Same methodology across Chinese wiki sources
  • 17. The Future  Developer API  Topic extraction  Real-time trends across languages  Other verticals Already 10x more data than Rotten Tomatoes... The complete sum of information from across the web... Information not constrained by language...
  • 18. We're hiring PHP engineers! Send your CV to me@stephenwang.com My blog: http://stephenwang.com