SlideShare a Scribd company logo
1 of 41
Download to read offline
Audible Tech Talk
     23. April 2012




      Andraz Tori
    andraz@zemanta.com
          @andraz
Today's plan
• Short story of Zemanta
• The Zemanta technology
Where am I right now?
Wonders of modern
 communication
Ljubljana
Strip mine
• A system for Slovenian National television in 2006
• Closed captioning → web page for each episode of
each show
• Natural Langauge Processing, Information
Retrieval...
Start-up? Why not?

      v
Tour de Slovénie
Sales
Seedcamp

• First European program inspired by YC (2007)
• London based
• 3 months, 50.000 EUR / 10%
Roller coaster
12. August          Deadline
20. August          Shortlist
23. August          Phone interview
24. August          Results

3. September        London week start
7. September        London week end
16. September ==>   London
3 months in London
Back to Ljubljana
Back to Ljubljana
And then ...

• Figuring out US is our target market
• Figuring out where in US to be and who to have here
• Partnerships
• And naturally the business model
Technology
What do we do?
• Zemanta – Personal Writing Assistant
     - on your current platform
• While bloggers write we suggest:
     - images
     - related articles
     - in-text links
     - tags
Some stats

• 80k bloggers monthly
• 1.3 million posts enhanced in 2011
How does it work
• Natural Language Processing
• Big database of “meanings” (entities, concepts, topics)
• Word Sense Disambiguation
 • Linking out to Wikipedia, Freebase, …
 • Categorization, Named Entity Recognition


• Information Retrieval
 • Solr based, using features from NLP
 • With some twists
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
“Text Understanding”
- Input is meaningful chunk of text (not a keyword or a
phrase)
- Input is (semi) English language
- Has to work across all domains in the open world
- music, celebrities, finance, entertainment, politics,
gardening, parenting, …
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
Background knowledge
- Data from Wikipedia, MusicBrainz, Freebase… and the
  world wild web
- Includes linguistical and semantical properties and
  unstructured data
- Present in two forms:
  - in “original” custom built triple store on top of MySQL
    (150 GB)
  - processed into 7 GB optimized “memory mapped
    dump”
Analysis pipeline
                                    Known phrases
Named Entity
                                      extraction
 Extraction
                                    (aho-corasick)

                                                     Triple store
      Surface form features evaluation

          Statistical comparison to
           background knowledge


               Semantic coherence
                 and hand-tuned
                    heuristics


                                                         etc.

         Disambiguated entities
Indexed content



                                            Content
                                            suggestions
Plain text                 Semantic
 (article)   Analysis
                            search




             Background
             knowledge
Connecting content
• Indexing blogosphere and mediasphere
• Solr based index
 • Twist: complicated queries – 50 terms
• Filtering out spam is “fun”
• Probably best “related content” in terms of accuracy
• Coming soon: social signal
But why just for bloggers?

 Let's open up the API!
Some API users
Back to reality.
Age of “smart”
Blog me up, Scotty!
      23. April 2012
Some takeaways
• Accelerators are good
• World is getting flatter
          But it will never be flat
• Start monetizing soon – to learn, not to earn
• Be where your market is
• Many markets left to innovate in
Thank you!

More Related Content

Similar to Zemanta Tech Talk at Audible

Learning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsLearning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsSaltlux Inc.
 
Introduction
IntroductionIntroduction
Introductionsriniefs
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
Statistical Entity Linking
Statistical Entity LinkingStatistical Entity Linking
Statistical Entity LinkingPyDataParis
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagementSTIinnsbruck
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsZemanta
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semanticsAndraz Tori
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Heimo Hänninen
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
 
Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Guus Schreiber
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solrLucidworks (Archived)
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handoutsSTIinnsbruck
 
How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information accessGuus Schreiber
 
Knowledge Management inside Alfresco
Knowledge Management inside AlfrescoKnowledge Management inside Alfresco
Knowledge Management inside AlfrescoXeniT Solutions nv
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 

Similar to Zemanta Tech Talk at Audible (20)

Learning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog PostingsLearning Emergent Knowledge from Blog Postings
Learning Emergent Knowledge from Blog Postings
 
Introduction
IntroductionIntroduction
Introduction
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
Statistical Entity Linking
Statistical Entity LinkingStatistical Entity Linking
Statistical Entity Linking
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagement
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and Semantics
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semantics
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012Semantics and the Humanities: some lessons from my journey 2000-2012
Semantics and the Humanities: some lessons from my journey 2000-2012
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information access
 
Knowledge Management inside Alfresco
Knowledge Management inside AlfrescoKnowledge Management inside Alfresco
Knowledge Management inside Alfresco
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 

More from Andraz Tori

Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Andraz Tori
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequencyAndraz Tori
 
Future of content cration
Future of content crationFuture of content cration
Future of content crationAndraz Tori
 
Augmenting Content
Augmenting ContentAugmenting Content
Augmenting ContentAndraz Tori
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
#LjubljanaJeZakon
#LjubljanaJeZakon#LjubljanaJeZakon
#LjubljanaJeZakonAndraz Tori
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Andraz Tori
 
SemWeb install-fest presentation
SemWeb install-fest presentationSemWeb install-fest presentation
SemWeb install-fest presentationAndraz Tori
 
Beyond who else bought what
Beyond who else bought whatBeyond who else bought what
Beyond who else bought whatAndraz Tori
 

More from Andraz Tori (9)

Ljubljana je Zakon 2013
Ljubljana je Zakon 2013Ljubljana je Zakon 2013
Ljubljana je Zakon 2013
 
Triple your blog post frequency
Triple your blog post frequencyTriple your blog post frequency
Triple your blog post frequency
 
Future of content cration
Future of content crationFuture of content cration
Future of content cration
 
Augmenting Content
Augmenting ContentAugmenting Content
Augmenting Content
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
#LjubljanaJeZakon
#LjubljanaJeZakon#LjubljanaJeZakon
#LjubljanaJeZakon
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?
 
SemWeb install-fest presentation
SemWeb install-fest presentationSemWeb install-fest presentation
SemWeb install-fest presentation
 
Beyond who else bought what
Beyond who else bought whatBeyond who else bought what
Beyond who else bought what
 

Recently uploaded

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 

Recently uploaded (20)

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 

Zemanta Tech Talk at Audible

  • 1. Audible Tech Talk 23. April 2012 Andraz Tori andraz@zemanta.com @andraz
  • 2. Today's plan • Short story of Zemanta • The Zemanta technology
  • 3. Where am I right now?
  • 4. Wonders of modern communication
  • 6. Strip mine • A system for Slovenian National television in 2006 • Closed captioning → web page for each episode of each show • Natural Langauge Processing, Information Retrieval...
  • 10.
  • 11. Seedcamp • First European program inspired by YC (2007) • London based • 3 months, 50.000 EUR / 10%
  • 12.
  • 13. Roller coaster 12. August Deadline 20. August Shortlist 23. August Phone interview 24. August Results 3. September London week start 7. September London week end 16. September ==> London
  • 14. 3 months in London
  • 15.
  • 16.
  • 19.
  • 20. And then ... • Figuring out US is our target market • Figuring out where in US to be and who to have here • Partnerships • And naturally the business model
  • 22. What do we do? • Zemanta – Personal Writing Assistant - on your current platform • While bloggers write we suggest: - images - related articles - in-text links - tags
  • 23.
  • 24.
  • 25.
  • 26. Some stats • 80k bloggers monthly • 1.3 million posts enhanced in 2011
  • 27. How does it work • Natural Language Processing • Big database of “meanings” (entities, concepts, topics) • Word Sense Disambiguation • Linking out to Wikipedia, Freebase, … • Categorization, Named Entity Recognition • Information Retrieval • Solr based, using features from NLP • With some twists
  • 28. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 29. “Text Understanding” - Input is meaningful chunk of text (not a keyword or a phrase) - Input is (semi) English language - Has to work across all domains in the open world - music, celebrities, finance, entertainment, politics, gardening, parenting, …
  • 30. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 31. Background knowledge - Data from Wikipedia, MusicBrainz, Freebase… and the world wild web - Includes linguistical and semantical properties and unstructured data - Present in two forms: - in “original” custom built triple store on top of MySQL (150 GB) - processed into 7 GB optimized “memory mapped dump”
  • 32. Analysis pipeline Known phrases Named Entity extraction Extraction (aho-corasick) Triple store Surface form features evaluation Statistical comparison to background knowledge Semantic coherence and hand-tuned heuristics etc. Disambiguated entities
  • 33. Indexed content Content suggestions Plain text Semantic (article) Analysis search Background knowledge
  • 34. Connecting content • Indexing blogosphere and mediasphere • Solr based index • Twist: complicated queries – 50 terms • Filtering out spam is “fun” • Probably best “related content” in terms of accuracy • Coming soon: social signal
  • 35. But why just for bloggers? Let's open up the API!
  • 39. Blog me up, Scotty! 23. April 2012
  • 40. Some takeaways • Accelerators are good • World is getting flatter But it will never be flat • Start monetizing soon – to learn, not to earn • Be where your market is • Many markets left to innovate in