Semantic Web Meetup Munich
10 / 12 - Google Munich
PhD Topic: Web Information Extraction
to bootstrap Semantic E-commerce
Uwe Stoll
Uwe Stoll, Universität der Bundeswehr München
Motivation & Problem
2
✴ GoodRelations web vocabulary for e-
commerce is great!
✴ Rich snippets on SERPs
✴ Browser plugins
✴ Applications on Dataspaces
✴ already, ~10k shops use it
✴ but there are ~500k shops
98%
2%
GR :) no GR :(
Uwe Stoll, Universität der Bundeswehr München
Why is it still low?
✴ Deployment shop by shop, no centralized switching on
✴ some expertise needed
✴ Incentive limited mostly on SEO benefits
3
Uwe Stoll, Universität der Bundeswehr München
What can we do?
✴ Automate the problem
✴ Web information
extraction (WIE) FTW!
✴ Web information
extraction is a way to
get structured data
automatically out of
web sites
4
image: coursera
Uwe Stoll, Universität der Bundeswehr München
Pros & Cons of shop deployment and WIE for Semantic
E-commerce
5
Shop deployment
Web Information
Extraction
main need incentive computing power
market coverage rel. low pot. high
granularity high pot. low
publishing SW data decentralized basically centralized
Uwe Stoll, Universität der Bundeswehr München
How to imrove granularity
✴ build specific extractors that recognize shop software
systems
✴ use existing markup as a learning set for WIE
6
Uwe Stoll, Universität der Bundeswehr München
7
✴ build an API for shops to request structured markup for
their pages
ShopCache
Semantic E-
Commerce Web
Information
Extraction API
structured markup
How to improve publishing
Uwe Stoll, Universität der Bundeswehr München
Take-away
✴ Exploit Web Information
Extraction to grow the
GoodRelations piece of cake!
✴ Let machines work for men
again and not the other way
round!
8
94%
4%
2%
shop deployment WIE no GR :(
Thank you
Uwe Stoll
http://www.semantium.de
twitter: ustoll
stoll@semantium.de

Semantic Web Munich Meetup Fall 2012 - Web Information Extraction to bootstrap Semantic E-Commerce

  • 1.
    Semantic Web MeetupMunich 10 / 12 - Google Munich PhD Topic: Web Information Extraction to bootstrap Semantic E-commerce Uwe Stoll
  • 2.
    Uwe Stoll, Universitätder Bundeswehr München Motivation & Problem 2 ✴ GoodRelations web vocabulary for e- commerce is great! ✴ Rich snippets on SERPs ✴ Browser plugins ✴ Applications on Dataspaces ✴ already, ~10k shops use it ✴ but there are ~500k shops 98% 2% GR :) no GR :(
  • 3.
    Uwe Stoll, Universitätder Bundeswehr München Why is it still low? ✴ Deployment shop by shop, no centralized switching on ✴ some expertise needed ✴ Incentive limited mostly on SEO benefits 3
  • 4.
    Uwe Stoll, Universitätder Bundeswehr München What can we do? ✴ Automate the problem ✴ Web information extraction (WIE) FTW! ✴ Web information extraction is a way to get structured data automatically out of web sites 4 image: coursera
  • 5.
    Uwe Stoll, Universitätder Bundeswehr München Pros & Cons of shop deployment and WIE for Semantic E-commerce 5 Shop deployment Web Information Extraction main need incentive computing power market coverage rel. low pot. high granularity high pot. low publishing SW data decentralized basically centralized
  • 6.
    Uwe Stoll, Universitätder Bundeswehr München How to imrove granularity ✴ build specific extractors that recognize shop software systems ✴ use existing markup as a learning set for WIE 6
  • 7.
    Uwe Stoll, Universitätder Bundeswehr München 7 ✴ build an API for shops to request structured markup for their pages ShopCache Semantic E- Commerce Web Information Extraction API structured markup How to improve publishing
  • 8.
    Uwe Stoll, Universitätder Bundeswehr München Take-away ✴ Exploit Web Information Extraction to grow the GoodRelations piece of cake! ✴ Let machines work for men again and not the other way round! 8 94% 4% 2% shop deployment WIE no GR :(
  • 9.