SemWeb install-fest presentation
Upcoming SlideShare
Loading in...5
×
 

SemWeb install-fest presentation

on

  • 2,517 views

Presentation for developers about what Zemanta API can do for you.

Presentation for developers about what Zemanta API can do for you.

Statistics

Views

Total Views
2,517
Views on SlideShare
2,508
Embed Views
9

Actions

Likes
5
Downloads
12
Comments
0

5 Embeds 9

http://delirium-gr.blogspot.com 2
http://maurice.vanderfeesten.name 2
http://web.archive.org 2
http://www.linkedin.com 2
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cross domain – we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer
  • We are big fans of Freebase and Linking Open Data project
  • Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

SemWeb install-fest presentation SemWeb install-fest presentation Presentation Transcript

  • Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
  • Overview
    • General purpose
    • Functionality
    • Examples, demos & use-cases
  • What does it do?
  • A Stargate Computer Processable Data Human Understandable Text
  • Initial design
    • Input: a chunk of text
    • Domain agnostic!
    • Avoid proprietary entity identifiers or taxonomies
    • Standard response formats: JSON, XML, RDF/XML
  • What gives?
    • Tags
    • Categories
    • Concepts and entities
    • Related articles
    • Related images
    Most used Most interesting Most obvious
  • Tags
    • Words, phrases
    • „ Interesting“ tags
      • Explicitely mentioned
      • What the text is about as a whole
      • What concepts were not mentioned, but could be relevant (for SEO)
  • Categories
    • Deep hirarchy (100k categories)
    • Customized smaller taxonomies
    • Good for content organization, ad-targeting, etc
  • Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • Categories
    • Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
    • Top/Arts/Comics/Reviews (0.10)
    • Top/Society/History/By_Time_Period (0.08)
    • Top/Arts/Comics (0.08)
    • Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
    • Top/Society/History (0.08)
    • Top/Shopping/Publications/Books (0.08)
    • Top/Shopping/Publications/Books/Fiction (0.08)
  • Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • Concepts and entities
    • Identify relevant concepts and entities
    • All disambiguated!
    • At least one URL for each concept, possibly more
  • How we disambiguate
    • Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
    • Mine the web
    • Use knowledge from choices of our users
    • Use both semantic data and statistics based methods
  • Linking to...
    • Traditional
    • Semantic
    ... ... ...
  • How to build upon this
    • Step 1 : We give you exact identifiers
    • Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
    • Step 3 : ?
    • Step 4 : Profit!
  • Discovery example
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
    • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
    • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.
  • You get
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
    • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
    • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike .
    entities concepts
  • Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
  • You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
  • Or more complex ones...
  • Concepts and entities use cases
    • Quick 'overviews' of topics
    • Discovery-supporting user interfaces
    • Automatic deep information delivery (hoovers, widgets)
  • Balloons example
    • Deliver deep information on exact concepts and entities
  • Fantastic public graph
    • Information about concepts/entities
    • Types: human, building, location...
    • Relationships with other entities
    • Hard data: dates, places, amounts
  • Connected Dream? September 2008
  • Connected Dream? July 2009
  • Opportunities in leveraging linked data
    • There are internal and external benefits of linking into larger pool of exact data
    • Pulling together custom data becomes orders of magnitude easier
    • However we still miss strong success stories
  • Related articles
    • 20k blogs and media sites
    • You can provide your own list of feeds to recommend from
    • Or use our 'global whitelisted pool'
  • Related articles use cases
    • Better experience for the readers
    • Information discovery (for authors)
    • Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)
  • Related images
    • From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
    • We filter totally unacceptable licenses out, keep the rest
    • Each image has a license spelled out, developer/author choses
  • Zemanta API
    • http://developer.zemanta.com
    • Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
    • JavaScript SDK for quick custom CMS integration
    • Up to 10.000 requests/day free!
  • Ease of API use
    • import urllib, simplejson, pprint
    • args = {'format': 'json',
    • 'method': 'zemanta.suggest',
    • 'api_key': 'np9cbnby9x8tsc47recwuhqm',
    • 'return_categories': 'dmoz',
    • 'return_rdf_links': 1,
    • 'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
    • '''}
    • args_enc = urllib.urlencode(args)
    • response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
    • response = simplejson.loads(response_raw)
    • pprint.pprint(response)
  • Works for
    • All kinds of texts (not just financial or journalistic articles)
    • Tweets!
    • Wherever you need to go from text documents to something structured to put into your algorithm/data store
  • Some API users
  • How the API is used?
    • Place extraction and disambiguation – used by Outside.in
    • Analysis of tweets – used by Klout.net
    • Custom categorization – used by Slideshare
    • Semantic tagging – used by Faviki
  • CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag
    • Exact tagging
    • RDFa as a transport layer
    • Freebase & LOD as vocabularies
    • Full-circle ecosystem from day one (publishers, services, better search, better browsing)
  • The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
  • The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
  • Image attributions
    • http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare