SemWeb install-fest presentation
Upcoming SlideShare
Loading in...5
×
 

SemWeb install-fest presentation

on

  • 2,489 views

Presentation for developers about what Zemanta API can do for you.

Presentation for developers about what Zemanta API can do for you.

Statistics

Views

Total Views
2,489
Slideshare-icon Views on SlideShare
2,480
Embed Views
9

Actions

Likes
5
Downloads
12
Comments
0

5 Embeds 9

http://delirium-gr.blogspot.com 2
http://maurice.vanderfeesten.name 2
http://web.archive.org 2
http://www.linkedin.com 2
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cross domain – we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer
  • We are big fans of Freebase and Linking Open Data project
  • Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

SemWeb install-fest presentation SemWeb install-fest presentation Presentation Transcript

  • Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
  • Overview
    • General purpose
    • Functionality
    • Examples, demos & use-cases
  • What does it do?
  • A Stargate Computer Processable Data Human Understandable Text
  • Initial design
    • Input: a chunk of text
    • Domain agnostic!
    • Avoid proprietary entity identifiers or taxonomies
    • Standard response formats: JSON, XML, RDF/XML
  • What gives?
    • Tags
    • Categories
    • Concepts and entities
    • Related articles
    • Related images
    Most used Most interesting Most obvious
  • Tags
    • Words, phrases
    • „ Interesting“ tags
      • Explicitely mentioned
      • What the text is about as a whole
      • What concepts were not mentioned, but could be relevant (for SEO)
  • Categories
    • Deep hirarchy (100k categories)
    • Customized smaller taxonomies
    • Good for content organization, ad-targeting, etc
  • Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • Categories
    • Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
    • Top/Arts/Comics/Reviews (0.10)
    • Top/Society/History/By_Time_Period (0.08)
    • Top/Arts/Comics (0.08)
    • Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
    • Top/Society/History (0.08)
    • Top/Shopping/Publications/Books (0.08)
    • Top/Shopping/Publications/Books/Fiction (0.08)
  • Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • Concepts and entities
    • Identify relevant concepts and entities
    • All disambiguated!
    • At least one URL for each concept, possibly more
  • How we disambiguate
    • Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
    • Mine the web
    • Use knowledge from choices of our users
    • Use both semantic data and statistics based methods
  • Linking to...
    • Traditional
    • Semantic
    ... ... ...
  • How to build upon this
    • Step 1 : We give you exact identifiers
    • Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
    • Step 3 : ?
    • Step 4 : Profit!
  • Discovery example
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
    • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
    • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.
  • You get
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
    • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
    • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike .
    entities concepts
  • Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
  • You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
  • Or more complex ones...
  • Concepts and entities use cases
    • Quick 'overviews' of topics
    • Discovery-supporting user interfaces
    • Automatic deep information delivery (hoovers, widgets)
  • Balloons example
    • Deliver deep information on exact concepts and entities
  • Fantastic public graph
    • Information about concepts/entities
    • Types: human, building, location...
    • Relationships with other entities
    • Hard data: dates, places, amounts
  • Connected Dream? September 2008
  • Connected Dream? July 2009
  • Opportunities in leveraging linked data
    • There are internal and external benefits of linking into larger pool of exact data
    • Pulling together custom data becomes orders of magnitude easier
    • However we still miss strong success stories
  • Related articles
    • 20k blogs and media sites
    • You can provide your own list of feeds to recommend from
    • Or use our 'global whitelisted pool'
  • Related articles use cases
    • Better experience for the readers
    • Information discovery (for authors)
    • Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)
  • Related images
    • From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
    • We filter totally unacceptable licenses out, keep the rest
    • Each image has a license spelled out, developer/author choses
  • Zemanta API
    • http://developer.zemanta.com
    • Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
    • JavaScript SDK for quick custom CMS integration
    • Up to 10.000 requests/day free!
  • Ease of API use
    • import urllib, simplejson, pprint
    • args = {'format': 'json',
    • 'method': 'zemanta.suggest',
    • 'api_key': 'np9cbnby9x8tsc47recwuhqm',
    • 'return_categories': 'dmoz',
    • 'return_rdf_links': 1,
    • 'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
    • '''}
    • args_enc = urllib.urlencode(args)
    • response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
    • response = simplejson.loads(response_raw)
    • pprint.pprint(response)
  • Works for
    • All kinds of texts (not just financial or journalistic articles)
    • Tweets!
    • Wherever you need to go from text documents to something structured to put into your algorithm/data store
  • Some API users
  • How the API is used?
    • Place extraction and disambiguation – used by Outside.in
    • Analysis of tweets – used by Klout.net
    • Custom categorization – used by Slideshare
    • Semantic tagging – used by Faviki
  • CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag
    • Exact tagging
    • RDFa as a transport layer
    • Freebase & LOD as vocabularies
    • Full-circle ecosystem from day one (publishers, services, better search, better browsing)
  • The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
  • The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
  • Image attributions
    • http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare