SemWeb install-fest presentation

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Cross domain – we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities

    Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.

    Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.

    Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

    We are big fans of Freebase and Linking Open Data project

    Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase

    Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

    4 Favorites

    SemWeb install-fest presentation - Presentation Transcript

    1. Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
    2. Overview
      • General purpose
      • Functionality
      • Examples, demos & use-cases
    3. What does it do?
    4. A Stargate Computer Processable Data Human Understandable Text
    5. Initial design
      • Input: a chunk of text
      • Domain agnostic!
      • Avoid proprietary entity identifiers or taxonomies
      • Standard response formats: JSON, XML, RDF/XML
    6. What gives?
      • Tags
      • Categories
      • Concepts and entities
      • Related articles
      • Related images
      Most used Most interesting Most obvious
    7. Tags
      • Words, phrases
      • „ Interesting“ tags
        • Explicitely mentioned
        • What the text is about as a whole
        • What concepts were not mentioned, but could be relevant (for SEO)
    8. Categories
      • Deep hirarchy (100k categories)
      • Customized smaller taxonomies
      • Good for content organization, ad-targeting, etc
    9. Categories example
      • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
      • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
      • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
    10. Categories
      • Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
      • Top/Arts/Comics/Reviews (0.10)
      • Top/Society/History/By_Time_Period (0.08)
      • Top/Arts/Comics (0.08)
      • Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
      • Top/Society/History (0.08)
      • Top/Shopping/Publications/Books (0.08)
      • Top/Shopping/Publications/Books/Fiction (0.08)
    11. Categories example
      • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
      • An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
      • First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
    12. Concepts and entities
      • Identify relevant concepts and entities
      • All disambiguated!
      • At least one URL for each concept, possibly more
    13. How we disambiguate
      • Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
      • Mine the web
      • Use knowledge from choices of our users
      • Use both semantic data and statistics based methods
    14. Linking to...
      • Traditional
      • Semantic
      ... ... ...
    15. How to build upon this
      • Step 1 : We give you exact identifiers
      • Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
      • Step 3 : ?
      • Step 4 : Profit!
    16. Discovery example
      • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
      • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
      • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
      • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.
    17. You get
      • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
      • Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
      • The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
      • It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike .
      entities concepts
    18. Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
    19. You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
    20. Or more complex ones...
    21. Concepts and entities use cases
      • Quick 'overviews' of topics
      • Discovery-supporting user interfaces
      • Automatic deep information delivery (hoovers, widgets)
    22. Balloons example
      • Deliver deep information on exact concepts and entities
    23. Fantastic public graph
      • Information about concepts/entities
      • Types: human, building, location...
      • Relationships with other entities
      • Hard data: dates, places, amounts
    24. Connected Dream? September 2008
    25. Connected Dream? July 2009
    26. Opportunities in leveraging linked data
      • There are internal and external benefits of linking into larger pool of exact data
      • Pulling together custom data becomes orders of magnitude easier
      • However we still miss strong success stories
    27. Related articles
      • 20k blogs and media sites
      • You can provide your own list of feeds to recommend from
      • Or use our 'global whitelisted pool'
    28. Related articles use cases
      • Better experience for the readers
      • Information discovery (for authors)
      • Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)
    29. Related images
      • From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
      • We filter totally unacceptable licenses out, keep the rest
      • Each image has a license spelled out, developer/author choses
    30. Zemanta API
      • http://developer.zemanta.com
      • Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
      • JavaScript SDK for quick custom CMS integration
      • Up to 10.000 requests/day free!
    31. Ease of API use
      • import urllib, simplejson, pprint
      • args = {'format': 'json',
      • 'method': 'zemanta.suggest',
      • 'api_key': 'np9cbnby9x8tsc47recwuhqm',
      • 'return_categories': 'dmoz',
      • 'return_rdf_links': 1,
      • 'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
      • '''}
      • args_enc = urllib.urlencode(args)
      • response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
      • response = simplejson.loads(response_raw)
      • pprint.pprint(response)
    32. Works for
      • All kinds of texts (not just financial or journalistic articles)
      • Tweets!
      • Wherever you need to go from text documents to something structured to put into your algorithm/data store
    33. Some API users
    34. How the API is used?
      • Place extraction and disambiguation – used by Outside.in
      • Analysis of tweets – used by Klout.net
      • Custom categorization – used by Slideshare
      • Semantic tagging – used by Faviki
    35. CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag
      • Exact tagging
      • RDFa as a transport layer
      • Freebase & LOD as vocabularies
      • Full-circle ecosystem from day one (publishers, services, better search, better browsing)
    36. The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
    37. The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
    38. Image attributions
      • http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare

    + Andraz ToriAndraz Tori, 1 month ago

    custom

    905 views, 4 favs, 1 embeds more stats

    Presentation for developers about what Zemanta API more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 905
      • 903 on SlideShare
      • 2 from embeds
    • Comments 0
    • Favorites 4
    • Downloads 4
    Most viewed embeds
    • 2 views on http://delirium-gr.blogspot.com

    more

    All embeds
    • 2 views on http://delirium-gr.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories