SemWeb install-fest presentation

  • 1,373 views
Uploaded on

Presentation for developers about what Zemanta API can do for you.

Presentation for developers about what Zemanta API can do for you.

More in: Technology , Sports
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,373
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
12
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Cross domain – we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer
  • We are big fans of Freebase and Linking Open Data project
  • Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer

Transcript

  • 1. Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
  • 2. Overview
    • General purpose
    • 3. Functionality
    • 4. Examples, demos & use-cases
  • 5. What does it do?
  • 6. A Stargate Computer Processable Data Human Understandable Text
  • 7. Initial design
    • Input: a chunk of text
    • 8. Domain agnostic!
    • 9. Avoid proprietary entity identifiers or taxonomies
    • 10. Standard response formats: JSON, XML, RDF/XML
  • 11. What gives? Most used Most interesting Most obvious
  • 16. Tags
    • Words, phrases
    • 17. „ Interesting“ tags
      • Explicitely mentioned
      • 18. What the text is about as a whole
      • 19. What concepts were not mentioned, but could be relevant (for SEO)
  • 20. Categories
    • Deep hirarchy (100k categories)
    • 21. Customized smaller taxonomies
    • 22. Good for content organization, ad-targeting, etc
  • 23. Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • 24. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • 25. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • 26. Categories
    • Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
    • 27. Top/Arts/Comics/Reviews (0.10)
    • 28. Top/Society/History/By_Time_Period (0.08)
    • 29. Top/Arts/Comics (0.08)
    • 30. Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
    • 31. Top/Society/History (0.08)
    • 32. Top/Shopping/Publications/Books (0.08)
    • 33. Top/Shopping/Publications/Books/Fiction (0.08)
  • 34. Categories example
    • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
    • 35. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
    • 36. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
  • 37. Concepts and entities
    • Identify relevant concepts and entities
    • 38. All disambiguated!
    • 39. At least one URL for each concept, possibly more
  • 40. How we disambiguate
    • Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
    • 41. Mine the web
    • 42. Use knowledge from choices of our users
    • 43. Use both semantic data and statistics based methods
  • 44. Linking to...
    • Traditional
    • Semantic
    ... ... ...
  • 45. How to build upon this
    • Step 1 : We give you exact identifiers
    • 46. Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
    • 47. Step 3 : ?
    • 48. Step 4 : Profit!
  • 49. Discovery example
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
    • 50. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • 51. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
    • 52. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.
  • 53. You get
    • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
    • 54. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
    • 55. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
    • 56. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike .
    entities concepts
  • 57. Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
  • 58. You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
  • 59. Or more complex ones...
  • 60. Concepts and entities use cases
    • Quick 'overviews' of topics
    • 61. Discovery-supporting user interfaces
    • 62. Automatic deep information delivery (hoovers, widgets)
  • 63. Balloons example
    • Deliver deep information on exact concepts and entities
  • 64. Fantastic public graph
    • Information about concepts/entities
    • 65. Types: human, building, location...
    • 66. Relationships with other entities
    • 67. Hard data: dates, places, amounts
  • 68. Connected Dream? September 2008
  • 69. Connected Dream? July 2009
  • 70. Opportunities in leveraging linked data
    • There are internal and external benefits of linking into larger pool of exact data
    • 71. Pulling together custom data becomes orders of magnitude easier
    • 72. However we still miss strong success stories
  • 73. Related articles
    • 20k blogs and media sites
    • 74. You can provide your own list of feeds to recommend from
    • 75. Or use our 'global whitelisted pool'
  • 76. Related articles use cases
    • Better experience for the readers
    • 77. Information discovery (for authors)
    • 78. Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)
  • 79. Related images
    • From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
    • 80. We filter totally unacceptable licenses out, keep the rest
    • 81. Each image has a license spelled out, developer/author choses
  • 82. Zemanta API
    • http://developer.zemanta.com
    • 83. Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
    • 84. JavaScript SDK for quick custom CMS integration
    • 85. Up to 10.000 requests/day free!
  • 86. Ease of API use
    • import urllib, simplejson, pprint
    • 87. args = {'format': 'json',
    • 88. 'method': 'zemanta.suggest',
    • 89. 'api_key': 'np9cbnby9x8tsc47recwuhqm',
    • 90. 'return_categories': 'dmoz',
    • 91. 'return_rdf_links': 1,
    • 92. 'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
    • 93. '''}
    • 94. args_enc = urllib.urlencode(args)
    • 95. response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
    • 96. response = simplejson.loads(response_raw)
    • 97. pprint.pprint(response)
  • 98. Works for
    • All kinds of texts (not just financial or journalistic articles)
    • 99. Tweets!
    • 100. Wherever you need to go from text documents to something structured to put into your algorithm/data store
  • 101. Some API users
  • 102. How the API is used?
    • Place extraction and disambiguation – used by Outside.in
    • 103. Analysis of tweets – used by Klout.net
    • 104. Custom categorization – used by Slideshare
    • 105. Semantic tagging – used by Faviki
  • 106. CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag
    • Exact tagging
    • 107. RDFa as a transport layer
    • 108. Freebase & LOD as vocabularies
    • 109. Full-circle ecosystem from day one (publishers, services, better search, better browsing)
  • 110. The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
  • 111. The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
  • 112. Image attributions
    • http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare