Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SemWeb install-fest presentation

1,919 views

Published on

Presentation for developers about what Zemanta API can do for you.

Published in: Technology, Sports
  • Be the first to comment

SemWeb install-fest presentation

  1. 1. Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
  2. 2. Overview <ul><li>General purpose
  3. 3. Functionality
  4. 4. Examples, demos & use-cases </li></ul>
  5. 5. What does it do?
  6. 6. A Stargate Computer Processable Data Human Understandable Text
  7. 7. Initial design <ul><li>Input: a chunk of text
  8. 8. Domain agnostic!
  9. 9. Avoid proprietary entity identifiers or taxonomies
  10. 10. Standard response formats: JSON, XML, RDF/XML </li></ul>
  11. 11. What gives? <ul><li>Tags
  12. 12. Categories
  13. 13. Concepts and entities
  14. 14. Related articles
  15. 15. Related images </li></ul>Most used Most interesting Most obvious
  16. 16. Tags <ul><li>Words, phrases
  17. 17. „ Interesting“ tags </li><ul><li>Explicitely mentioned
  18. 18. What the text is about as a whole
  19. 19. What concepts were not mentioned, but could be relevant (for SEO) </li></ul></ul>
  20. 20. Categories <ul><li>Deep hirarchy (100k categories)
  21. 21. Customized smaller taxonomies
  22. 22. Good for content organization, ad-targeting, etc </li></ul>
  23. 23. Categories example <ul><li>Branded &quot;unfilmable&quot;, Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
  24. 24. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
  25. 25. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons. </li></ul>
  26. 26. Categories <ul><li>Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
  27. 27. Top/Arts/Comics/Reviews (0.10)
  28. 28. Top/Society/History/By_Time_Period (0.08)
  29. 29. Top/Arts/Comics (0.08)
  30. 30. Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
  31. 31. Top/Society/History (0.08)
  32. 32. Top/Shopping/Publications/Books (0.08)
  33. 33. Top/Shopping/Publications/Books/Fiction (0.08) </li></ul>
  34. 34. Categories example <ul><li>Branded &quot;unfilmable&quot;, Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
  35. 35. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
  36. 36. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons. </li></ul>
  37. 37. Concepts and entities <ul><li>Identify relevant concepts and entities
  38. 38. All disambiguated!
  39. 39. At least one URL for each concept, possibly more </li></ul>
  40. 40. How we disambiguate <ul><li>Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
  41. 41. Mine the web
  42. 42. Use knowledge from choices of our users
  43. 43. Use both semantic data and statistics based methods </li></ul>
  44. 44. Linking to... <ul><li>Traditional </li></ul><ul><li>Semantic </li></ul>... ... ...
  45. 45. How to build upon this <ul><li>Step 1 : We give you exact identifiers
  46. 46. Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
  47. 47. Step 3 : ?
  48. 48. Step 4 : Profit! </li></ul>
  49. 49. Discovery example <ul><li>A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
  50. 50. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
  51. 51. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
  52. 52. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike. </li></ul>
  53. 53. You get <ul><li>A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
  54. 54. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
  55. 55. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
  56. 56. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike . </li></ul>entities concepts
  57. 57. Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
  58. 58. You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
  59. 59. Or more complex ones...
  60. 60. Concepts and entities use cases <ul><li>Quick 'overviews' of topics
  61. 61. Discovery-supporting user interfaces
  62. 62. Automatic deep information delivery (hoovers, widgets) </li></ul>
  63. 63. Balloons example <ul><li>Deliver deep information on exact concepts and entities </li></ul>
  64. 64. Fantastic public graph <ul><li>Information about concepts/entities
  65. 65. Types: human, building, location...
  66. 66. Relationships with other entities
  67. 67. Hard data: dates, places, amounts </li></ul>
  68. 68. Connected Dream? September 2008
  69. 69. Connected Dream? July 2009
  70. 70. Opportunities in leveraging linked data <ul><li>There are internal and external benefits of linking into larger pool of exact data
  71. 71. Pulling together custom data becomes orders of magnitude easier
  72. 72. However we still miss strong success stories </li></ul>
  73. 73. Related articles <ul><li>20k blogs and media sites
  74. 74. You can provide your own list of feeds to recommend from
  75. 75. Or use our 'global whitelisted pool' </li></ul>
  76. 76. Related articles use cases <ul><li>Better experience for the readers
  77. 77. Information discovery (for authors)
  78. 78. Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche) </li></ul>
  79. 79. Related images <ul><li>From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
  80. 80. We filter totally unacceptable licenses out, keep the rest
  81. 81. Each image has a license spelled out, developer/author choses </li></ul>
  82. 82. Zemanta API <ul><li>http://developer.zemanta.com
  83. 83. Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
  84. 84. JavaScript SDK for quick custom CMS integration
  85. 85. Up to 10.000 requests/day free! </li></ul>
  86. 86. Ease of API use <ul><li>import urllib, simplejson, pprint
  87. 87. args = {'format': 'json',
  88. 88. 'method': 'zemanta.suggest',
  89. 89. 'api_key': 'np9cbnby9x8tsc47recwuhqm',
  90. 90. 'return_categories': 'dmoz',
  91. 91. 'return_rdf_links': 1,
  92. 92. 'text': ''' Branded &quot;unfilmable&quot;, Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
  93. 93. '''}
  94. 94. args_enc = urllib.urlencode(args)
  95. 95. response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
  96. 96. response = simplejson.loads(response_raw)
  97. 97. pprint.pprint(response) </li></ul>
  98. 98. Works for <ul><li>All kinds of texts (not just financial or journalistic articles)
  99. 99. Tweets!
  100. 100. Wherever you need to go from text documents to something structured to put into your algorithm/data store </li></ul>
  101. 101. Some API users
  102. 102. How the API is used? <ul><li>Place extraction and disambiguation – used by Outside.in
  103. 103. Analysis of tweets – used by Klout.net
  104. 104. Custom categorization – used by Slideshare
  105. 105. Semantic tagging – used by Faviki </li></ul>
  106. 106. CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag <ul><li>Exact tagging
  107. 107. RDFa as a transport layer
  108. 108. Freebase & LOD as vocabularies
  109. 109. Full-circle ecosystem from day one (publishers, services, better search, better browsing) </li></ul>
  110. 110. The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
  111. 111. The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
  112. 112. Image attributions <ul><li>http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare </li></ul>

×