Your SlideShare is downloading. ×

SemWeb install-fest presentation

1,417

Published on

Presentation for developers about what Zemanta API can do for you.

Presentation for developers about what Zemanta API can do for you.

Published in: Technology, Sports
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,417
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Cross domain – we didn't start with financial or health domain and then expanded our algorithms, we started from day one with cross domain capabilities
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Tags have no background meaning, they are not tied to any database and they are not normalized in any way. They are what you would expect of a human not caring for standardization or normalization to choose from For example text mentioning Apple, Android and Google might get iPhone as a tag And mobile web as a tag, even when it wasn't mentioned anywhere.
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer
  • We are big fans of Freebase and Linking Open Data project
  • Zigtag, Faviki, AdpativeBlue, Zemanta, Yahoo, Freebase
  • Disambiguation is done using background knowledge, for example we differ between London the city in UK, London in Ohio or Texas and Jack London, the writer
  • Transcript

    • 1. Building upon the Zemanta API Andraz Tori, CTO [email_address] Twitter: andraz
    • 2. Overview
      • General purpose
      • 3. Functionality
      • 4. Examples, demos & use-cases
    • 5. What does it do?
    • 6. A Stargate Computer Processable Data Human Understandable Text
    • 7. Initial design
      • Input: a chunk of text
      • 8. Domain agnostic!
      • 9. Avoid proprietary entity identifiers or taxonomies
      • 10. Standard response formats: JSON, XML, RDF/XML
    • 11. What gives? Most used Most interesting Most obvious
    • 16. Tags
      • Words, phrases
      • 17. „ Interesting“ tags
        • Explicitely mentioned
        • 18. What the text is about as a whole
        • 19. What concepts were not mentioned, but could be relevant (for SEO)
    • 20. Categories
      • Deep hirarchy (100k categories)
      • 21. Customized smaller taxonomies
      • 22. Good for content organization, ad-targeting, etc
    • 23. Categories example
      • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
      • 24. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
      • 25. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
    • 26. Categories
      • Top/Society/History/By_Time_Period/Twentieth_Century/Cold_War (0.11)
      • 27. Top/Arts/Comics/Reviews (0.10)
      • 28. Top/Society/History/By_Time_Period (0.08)
      • 29. Top/Arts/Comics (0.08)
      • 30. Top/Society/History/By_Time_Period/Twentieth_Century (0.08)
      • 31. Top/Society/History (0.08)
      • 32. Top/Shopping/Publications/Books (0.08)
      • 33. Top/Shopping/Publications/Books/Fiction (0.08)
    • 34. Categories example
      • Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credits roll, it is clear Watchmen is not your typical superhero movie.
      • 35. An ageing vigilante, The Comedian, is attacked in his high-rise apartment before being hurled 10 storeys to his death... in graphic slow motion. What follows is a two-and-three-quarter hour epic that centres on an outlawed group of deeply flawed former heroes as a Cold War Doomsday clock inches ever closer to midnight and nuclear apocalypse.
      • 36. First published in 12 parts by DC Comics in 1986, Watchmen was written by the British team of Alan Moore and illustrator Dave Gibbons.
    • 37. Concepts and entities
      • Identify relevant concepts and entities
      • 38. All disambiguated!
      • 39. At least one URL for each concept, possibly more
    • 40. How we disambiguate
      • Use knowledge from Wikipedia, Freebase, Dmoz, third party databases...
      • 41. Mine the web
      • 42. Use knowledge from choices of our users
      • 43. Use both semantic data and statistics based methods
    • 44. Linking to...
      • Traditional
      • Semantic
      ... ... ...
    • 45. How to build upon this
      • Step 1 : We give you exact identifiers
      • 46. Step 2 : Then you look up the information about them (connections, images, …) in your or third party databases
      • 47. Step 3 : ?
      • 48. Step 4 : Profit!
    • 49. Discovery example
      • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York, the Federal Aviation Administration says.
      • 50. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
      • 51. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte, is partially submerged.
      • 52. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike.
    • 53. You get
      • A US Airways Airbus A320 passenger plane carrying 135 people has crashed into the Hudson River in New York , the Federal Aviation Administration says.
      • 54. Rescue boats and ferries are alongside the plane attempting to pick up people standing on both of the plane's wings.
      • 55. The plane, which the FAA said was flight 1549 from LaGuardia Airport to Charlotte , is partially submerged.
      • 56. It is not known how the plane came to land in the river, but the FAA said it might have been due to a bird strike .
      entities concepts
    • 57. Or more precisely... LaGuardia Airport http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000018f654 LaGuardia Airport http://dbpedia.org/resource/LaGuardia_Airport Federal Aviation Administration http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000017df0 Federal Aviation Administration http://dbpedia.org/resource/Federal_Aviation_Administration Hudson River http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000005ebb5 Hudson River http://dbpedia.org/resource/Hudson_River Airbus A320 family http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000012f918 Airbus A320 family http://dbpedia.org/resource/Airbus_A320_family Bird strike http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000004744df Bird strike http://dbpedia.org/resource/Bird_strike US Airways http://rdf.freebase.com/ns/guid/9202a8c04000641f80000000001b4dc5 US Airways http://dbpedia.org/resource/US_Airways New York http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000054dd5d New York http://dbpedia.org/resource/New_York Charlotte, North Carolina http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000006e148 Charlotte, North Carolina http://dbpedia.org/resource/Charlotte%2C_North_Carolina Ferr http://rdf.freebase.com/ns/guid/9202a8c04000641f8000000000063292 Ferry at http://dbpedia.org/resource/Ferry
    • 58. You can query relationships http://test.infoblow.zemanta.com/infoblow/galaxy/
    • 59. Or more complex ones...
    • 60. Concepts and entities use cases
      • Quick 'overviews' of topics
      • 61. Discovery-supporting user interfaces
      • 62. Automatic deep information delivery (hoovers, widgets)
    • 63. Balloons example
      • Deliver deep information on exact concepts and entities
    • 64. Fantastic public graph
      • Information about concepts/entities
      • 65. Types: human, building, location...
      • 66. Relationships with other entities
      • 67. Hard data: dates, places, amounts
    • 68. Connected Dream? September 2008
    • 69. Connected Dream? July 2009
    • 70. Opportunities in leveraging linked data
      • There are internal and external benefits of linking into larger pool of exact data
      • 71. Pulling together custom data becomes orders of magnitude easier
      • 72. However we still miss strong success stories
    • 73. Related articles
      • 20k blogs and media sites
      • 74. You can provide your own list of feeds to recommend from
      • 75. Or use our 'global whitelisted pool'
    • 76. Related articles use cases
      • Better experience for the readers
      • 77. Information discovery (for authors)
      • 78. Creating interlinked mini-comunities (example: bloggers using our tool to discover others in the niche)
    • 79. Related images
      • From Wikipedia, Flickr, Daylife, Amazon, Last.fm, Snooth, social networks
      • 80. We filter totally unacceptable licenses out, keep the rest
      • 81. Each image has a license spelled out, developer/author choses
    • 82. Zemanta API
      • http://developer.zemanta.com
      • 83. Examples in Java, Javascript, Python, Ruby, PHP, Perl, C#...
      • 84. JavaScript SDK for quick custom CMS integration
      • 85. Up to 10.000 requests/day free!
    • 86. Ease of API use
      • import urllib, simplejson, pprint
      • 87. args = {'format': 'json',
      • 88. 'method': 'zemanta.suggest',
      • 89. 'api_key': 'np9cbnby9x8tsc47recwuhqm',
      • 90. 'return_categories': 'dmoz',
      • 91. 'return_rdf_links': 1,
      • 92. 'text': ''' Branded "unfilmable", Watchmen - the cult graphic novel about a group of retired, flawed superheroes - has finally made it to the big screen. From the second the opening credit An ageing vigilante, The Comedian, is attacked ...
      • 93. '''}
      • 94. args_enc = urllib.urlencode(args)
      • 95. response_raw = urllib.urlopen(„ http://api.zemanta.com/services/rest/0.0/ “, args_enc).read()
      • 96. response = simplejson.loads(response_raw)
      • 97. pprint.pprint(response)
    • 98. Works for
      • All kinds of texts (not just financial or journalistic articles)
      • 99. Tweets!
      • 100. Wherever you need to go from text documents to something structured to put into your algorithm/data store
    • 101. Some API users
    • 102. How the API is used?
      • Place extraction and disambiguation – used by Outside.in
      • 103. Analysis of tweets – used by Klout.net
      • 104. Custom categorization – used by Slideshare
      • 105. Semantic tagging – used by Faviki
    • 106. CommonTag Initiative by AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo!, Zemanta, and Zigtag
      • Exact tagging
      • 107. RDFa as a transport layer
      • 108. Freebase & LOD as vocabularies
      • 109. Full-circle ecosystem from day one (publishers, services, better search, better browsing)
    • 110. The next web ... the next web will be like a great party host , introducing us to each other and bringing us together into meaningful conversation. Marta Strickland, Organic
    • 111. The future? Zemify me up, Scotty ! Andraz Tori [email_address] Twitter: andraz
    • 112. Image attributions
      • http://www.flickr.com/photos/constanzavolare/2475833775/in/photostream/ CC by Constanza Volare

    ×