Your SlideShare is downloading. ×
0
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation

2,480

Published on

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,480
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Automatic Keywords Extraction – A Basis for Content Recommendation Ivana Bosnić Katrien Verbert Erik Duval University of Zagreb, Croatia Katholieke Universiteit Leuven, Belgium
  2. Toda y... <ul><li>General idea </li></ul><ul><li>and use case </li></ul><ul><li>Evaluation of services </li></ul><ul><li>Evaluation in </li></ul><ul><li>the environment </li></ul>
  3. General idea and use case 1
  4. Content authoring <ul><li>Lacking inspiration </li></ul><ul><ul><li>Relevant content you should take a look at </li></ul></ul><ul><li>Educational context </li></ul><ul><ul><li>What is this content for? </li></ul></ul><ul><ul><li>Who is making it? </li></ul></ul><ul><li>Hard to reuse </li></ul><ul><ul><li>Referencing </li></ul></ul><ul><ul><li>Copy-pasting </li></ul></ul>
  5. Use case Content available Broader c ontext Content Reuse Metadata LORs Wikipedia Blogs LMS Course context Author Current content Tools Integration Referencing
  6. Current work… <ul><li>Web application </li></ul><ul><ul><li>WikiMedia / WikiPres integration </li></ul></ul><ul><li>Recommending content </li></ul><ul><ul><li>Keywords: Zemanta API </li></ul></ul><ul><ul><li>Wikipedia, Blogs </li></ul></ul><ul><ul><li>GLOBE repository (REST interface) </li></ul></ul><ul><li>Context </li></ul><ul><ul><li>Presentation slides </li></ul></ul><ul><li>Reuse </li></ul><ul><ul><li>Referencing the content </li></ul></ul>
  7. Ke ywords <ul><li>Basis for generating search terms </li></ul><ul><li>2 groups of generator services </li></ul><ul><ul><li>Term extraction </li></ul></ul><ul><ul><ul><li>Yahoo Term Extraction Web services, Fivefilters </li></ul></ul></ul><ul><ul><li>Semantic entity generation </li></ul></ul><ul><ul><ul><li>Zemanta, OpenCalais, Evri, Achemy API </li></ul></ul></ul>
  8. Keyword extractors – a closer look <ul><li>Yahoo Term Extraction Service </li></ul><ul><ul><li>Up to 20 keywords found in text </li></ul></ul><ul><ul><li>No ranking </li></ul></ul><ul><ul><li>Generates a part of metadata in GLOBE (SAmgI) </li></ul></ul><ul><li>Zemanta </li></ul><ul><ul><li>Up to 8 keywords, not necessarily in text </li></ul></ul><ul><ul><li>Relevance ranking </li></ul></ul><ul><ul><li>Recommends images, links, blogs/news </li></ul></ul><ul><ul><li>Emphasising words and influencing the extraction </li></ul></ul>
  9. Keyword extractors evaluation 2
  10. Goals <ul><li>Testing extraction services with already existing educational content </li></ul><ul><li>Comparing Zemanta & Y! Term Extraction </li></ul><ul><li>Learning about user-generated queries </li></ul>
  11. Methodology <ul><li>6 users </li></ul><ul><li>9 presentations found through Google </li></ul><ul><ul><li>Topics: open source, databases, gravity </li></ul></ul><ul><ul><li>Text extracted from 3 adjacent slides </li></ul></ul><ul><ul><li>Different content properties (general, specific, full sentences, bullets...) </li></ul></ul><ul><li>Users creating the queries </li></ul><ul><li>Users grading the generated keywords </li></ul>
  12. Automatically extracted keywords: user grading I Average of keyword relevancy grading, per presentation
  13. Automatically extracted keywords: user grading II Average of keyword relevancy grading, per topic
  14. Automatically extracted keywords: user grading III Average user grading per particular Zemanta rank
  15. User-generated keywords <ul><li>Comparing the differences between user and automatic generation </li></ul><ul><li>Chosen keywords: common to 2+ users </li></ul><ul><li>2 comparisons: </li></ul><ul><ul><li>Exact match </li></ul></ul><ul><ul><li>Similar match </li></ul></ul>
  16. User-generated keywords Matches between user- and automatically generated keywords
  17. Lessons learned <ul><li>Zemanta won  </li></ul><ul><ul><li>not an extensive evaluation, though  </li></ul></ul><ul><li>Presentations prepared beforehand </li></ul><ul><ul><li>Additional problems occur on-the-fly </li></ul></ul>
  18. Evaluation in the environment 3
  19. Goals <ul><li>Analyzing the use of keywords as the basis for recommendations </li></ul><ul><ul><li>while the content is being authored </li></ul></ul><ul><li>Evaluation the relevancy of keywords </li></ul><ul><li>Part of usabilty evaluation </li></ul>
  20. Methodology <ul><li>4 users </li></ul><ul><li>Make a presentation on programming topic </li></ul><ul><ul><li>Topics: HTML (x2), databases, XML </li></ul></ul><ul><li>Rank the 5 best keywords of the 8 proposed </li></ul>
  21. Environment <ul><li>Wiki + slides editing -> WikiPres </li></ul>
  22.  
  23. Evaluation I The relation between user and internat ranking The average of user rankings
  24. Evaluation I - problems <ul><li>Content cold start </li></ul><ul><li>Semantic relation of words </li></ul><ul><li>Unnecessary text marku p </li></ul><ul><li>Ambiguity </li></ul>
  25. Evaluation I - changes <ul><li>Including the content from previous slides </li></ul><ul><li>Slide t itle emphasis </li></ul><ul><li>Text cleaning </li></ul>
  26. Evaluation II <ul><li>The same evaluation methodology </li></ul><ul><li>Additional goal </li></ul><ul><ul><li>analyzing the influence of text scenarios </li></ul></ul><ul><ul><ul><li>including an example </li></ul></ul></ul><ul><ul><ul><li>changing the sub-topic in the presentation </li></ul></ul></ul><ul><ul><ul><li>more general topic (open source) </li></ul></ul></ul>
  27. Evaluation II The relation between user and internat ranking The average of user rankings
  28. Lessons learned <ul><li>Majority of best-ranked keywords in Zemanta TOP 5 </li></ul><ul><li>Scenarios’ problems: </li></ul><ul><ul><li>example: banking for database systems </li></ul></ul><ul><ul><li>open source: lower ranking </li></ul></ul><ul><ul><li>dynamic HTML: relation to general topic </li></ul></ul><ul><li>Few words per slide </li></ul><ul><ul><li>slides created for evaluation purposes </li></ul></ul>
  29. All in all... <ul><li>Zemanta fits the intended purpose </li></ul><ul><li>5 highest ranked keywords can be used </li></ul><ul><li>TODO: </li></ul><ul><ul><li>keyword classification schemes? </li></ul></ul><ul><ul><li>folksonomies? </li></ul></ul><ul><ul><li>context information extraction and mapping? </li></ul></ul>
  30. Some hard questions... <ul><li>Will the keywords be found in metadata? </li></ul><ul><li>Do more relevant keywords produce more relevant recommendations? </li></ul><ul><li>How not to omit the relevant content? </li></ul>
  31. Thanks  <ul><li>Ivana . bosnic at fer . hr </li></ul>

×