• Save
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
Upcoming SlideShare
Loading in...5
×
 

SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation

on

  • 2,744 views

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Statistics

Views

Total Views
2,744
Views on SlideShare
2,744
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation Presentation Transcript

  • Automatic Keywords Extraction – A Basis for Content Recommendation Ivana Bosnić Katrien Verbert Erik Duval University of Zagreb, Croatia Katholieke Universiteit Leuven, Belgium
  • Toda y...
    • General idea
    • and use case
    • Evaluation of services
    • Evaluation in
    • the environment
  • General idea and use case 1
  • Content authoring
    • Lacking inspiration
      • Relevant content you should take a look at
    • Educational context
      • What is this content for?
      • Who is making it?
    • Hard to reuse
      • Referencing
      • Copy-pasting
  • Use case Content available Broader c ontext Content Reuse Metadata LORs Wikipedia Blogs LMS Course context Author Current content Tools Integration Referencing
  • Current work…
    • Web application
      • WikiMedia / WikiPres integration
    • Recommending content
      • Keywords: Zemanta API
      • Wikipedia, Blogs
      • GLOBE repository (REST interface)
    • Context
      • Presentation slides
    • Reuse
      • Referencing the content
  • Ke ywords
    • Basis for generating search terms
    • 2 groups of generator services
      • Term extraction
        • Yahoo Term Extraction Web services, Fivefilters
      • Semantic entity generation
        • Zemanta, OpenCalais, Evri, Achemy API
  • Keyword extractors – a closer look
    • Yahoo Term Extraction Service
      • Up to 20 keywords found in text
      • No ranking
      • Generates a part of metadata in GLOBE (SAmgI)
    • Zemanta
      • Up to 8 keywords, not necessarily in text
      • Relevance ranking
      • Recommends images, links, blogs/news
      • Emphasising words and influencing the extraction
  • Keyword extractors evaluation 2
  • Goals
    • Testing extraction services with already existing educational content
    • Comparing Zemanta & Y! Term Extraction
    • Learning about user-generated queries
  • Methodology
    • 6 users
    • 9 presentations found through Google
      • Topics: open source, databases, gravity
      • Text extracted from 3 adjacent slides
      • Different content properties (general, specific, full sentences, bullets...)
    • Users creating the queries
    • Users grading the generated keywords
  • Automatically extracted keywords: user grading I Average of keyword relevancy grading, per presentation
  • Automatically extracted keywords: user grading II Average of keyword relevancy grading, per topic
  • Automatically extracted keywords: user grading III Average user grading per particular Zemanta rank
  • User-generated keywords
    • Comparing the differences between user and automatic generation
    • Chosen keywords: common to 2+ users
    • 2 comparisons:
      • Exact match
      • Similar match
  • User-generated keywords Matches between user- and automatically generated keywords
  • Lessons learned
    • Zemanta won 
      • not an extensive evaluation, though 
    • Presentations prepared beforehand
      • Additional problems occur on-the-fly
  • Evaluation in the environment 3
  • Goals
    • Analyzing the use of keywords as the basis for recommendations
      • while the content is being authored
    • Evaluation the relevancy of keywords
    • Part of usabilty evaluation
  • Methodology
    • 4 users
    • Make a presentation on programming topic
      • Topics: HTML (x2), databases, XML
    • Rank the 5 best keywords of the 8 proposed
  • Environment
    • Wiki + slides editing -> WikiPres
  •  
  • Evaluation I The relation between user and internat ranking The average of user rankings
  • Evaluation I - problems
    • Content cold start
    • Semantic relation of words
    • Unnecessary text marku p
    • Ambiguity
  • Evaluation I - changes
    • Including the content from previous slides
    • Slide t itle emphasis
    • Text cleaning
  • Evaluation II
    • The same evaluation methodology
    • Additional goal
      • analyzing the influence of text scenarios
        • including an example
        • changing the sub-topic in the presentation
        • more general topic (open source)
  • Evaluation II The relation between user and internat ranking The average of user rankings
  • Lessons learned
    • Majority of best-ranked keywords in Zemanta TOP 5
    • Scenarios’ problems:
      • example: banking for database systems
      • open source: lower ranking
      • dynamic HTML: relation to general topic
    • Few words per slide
      • slides created for evaluation purposes
  • All in all...
    • Zemanta fits the intended purpose
    • 5 highest ranked keywords can be used
    • TODO:
      • keyword classification schemes?
      • folksonomies?
      • context information extraction and mapping?
  • Some hard questions...
    • Will the keywords be found in metadata?
    • Do more relevant keywords produce more relevant recommendations?
    • How not to omit the relevant content?
  • Thanks 
    • Ivana . bosnic at fer . hr