• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
 

SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation

on

  • 2,676 views

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Presented at SE@M 2010 - Fourth International Workshop on Search and Exchange of e-le@rning Materials (SE@M’10). 27-28 September 2010, Barcelona, Spain

Statistics

Views

Total Views
2,676
Views on SlideShare
2,676
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation Presentation Transcript

    • Automatic Keywords Extraction – A Basis for Content Recommendation Ivana Bosnić Katrien Verbert Erik Duval University of Zagreb, Croatia Katholieke Universiteit Leuven, Belgium
    • Toda y...
      • General idea
      • and use case
      • Evaluation of services
      • Evaluation in
      • the environment
    • General idea and use case 1
    • Content authoring
      • Lacking inspiration
        • Relevant content you should take a look at
      • Educational context
        • What is this content for?
        • Who is making it?
      • Hard to reuse
        • Referencing
        • Copy-pasting
    • Use case Content available Broader c ontext Content Reuse Metadata LORs Wikipedia Blogs LMS Course context Author Current content Tools Integration Referencing
    • Current work…
      • Web application
        • WikiMedia / WikiPres integration
      • Recommending content
        • Keywords: Zemanta API
        • Wikipedia, Blogs
        • GLOBE repository (REST interface)
      • Context
        • Presentation slides
      • Reuse
        • Referencing the content
    • Ke ywords
      • Basis for generating search terms
      • 2 groups of generator services
        • Term extraction
          • Yahoo Term Extraction Web services, Fivefilters
        • Semantic entity generation
          • Zemanta, OpenCalais, Evri, Achemy API
    • Keyword extractors – a closer look
      • Yahoo Term Extraction Service
        • Up to 20 keywords found in text
        • No ranking
        • Generates a part of metadata in GLOBE (SAmgI)
      • Zemanta
        • Up to 8 keywords, not necessarily in text
        • Relevance ranking
        • Recommends images, links, blogs/news
        • Emphasising words and influencing the extraction
    • Keyword extractors evaluation 2
    • Goals
      • Testing extraction services with already existing educational content
      • Comparing Zemanta & Y! Term Extraction
      • Learning about user-generated queries
    • Methodology
      • 6 users
      • 9 presentations found through Google
        • Topics: open source, databases, gravity
        • Text extracted from 3 adjacent slides
        • Different content properties (general, specific, full sentences, bullets...)
      • Users creating the queries
      • Users grading the generated keywords
    • Automatically extracted keywords: user grading I Average of keyword relevancy grading, per presentation
    • Automatically extracted keywords: user grading II Average of keyword relevancy grading, per topic
    • Automatically extracted keywords: user grading III Average user grading per particular Zemanta rank
    • User-generated keywords
      • Comparing the differences between user and automatic generation
      • Chosen keywords: common to 2+ users
      • 2 comparisons:
        • Exact match
        • Similar match
    • User-generated keywords Matches between user- and automatically generated keywords
    • Lessons learned
      • Zemanta won 
        • not an extensive evaluation, though 
      • Presentations prepared beforehand
        • Additional problems occur on-the-fly
    • Evaluation in the environment 3
    • Goals
      • Analyzing the use of keywords as the basis for recommendations
        • while the content is being authored
      • Evaluation the relevancy of keywords
      • Part of usabilty evaluation
    • Methodology
      • 4 users
      • Make a presentation on programming topic
        • Topics: HTML (x2), databases, XML
      • Rank the 5 best keywords of the 8 proposed
    • Environment
      • Wiki + slides editing -> WikiPres
    •  
    • Evaluation I The relation between user and internat ranking The average of user rankings
    • Evaluation I - problems
      • Content cold start
      • Semantic relation of words
      • Unnecessary text marku p
      • Ambiguity
    • Evaluation I - changes
      • Including the content from previous slides
      • Slide t itle emphasis
      • Text cleaning
    • Evaluation II
      • The same evaluation methodology
      • Additional goal
        • analyzing the influence of text scenarios
          • including an example
          • changing the sub-topic in the presentation
          • more general topic (open source)
    • Evaluation II The relation between user and internat ranking The average of user rankings
    • Lessons learned
      • Majority of best-ranked keywords in Zemanta TOP 5
      • Scenarios’ problems:
        • example: banking for database systems
        • open source: lower ranking
        • dynamic HTML: relation to general topic
      • Few words per slide
        • slides created for evaluation purposes
    • All in all...
      • Zemanta fits the intended purpose
      • 5 highest ranked keywords can be used
      • TODO:
        • keyword classification schemes?
        • folksonomies?
        • context information extraction and mapping?
    • Some hard questions...
      • Will the keywords be found in metadata?
      • Do more relevant keywords produce more relevant recommendations?
      • How not to omit the relevant content?
    • Thanks 
      • Ivana . bosnic at fer . hr