Your SlideShare is downloading. ×
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment   The Riks Demonstrator
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Acknowledge 07 Automated Retrieval And Categorization Of Texts In An E Learning Environment The Riks Demonstrator

314

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
314
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Automated Information Retrieval and Text Categorization: The RIKS Demonstrator Acknowledge final event November 25, 2008 Marie-Francine Moens, Erik Boiy, Javier Arias (HMDB-LIIR) Saskia Debergh (i.Know) Philippe De Lombaerde, Birger Fühne (UNU-CRIS) Overview • UNU CRIS: The RIKS Demonstrator UNU-CRIS: • K.U.Leuven: – Content extraction from multilingual Web pages – Text categorization: machine learning approach – Search engine and indexing infrastructure – Interfacing the Acknowledge platform • i.Know: – Information forensics Acknowledge 25-11-2008 1
  • 2. The RIKS Demonstrator • United Nations University – Comparative Regional Integration Studies (UNU-CRIS) • Issues addressed in research and capacity building: – (i) emergence of regional (= supra-national) governance level – (ii) linkages with other governance levels (national, global/UN) – (iii) building of regional institutions – (iv) growing regional interdependence, etc. • RIKS = Regional Integration Knowledge System (UNU-CRIS and GARNET NoE) Acknowledge 25-11-2008 Acknowledge 25-11-2008 2
  • 3. The RIKS Demonstrator Issues addressed in the demonstrator: How to automate retrieval and processing p g (cleaning, search, categorization, presentation) of particular types of relevant information in an e-learning environment?: – ‘News’: short texts, various formats, dynamic collection, short life cycle, role of news in e- learning application – ‘Documentation’: heterogeneous texts: scientific articles, theses, essays, ... , rather static collection – Treaty texts: long and complex texts, static collection, issue of accessibility Acknowledge 25-11-2008 RIKS example output Acknowledge 25-11-2008 3
  • 4. Demo Acknowledge 25-11-2008 K.U.Leuven: Content extraction from multilingual Web pages • = Extracting main content from Web page and removing extraneous data (navigation menu’s, advertisements, etc.) • Requirements of the tool: – Accurate – Generic – Multilingual – Fast Acknowledge 25-11-2008 4
  • 5. Acknowledge 25-11-2008 [Arias et al. submitted] Acknowledge 25-11-2008 5
  • 6. [Arias et al. submitted] [5] =[Gottron 2008] Acknowledge 25-11-2008 K.U.Leuven:Text categorization • Heterogeneous documentation and Google News classified into 27 categories (e.g., trade, poverty, ...) (e g trade poverty ) • Supervised classifier: Multinomial Naïve Bayes, Support Vector Machine, ... • Features: – different features: unigrams, bigrams, feature item sets, ... • Additional feature Selection: – Chi Square, Information Gain, Linear Classifier Weights, Orthogonal Centroid Feature Selection • Different test set ups 6
  • 7. K.U.Leuven: Text categorization Acknowledge 25-11-2008 RIKS K.U.Leuven: search engine Acknowledge 25-11-2008 7
  • 8. Acknowledge 25-11-2008 Demo Acknowledge 25-11-2008 8
  • 9. Weten dat je niet weet wat je zou moeten weten 1. Information Forensics ‐ Smart Indexing more than just an index distinguishes between concepts and relations distinguishes between concepts and relations starts from unstructured text (bottom‐up instead of top‐down) recognises word groups as meaningful units Top‐down: Bottom‐up: knowledge knowledge keywords concepts and relations text text Acknowledge 25-11-2008 © i.Know NV ‐ All rights reserved. Weten dat je niet weet wat je zou moeten weten 1. Information Forensics – Smart Indexing De Fortis Bank werd overgenomen door BNP Paribas. Traditional indexing (keywords): De Fortis Bank werd overgenomen door BNP Paribas. Keyword Index Fortis 0.23 stopwords calculation Bank 0.38 werd 0.08 stemming correlation overgenomen 0.21 door 0.12 BNP 0.34 De Fortis Bank werd overgenomen door BNP Paribas Paribas 0.27 Acknowledge 25-11-2008 © i.Know NV ‐ All rights reserved. 9
  • 10. Weten dat je niet weet wat je zou moeten weten 1. Information Forensics – Smart Indexing De Fortis Bank werd overgenomen door BNP Paribas. Smart Indexing (concepts and relations): De Fortis Bank werd overgenomen door BNP Paribas. Smart Index relation  concept  Concept Fortis Bank detection detection Relation werd overgenomen door werd overgenomen door Concept BNP Paribas De Fortis Bank werd overgenomen door BNP Paribas Acknowledge 25-11-2008 © i.Know NV ‐ All rights reserved. Weten dat je niet weet wat je zou moeten weten 2. Categorisation based on Smart Indexing Preconditions: Pre defined taxonomy/ontology Pre‐defined taxonomy/ontology Top‐down processing Advantages of Smart Indexing: Smart Indexing Results can be used to fill and enrich the taxonomy, thus ensuring  the entries are relevant precise complete Acknowledge 25-11-2008 © i.Know NV ‐ All rights reserved. 10
  • 11. Weten dat je niet weet wat je zou moeten weten 2. Categorisation Categorisation EU EFTA Smart Indexing (concepts and relations): The Agreement will be applied with the European  and with the EFTA states. Union Input: The Agreement will be applied with the European Union and with the EFTA states. Acknowledge 25-11-2008 © i.Know NV ‐ All rights reserved. RIKS i.Know: news categorization Acknowledge 25-11-2008 11
  • 12. RIKS i.Know: news categorization Acknowledge 25-11-2008 Acknowledge 25-11-2008 12
  • 13. Acknowledge 25-11-2008 Demo Acknowledge 25-11-2008 13
  • 14. Thank you Acknowledge 25-11-2008 14

×