Vector spaces for information extraction - Random Projection Example

1,007 views

Published on

A very short talk given at UCD latent semantic workshop

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,007
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Vector spaces for information extraction - Random Projection Example

  1. 1. Vector Spaces for  Information Extraction Behrang Q. Zadeh behrangatoffice@gmail.com Knowledge Discovery Unit @ Insight Centre @ National University of Ireland, Galway Insight Workshop on Latent Space Methods – Dublin, UCD, 2014
  2. 2. Vector Spaces in Information Extraction Entities to be Extracted or compared Contexts that are used for comparison • Vector spaces in IE are: • a representation framework for the Distributional Hypothesis*;  • Sparse; • Large (order of millions by millions); • Changing Dynamically; *not exclusively 
  3. 3. Vector Spaces in Information Extraction Entities to be Extracted or compared Contexts that are used for comparison • In classic methods the dimension  of VSM growths as data growth.  • Dimension Reduction techniques  based on Matrix Factorization  may not be applied: • Iterative methods are still of the  complexity of O(n2) 
  4. 4. Vector Spaces in Information Extraction • Random Projection is one solution: • Estimate a VSM by a random projection matrix that made  of a set of randomly created vectors. • i.e. based on the Johnson‐Lindenstrauss lemma • verified by the results reported in (Hecht‐Nielsen, 1994) * The above figure is copyrighted by Alex Clemmer (http://nullspace.io/) 
  5. 5. Vector Spaces in Information Extraction • Random Projection ‐ Application  Example • Extraction of Technology Terms (term classification) • Data Size: only 10,000 publication • Contexts: words and their position in the  neighbourhood of terms • Original Dimension:  • approximately  5 million • Reducing the dimension to 2000 using  Random Projection Behrang’s research evolves around classification and finding the optimal contexts in random vector spaces for  the extraction of technology terms and their relation. If you are interested please email him at  behrangatoffice@gmail.com

×