×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources - OSCON 2008

by on Jul 21, 2009

  • 1,328 views

Wikipedia contains a wealth of collective knowledge but due to its semi-structured design and idiosyncratic markup mining this resource is a formidable challenge. This session will examine techniques ...

Wikipedia contains a wealth of collective knowledge but due to its semi-structured design and idiosyncratic markup mining this resource is a formidable challenge. This session will examine techniques for mining semantically weak data sources for explicit facts.

The session will utilize WEX and preprocessed normalization of Wikipedia designed to make this corpus easily accessible to developers interested in machine learning, natural language processing, or knowledge extraction. The process through which WEX is prepared, as a guide to creating mineable structures from semi-structured data, will be discussed followed by approaches to machine extraction on structures of mixed data quality.

The session is targeted at intermediate developers with an interest in machine learning or knowledge extraction (though no experience is assumed with either).

The demonstrations leverage the power of Postgres 8.3’s XPath capability to simplify the programming model and present examples in Python, but the data and principles are compatible with any modern data infrastructure.

Statistics

Views

Total Views
1,328
Views on SlideShare
1,328
Embed Views
0

Actions

Likes
2
Downloads
29
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources - OSCON 2008 Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources - OSCON 2008 Presentation Transcript