Mining Wikipedia For Awesome Data
by Neil Crosby
- 27,799 views
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Adobe PDF
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 12
- Downloads
- 412
- Comments
- 3
- Embed Views
- Views on SlideShare
- 27,325
- Total Views
- 27,799
What I did to get html content was to load directly the Wiki page and then to scrape for body content with xPath. That worked really fine and I could get the html content in just 1 call.
Issues: if the body id will change, the scraping business logic will have to be modified.
Thanks for sharing,
Enrico Foschi 4 years ago