Your SlideShare is downloading. ×
Arcomem training heritrix_beginner
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Arcomem training heritrix_beginner

148

Published on

This presentation on using the Heritrix crawler is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on …

This presentation on using the Heritrix crawler is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
148
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Adaptive Heritrix ATHENA – Research and Innovation Center in Information, Communication and Knowledge Technologies
  • 2. ARCOMEM Requirements for crawling • ARCOMEM aims to guide crawling based on – Advanced semantic link extraction – Use of social media – Analysis of crawled content in large-scale distributed environment • These aims require a crawler to – Update adaptively priorities – Operate as a service 2Adaptive Heritrix
  • 3. Adaptive Prioritization • New Heritrix frontier class – Plug & Play with open source Heritrix – Minimal configuration • Adding forward index for URLs – locates a link already scheduled for crawling • Moves scheduled link to the place corresponding to the updated priority 3Adaptive Heritrix

×