Adaptive Heritrix
ATHENA – Research and Innovation Center in Information,
Communication and Knowledge Technologies
ARCOMEM Requirements for crawling
• ARCOMEM aims to guide crawling based on
– Advanced semantic link extraction
– Use of s...
Adaptive Prioritization
• New Heritrix frontier class
– Plug & Play with open source Heritrix
– Minimal configuration
• Ad...
Upcoming SlideShare
Loading in...5
×

Arcomem training heritrix_beginner

180

Published on

This presentation on using the Heritrix crawler is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
180
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Arcomem training heritrix_beginner

  1. 1. Adaptive Heritrix ATHENA – Research and Innovation Center in Information, Communication and Knowledge Technologies
  2. 2. ARCOMEM Requirements for crawling • ARCOMEM aims to guide crawling based on – Advanced semantic link extraction – Use of social media – Analysis of crawled content in large-scale distributed environment • These aims require a crawler to – Update adaptively priorities – Operate as a service 2Adaptive Heritrix
  3. 3. Adaptive Prioritization • New Heritrix frontier class – Plug & Play with open source Heritrix – Minimal configuration • Adding forward index for URLs – locates a link already scheduled for crawling • Moves scheduled link to the place corresponding to the updated priority 3Adaptive Heritrix
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×