This document summarizes Sawood Alam's doctoral research on developing efficient techniques for profiling web archives to improve memento aggregation. Some key points:
- It proposes using concise archive profiles to predict what content archives have in order to minimize traffic when routing memento requests.
- It evaluates different profiling policies and strategies like using CDX files, full-text search, and URI sampling to balance accuracy and resource costs.
- The results show profiling policies can achieve up to 80% routing accuracy with less than 1% of the cost of a complete archive index, while maintaining over 90% recall of available mementos.
- Future work includes expanding the profiles to include language, date/time