Programming Historian (2008)
William J. Turkel & Alan MacEachern
1. Working with Files and Web Pages
2. From HTML to a List of Words
3. Computing Frequencies
4. Wrapping Output in HTML
5. Keywords in Context (KWIC)
6. Tag Clouds
7. Harvesting Links and Downloading Pages
8. Indexing a Document Collection
Are they using it?
Time spent on page
Topic Modeling & Mallet
From HTML to a List of Words
Understanding Regular Expressions
Applied Archival Downloading
Automated Downloading with WGET
Who is it useful for?
Libraries / Archives?
How do they find us?
Content in front of eyes. Not eyes to content.