The document discusses analyzing over 1 million Victorian era books to find patterns and insights. It describes searching the text for topics related to religion, marriages, and other subjects. Examples are given of comparing the frequency of words like "God" and topics mentioned by authors of different years. Bigrams related to marriage are also analyzed over time. The purpose is to gain understanding of Victorian society and culture by studying the language and themes in a huge collection of texts from the period.
Driving Behavioral Change for Information Management through Data-Driven Gree...
Finding Meaning in 1M Victorian Books
1. FINDING MEANING IN A
MILLION VICTORIAN BOOKS
DAN COHEN
6 MARCH 2012
WEB: DANCOHEN.ORG
INSTITUTE OF HISTORICAL
RESEARCH, LONDON TWITTER: @DANCOHEN
2.
3. inbound links
outbound links
words in the URL
title of the web page
top-level domain (e.g., .edu)
age of second-level domain (e.g., dancohen.org)
date page was created
date page was last updated
how frequently a page is updated
related pages on the same domain
keywords in meta tags
font size of keywords (h1, h2, etc.)
...
9. title sales
author birth year of author
contributor death year of author
publication year fiction type
text full title
language height
material form LC subject heading
bibliographic level other subject headings
page count place of publication
publisher product form
library holdings publication country
43. Interesting bigrams with “marriage” as the second word
with frequencies that change significantly over the 19th c.
clandestine
forbidden
foreign
fruitless
good
happy
hasty
irregular
loveless
mixed
44.
45.
46. 1790s
sinful to harbour evil thoughts
sinful to the obedience of God
sinful to attend the playhouse
sinful to do it from any other motive but conviction
47. 1890s
sinful to be superstitiously credulous
sinful to drink intoxicating liquors
sinful to allow your grief to hurt you
sinful to treat beasts with cruelty