To keep up with the lexicon, dictionaries are to supplant them (Fig. 2E and fig. S5). High- significant driver of reupdated regularly (13). We examined how well frequency irregulars, which are more readily 200 years. The regulariz Culturomics: Using Googlethese changes corresponded with changes in ac-tual usage by studying the 2077 1-gram headwords remembered, hold their ground better. For in- stance, we found “found” (frequency: 5 × 10−4) and spilt originated in forms still cling to life i Books to analyze cultureadded to AHD4 in 2000. The overall frequency ofthese words, such as “buckyball” and “netiquette”,has soared since 1950: Two-thirds exhibited recent 200,000 times more often than we finded “finded.” In contrast, “dwelt” (frequency: 1 × 10−5) dwelt in our data only 60 times as often as “dwelled” E and F). But the -t irre England too. Each year Cambridge adopts “burFig. 1. Culturomic analy- A Bses study millions of booksat once. (A) Top row: Au-thors have been writingfor millennia; ~129 mil-lion book editions havebeen published since the 129 million booksadvent of the printing press published(upper left). Second row:Libraries and publishinghouses provide books toGoogle for scanning (mid-dle left). Over 15 million 15 million books Cbooks have been digitized. scannedThird row: Each book isassociated with metadata.Five million books are cho-sen for computational anal-ysis (bottom left). Bottom 5 million booksrow: A culturomic time line analyzed Frequency of theshows the frequency of word "apple"“apple” in English booksover time (1800–2000). Year(B) Usage frequency of
Fame depends on profession F Median frequency3” 1871 (gray lines; median, thick dark gray line). Five examples are highlighted.
birth date and (Fig. 3E). The age of peak celebrity has been con- similar (7) (fig. S 1800 to 1950, sistent over time: about 75 years after birth. But famous than eveof the 50 most the other parameters have been changing (fig. S8). more rapidly than Tracking censorship A B Frequency Frequency wikipedia.org
D B Frequency (fig. S8). more rapidly than ever.www.sciencemag.org on April 21, 2011
en su History of science ar MF w re “R ex m en fi id la
Conclusion• There are already many successful examples of data-rich applications.• More and more data will become available in many different ﬁelds.• Collecting data is easy. Difﬁculties lie in analyzing it and understanding what it means.
References• Quantitative Analysis of Culture Using Millions of Digitized Books, Jean-Baptiste Michel, et al. Science 331, 176 (2011)• OkTrends: Dating research from OkCupid http://blog.okcupid.com• The Data-Driven Life, Gary Wolf, The New York Times, http:// www.nytimes.com/2010/05/02/magazine/02self-measurement- t.html• The Quantiﬁed Self | self knowledge through numbers http://quantiﬁedself.com