Text mining and analytics v6 - p1

735 views

Published on

HICSS Tutorial - Jan 2011 - Part 1

Published in: Education
2 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
735
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide
  • 5.2 million digitized books - about 4% of all books ever printedpublished during the past 200 yearsAll told, about 129 million books have been published since the invention of the printing press. In 2004, Google software engineers began making electronic copies of them, and have about 15 million so far, comprising more than two trillion words in 400 languages.They currently include Chinese, English, French, German, Russian and Spanish books dating back to the year 1500—about 4% of all books published. The database doesn't include periodicals, which might reflect popular culture from a different vantage.The resulting corpus contains over 500 billion words, inEnglish (361 billion), French (45B), Spanish (45B), German(37B), Chinese (13B), Russian (35B), and Hebrew (2B). Theoldest works were published in the 1500s. The early decadesare represented by only a few books per year, comprisingseveral hundred thousand words. By 1800, the corpus growsto 60 million words per year; by 1900, 1.4 billion; and by2000, 8 billion.
  • 5.2 million digitized books - about 4% of all books ever printedpublished during the past 200 yearsAll told, about 129 million books have been published since the invention of the printing press. In 2004, Google software engineers began making electronic copies of them, and have about 15 million so far, comprising more than two trillion words in 400 languages.They currently include Chinese, English, French, German, Russian and Spanish books dating back to the year 1500—about 4% of all books published. The database doesn't include periodicals, which might reflect popular culture from a different vantage.The resulting corpus contains over 500 billion words, inEnglish (361 billion), French (45B), Spanish (45B), German(37B), Chinese (13B), Russian (35B), and Hebrew (2B). Theoldest works were published in the 1500s. The early decadesare represented by only a few books per year, comprisingseveral hundred thousand words. By 1800, the corpus growsto 60 million words per year; by 1900, 1.4 billion; and by2000, 8 billion.
  • ×