Your SlideShare is downloading. ×
Text mining and analytics   v6 - p1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Text mining and analytics v6 - p1

467
views

Published on

HICSS Tutorial - Jan 2011 - Part 1

HICSS Tutorial - Jan 2011 - Part 1

Published in: Education

2 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
467
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
34
Comments
2
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 5.2 million digitized books - about 4% of all books ever printedpublished during the past 200 yearsAll told, about 129 million books have been published since the invention of the printing press. In 2004, Google software engineers began making electronic copies of them, and have about 15 million so far, comprising more than two trillion words in 400 languages.They currently include Chinese, English, French, German, Russian and Spanish books dating back to the year 1500—about 4% of all books published. The database doesn't include periodicals, which might reflect popular culture from a different vantage.The resulting corpus contains over 500 billion words, inEnglish (361 billion), French (45B), Spanish (45B), German(37B), Chinese (13B), Russian (35B), and Hebrew (2B). Theoldest works were published in the 1500s. The early decadesare represented by only a few books per year, comprisingseveral hundred thousand words. By 1800, the corpus growsto 60 million words per year; by 1900, 1.4 billion; and by2000, 8 billion.
  • 5.2 million digitized books - about 4% of all books ever printedpublished during the past 200 yearsAll told, about 129 million books have been published since the invention of the printing press. In 2004, Google software engineers began making electronic copies of them, and have about 15 million so far, comprising more than two trillion words in 400 languages.They currently include Chinese, English, French, German, Russian and Spanish books dating back to the year 1500—about 4% of all books published. The database doesn't include periodicals, which might reflect popular culture from a different vantage.The resulting corpus contains over 500 billion words, inEnglish (361 billion), French (45B), Spanish (45B), German(37B), Chinese (13B), Russian (35B), and Hebrew (2B). Theoldest works were published in the 1500s. The early decadesare represented by only a few books per year, comprisingseveral hundred thousand words. By 1800, the corpus growsto 60 million words per year; by 1900, 1.4 billion; and by2000, 8 billion.