The Data-Driven World

830 views

Published on

The presentation that I gave on JCI ECM 2011 conference.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
830
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Data-Driven World

  1. 1. The Data-Driven World Kaur Alasoo
  2. 2. Kaur Alasoo• Computer Science, University of Tartu (2007-2010)• Intern at European Molecular Biology Laboratory (April - August 2010)• Systems Biology at Aalto University (2010 - 2012)
  3. 3. Outline• Where are we now?• Where are we heading to?• How to get there?
  4. 4. Where are we now?
  5. 5. Where are we now?1 Science: Culturomics2 Society: OkCupid3 Personal life: Tracking sites
  6. 6. 1Science: Culturomics
  7. 7. To keep up with the lexicon, dictionaries are to supplant them (Fig. 2E and fig. S5). High- significant driver of reupdated regularly (13). We examined how well frequency irregulars, which are more readily 200 years. The regulariz Culturomics: Using Googlethese changes corresponded with changes in ac-tual usage by studying the 2077 1-gram headwords remembered, hold their ground better. For in- stance, we found “found” (frequency: 5 × 10−4) and spilt originated in forms still cling to life i Books to analyze cultureadded to AHD4 in 2000. The overall frequency ofthese words, such as “buckyball” and “netiquette”,has soared since 1950: Two-thirds exhibited recent 200,000 times more often than we finded “finded.” In contrast, “dwelt” (frequency: 1 × 10−5) dwelt in our data only 60 times as often as “dwelled” E and F). But the -t irre England too. Each year Cambridge adopts “burFig. 1. Culturomic analy- A Bses study millions of booksat once. (A) Top row: Au-thors have been writingfor millennia; ~129 mil-lion book editions havebeen published since the 129 million booksadvent of the printing press published(upper left). Second row:Libraries and publishinghouses provide books toGoogle for scanning (mid-dle left). Over 15 million 15 million books Cbooks have been digitized. scannedThird row: Each book isassociated with metadata.Five million books are cho-sen for computational anal-ysis (bottom left). Bottom 5 million booksrow: A culturomic time line analyzed Frequency of theshows the frequency of word "apple"“apple” in English booksover time (1800–2000). Year(B) Usage frequency of
  8. 8. Fame depends on profession F Median frequency3” 1871 (gray lines; median, thick dark gray line). Five examples are highlighted.
  9. 9. birth date and (Fig. 3E). The age of peak celebrity has been con- similar (7) (fig. S 1800 to 1950, sistent over time: about 75 years after birth. But famous than eveof the 50 most the other parameters have been changing (fig. S8). more rapidly than Tracking censorship A B Frequency Frequency wikipedia.org
  10. 10. D B Frequency (fig. S8). more rapidly than ever.www.sciencemag.org on April 21, 2011
  11. 11. en su History of science ar MF w re “R ex m en fi id la
  12. 12. FeminismC D
  13. 13. 193 and Gender equality dis terD lie ske wa far sup enr suc arc Ma
  14. 14. 2Society: OkCupid
  15. 15. OkCupid: data mining to analyze dating
  16. 16. men want to date most attractive women
  17. 17. most men are below average
  18. 18. 3Personal Life: Tracking
  19. 19. Goodreads tracks allbooks you have ever read
  20. 20. Goodreads tracks allbooks you have ever read
  21. 21. Goodreads tracks allbooks you have ever read
  22. 22. Travel planning Visited places Read books Flight logExercise tracking Conversation log Activity tracking
  23. 23. Fitbit: tracking physical activity
  24. 24. Where are we heading to?
  25. 25. Where are we heading to? 1 Science: Other fields 2 Society: Data integration 3 Personal life: Data integration
  26. 26. 1Science: Other fields
  27. 27. Studying art historywith image processing?
  28. 28. Analyzing literature using text mining? http://www.textually.org/textually/archives/2009/02/022619.htm
  29. 29. Co-mentioning of peopleWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
  30. 30. Co-mentioning of peopleWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
  31. 31. Co-mentioning of termsWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
  32. 32. 2Society: Data Integration
  33. 33. Social network analysishttp://www.psychologytoday.com/blog/mr-personality/201001/the-psychology-social-networking
  34. 34. Data Integration http://strategicbusinessintelligence.biz/images/data-integration.jpg
  35. 35. 3Personal Life: Data Integration
  36. 36. Almost complete log of personal life
  37. 37. Travel planning Visited places Read books Flight logExercise tracking Conversation log Activity tracking
  38. 38. Data integrationPublic transport log Purchase history
  39. 39. How to get there?
  40. 40. Lack of data won’t be the problem http://www.sand.com/wp-content/uploads/2011/04/binary_data.jpg
  41. 41. Data integration and analysis skills are important http://www.keralaevents.com/eventphotos/799/Dataanalysis_1.jpg
  42. 42. Privacy has to be preservedhttp://www.geekologie.com/2008/04/warmth_and_privacy_while_using.php
  43. 43. Privacy-preserving data mining http://sharemind.cyber.ee/
  44. 44. Hard part is to come up with good questions http://www.hhmi.org/bulletin/dec2005/chronicle/crosstalk.html
  45. 45. Conclusion
  46. 46. Conclusion• There are already many successful examples of data-rich applications.• More and more data will become available in many different fields.• Collecting data is easy. Difficulties lie in analyzing it and understanding what it means.
  47. 47. References• Quantitative Analysis of Culture Using Millions of Digitized Books, Jean-Baptiste Michel, et al. Science 331, 176 (2011)• OkTrends: Dating research from OkCupid http://blog.okcupid.com• The Data-Driven Life, Gary Wolf, The New York Times, http:// www.nytimes.com/2010/05/02/magazine/02self-measurement- t.html• The Quantified Self | self knowledge through numbers http://quantifiedself.com

×