• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The Data-Driven World
 

The Data-Driven World

on

  • 654 views

The presentation that I gave on JCI ECM 2011 conference.

The presentation that I gave on JCI ECM 2011 conference.

Statistics

Views

Total Views
654
Views on SlideShare
654
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Data-Driven World The Data-Driven World Presentation Transcript

    • The Data-Driven World Kaur Alasoo
    • Kaur Alasoo• Computer Science, University of Tartu (2007-2010)• Intern at European Molecular Biology Laboratory (April - August 2010)• Systems Biology at Aalto University (2010 - 2012)
    • Outline• Where are we now?• Where are we heading to?• How to get there?
    • Where are we now?
    • Where are we now?1 Science: Culturomics2 Society: OkCupid3 Personal life: Tracking sites
    • 1Science: Culturomics
    • To keep up with the lexicon, dictionaries are to supplant them (Fig. 2E and fig. S5). High- significant driver of reupdated regularly (13). We examined how well frequency irregulars, which are more readily 200 years. The regulariz Culturomics: Using Googlethese changes corresponded with changes in ac-tual usage by studying the 2077 1-gram headwords remembered, hold their ground better. For in- stance, we found “found” (frequency: 5 × 10−4) and spilt originated in forms still cling to life i Books to analyze cultureadded to AHD4 in 2000. The overall frequency ofthese words, such as “buckyball” and “netiquette”,has soared since 1950: Two-thirds exhibited recent 200,000 times more often than we finded “finded.” In contrast, “dwelt” (frequency: 1 × 10−5) dwelt in our data only 60 times as often as “dwelled” E and F). But the -t irre England too. Each year Cambridge adopts “burFig. 1. Culturomic analy- A Bses study millions of booksat once. (A) Top row: Au-thors have been writingfor millennia; ~129 mil-lion book editions havebeen published since the 129 million booksadvent of the printing press published(upper left). Second row:Libraries and publishinghouses provide books toGoogle for scanning (mid-dle left). Over 15 million 15 million books Cbooks have been digitized. scannedThird row: Each book isassociated with metadata.Five million books are cho-sen for computational anal-ysis (bottom left). Bottom 5 million booksrow: A culturomic time line analyzed Frequency of theshows the frequency of word "apple"“apple” in English booksover time (1800–2000). Year(B) Usage frequency of
    • Fame depends on profession F Median frequency3” 1871 (gray lines; median, thick dark gray line). Five examples are highlighted.
    • birth date and (Fig. 3E). The age of peak celebrity has been con- similar (7) (fig. S 1800 to 1950, sistent over time: about 75 years after birth. But famous than eveof the 50 most the other parameters have been changing (fig. S8). more rapidly than Tracking censorship A B Frequency Frequency wikipedia.org
    • D B Frequency (fig. S8). more rapidly than ever.www.sciencemag.org on April 21, 2011
    • en su History of science ar MF w re “R ex m en fi id la
    • FeminismC D
    • 193 and Gender equality dis terD lie ske wa far sup enr suc arc Ma
    • 2Society: OkCupid
    • OkCupid: data mining to analyze dating
    • men want to date most attractive women
    • most men are below average
    • 3Personal Life: Tracking
    • Goodreads tracks allbooks you have ever read
    • Goodreads tracks allbooks you have ever read
    • Goodreads tracks allbooks you have ever read
    • Travel planning Visited places Read books Flight logExercise tracking Conversation log Activity tracking
    • Fitbit: tracking physical activity
    • Where are we heading to?
    • Where are we heading to? 1 Science: Other fields 2 Society: Data integration 3 Personal life: Data integration
    • 1Science: Other fields
    • Studying art historywith image processing?
    • Analyzing literature using text mining? http://www.textually.org/textually/archives/2009/02/022619.htm
    • Co-mentioning of peopleWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
    • Co-mentioning of peopleWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
    • Co-mentioning of termsWhen Arno and his father reached school, the classes had already begun. Kevade by Oska Luts
    • 2Society: Data Integration
    • Social network analysishttp://www.psychologytoday.com/blog/mr-personality/201001/the-psychology-social-networking
    • Data Integration http://strategicbusinessintelligence.biz/images/data-integration.jpg
    • 3Personal Life: Data Integration
    • Almost complete log of personal life
    • Travel planning Visited places Read books Flight logExercise tracking Conversation log Activity tracking
    • Data integrationPublic transport log Purchase history
    • How to get there?
    • Lack of data won’t be the problem http://www.sand.com/wp-content/uploads/2011/04/binary_data.jpg
    • Data integration and analysis skills are important http://www.keralaevents.com/eventphotos/799/Dataanalysis_1.jpg
    • Privacy has to be preservedhttp://www.geekologie.com/2008/04/warmth_and_privacy_while_using.php
    • Privacy-preserving data mining http://sharemind.cyber.ee/
    • Hard part is to come up with good questions http://www.hhmi.org/bulletin/dec2005/chronicle/crosstalk.html
    • Conclusion
    • Conclusion• There are already many successful examples of data-rich applications.• More and more data will become available in many different fields.• Collecting data is easy. Difficulties lie in analyzing it and understanding what it means.
    • References• Quantitative Analysis of Culture Using Millions of Digitized Books, Jean-Baptiste Michel, et al. Science 331, 176 (2011)• OkTrends: Dating research from OkCupid http://blog.okcupid.com• The Data-Driven Life, Gary Wolf, The New York Times, http:// www.nytimes.com/2010/05/02/magazine/02self-measurement- t.html• The Quantified Self | self knowledge through numbers http://quantifiedself.com