Data by Design




Elizabeth F. Churchill
Design/Science of participation
    (1) Science through (platforms for mediated communication)
         TMSP




    (2) Science on (social science contributions about fundamentals of
    psychology/communication/collaboration/cooperation)
         “Hubble telescope” of social science


WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY
On (1) – TMSP via SMPs

  Awareness
    Conversation and content exchange good;
     content storage, indexing and search poor
  Content sharing
    Malleable as well as stable content
  Coordination
    Long and short term
  Collaborative production
    Lightweight to complex
  Longevity
    Currently questionable….
Cooperative activities,
centralised




Collective action,
centralised




Collective action,
decentralised
On (2)- Sciences of the social

           Data quality
             descriptive/predictive; observed/understood;
               local/universal; reactive/proactive; stand-
               alone/replicated
           Science quality
             Data stability/longevity, TOS, content and
               social responsibility

  WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY

Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
Focus on (2)




 Mike Loukides
 http://radar.oreilly.com/2010/06/what-is-data-science.html
On Data Science

 “What differentiates data science from statistics is that
  data science is a holistic approach. We’re increasingly
  finding data in the wild, and data scientists are involved
  with gathering data, massaging it into a tractable form,
  making it tell its story, and presenting that story to others.”

 The first step of any data analysis project is “data
  conditioning,” or getting data into a state where it’s
  usable.
On Data Science

 The most meaningful definition I’ve heard: “big data” is
  when the size of the data itself becomes part of the
  problem.

 The need to define a schema in advance conflicts with
  reality of multiple, unstructured data sources, in which
  you may not know what’s important until after you’ve
  analyzed the data.
On Data Science

 Data scientists … come up with new ways to view the
  problem, or to work with very broadly defined problems:
  “here’s a lot of data, what can you make from it?”

 The future belongs to the companies who figure out how
  to collect and use data successfully.

     …and the scientists?
Business logic is not science logic
http://www.forbes.com/sites/onmarketing/2012/06/28/social-media-and-the-big-data-explosion/
Data – the ‘this is the dataset’ problem
Verbeeldingskr8 on Flickr
Interface elements
….lead to data, inviting action and inviting information
Facebook
Like!
Like?
Agree!
Disagree!
(bookmarked)
Hello Sherry
Dating
profile creation




explicit versus passive
“personalisation”
Anxiety, self reflection, identity….




                                       Eva Illouz
Flickr
Recording and Sharing
Documenting
Personal and Collective
Memory

Competition
Status

Affiliation
Group Membership

Learning
Emulating

Awareness
Near and Far

Curiosity/Voyeuris
m
Flickr – Photo sharing by user location
The Library of Congress, the Powerhouse Museum, the Smithsonian,
New York Public Library, and Cornell University Library
http://www.flickr.com/photos/powerhouse_museum/2980051095/
http://www.museumsandtheweb.com/mw2011/papers/rethinking_evaluation_metrics_in_light_of_flic
Data longevity

 “Like all Commons members, the other qualitative
  measure we value highly is the sheer inventiveness of
  Flickr members who engage with the photographs.

 Currently, Cornell saves links to examples of reuse on
  delicious (http://www.delicious.com) and displays them
  as a feed on its website.
Business logic is not science logic
Design/Science of participation
(1) Science through (platforms for mediated
communication)
   TMSP




(2) Science on (social science contributions about
fundamentals of collaboration/cooperation)
   “Hubble telescope” of social science
Reflections on requirements
   Stability – the existence of content in an accessible (and hopefully the same)
    format over time

   Science requires
       Consistency: consistently re-code the same data in the same way over a period of
        time
       Reproducibility: the tendency for a group of coders to classify categories membership
        in the same way
       Accuracy: or the extent to which the classification of a text corresponds to a
        standard or norm statistically.
       Validity
            correspondence of the categories to the conclusions, avoiding ambiguity and
             addressing multiple possible classifications
            Proof: trust in the inferential procedures and clarity of what level of implication is
             allowed. i.e. do the conclusions follow from the data or are they explainable due
             to some other phenomenon

       Generalizability of results to a theory
       Cross-setting comparative interventions
On (2)- Sciences of the social

           Data quality
             descriptive/predictive; observed/understood;
               local/universal; reactive/proactive; stand-
               alone/replicated
           Science quality
             Data stability/longevity, TOS, content and
               social responsibility

  WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY

Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
Questions?

churchill@acm.org

xeeliz on Twitter
Acknowledgements

 On dating: Elizabeth Goodman; on Flickr: Shyong (Tony)
  Lam, on instrumentation and analysis: David Ayman
  Shamma & M. Cameron Jones; on Flickr Commons:
  George Oates

 Flickr photographers: Marina Noordegraaf
  (Verbeeldingskr8), Tim Jagenberg, Nicolas Nova

Elizabeth Churchill, "Data by Design"

  • 1.
  • 2.
    Design/Science of participation (1) Science through (platforms for mediated communication)  TMSP (2) Science on (social science contributions about fundamentals of psychology/communication/collaboration/cooperation)  “Hubble telescope” of social science WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY
  • 3.
    On (1) –TMSP via SMPs  Awareness  Conversation and content exchange good; content storage, indexing and search poor  Content sharing  Malleable as well as stable content  Coordination  Long and short term  Collaborative production  Lightweight to complex  Longevity  Currently questionable….
  • 4.
  • 5.
    On (2)- Sciencesof the social  Data quality  descriptive/predictive; observed/understood; local/universal; reactive/proactive; stand- alone/replicated  Science quality  Data stability/longevity, TOS, content and social responsibility WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
  • 6.
    Focus on (2) Mike Loukides http://radar.oreilly.com/2010/06/what-is-data-science.html
  • 7.
    On Data Science “What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.”  The first step of any data analysis project is “data conditioning,” or getting data into a state where it’s usable.
  • 8.
    On Data Science The most meaningful definition I’ve heard: “big data” is when the size of the data itself becomes part of the problem.  The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what’s important until after you’ve analyzed the data.
  • 9.
    On Data Science Data scientists … come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”  The future belongs to the companies who figure out how to collect and use data successfully.  …and the scientists?
  • 10.
    Business logic isnot science logic
  • 11.
  • 12.
    Data – the‘this is the dataset’ problem
  • 13.
  • 14.
    Interface elements ….lead todata, inviting action and inviting information
  • 15.
  • 17.
  • 18.
  • 20.
    profile creation explicit versuspassive “personalisation”
  • 23.
    Anxiety, self reflection,identity…. Eva Illouz
  • 24.
  • 26.
    Recording and Sharing Documenting Personaland Collective Memory Competition Status Affiliation Group Membership Learning Emulating Awareness Near and Far Curiosity/Voyeuris m
  • 27.
    Flickr – Photosharing by user location
  • 28.
    The Library ofCongress, the Powerhouse Museum, the Smithsonian, New York Public Library, and Cornell University Library
  • 32.
  • 33.
  • 34.
    Data longevity  “Likeall Commons members, the other qualitative measure we value highly is the sheer inventiveness of Flickr members who engage with the photographs.  Currently, Cornell saves links to examples of reuse on delicious (http://www.delicious.com) and displays them as a feed on its website.
  • 44.
    Business logic isnot science logic
  • 45.
    Design/Science of participation (1)Science through (platforms for mediated communication)  TMSP (2) Science on (social science contributions about fundamentals of collaboration/cooperation)  “Hubble telescope” of social science
  • 46.
    Reflections on requirements  Stability – the existence of content in an accessible (and hopefully the same) format over time  Science requires  Consistency: consistently re-code the same data in the same way over a period of time  Reproducibility: the tendency for a group of coders to classify categories membership in the same way  Accuracy: or the extent to which the classification of a text corresponds to a standard or norm statistically.  Validity  correspondence of the categories to the conclusions, avoiding ambiguity and addressing multiple possible classifications  Proof: trust in the inferential procedures and clarity of what level of implication is allowed. i.e. do the conclusions follow from the data or are they explainable due to some other phenomenon  Generalizability of results to a theory  Cross-setting comparative interventions
  • 47.
    On (2)- Sciencesof the social  Data quality  descriptive/predictive; observed/understood; local/universal; reactive/proactive; stand- alone/replicated  Science quality  Data stability/longevity, TOS, content and social responsibility WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
  • 48.
  • 49.
    Acknowledgements  On dating:Elizabeth Goodman; on Flickr: Shyong (Tony) Lam, on instrumentation and analysis: David Ayman Shamma & M. Cameron Jones; on Flickr Commons: George Oates  Flickr photographers: Marina Noordegraaf (Verbeeldingskr8), Tim Jagenberg, Nicolas Nova

Editor's Notes

  • #24 Define serious and casual daters Serious: Looking for a long term partner Casual: Looking for short-term or one-time encounters Industry assumptions: Division between “ serious ” and “ casual ” daters in terms of what they ’ d pay for and the effort they put in Interviews were semi-structured, asking people to talk about the experience of planning and going on dates after only communicating online. Focusing on the work of dating -- the management of schedules, choosing locations, dealing with unexpected delays in traffic, handling anxiety.