1. Citizen Science from Galaxy Zoo to
the Zooniverse
Chris Lintott
Adler Planetarium; Oxford University
Lucy Fortson
University of Minnesota
National Science Foundation May 10, 2011
18. Known knowns : Data
reduction by science team
Known unknowns : Results
funnelled to specific researchers
19. Known knowns : Data
reduction by science team
Known unknowns : Results
funnelled to specific researchers
Unknown unknowns : Discussion
tool allows interesting objects to
become prominent.
32. Data from external
sources
(eg telescope)
Primary
Interface
Output
Data
Science
Training sets (papers,
for data
Machine release)
33. Data from external
sources
(eg telescope)
Advanced Tools
1. Discussion tools & fora
2. Advanced tools for data analysis
3. A ‘journal’ for citizen scientists
Primary
Interface
Output
Data
Science
Training sets (papers,
for data
Machine release)
34. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
Output
Data
Science
Training sets (papers,
for data
Machine release)
35. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
Output
Data
Science
Training sets (papers,
for data
Machine release)
36. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
Education Research
- What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Science
Training sets (papers,
for data
Machine release)
37. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
Education Research
- What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Science Informed
Training sets (papers,
for Design &
data Best
Machine release) Practice
38. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
Education Research
CI-TEAM - What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Science Informed
Training sets (papers,
for Design &
data Best
Machine release) Practice
39. Data from external
sources
(eg telescope)
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
ISE
Education Research
CI-TEAM - What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Science Informed
Training sets (papers,
for Design &
data Best
Machine release) Practice
40. Data from external
sources
(eg telescope)
CDI
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
ISE
Education Research
CI-TEAM - What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Science Informed
Training sets for
CDI Machine
Learning
(papers,
data
Design &
Best
release) Practice
41. Cyber-enabled Discovery & Innovation
Conquering the data flood with a
transformative partnership between
citizen scientists and machines
55. User-based Research
Enabling Toolkits
• Human element in data processing pipeline
-> Serendipity -> user-based discoveries
(Green Peas, variable stars)
• Motivation study -> users want to
participate in a real research project
• Collective intelligence -> online peer
mentorship -> “Talk” tool
56. User-based Research
Enabling Toolkits
• Provide opportunity for users to explore
data beyond the primary task
• Tools that enable users to engage in the
process of research from data -> papers
64. Percentage of PlanetHunters users that visit Talk: 62.6%
Percentage of Talk users that comment: 38.5%
Percentage of PlanetHunters users that comment: 24.1%
73. Types of volunteer
• Initial volunteer
– First-time user
• Sustained volunteer
– Continuing user
• Meta volunteer
– Works to enable contributions by other users
74. • Crowston & Fagnot (2008)
– Awareness: Do volunteers know their help is
needed?
– Capacity: Do they think they can do what is asked?
– Obligation: Are they committed to continue?
– Evaluation: Do they think benefits outweigh costs?
• O’Brien & Toms (2010)
– Volunteer-Website Interaction: How do they react to
the website?
• Model: five factors for three user types
75. Data from external
sources
(eg telescope)
CDI
Advanced Tools
Tools for more
structured learning 1. Discussion tools & fora
2. Advanced tools for data analysis
- Class groups 3. A ‘journal’ for citizen scientists
- teach.zooniverse.org for
resource sharing
- Alternative scaffolding Primary
- For different ages Interface
- For use in class
- For museums
ISE
Education Research
CI-TEAM - What motivates volunteers to
contribute at different levels?
- Who is participating at which
level?
Output
Data - What do they learn by doing
so?
Training sets for Science Informed
CDI Machine
Learning
(papers,
data
Design &
Best
release) Practice
76. Project Domain Functionality required
Bodleian music Music/History Extended
whale.fm Zoology transcription
Audio classification
SETIQuest Astrophysics Live operation
Hurricane Climate science New drawing tool
classification Zoo
Radio Galaxy Astrophysics None
Roman Britain Archeology Multiwavelength
imaging
Generalized
Old Weather : Climate Science/
Arctic History transcription
Live operation/
Protected Planet Environmental
science multiwavelength
imaging
Crow Zoo Ecology None
NEEMO/BioTagger Marine science GIS Integration
Gamma-ray Zoo Astrophysics None
79. Science team
propose project
CSA Questionnaire
What science will be produced?
Why can’t the project be done
automatically?
What training will be necessary?
What resources are available to exploit
the results?
89. Thanks to...
The National Science Foundation
The Leverhulme Trust, NASA, European Union FP7, SDSS, TED, Microsoft, Google, Scientific American
Karen Carney, Adam Tarnoff, Geza Gyuk, John Wallin, Kirk Borne, Jordan Raddick, Pamela Gay, Jason
Reed, Andrea Lardner, Michael Parrish, Arfon Smith, Rob Simpson, Stuart Lynn, William MacFarlane, James
Brusuelas, the Zooniverse science teams and...
Ask the simplest useful question. \nKevin - and limitations of single expert classification. \n
\n
Working at web scale...or not. \nJump forward : solving this through judicious use of commercial technologies. \n
User weighting after the fact\nConfidence as well as answer\n
\n
\n
Serendipitous discovery\n
Serendipitous discovery\n
What else can we do with a forum?\n
Collaboration.\n\nFace on S over E...SE...SS...’Q’...’X’...‘Phi\n
Object-orientated commenting. \n\nDifferent levels of promise - commitment to use 1 (&2?). \nExample of generalised problem. \n
Object-orientated commenting. \n\nDifferent levels of promise - commitment to use 1 (&2?). \nExample of generalised problem. \n
Object-orientated commenting. \n\nDifferent levels of promise - commitment to use 1 (&2?). \nExample of generalised problem. \n
Object-orientated commenting. \n\nDifferent levels of promise - commitment to use 1 (&2?). \nExample of generalised problem. \n
\n
Noon air temperatures. \n
HMS Africa : Ships crew ~600\nWeather observations throughout\n
\n
Very genralizable domain model\n
- Code reuse (allows small projects, media independence)\n- Keeping volunteers in one place : Good for us, progression for them\n- A quality assured set of projects. \n- Open calls, more projects to come\n\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
April–July 2010, during which nearly 14,000 supernova candidates from PTF were classified by more than 2,500 individuals within a few hours of data collection. We compare the transients selected by the citizen scientists to those identified by experienced PTF scanners, and find the agreement to be remarkable – Galaxy Zoo Supernovae performs comparably to the PTF scanners, and identified as transients 93% of the ∼ 130 spectroscopically confirmed\n
April–July 2010, during which nearly 14,000 supernova candidates from PTF were classified by more than 2,500 individuals within a few hours of data collection. We compare the transients selected by the citizen scientists to those identified by experienced PTF scanners, and find the agreement to be remarkable – Galaxy Zoo Supernovae performs comparably to the PTF scanners, and identified as transients 93% of the ∼ 130 spectroscopically confirmed\n
April–July 2010, during which nearly 14,000 supernova candidates from PTF were classified by more than 2,500 individuals within a few hours of data collection. We compare the transients selected by the citizen scientists to those identified by experienced PTF scanners, and find the agreement to be remarkable – Galaxy Zoo Supernovae performs comparably to the PTF scanners, and identified as transients 93% of the ∼ 130 spectroscopically confirmed\n
Hubble time is rare - minimize false positives as best you can to get best hit rate. \nGive up on completeness, miss unusual examples. \n
Over the past 8 months, a community of more than 40,000 volunteers has classified less than 1% of the first LRO data release. With the LRO mission expected to last for at least another 7 years producing tens of terabytes of data with every release,\n\nLunar scientists are already experiencing the ‘data flood’, receiving unprecedented quantities of data from the Lunar Reconnaissance Orbiter (LRO). Sub-meter imaging of a significant fraction of the lunar surface is producing tens of terabytes of image data per year and attempts to take a similar approach to Galaxy Zoo with the Moon Zoo project (http://moonzoo.org) have demonstrated that this approach does not scale. So far, a community of over 40,000 volunteers has classified less than 1% of the first LRO data release. At current classification rates it will take over a century for the Moon Zoo community to analyze the full LRO dataset.\n\nHave a pipeline that is slicing and dicing but also registering the new ‘Assets’ with the Moon Zoo backend API.\nCatalogue size will depend upon mission lifetime - except that there is no ‘catalogue’\nSDSS for Galaxy Zoo\n
Over the past 8 months, a community of more than 40,000 volunteers has classified less than 1% of the first LRO data release. With the LRO mission expected to last for at least another 7 years producing tens of terabytes of data with every release,\n\nLunar scientists are already experiencing the ‘data flood’, receiving unprecedented quantities of data from the Lunar Reconnaissance Orbiter (LRO). Sub-meter imaging of a significant fraction of the lunar surface is producing tens of terabytes of image data per year and attempts to take a similar approach to Galaxy Zoo with the Moon Zoo project (http://moonzoo.org) have demonstrated that this approach does not scale. So far, a community of over 40,000 volunteers has classified less than 1% of the first LRO data release. At current classification rates it will take over a century for the Moon Zoo community to analyze the full LRO dataset.\n\nHave a pipeline that is slicing and dicing but also registering the new ‘Assets’ with the Moon Zoo backend API.\nCatalogue size will depend upon mission lifetime - except that there is no ‘catalogue’\nSDSS for Galaxy Zoo\n
Slicing and dicing images - moving target for the catalogue size\nProbably about 60,000,000 images - could be up to 600,000,000\n80GB / day raw images (30TB over first year)\nChanges the game for us a little\n
\n
\n
\n
\n
Apprenticeship vs collective intelligence (apprenticeship is not scalable)\n
Data – 1. primary task data and 2. UBRET data (a) external data (b) user-generated data\n
Understanding human classification\n
The role of an intermediate interface\n
\n
\n
\n
These are all candidate planets that we found that had not been seen yet by Kepler. We just had a run on the Keck telescope last night\nTo follow up on some of these. \nAudio has a planet in it.\n
\n
\n
Different scaffolding targeting different audiences\n
CI-TEAM is producing a tool for formal educators to share and adapt zoo-resources for the classroom; has group capability now.\n
State that these are wireframes; not designs.\n
\n
\n
\n
What motivates each class of volunteer?\nCredit JHU + Adler\n
eg Awareness : Initial - did they hear of the project, Sustained - Contribution is needed, Meta - identifies with the project.\nObligation Initial - absent, sustained - believes in project goals, meta - responsibility to project\n\nNext up - Giant Pilot Survey, targeted survey. \n
\n
Development strategy\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Citizen science as a facility\nNot open but nearly open\n
Inspired by Google Labs, but consistent site design and options.\n
App (now available in iPhone & Android) shows significant increase in number of classifications per user. iPhone - Mean ~1900 clicks per user! (Total > 3million)\nAlerts, social messaging. \n
Zooniverse home as a personalized, and updating portal.\n
Hubble time is rare - minimize false positives as best you can to get best hit rate. \nGive up on completeness, miss unusual examples. \n
Hubble time is rare - minimize false positives as best you can to get best hit rate. \nGive up on completeness, miss unusual examples. \n
Hubble time is rare - minimize false positives as best you can to get best hit rate. \nGive up on completeness, miss unusual examples. \n