Annotating Research Datasets

414 views
364 views

Published on

A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging.  The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
414
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 10 minutes, Day 2, 9am April 11Abstract: A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging.  The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.
  • One way of looking at Big Data is this graph showing dataset size on the vertical axis against numbers of datasets on the horizontal axis.While there are some very large, celebrated datasets produced by satellites, ocean sensors, etc., there’s a very long tail off to the right of smaller, more obscure datasets that cumulatively account for a large portion of Big Data.
  • There are many more researchers out in the field collecting heterogeneous data, such as species counts obtained by visual sightings.
  • And there are many more grants supporting this kind of research...
  • And those grants are usually much smaller in terms of dollar amounts.
  • As a result, the large, celebrated datasets tend to come with staff positions for data management, as well as well-supported, standardized software tools supporting rich description and discovery, and enforcing certain curation standards.So for a huge number of grants and datasets, especially in Earth, environmental, and ecological sciences, ...
  • ... there’s an ugly truth. Many of these researchers,This amounts to a whole lot of inertia that keeps a large part of the scientific record invisible, at-risk, and unavailable for re-use.
  • Annotating Research Datasets

    1. 1. Annota&ng  Research   Datasets   1 1   A p r i l   2 0 1 3  U n i v e r s i t y   o f   C a l i f o r n i a   C u r a & o n   C e n t e r   C a l i f o r n i a   D i g i t a l   L i b r a r y  
    2. 2. Term  skew  Annota&on:  The  act  of  adding  a  note  by  way  of   comment  or  explana&on.  Genome  annota&on:  The  process  of  aFaching   biological  informa&on  to  sequences.    E.g.,  •  Protein  Data  Bank  annota&on  manual:  247  pgs  Research  data  annota&on:  (?!)  Adding  to  opaque   data  to  make  it  visible,  sensible,  and  valuable.  
    3. 3. The  Long  Tail  Size  of  dataset   #  datasets  
    4. 4. The  Long  Tail  Size  of  dataset   #  datasets   #  researchers  
    5. 5. The  Long  Tail  Size  of  dataset   #  datasets   #  researchers   #  grants  
    6. 6. The  Long  Tail   Size  of  dataset  grant  ($)   #  datasets   #  researchers   #  grants  
    7. 7. The  Long  Tail   With  data  managers   and  fancy  tools   Size  of  dataset  grant  ($)   #  datasets   #  researchers   Do-­‐it-­‐yourself  tools   #  grants  
    8. 8. UGLY   TRUTH  Many  researchers…  have  limited  funding  for  data  services  are  not  taught  data  management  don’t  know  what  metadata  or  data  centers  are  don’t  share  data  publicly  or  store  it  in  an  archive  aren’t  convinced  they  should  share  data   From  Flickr  By    puck90  
    9. 9. The research data problem  •  Journal article •  Research data –  Uniquely and persistently –  Nope identified –  Concept of “publish” –  Not really –  Multiple copies –  Typically one –  Easily findable –  Difficult –  Impact metrics, etc. –  Nope –  Curation funding –  Barely Research data is ripe for crowd-sourced annotation

    ×