0
https://twitter.com/cory_foy/status/493759969045409792/photo/1
On Science Research Data
Jeffrey Lancaster, Ph.D.
Emerging Technologies Coordinator
Columbia University Libraries
jeffrey....
How do you feel about being
ever-increasingly bombarded
by data?
On Science Research Data
On Science Research Data
Computer Science
Chemical
Biological
Geological
Engineering
Medical
Psychological
Mathematical
Ph...
SciencesHumanitiesSocial
Sciences
<1990’s
Text
Census
Data
Experimental
Data
Highly structured Wild West, y’all
SciencesHumanities
Social Sciences
>2010’s
Text
Census
Exp. Data
Digital
Humanities
Regression
Analysis
Data
Mining
Code
Practices in science research
drive institutional approaches to
research support.
Why?
Data Lifecycle
Data Lifecycle
Data Management Planning
Formats, Metadata
Storage
Funding
Etc.mat	
  
Data Lifecycle
Original Data
Collected Data	
  
Data Lifecycle
Formats, Metadata
Software
Methodology	
  
Data Lifecycle
Publishing
Copyright
Intellectual Prop.
Open Access
Repositories
(Alt)metrics
Professor Nick Turro
(1938-2012)
Applica*ons?	
  Plausible?	
  
Useful?	
  Novel?	
  
New	
  
Ques*ons	
  
New	
  
Knowledge	
  
Patents	
  
Background	
  ...
Baseline:
What’s the minimum you need
to know about a field/subject
to be helpful?
Case Study 1: I’m on a Boat!
R/V Marcus G. Langseth
Case Study 1: I’m on a Boat!
MCS Acquisition
Syntrak 960-24
SSI Seisnet active tape
emulation
Hydrophone arrays
Sentry sol...
Activity: Spreadsheets
What do you observe about the data?
Can you describe the experiment that was being done?
What did t...
Case Study 2: Breaking Bad
Activity: Lab Notebooks
What do you observe about the data?
Can you describe the experiment that was being done?
What did ...
Case Study 3: Needle in a Haystack
http://core-genomics.blogspot.com/2012/05/resources-for-public-understanding-of.html
Big Data
+
Data Science
CERN:
approx. 1 PB/sec = 1000 TB/sec = 1000000 GB/sec
filtered to 1 GB/sec
http://arstechnica.com/...
Big Data
+
Data Science
Institute for Data Sciences & Engineering:
•  Cybersecurity Center
•  Financial and Business Analy...
Conversation: Code
What is special about code?
What do you need to know to help a patron code?
What are best practices for...
Some disciplines have repositories.
Some don’t.
Some institutions have repositories.
Some don’t.
figshare.com
Share	
  research	
  components	
  to	
  make	
  them	
  discoverable	
  &	
  citable;	
  get	
  metrics	
  
DOIs for Code
Github + Mozilla + Figshare à DOIs
Mozilla: Software Carpentry
Future: Electronic Lab Notebooks
Science metadata depends on the
discipline.
Sort of.
Some Problems with
Science (Data)
Scientists are lazy.
But:
They’ll do what funders tell them to.
Crowd-Funding Science
Funding	
  science	
  may	
  no	
  longer	
  rely	
  upon	
  government.	
  Interested	
  people,	
 ...
Reproducibility Initiative
Address	
  the	
  reproducibility	
  of	
  your	
  research	
  in	
  a	
  blind,	
  fee-­‐for-­...
(Some) Scientists are private.
About some things.
And some scientists are not.
ORCID, ResearcherID, etc.
Unique	
  idenFfiers	
  for	
  researchers	
  to	
  cross-­‐reference	
  publicaFons,	
  acFviFes...
Sharing doesn’t count.
Until now.
Run My Code
Share	
  code	
  used	
  to	
  analyze	
  data;	
  others	
  can	
  implement	
  the	
  same	
  methodology	
 ...
Altmetric(s)
Capture	
  overall	
  impact	
  of	
  a	
  publicaFon	
  in	
  blogs,	
  tweets,	
  menFons,	
  news,	
  etc....
Applica*ons?	
  Plausible?	
  
Useful?	
  Novel?	
  
New	
  
Ques*ons	
  
New	
  
Knowledge	
  
Patents	
  
Background	
  ...
Digital Science
Digital Science Librarian
Questions?
Jeffrey Lancaster, Ph.D.
Emerging Technologies Coordinator
Columbia University Libraries
jeffrey.lancaster@colu...
Science @ Columbia
columbiascience.tumblr.com
Upcoming SlideShare
Loading in...5
×

CLIR Fellows - Science Data - 14_0730

372

Published on

Presentation on July 30, 2014 to CLIR Post-Doctoral Fellows at Bryn Mawr College.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
372
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "CLIR Fellows - Science Data - 14_0730"

  1. 1. https://twitter.com/cory_foy/status/493759969045409792/photo/1
  2. 2. On Science Research Data Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator Columbia University Libraries jeffrey.lancaster@columbia.edu @j_lancaster
  3. 3. How do you feel about being ever-increasingly bombarded by data?
  4. 4. On Science Research Data
  5. 5. On Science Research Data Computer Science Chemical Biological Geological Engineering Medical Psychological Mathematical Physical Astronomical Etc.  
  6. 6. SciencesHumanitiesSocial Sciences <1990’s Text Census Data Experimental Data Highly structured Wild West, y’all
  7. 7. SciencesHumanities Social Sciences >2010’s Text Census Exp. Data Digital Humanities Regression Analysis Data Mining Code
  8. 8. Practices in science research drive institutional approaches to research support. Why?
  9. 9. Data Lifecycle
  10. 10. Data Lifecycle Data Management Planning Formats, Metadata Storage Funding Etc.mat  
  11. 11. Data Lifecycle Original Data Collected Data  
  12. 12. Data Lifecycle Formats, Metadata Software Methodology  
  13. 13. Data Lifecycle Publishing Copyright Intellectual Prop. Open Access Repositories (Alt)metrics
  14. 14. Professor Nick Turro (1938-2012)
  15. 15. Applica*ons?  Plausible?   Useful?  Novel?   New   Ques*ons   New   Knowledge   Patents   Background   Jus*fica*on   Conferences   Community   Conversa*on   Data   Analysis   Confirma*on   Reagents   Protocols   Learning   Up  to  Speed   Researchers   Students   Jus*fica*on   Grants   JOB  &   FUNDING   PUBLISH   PEOPLE   FUNDING   ANALYSIS  &   RESULTS   EXPERIMENTS   RESEARCH   PLAN   IDEA   Big  or  small   Discussion   Conferences   Talks   Ar*cles   Thesis   Talks   The Research Workflow Adapted  from  Laura  Cro/  @  Nature  
  16. 16. Baseline: What’s the minimum you need to know about a field/subject to be helpful?
  17. 17. Case Study 1: I’m on a Boat! R/V Marcus G. Langseth
  18. 18. Case Study 1: I’m on a Boat! MCS Acquisition Syntrak 960-24 SSI Seisnet active tape emulation Hydrophone arrays Sentry solid cable 12.5 meter groups 150m sections up to four towed separation 50 - 150 meters Source Arrays 4 x 10 gun strings 9 active, one spare / string 15 meter string length 1650 cu. In. per string Source Controller DigiShot MCS geometry sensors Digicourse 5011 Compassbirds Digicourse Digirange Tailbuoy GPS Source GPS (1 per string) MCS Navigation Concept Systems, Ltd Spectra, Sprint, Reflex MCS QC Syntrak SeisNet ProMaxx Focus Communications HighSeasNet Inmarsat Sailor 500 FleetBroadband Iridium Sailor Satellite Phone Multibeam / Echsounder Kongsberg EM122 1° x 1° Knudsen 3260 Echosounder Marine Mammals Observation/ Mitigation Seiche Passive Acoustic Monitoring Streamer 2 x Fujinon Big Eye Binoculars General Bell BGM-3 Gravimeter Geometrics 882 Magnetometer RDI 75KHz ADCP Stbd Side A frame Telescoping Stern Boom SippicanMk21ExpendableProbeLauncher Teflon-lined Uncontaminated Seawater System Seabird SBE21 Thermosalinograph LDEO PCO2 RM Young Weather Station
  19. 19. Activity: Spreadsheets What do you observe about the data? Can you describe the experiment that was being done? What did the researcher do well? What can be improved in how the data is kept/shared?
  20. 20. Case Study 2: Breaking Bad
  21. 21. Activity: Lab Notebooks What do you observe about the data? Can you describe the experiment that was being done? What did the researcher do well? What can be improved in how the data is kept/shared?
  22. 22. Case Study 3: Needle in a Haystack http://core-genomics.blogspot.com/2012/05/resources-for-public-understanding-of.html
  23. 23. Big Data + Data Science CERN: approx. 1 PB/sec = 1000 TB/sec = 1000000 GB/sec filtered to 1 GB/sec http://arstechnica.com/science/2010/08/lhc-computing-grid-pushes-petabytes-of-data-beats-expectations/
  24. 24. Big Data + Data Science Institute for Data Sciences & Engineering: •  Cybersecurity Center •  Financial and Business Analytics Center •  Foundations of Data Science Center •  Health Analytics Center •  New Media Center •  Smart Cities Center
  25. 25. Conversation: Code What is special about code? What do you need to know to help a patron code? What are best practices for code use? Could you find out the most used bits of code in 2014?
  26. 26. Some disciplines have repositories. Some don’t. Some institutions have repositories. Some don’t.
  27. 27. figshare.com Share  research  components  to  make  them  discoverable  &  citable;  get  metrics  
  28. 28. DOIs for Code Github + Mozilla + Figshare à DOIs Mozilla: Software Carpentry
  29. 29. Future: Electronic Lab Notebooks
  30. 30. Science metadata depends on the discipline. Sort of.
  31. 31. Some Problems with Science (Data)
  32. 32. Scientists are lazy. But: They’ll do what funders tell them to.
  33. 33. Crowd-Funding Science Funding  science  may  no  longer  rely  upon  government.  Interested  people,   engaged  by  social  media  presence,  are  key  to  raising  money  from  the  crowd.  
  34. 34. Reproducibility Initiative Address  the  reproducibility  of  your  research  in  a  blind,  fee-­‐for-­‐service  validaFon   Validated  studies  receive  a  Cer*ficate  of  Reproducibility   acknowledging  that  their  results  have  been   independently  reproduced.  
  35. 35. (Some) Scientists are private. About some things. And some scientists are not.
  36. 36. ORCID, ResearcherID, etc. Unique  idenFfiers  for  researchers  to  cross-­‐reference  publicaFons,  acFviFes,  etc.   John  Smith  vs.  J.  Smith  vs.  John  D.  Smith  vs.     J.  D.  Smith  vs.  JD  Smith  vs.  …   Wang  Kim  vs.  W.  Kim  vs.  Kim  Wang  vs.  K.  Wang  …   ORCID:  0000-­‐0003-­‐0458-­‐2127   ResearcherID:  J-­‐6870-­‐2012  
  37. 37. Sharing doesn’t count. Until now.
  38. 38. Run My Code Share  code  used  to  analyze  data;  others  can  implement  the  same  methodology     •  Biology   •  Mathema*cs   •  Neuroscience   •  Sta*s*cs   •  Social  sciences   •  Economics   •  Econometrics   •  Finance   •  Management   •  R   •  MATLAB©   •  C++   •  Fortran   •  Rats   •  More  sodware  will  be   added  soon.   Oh,  and  it’s  free!  
  39. 39. Altmetric(s) Capture  overall  impact  of  a  publicaFon  in  blogs,  tweets,  menFons,  news,  etc.  
  40. 40. Applica*ons?  Plausible?   Useful?  Novel?   New   Ques*ons   New   Knowledge   Patents   Background   Jus*fica*on   Conferences   Community   Conversa*on   Data   Analysis   Confirma*on   Reagents   Protocols   Learning   Up  to  Speed   Researchers   Students   Jus*fica*on   Grants   JOB  &   FUNDING   PUBLISH   PEOPLE   FUNDING   ANALYSIS  &   RESULTS   EXPERIMENTS   RESEARCH   PLAN   IDEA   Big  or  small   Discussion   Conferences   Talks   Ar*cles   Thesis   Talks   So. Many. Tools.
  41. 41. Digital Science
  42. 42. Digital Science Librarian
  43. 43. Questions? Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator Columbia University Libraries jeffrey.lancaster@columbia.edu @j_lancaster http://www.slideshare.net/jeffreylancaster/
  44. 44. Science @ Columbia columbiascience.tumblr.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×