CLIR Fellows - Science Data - 14_0730

  • 289 views
Uploaded on

Presentation on July 30, 2014 to CLIR Post-Doctoral Fellows at Bryn Mawr College.

Presentation on July 30, 2014 to CLIR Post-Doctoral Fellows at Bryn Mawr College.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
289
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. https://twitter.com/cory_foy/status/493759969045409792/photo/1
  • 2. On Science Research Data Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator Columbia University Libraries jeffrey.lancaster@columbia.edu @j_lancaster
  • 3. How do you feel about being ever-increasingly bombarded by data?
  • 4. On Science Research Data
  • 5. On Science Research Data Computer Science Chemical Biological Geological Engineering Medical Psychological Mathematical Physical Astronomical Etc.  
  • 6. SciencesHumanitiesSocial Sciences <1990’s Text Census Data Experimental Data Highly structured Wild West, y’all
  • 7. SciencesHumanities Social Sciences >2010’s Text Census Exp. Data Digital Humanities Regression Analysis Data Mining Code
  • 8. Practices in science research drive institutional approaches to research support. Why?
  • 9. Data Lifecycle
  • 10. Data Lifecycle Data Management Planning Formats, Metadata Storage Funding Etc.mat  
  • 11. Data Lifecycle Original Data Collected Data  
  • 12. Data Lifecycle Formats, Metadata Software Methodology  
  • 13. Data Lifecycle Publishing Copyright Intellectual Prop. Open Access Repositories (Alt)metrics
  • 14. Professor Nick Turro (1938-2012)
  • 15. Applica*ons?  Plausible?   Useful?  Novel?   New   Ques*ons   New   Knowledge   Patents   Background   Jus*fica*on   Conferences   Community   Conversa*on   Data   Analysis   Confirma*on   Reagents   Protocols   Learning   Up  to  Speed   Researchers   Students   Jus*fica*on   Grants   JOB  &   FUNDING   PUBLISH   PEOPLE   FUNDING   ANALYSIS  &   RESULTS   EXPERIMENTS   RESEARCH   PLAN   IDEA   Big  or  small   Discussion   Conferences   Talks   Ar*cles   Thesis   Talks   The Research Workflow Adapted  from  Laura  Cro/  @  Nature  
  • 16. Baseline: What’s the minimum you need to know about a field/subject to be helpful?
  • 17. Case Study 1: I’m on a Boat! R/V Marcus G. Langseth
  • 18. Case Study 1: I’m on a Boat! MCS Acquisition Syntrak 960-24 SSI Seisnet active tape emulation Hydrophone arrays Sentry solid cable 12.5 meter groups 150m sections up to four towed separation 50 - 150 meters Source Arrays 4 x 10 gun strings 9 active, one spare / string 15 meter string length 1650 cu. In. per string Source Controller DigiShot MCS geometry sensors Digicourse 5011 Compassbirds Digicourse Digirange Tailbuoy GPS Source GPS (1 per string) MCS Navigation Concept Systems, Ltd Spectra, Sprint, Reflex MCS QC Syntrak SeisNet ProMaxx Focus Communications HighSeasNet Inmarsat Sailor 500 FleetBroadband Iridium Sailor Satellite Phone Multibeam / Echsounder Kongsberg EM122 1° x 1° Knudsen 3260 Echosounder Marine Mammals Observation/ Mitigation Seiche Passive Acoustic Monitoring Streamer 2 x Fujinon Big Eye Binoculars General Bell BGM-3 Gravimeter Geometrics 882 Magnetometer RDI 75KHz ADCP Stbd Side A frame Telescoping Stern Boom SippicanMk21ExpendableProbeLauncher Teflon-lined Uncontaminated Seawater System Seabird SBE21 Thermosalinograph LDEO PCO2 RM Young Weather Station
  • 19. Activity: Spreadsheets What do you observe about the data? Can you describe the experiment that was being done? What did the researcher do well? What can be improved in how the data is kept/shared?
  • 20. Case Study 2: Breaking Bad
  • 21. Activity: Lab Notebooks What do you observe about the data? Can you describe the experiment that was being done? What did the researcher do well? What can be improved in how the data is kept/shared?
  • 22. Case Study 3: Needle in a Haystack http://core-genomics.blogspot.com/2012/05/resources-for-public-understanding-of.html
  • 23. Big Data + Data Science CERN: approx. 1 PB/sec = 1000 TB/sec = 1000000 GB/sec filtered to 1 GB/sec http://arstechnica.com/science/2010/08/lhc-computing-grid-pushes-petabytes-of-data-beats-expectations/
  • 24. Big Data + Data Science Institute for Data Sciences & Engineering: •  Cybersecurity Center •  Financial and Business Analytics Center •  Foundations of Data Science Center •  Health Analytics Center •  New Media Center •  Smart Cities Center
  • 25. Conversation: Code What is special about code? What do you need to know to help a patron code? What are best practices for code use? Could you find out the most used bits of code in 2014?
  • 26. Some disciplines have repositories. Some don’t. Some institutions have repositories. Some don’t.
  • 27. figshare.com Share  research  components  to  make  them  discoverable  &  citable;  get  metrics  
  • 28. DOIs for Code Github + Mozilla + Figshare à DOIs Mozilla: Software Carpentry
  • 29. Future: Electronic Lab Notebooks
  • 30. Science metadata depends on the discipline. Sort of.
  • 31. Some Problems with Science (Data)
  • 32. Scientists are lazy. But: They’ll do what funders tell them to.
  • 33. Crowd-Funding Science Funding  science  may  no  longer  rely  upon  government.  Interested  people,   engaged  by  social  media  presence,  are  key  to  raising  money  from  the  crowd.  
  • 34. Reproducibility Initiative Address  the  reproducibility  of  your  research  in  a  blind,  fee-­‐for-­‐service  validaFon   Validated  studies  receive  a  Cer*ficate  of  Reproducibility   acknowledging  that  their  results  have  been   independently  reproduced.  
  • 35. (Some) Scientists are private. About some things. And some scientists are not.
  • 36. ORCID, ResearcherID, etc. Unique  idenFfiers  for  researchers  to  cross-­‐reference  publicaFons,  acFviFes,  etc.   John  Smith  vs.  J.  Smith  vs.  John  D.  Smith  vs.     J.  D.  Smith  vs.  JD  Smith  vs.  …   Wang  Kim  vs.  W.  Kim  vs.  Kim  Wang  vs.  K.  Wang  …   ORCID:  0000-­‐0003-­‐0458-­‐2127   ResearcherID:  J-­‐6870-­‐2012  
  • 37. Sharing doesn’t count. Until now.
  • 38. Run My Code Share  code  used  to  analyze  data;  others  can  implement  the  same  methodology     •  Biology   •  Mathema*cs   •  Neuroscience   •  Sta*s*cs   •  Social  sciences   •  Economics   •  Econometrics   •  Finance   •  Management   •  R   •  MATLAB©   •  C++   •  Fortran   •  Rats   •  More  sodware  will  be   added  soon.   Oh,  and  it’s  free!  
  • 39. Altmetric(s) Capture  overall  impact  of  a  publicaFon  in  blogs,  tweets,  menFons,  news,  etc.  
  • 40. Applica*ons?  Plausible?   Useful?  Novel?   New   Ques*ons   New   Knowledge   Patents   Background   Jus*fica*on   Conferences   Community   Conversa*on   Data   Analysis   Confirma*on   Reagents   Protocols   Learning   Up  to  Speed   Researchers   Students   Jus*fica*on   Grants   JOB  &   FUNDING   PUBLISH   PEOPLE   FUNDING   ANALYSIS  &   RESULTS   EXPERIMENTS   RESEARCH   PLAN   IDEA   Big  or  small   Discussion   Conferences   Talks   Ar*cles   Thesis   Talks   So. Many. Tools.
  • 41. Digital Science
  • 42. Digital Science Librarian
  • 43. Questions? Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator Columbia University Libraries jeffrey.lancaster@columbia.edu @j_lancaster http://www.slideshare.net/jeffreylancaster/
  • 44. Science @ Columbia columbiascience.tumblr.com