SlideShare a Scribd company logo
School	
  of	
  Information	
  	
  
                                                                	
  	
  	
  	
  	
  	
  	
  Studies	
  
                                                                Syracuse	
  University	
  




Developing	
  Data	
  Services	
  to	
  
Support	
  eScience/eResearch	
  
         2012	
  Priscilla	
  M.	
  Mayden	
  Lecture	
  
   eScience	
  and	
  the	
  Evolution	
  of	
  Library	
  Services	
  
                                   	
  
                               Jian	
  Qin	
  
                  School	
  of	
  Information	
  Studies	
  
                      Syracuse	
  University	
  
                  http://eslib.ischool.syr.edu/	
  
                                    	
  
                            February	
  22,	
  2012	
  
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                           Syracuse	
  University	
  




           The	
  morning	
  ahead	
  
An	
  environmental	
  scan	
  
•  E-­‐Science,	
  cyberinfrastructure,	
  and	
  data	
  
•  What	
  do	
  all	
  these	
  	
  have	
  to	
  do	
  with	
  me?	
  



          Case	
  study:	
  The	
  gravitational	
  wave	
  
          research	
  data	
  management	
  	
  



                             Group	
  work:	
  Role	
  play	
  in	
  developing	
  
                             data	
  management	
  initiatives	
  	
  



                                Priscilla M. Mayden Lecture 2012, Utah                                                                                 2
School	
  of	
  Information	
  	
  
                                                                             	
  	
  	
  	
  	
  	
  	
  Studies	
  
                                                                             Syracuse	
  University	
  



  An	
  environmental	
  scan	
  
  •  E-­‐Science,	
  cyberinfrastructure,	
  and	
  data	
  
  •  What	
  do	
  all	
  these	
  	
  have	
  to	
  do	
  with	
  me?	
  




Overview	
  of	
  E-­‐Science	
  and	
  Data	
  
       Characteristics	
  of	
  e-­‐science	
  
   Data	
  sets,	
  data	
  collections,	
  and	
  data	
  
                     repositories	
  
    Why	
  does	
  it	
  matter	
  to	
  libraries?	
  
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                      Syracuse	
  University	
  




                              E-­‐Science	
  
	
  	
  	
  	
  “In	
  the	
  future,	
  e-­‐Science	
  will	
  refer	
  to	
  
the	
  large	
  scale	
  science	
  that	
  will	
  
increasingly	
  be	
  carried	
  out	
  through	
  
distributed	
  global	
  collaborations	
  
enabled	
  by	
  the	
  Internet.	
  ”	
  

	
  
                   National e-Science Center. (2008). Defining e-Science.
                   http://www.nesc.ac.uk/nesc/define.html


                             Priscilla M. Mayden Lecture 2012, Utah                                                                               4
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                Syracuse	
  University	
  




           Characteris>cs	
  of	
  e-­‐science	
  
•    Digital	
  data	
  driven	
  
•    Distributed	
  
•    Collaborative	
  
•    Trans-­‐disciplinary	
  
•    Fuses	
  pillars	
  of	
  science	
  
      –  Experiment	
  
      –  Theory	
                                     Greer,	
  Chris.	
  (2008).	
  E-­‐Science:	
  Trends,	
  
                                                      Transformations	
  &	
  Responses.	
  In:	
  
      –  Model/simulation	
                           Reinventing	
  Science	
  Librarianship:	
  
                                                      Models	
  for	
  the	
  Future,	
  October	
  2008.	
  
      –  Observation/correlation	
                    http://www.arl.org/bm~doc/
                                                      ff08greer.pps	
  	
  


                              Priscilla M. Mayden Lecture 2012, Utah                                                                                        5
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                       Syracuse	
  University	
  



                        	
  Shi?	
  in	
  Science	
  Paradigms	
  
     Thousand	
            A	
  few	
  hundred	
         A	
  few	
  decades	
                          Today	
  
     years	
  ago	
              years	
  ago	
                  ago	
  




                                                         Data exploration (eScience)
                                                          unify theory, experiment, and
                                                                    simulation
                                         A computational -- Data captured by
                                             approach    instruments or generated by
                                             simulating  simulator
                           Theoretical        complex    -- Processed by software
                             branch         phenomena    -- Information/Knowledge
                          using models,                  stored in computer
                         generalizations                 -- Scientist analyzes
  Science was                                            database/files using data
   empirical                                             management and statistics
describing natural           Gray,	
  J.	
  &	
  Szalay,	
  A.	
  (2007).	
  eScience	
  –	
  A	
  transformed	
  scienti_ic	
  method.	
  
  phenomena                  http://research.microsoft.com/en-­‐us/um/people/gray/talks/NRC-­‐CSTB_eScience.ppt	
  	
  
                                           Priscilla M. Mayden Lecture 2012, Utah                                                                                                  6
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                                                 Syracuse	
  University	
  
                                                                                    Gray,	
  J.	
  &	
  Szalay,	
  A.	
  (2007).	
  eScience	
  –	
  A	
  transformed	
  
                                                                                    scienti_ic	
  method.	
  http://research.microsoft.com/en-­‐us/
             X-­‐Informa>cs	
                                                       um/people/gray/talks/NRC-­‐CSTB_eScience.ppt	
  

•  The	
  evolution	
  of	
  X-­‐Informatics	
  and	
  Computational-­‐X	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
     for	
  each	
  discipline	
  X	
  
•  How	
  to	
  codify	
  and	
  represent	
  our	
  knowledge	
  	
  
	
             Experiments &
                          Instruments
                       Other Archives facts                                               questions
                         Literature facts                                  ?              answers
                          Simulations

                                                The Generic Problems
    •      Data	
  ingest	
  	
  	
                                                •     Query	
  and	
  Visualization	
  tools	
  	
  
    •      Managing	
  a	
  petabyte	
                                             •     Building	
  and	
  executing	
  models	
  
    •      Common	
  schema	
                                                      •     Integrating	
  data	
  and	
  Literature	
  	
  	
  
    •      How	
  to	
  organize	
  it	
  	
                                       •     Documenting	
  experiments	
  
    •      How	
  to	
  reorganize	
  it	
                                         •     Curation	
  and	
  long-­‐term	
  preservation	
  
    •      How	
  to	
  share	
  with	
  others	
  
                                                               Priscilla M. Mayden Lecture 2012, Utah                                                                                                        7
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                             Syracuse	
  University	
  




                             Useful	
  resources	
  
                                    Part 2: Health and Wellbeing


                                    •  The healthcare singularity and the age of
                                       semantic medicine
                                    •  Healthcare delivery in developing countries:
                                       challenges and potential solutions
                                    •  Discovering the wiring diagram of the brain
                                    •  Toward a computational microscope for
                                       neurobiology
                                    •  A unified modeling approach to data-intensive
http://research.microsoft.com/en-      healthcare
us/collaboration/fourthparadigm/
                                    •  Visualization in process algebra models of
                                       biological systems

                                    Priscilla M. Mayden Lecture 2012, Utah                                                                               8
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                     Syracuse	
  University	
  




What	
  are	
  data?	
  	
  
What	
  are	
  some	
  of	
  the	
  major	
  data	
  formats?	
  
Why	
  data	
  formats?	
  

FUNDAMENTALS	
  OF	
  DATA	
  


                            Priscilla M. Mayden Lecture 2012, Utah                                                                               9
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                    Syracuse	
  University	
  



                         What	
  are	
  data?	
  (1)	
  




 An	
  artist’s	
  conception	
  (above)	
  depicts	
  fundamental	
  NEON	
  observatory	
  
instrumentation	
  and	
  systems	
  as	
  well	
  as	
  potential	
  spatial	
  organization	
  of	
  
   the	
  environmental	
  measurements	
  made	
  by	
  these	
  instruments	
  and	
  
   systems.	
  http://www.nsf.gov/pubs/2007/nsf0728/nsf0728_4.pdf	
  	
  

                                         Priscilla M. Mayden Lecture 2012, Utah                                                                             10
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                   Syracuse	
  University	
  



What	
  are	
  data?	
  (2)	
  




          Priscilla M. Mayden Lecture 2012, Utah                                                                           11
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                             Syracuse	
  University	
  




                  Medical	
  and	
  health	
  data	
  


Standardization

Compliance

Security




http://www.weforum.org/issues/charter-health-data
                                    Priscilla M. Mayden Lecture 2012, Utah                                                                           12
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                       Syracuse	
  University	
  




The	
  mul>-­‐dimensions	
  of	
  data	
  
 Research orientation
                           Data types




                                                     Data formats
                         Levels of
                        processing




                              Priscilla M. Mayden Lecture 2012, Utah                                                                           13
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                      Syracuse	
  University	
  




Scien>fic	
  data	
  formats	
  

        Common	
  data	
  format	
  
           Image	
  formats	
  
           Matrix	
  formats	
  
       Microarray	
  _ile	
  formats	
  
      Communication	
  protocols	
  




             Priscilla M. Mayden Lecture 2012, Utah                                                                           14
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                       Syracuse	
  University	
  



                 Scien>fic	
  &	
  medical	
  data	
  formats	
  	
  
•  Medical	
  and	
  Physiological	
  Data	
   •  Chemical	
  Formats	
  
   Formats	
  
                                                      –  XYZ	
  —	
  XYZ	
  molecule	
  geometry	
  _ile	
  
    –  BDF	
  —	
  BioSemi	
  data	
  format	
           (.xyz)	
  
       (.bdf)	
  
                                                      –  MOL	
  —	
  MDL	
  MOL	
  format	
  (.mol)	
  
    –  EDF	
  —	
  European	
  data	
  
                                                      –  MOL2	
  —	
  Tripos	
  MOL2	
  format	
  (.mol2)	
  
       format	
  (.edf)	
  
                                                      –  SDF	
  —	
  MDL	
  SDF	
  format	
  (.sdf)	
  
•  Molecular	
  Biology	
  data	
  Formats	
  
                                                      –  SMILES	
  —	
  SMILES	
  chemical	
  format	
  
    –  PDB	
  —	
  Protein	
  Data	
  Bank	
  
                                                         (.smi)	
  
       format	
  (.pdb)	
  
                                                  •  Bioinformatics	
  Formats	
  
    –  MMCIF	
  —	
  MMCIF	
  3D	
  
       molecular	
  model	
  format	
  (.cif)	
       –  GenBank	
  —	
  NCBI	
  GenBank	
  sequence	
  
                                                         format	
  (.gb,	
  .gbk)	
  
•  Medical	
  Imaging	
  
                                                      –  FASTA	
  —	
  bioinformatics	
  sequence	
  
    –  DICOM	
  —	
  DICOM	
  annotated	
  
                                                         format	
  (.fasta,	
  .fa,	
  .fsa,	
  .mpfa)	
  
       medical	
  images	
  (.dcm,	
  .dic)	
  
                                                      –  NEXUS	
  —	
  NEXUS	
  phylogenetic	
  data	
  
                                                         format	
  (.nex,	
  .ndk)	
  
                                              Priscilla M. Mayden Lecture 2012, Utah                                                                           15
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                   Syracuse	
  University	
  




                   Why	
  data	
  formats?	
  
•  Archiving	
                       •  Transmission	
  
   – Preservation	
  for	
                  – delivery	
  across	
  	
  
     posterity	
                                  • hardware	
  	
  
•  Storage	
                                      • software	
  	
  
   – Availability	
  for	
                        • administrative	
  	
  
     “arbitrary”	
  access	
                – system	
  boundaries	
  	
  
                                     •  Analysis	
  
                                            – availability	
  for	
  
                                              processing	
  	
  
                          Priscilla M. Mayden Lecture 2012, Utah                                                                           16
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                          Syracuse	
  University	
  




                                 Summary	
  	
  
•  Scienti_ic	
  data	
  formats	
  are	
  closely	
  tied	
  to	
  scienti_ic	
  
   computing	
  
    –  Data	
  structure,	
  model,	
  and	
  attributes	
  
    –  Self-­‐descriptive	
  with	
  header/metadata	
  
    –  API	
  for	
  manipulating	
  the	
  data	
  
    –  Interoperability:	
  conversion	
  between	
  different	
  formats	
  
•  No	
  one-­‐format-­‐_its-­‐all	
  standard	
  
•  Each	
  standard	
  has	
  one	
  or	
  more	
  tools	
  for	
  creating,	
  
     editing,	
  and	
  annotating	
  dataset	
  	
  
	
  
                                 Priscilla M. Mayden Lecture 2012, Utah                                                                           17
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                    Syracuse	
  University	
  




What	
  is	
  a	
  dataset?	
  
What	
  are	
  some	
  of	
  the	
  metadata	
  standards	
  for	
  describing	
  
datasets?	
  
What	
  is	
  data	
  management?	
  

DATASETS,	
  METADATA,	
  AND	
  DATA	
  
MANAGEMENT	
  

                           Priscilla M. Mayden Lecture 2012, Utah                                                                           18
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                  Syracuse	
  University	
  




                Dataset	
  classifica>on	
  
  Volume	
  
Large-­‐volume	
  
Small-­‐volume	
  




                         Priscilla M. Mayden Lecture 2012, Utah                                                                           19
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                        Syracuse	
  University	
  


Ecological data example: Instantaneous streamflow by watershed
http://www.hubbardbrook.org/data/dataset.php?id=1




                               Priscilla M. Mayden Lecture 2012, Utah                                                                           20
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                          Syracuse	
  University	
  




                                         Diabetes data
                                         and trends—
                                         Country level
                                         estimates:
                                         http://apps.nccd.cdc.gov/
                                         DDT_STRS2/
                                         NationalDiabetesPrevale
                                         nceEstimates.aspx?
                                         mode=PHY ;

                                         Diabetes Data &
                                         Trends home page:
                                         http://apps.nccd.cdc.gov/
                                         ddtstrs/default.aspx
Priscilla M. Mayden Lecture 2012, Utah                                                                            21
Clinical trials data management:
                                                                        School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                        Syracuse	
  University	
  

http://www.clinicaltrials.gov/ct2/show/NCT00006286?term=TADS
+NIMH&rank=1




                               Priscilla M. Mayden Lecture 2012, Utah                                                                           22
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                      Syracuse	
  University	
  




                Common	
  in	
  the	
  examples	
  
•  Attributes	
  of	
  a	
  dataset	
  tell	
  users/managers:	
  
   –  What	
  the	
  dataset	
  is	
  about	
  
   –  How	
  data	
  was	
  collected	
  
   –  To	
  which	
  project	
  the	
  data	
  is	
  related	
  
   –  Who	
  were	
  responsible	
  for	
  data	
  collection	
  
   –  Who	
  you	
  may	
  contact	
  to	
  obtain	
  the	
  data	
  
   –  What	
  publications	
  the	
  data	
  have	
  generated	
  
   –  ??	
  


                             Priscilla M. Mayden Lecture 2012, Utah                                                                           23
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                 Syracuse	
  University	
  


Metadata	
  standards	
  in	
  medical	
  &	
  health	
  sciences	
  
                     Structure	
  
                                                                           Semantics	
  


                      Medical	
  	
   Bioinfomatics	
                   NCBI Taxonomy
Healthcare	
  	
  
                      images	
                                          NCBO Bioportal
                                                                              UMLS
                                                                      MeSH (Medical Subject
                                          GenBank	
                        Headings)
                                          GenBank	
  
    HL7	
             DICOM	
             GenBank	
                 SNOMED CT (Systematized
                                                                    Nomenclature of Medicine--
                                                                         Clinical Terms)



                                  Priscilla M. Mayden Lecture 2012, Utah                                                                                 24
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                         Syracuse	
  University	
  




Priscilla M. Mayden Lecture 2012, Utah                                                                           25
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                 Syracuse	
  University	
  




Research	
  data	
  collec>ons	
  
          Size                     Metadata                 Management
                                   Standards

         Larger,	
                Multiple,                       Organized
       discipline-­‐           comprehensive                   Institutionalized,
         based	
  




                                                                                Heroic
                                                                              individual
        Smaller,	
                  None or                                   inside the
      team-­‐based	
                random                                       team

        Priscilla M. Mayden Lecture 2012, Utah                                                                           26
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                        Syracuse	
  University	
  




                    Research	
  collec>ons	
  
•  Limited	
  processing	
  or	
  long-­‐term	
  
   management
•  Not	
  conformed	
  to	
  any	
  data	
  
   standards
•  Varying	
  sizes	
  and	
  formats	
  of	
  data	
  
   _iles	
  
•  Low	
  level	
  of	
  processing,	
  lack	
  of	
  
   plan	
  for	
  data	
  products	
  
•  Low	
  awareness	
  of	
  metadata	
  
   standards	
  and	
  data	
  management	
  
   issues	
  
                               Priscilla M. Mayden Lecture 2012, Utah                                                                           27
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                     Syracuse	
  University	
  



Resource	
  collec>ons	
  
                                               •  Authored	
  by	
  a	
  
                                                  community	
  of	
  
                                                  investigators,	
  within	
  
                                                  a	
  domain	
  or	
  science	
  
                                                  or	
  engineering	
  
                                               •  Developed	
  with	
  
                                                  community	
  level	
  
                                                  standards	
  
                                               •  Life	
  time	
  is	
  between	
  
                                                  mid-­‐	
  and	
  long-­‐term	
  




      Priscilla M. Mayden Lecture 2012, Utah                                                                                 28
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                    Syracuse	
  University	
  




                         Reference	
  collec>on	
  
•  Example:	
  Global	
  Biodiversity	
  Information	
  Facility	
  
   –  Created	
  by	
  large	
  segments	
  of	
  science	
  community	
  	
  
   –  Conform	
  to	
  robust,	
  well-­‐established	
  and	
  comprehensive	
  
      standards,	
  e.g.	
  
       •    ABCD	
  (Access	
  to	
  Biological	
  Collection	
  Data)	
  	
  
       •    Darwin	
  Core	
  	
  
       •    DiGIR	
  (Distributed	
  Generic	
  Information	
  Retrieval)	
  	
  
       •    Dublin	
  Core	
  Metadata	
  standard	
  	
  
       •    GGF	
  	
  (Global	
  Grid	
  Forum)	
  	
  
       •    Invasive	
  Alien	
  Species	
  Pro_ile	
  	
  
       •    LSID	
  (Life	
  Sciences	
  Identi_ier)	
  	
  
       •    OGC	
  (Open	
  Geospatial	
  Consortium)	
  



                                    Priscilla M. Mayden Lecture 2012, Utah                                                                                  29
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                    Syracuse	
  University	
  


    Datasets,	
  data	
  collec>ons,	
  and	
  data	
  
                   repositories	
  	
   System for storing,
                                        managing,
                                                                                preserving, and
•  Data	
  collections	
  are	
  built	
  for	
                                 providing access to
   larger	
  segments	
  of	
  science	
                                        datasets
   and	
  engineering	
                                                               Data	
  
•  Datasets	
                                                                      repository	
  
     –  typically	
  centered	
  around	
  an	
                                 A repository may
        event	
  or	
  a	
  study	
                                             contain one or more
     –  contain	
  a	
  single	
  _ile	
  or	
  multiple	
                      data collections
        _iles	
  in	
  various	
  formats	
                                     A data collection may
     –  coupled	
  with	
  documentation	
                                      contain one or more
        about	
  the	
  background	
  of	
  data	
                              datasets
        collection	
  and	
  processing	
                                       A dataset may
                                                                                contain one or more
                                       Priscilla M. Mayden Lecture 2012, Utah   data files        30
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                          Syracuse	
  University	
  




Data	
  management	
  for	
  science	
  research	
  
•  De_inition	
  from	
  Wikipedia:	
  
   http://en.wikipedia.org/wiki/Data_management	
  	
  
•  Key	
  concepts	
  in	
  data	
  management:	
  
   –    Data	
  ownership	
  
   –    Data	
  collection	
  
   –    Data	
  storage	
                    How do they relate to
   –    Data	
  protection	
                 responsible conduct of
   –    Data	
  retention	
                  research?
   –    Data	
  analysis	
                   http://ori.hhs.gov/images/
   –    Data	
  sharing	
                    ddblock/data.pdf
   –    Data	
  reporting	
  



                                 Priscilla M. Mayden Lecture 2012, Utah                                                                           31
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                   Syracuse	
  University	
  




                    An	
  aPempt	
  to	
  define	
  DM	
  
•  In	
  the	
  context	
  of	
  libraries:	
  
    –  Data	
  management	
  is	
  a	
  process	
  in	
  which	
  librarians	
  plan,	
  
       design,	
  and	
  implement	
  data	
  services	
  to	
  support	
  eScience/
       eResearch.	
  	
  
    –  Data	
  services	
  that	
  libraries	
  may	
  provide:	
  
         •    Institutional	
  or	
  community	
  data	
  repositories	
  
         •    Data	
  management	
  plan	
  for	
  pre-­‐	
  and	
  post-­‐award	
  of	
  grants	
  
         •    Metadata	
  creation,	
  linking,	
  and	
  discovery	
  
         •    Data	
  archiving,	
  preservation,	
  and	
  curation	
  
         •    Consultation	
  for	
  research	
  group’s	
  data	
  management	
  projects	
  	
  
         •    Data	
  management	
  and	
  data	
  literacy	
  training	
  for	
  graduate	
  students	
  
              and	
  faculty	
  

                                        Priscilla M. Mayden Lecture 2012, Utah                                                                             32
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                    Syracuse	
  University	
  




                 Ini>a>ves	
  in	
  research	
  libraries	
  

    Data support and                                       Libraries involved in
       services in                                        supporting eScience:
      institutions:                                                73%
          45%
           •  Pressure	
  points:	
  
                 –  Lack	
  of	
  resources	
  
                 –  Dif_iculty	
  acquiring	
  the	
  appropriate	
  staff	
  and	
  
                    expertise	
  to	
  provide	
  eScience	
  and	
  data	
  
                    management	
  or	
  curation	
  services	
  
                 –  Lack	
  of	
  a	
  unifying	
  direction	
  on	
  campus	
  
Source: Soehner, C., Steeves, C. & Ward, J. (2010). E-Science and data support services: A
study of ARL member institution. http://www.arl.org/bm~doc/escience_report2010.pdf
                                    Priscilla M. Mayden Lecture 2012, Utah                                                                                  33
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                              Syracuse	
  University	
  




             Data	
  preserva>on	
  challenges	
  
•  Data	
  formats	
  
    –  Vary	
  in	
  data	
  types,	
  e.g.	
  vector	
  and	
  raster	
  data	
  types	
  	
  
    –  Format	
  conversions,	
  e.g.	
  from	
  an	
  old	
  version	
  to	
  a	
  newer	
  
       one	
  
•  Data	
  relations	
  	
  
    –  e.g.	
  there	
  are	
  data	
  models,	
  annotations,	
  classi_ication	
  
       schemes,	
  and	
  symbolization	
  _iles	
  for	
  a	
  digital	
  map	
  
•  Semantic	
  issues	
  
    –  Naming	
  datasets	
  and	
  attributes	
  



                                     Priscilla M. Mayden Lecture 2012, Utah                                                                           34
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                    Syracuse	
  University	
  




               Data	
  access	
  challenges	
  
•  Reliability	
  	
  
•  Authenticity	
  
•  Leverage	
  technology	
  to	
  make	
  data	
  access	
  
   easier	
  and	
  more	
  effective	
  
   –  Cross-­‐database	
  search	
  
   –  Integration	
  applications	
  
   –  “Science-­‐ready”	
  datasets	
  



                           Priscilla M. Mayden Lecture 2012, Utah                                                                           35
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                     Syracuse	
  University	
  




     Suppor>ng	
  digital	
  research	
  data	
  
•  Lifecycle	
  of	
  research	
  data	
  
    –  Create:	
  data	
  creation/capture/gathering	
  from	
  
       laboratory	
  experiments,	
  _ield	
  work,	
  surveys,	
  devices,	
  
       media,	
  simulation	
  output…	
  
    –  Edit:	
  organize,	
  annotate,	
  clean,	
  _ilter…	
  
    –  Use/reuse:	
  analyze,	
  mine,	
  model,	
  derive	
  additional	
  
       data,	
  visualize,	
  input	
  to	
  instruments	
  /computers	
  
    –  Publish:	
  disseminate,	
  create,	
  portals	
  /data.	
  
       Databases,	
  associate	
  with	
  literature	
  
    –  Preserve/destroy:	
  store	
  /	
  preserve,	
  store	
  /replicate	
  /
       preserve,	
  store	
  /	
  ignore,	
  destroy…	
  

                            Priscilla M. Mayden Lecture 2012, Utah                                                                           36
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                    Syracuse	
  University	
  




           Suppor>ng	
  data	
  management	
  




The data deluge                                              Researchers need:
Numerical, image, video                                       Specialized search
                                                              engines to discover
Models, simulations, bit                                      the data they need
streams
                                                              Powerful data mining
XML, CVS, DB, HTML                                            tools to use and
                                                              analyze the data


                           Priscilla M. Mayden Lecture 2012, Utah                                                                           37
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                         Syracuse	
  University	
  




        Research	
  data	
  management	
  
                                                                                   Community
    Institution
                                     eScience	
  
                                     librarian	
  


Financial and
policy support           Science                   Data content                                                 User
                         domain                   idiosyncrasies                                            requirements



Evolving and interconnecting –


Institutional	
     Community	
                National	
              International	
  
 repository	
       repository	
              repository	
              repository	
  

                              Priscilla M. Mayden Lecture 2012, Utah                                                                             38
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                          Syracuse	
  University	
  




 Implica>ons	
  to	
  scholarly	
  communica>on	
  
                       process	
  

   Publishing	
  	
                         Curation	
                             Archiving	
  

     Data	
  publishing;	
            Maintaining,	
  preserving	
  
                                                                                       The	
  long-­‐term	
  
New	
  scholarly	
  publishing	
        and	
  adding	
  value	
  to	
  
                                                                                   storage,	
  retrieval,	
  and	
  
 models—open	
  access,	
               digital	
  research	
  data	
  
                                                                                    use	
  of	
  scienti_ic	
  data	
  
    institutional	
  and	
            throughout	
  its	
  lifecycle.	
  
                                                                                       and	
  methods.	
  
community	
  	
  repositories,	
  
 self-­‐publishing,	
  library	
  
        publishing,	
  ....	
  	
  


                                                Priscilla M. Mayden Lecture 2012, Utah                                                                            39
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                  Syracuse	
  University	
  




                         Summary	
  	
  
•  E-­‐Science	
  development	
  has	
  raised	
  
   expectations	
  to	
  research	
  libraries	
  
   –  Working	
  knowledge	
  and	
  skills	
  in	
  e-­‐Science	
  
   –  Focus	
  on	
  process	
  (data	
  and	
  team	
  science)	
  
      rather	
  than	
  product	
  (reference	
  services)	
  
   –  Proactive,	
  collaborative,	
  integrative,	
  and	
  
      interdisciplinary	
  




                         Priscilla M. Mayden Lecture 2012, Utah                                                                           40
School	
  of	
  Information	
  	
  
                            	
  	
  	
  	
  	
  	
  	
  Studies	
  
                            Syracuse	
  University	
  




        Case	
  Study:	
  	
  
Learning	
  Data	
  Management	
  
   Needs	
  from	
  Scien>sts	
  
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                        Syracuse	
  University	
  




Gravita>onal	
  Wave	
  (GW)	
  Research	
  




               Priscilla M. Mayden Lecture 2012, Utah                                                                           42
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                            Syracuse	
  University	
  




                 What	
  is	
  the	
  problem?	
  
•  Tracking	
  data	
  output	
  and	
  work_lows	
  is	
  
   dif_icult	
  due	
  to	
  lack	
  of	
  provenance	
  data	
  
•  Search	
  of	
  datasets	
  is	
  limited	
  due	
  to	
  lack	
  of	
  
   speci_ic	
  options	
  	
  
•  Within	
  the	
  LIGO	
  community,	
  data	
  sharing	
  and	
  
   reuse	
  is	
  dif_icult	
  without	
  provenance	
  metadata	
  




                               Data provenance case study                                                                           43
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                           Syracuse	
  University	
  




     Understand	
  the	
  research	
  workflow	
  
•  Interview	
  the	
  scientist	
  
   –  Listening	
  (good	
  listening	
  skills)	
  
   –  Asking	
  questions	
  (don’t	
  be	
  afraid	
  of	
  asking	
  
      questions)	
  
   –  Use	
  your	
  librarian	
  brain	
  to	
  ingest	
  the	
  
      conversation:	
  
       •  How	
  does	
  the	
  research	
  _low	
  from	
  one	
  point	
  to	
  next?	
  
       •  What	
  consists	
  of	
  the	
  research	
  input	
  and	
  output	
  at	
  each	
  
          stage	
  of	
  research	
  in	
  terms	
  of	
  data?	
  	
  


                                  Priscilla M. Mayden Lecture 2012, Utah                                                                           44
Mapping	
  out	
  the	
  knowledge	
  v0.1	
  	
  
                                                         School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                         Syracuse	
  University	
  




                Priscilla M. Mayden Lecture 2012, Utah                                                                           45
Mapping	
  out	
  the	
  knowledge	
  v0.2	
  	
  
                                                         School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                         Syracuse	
  University	
  




                Priscilla M. Mayden Lecture 2012, Utah                                                                           46
Mapping	
  out	
  the	
  knowledge	
  v1.0	
  	
  
                                                            School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                            Syracuse	
  University	
  




                   Priscilla M. Mayden Lecture 2012, Utah                                                                           47
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                     Syracuse	
  University	
  



                              Lessons	
  learned	
  
•  Science	
  is	
  learnable	
  even	
  if	
  you	
  don’t	
  have	
  a	
  subject	
  
   background	
  
     –  Learn	
  enough	
  to	
  understand	
  the	
  research	
  process	
  and	
  work_low	
  
•  Scientists	
  are	
  eager	
  to	
  get	
  help	
  
•  Librarians	
  need	
  to	
  be	
  technical-­‐minded	
  
     –  Data,	
  metadata,	
  database	
  
     –  Structures,	
  models,	
  work_lows	
  
•  Librarians	
  need	
  to	
  be	
  good	
  listeners	
  while	
  staying	
  good	
  
   conversation	
  leaders	
  
     –  Know	
  when	
  and	
  how	
  to	
  lead	
  the	
  conversation	
  to	
  get	
  what	
  you	
  
        need	
  for	
  data	
  management	
  planning	
  and	
  implementation	
  
     –  Do	
  your	
  homework	
  on	
  the	
  subject	
  so	
  that	
  you	
  can	
  be	
  an	
  
        intelligent	
  listener	
  


                                       Priscilla M. Mayden Lecture 2012, Utah                                                                                48
School	
  of	
  Information	
  	
  
                          	
  	
  	
  	
  	
  	
  	
  Studies	
  
                          Syracuse	
  University	
  




Case	
  Discussions	
  
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                        Syracuse	
  University	
  


   Case	
  Study	
  #1:	
  To	
  build	
  or	
  not	
  to	
  build	
  a	
  data	
  
                               repository?	
  
                                        	
   by the researchers in this institution.
A university library has developed an institutional repository for preserving and
providing access to the scholarly output
Now the new challenge arises from e-science research demanding data
management plan by the funding agency and the linking between publications
and data by the authors and users. You already know that some faculty use
their disciplinary data repository for submitting their datasets (e.g., GenBank for
microbiology research data). The problem you face now is whether an
institutional data repository should be built for those who do “small science” and
don’t have funding nor expertise to manage their data.

Questions to be addressed:
•  What are the strategies you will use to approach the problem?
•  What are the possible solutions for the problem?
•  What are some of the tradeoffs for the solutions you will adopt?




                               Priscilla M. Mayden Lecture 2012, Utah                                                                           50
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                       Syracuse	
  University	
  



       Case	
  study	
  #2:	
  Developing	
  a	
  data	
  
                        taxonomy	
  	
  
The concept of research data management is a stranger to many faculty as
well as your library staff. What is data? What is a data set? These seemingly
simple terms can be very confusing and have different interpretations in
different context and disciplines. As part of the data management strategies,
you decide to develop an authoritative data taxonomy for the campus research
community. This data taxonomy will benefit the creation and use of institutional
data policies, data repository or repositories, and data management plans
required of funding agencies.

Questions to be addressed:
•  What should the data taxonomy include?
•  What form should it take, a database-driven website or a static HTML page?
•  Who should be the constituencies in this process?
•  Who will be the maintainer once the taxonomy is released?



                              Priscilla M. Mayden Lecture 2012, Utah                                                                           51
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                             Syracuse	
  University	
  




Case	
  study	
  #3:	
  Developing	
  a	
  data	
  policy	
  	
  
Data policies play an important role in governing how the data will be managed,
shared, and accessed. It is also an instrument that will fend off potential legal
problems. Data policies have several types: data access and use, data
publishing, and data management. Your university’s Office of Sponsored
Research has some existing policy on data, but it is neither systematic nor
complete. Many of the terms were defined years ago and did not cover the new
areas such as the embargo period of data. As the university has decided to
build a data repository for managing and preserving datasets, a data policy has
become one of the top priorities for both the institution and the data repository.

Questions to be addressed:
•  What should the data policy include?
•  Who should be the constituencies in this process?
•  Who will be the interpretation authority for the data policy?




                                    Priscilla M. Mayden Lecture 2012, Utah                                                                           52
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                          Syracuse	
  University	
  



    Case	
  study	
  #4:	
  Cataloging	
  datasets	
  
Describing datasets is the process of creating metadata for datasets. In
scientific disciplines, several metadata standards have been developed, e.g.,
the Content Standard for Digital Geospatial Metadata (CSDGM), Darwin Core,
and Ecological Metadata Language (EML). Each of these metadata standards
contains hundreds of elements and requires both metadata and subject
knowledge training in order to use them. Besides, creating one record using
any of these standards will require a tremendous time investment. But you
library does not have such specialized personnel nor have the fund to hire new
persons for the job. The existing staff has some general metadata skills such as
Dublin Core. In deciding the metadata schema for your data repository, you
need to address these questions:

•  Should I adopt a scientific metadata standard or develop one tailored to our
need?
•  How can I learn what metadata elements are critical to dataset submitters and
searchers?
•  What are some of the benefits and disadvantages for adopting a standard or
developing a local schema?

                                 Priscilla M. Mayden Lecture 2012, Utah                                                                           53
School	
  of	
  Information	
  Studies	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                               Syracuse	
  University	
  




Case	
  study	
  #5:	
  Evalua>ng	
  data	
  repository	
  tools	
  
  Research data as a driving force for e-science is inherently a tool-intensive
  field. Tools related to data management can be divided into two broad
  categories: those for creating metadata records and those for data repository
  management. An academic institution decided to build their own data repository
  as part of the supporting service for researchers to meet the data management
  plan requirement of funding agencies. This data repository development task
  was handed down to the library. You the library director have to decide whether
  to develop an in-house system or use an off-the-shelf software system. As
  usual, you put together a taskforce to find a solution to this challenge. The
  questions to be addressed by the taskforce include:

  •  What are the options available to us?
  •  What evaluation criteria are the most important to our goal?
  •  What are the limitations for us to adopt one option or the other?
  •  How will this option be interoperate with existing institutional repository
  system? Or, can the existing repository system used for data repository
  purposes?

                                      Priscilla M. Mayden Lecture 2012, Utah                                                                           54

More Related Content

What's hot

June 2020: Most Downloaded Article in Soft Computing
June 2020: Most Downloaded Article in Soft Computing  June 2020: Most Downloaded Article in Soft Computing
June 2020: Most Downloaded Article in Soft Computing
ijsc
 
NKU STEM Programs
NKU STEM ProgramsNKU STEM Programs
A Tableau-based Federated Reasoning Algorithm for Modular Ontologies
A Tableau-based Federated Reasoning Algorithm for Modular OntologiesA Tableau-based Federated Reasoning Algorithm for Modular Ontologies
A Tableau-based Federated Reasoning Algorithm for Modular Ontologies
Jie Bao
 
Top 10 neural networks
Top 10 neural networksTop 10 neural networks
Top 10 neural networks
ijsc
 
International Perspectives: Visualization in Science and Education
International Perspectives: Visualization in Science and EducationInternational Perspectives: Visualization in Science and Education
International Perspectives: Visualization in Science and Education
Liz Dorland
 
Curriculum vitae 170521 updated new
Curriculum vitae 170521 updated newCurriculum vitae 170521 updated new
Curriculum vitae 170521 updated new
AbimbolaAfolayan2
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
butest
 

What's hot (7)

June 2020: Most Downloaded Article in Soft Computing
June 2020: Most Downloaded Article in Soft Computing  June 2020: Most Downloaded Article in Soft Computing
June 2020: Most Downloaded Article in Soft Computing
 
NKU STEM Programs
NKU STEM ProgramsNKU STEM Programs
NKU STEM Programs
 
A Tableau-based Federated Reasoning Algorithm for Modular Ontologies
A Tableau-based Federated Reasoning Algorithm for Modular OntologiesA Tableau-based Federated Reasoning Algorithm for Modular Ontologies
A Tableau-based Federated Reasoning Algorithm for Modular Ontologies
 
Top 10 neural networks
Top 10 neural networksTop 10 neural networks
Top 10 neural networks
 
International Perspectives: Visualization in Science and Education
International Perspectives: Visualization in Science and EducationInternational Perspectives: Visualization in Science and Education
International Perspectives: Visualization in Science and Education
 
Curriculum vitae 170521 updated new
Curriculum vitae 170521 updated newCurriculum vitae 170521 updated new
Curriculum vitae 170521 updated new
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 

Similar to Developing Data Services to Support eScience/eResearch

Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
Jian Qin
 
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
Danube University Krems, Centre for E-Governance
 
Scientific data management (v2)
Scientific data management (v2)Scientific data management (v2)
Scientific data management (v2)
Jian Qin
 
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
IL Group (CILIP Information Literacy Group)
 
Supporting research life cycle librarians
Supporting research life cycle   librariansSupporting research life cycle   librarians
Supporting research life cycle librarians
Sherry Lake
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
Philip Bourne
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
University of Malaya
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
Open Science
Open Science Open Science
Open Science
Andrea Miller-Nesbitt
 
ALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics SymposiumALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics Symposium
Simon Buckingham Shum
 
Context-Aware Adaptive and Personalized Mobile Learning
Context-Aware Adaptive and Personalized Mobile Learning Context-Aware Adaptive and Personalized Mobile Learning
Context-Aware Adaptive and Personalized Mobile Learning
Advanced Digital Systems and Services for Education and Learning (ASK)
 
Tacoma, WA 98422
Tacoma, WA 98422Tacoma, WA 98422
Tacoma, WA 98422
butest
 
Organizational Implications of Data Science Environments in Education, Resear...
Organizational Implications of Data Science Environments in Education, Resear...Organizational Implications of Data Science Environments in Education, Resear...
Organizational Implications of Data Science Environments in Education, Resear...
Victoria Steeves
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
The Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data ScienceThe Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data Science
Dawn Wright
 
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
Lina Markauskaite
 
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
National Information Standards Organization (NISO)
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
Philip Bourne
 
Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"
The TMC Library
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
ASIS&T
 

Similar to Developing Data Services to Support eScience/eResearch (20)

Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
The Internet, Science, and Transformations of Knowledge (Ralph Schroeder)
 
Scientific data management (v2)
Scientific data management (v2)Scientific data management (v2)
Scientific data management (v2)
 
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
 
Supporting research life cycle librarians
Supporting research life cycle   librariansSupporting research life cycle   librarians
Supporting research life cycle librarians
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Open Science
Open Science Open Science
Open Science
 
ALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics SymposiumALT-C2012 Learning Analytics Symposium
ALT-C2012 Learning Analytics Symposium
 
Context-Aware Adaptive and Personalized Mobile Learning
Context-Aware Adaptive and Personalized Mobile Learning Context-Aware Adaptive and Personalized Mobile Learning
Context-Aware Adaptive and Personalized Mobile Learning
 
Tacoma, WA 98422
Tacoma, WA 98422Tacoma, WA 98422
Tacoma, WA 98422
 
Organizational Implications of Data Science Environments in Education, Resear...
Organizational Implications of Data Science Environments in Education, Resear...Organizational Implications of Data Science Environments in Education, Resear...
Organizational Implications of Data Science Environments in Education, Resear...
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
The Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data ScienceThe Perils and Promise of Environmental Data Science
The Perils and Promise of Environmental Data Science
 
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
Interdisciplinarity and Epistemic Fluency: What makes complex knowledge work ...
 
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"Neville Prendergast "E-Science - What is it?"
Neville Prendergast "E-Science - What is it?"
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
 

More from Jian Qin

Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
Jian Qin
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Jian Qin
 
Survey research
Survey research Survey research
Survey research
Jian Qin
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
Jian Qin
 
Research literature review
Research literature reviewResearch literature review
Research literature review
Jian Qin
 
Scholarly communication
Scholarly communicationScholarly communication
Scholarly communication
Jian Qin
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)
Jian Qin
 

More from Jian Qin (9)

Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Survey research
Survey research Survey research
Survey research
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
Research literature review
Research literature reviewResearch literature review
Research literature review
 
Scholarly communication
Scholarly communicationScholarly communication
Scholarly communication
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)
 

Recently uploaded

CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
Nguyen Thanh Tu Collection
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
Celine George
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 

Recently uploaded (20)

CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 

Developing Data Services to Support eScience/eResearch

  • 1. School  of  Information                  Studies   Syracuse  University   Developing  Data  Services  to   Support  eScience/eResearch   2012  Priscilla  M.  Mayden  Lecture   eScience  and  the  Evolution  of  Library  Services     Jian  Qin   School  of  Information  Studies   Syracuse  University   http://eslib.ischool.syr.edu/     February  22,  2012  
  • 2. School  of  Information  Studies                                   Syracuse  University   The  morning  ahead   An  environmental  scan   •  E-­‐Science,  cyberinfrastructure,  and  data   •  What  do  all  these    have  to  do  with  me?   Case  study:  The  gravitational  wave   research  data  management     Group  work:  Role  play  in  developing   data  management  initiatives     Priscilla M. Mayden Lecture 2012, Utah 2
  • 3. School  of  Information                  Studies   Syracuse  University   An  environmental  scan   •  E-­‐Science,  cyberinfrastructure,  and  data   •  What  do  all  these    have  to  do  with  me?   Overview  of  E-­‐Science  and  Data   Characteristics  of  e-­‐science   Data  sets,  data  collections,  and  data   repositories   Why  does  it  matter  to  libraries?  
  • 4. School  of  Information  Studies                                   Syracuse  University   E-­‐Science          “In  the  future,  e-­‐Science  will  refer  to   the  large  scale  science  that  will   increasingly  be  carried  out  through   distributed  global  collaborations   enabled  by  the  Internet.  ”     National e-Science Center. (2008). Defining e-Science. http://www.nesc.ac.uk/nesc/define.html Priscilla M. Mayden Lecture 2012, Utah 4
  • 5. School  of  Information  Studies                                   Syracuse  University   Characteris>cs  of  e-­‐science   •  Digital  data  driven   •  Distributed   •  Collaborative   •  Trans-­‐disciplinary   •  Fuses  pillars  of  science   –  Experiment   –  Theory   Greer,  Chris.  (2008).  E-­‐Science:  Trends,   Transformations  &  Responses.  In:   –  Model/simulation   Reinventing  Science  Librarianship:   Models  for  the  Future,  October  2008.   –  Observation/correlation   http://www.arl.org/bm~doc/ ff08greer.pps     Priscilla M. Mayden Lecture 2012, Utah 5
  • 6. School  of  Information  Studies                                   Syracuse  University    Shi?  in  Science  Paradigms   Thousand   A  few  hundred   A  few  decades   Today   years  ago   years  ago   ago   Data exploration (eScience) unify theory, experiment, and simulation A computational -- Data captured by approach instruments or generated by simulating simulator Theoretical complex -- Processed by software branch phenomena -- Information/Knowledge using models, stored in computer generalizations -- Scientist analyzes Science was database/files using data empirical management and statistics describing natural Gray,  J.  &  Szalay,  A.  (2007).  eScience  –  A  transformed  scienti_ic  method.   phenomena http://research.microsoft.com/en-­‐us/um/people/gray/talks/NRC-­‐CSTB_eScience.ppt     Priscilla M. Mayden Lecture 2012, Utah 6
  • 7. School  of  Information  Studies                                   Syracuse  University   Gray,  J.  &  Szalay,  A.  (2007).  eScience  –  A  transformed   scienti_ic  method.  http://research.microsoft.com/en-­‐us/ X-­‐Informa>cs   um/people/gray/talks/NRC-­‐CSTB_eScience.ppt   •  The  evolution  of  X-­‐Informatics  and  Computational-­‐X                                                     for  each  discipline  X   •  How  to  codify  and  represent  our  knowledge       Experiments & Instruments Other Archives facts questions Literature facts ? answers Simulations The Generic Problems •  Data  ingest       •  Query  and  Visualization  tools     •  Managing  a  petabyte   •  Building  and  executing  models   •  Common  schema   •  Integrating  data  and  Literature       •  How  to  organize  it     •  Documenting  experiments   •  How  to  reorganize  it   •  Curation  and  long-­‐term  preservation   •  How  to  share  with  others   Priscilla M. Mayden Lecture 2012, Utah 7
  • 8. School  of  Information  Studies                                   Syracuse  University   Useful  resources   Part 2: Health and Wellbeing •  The healthcare singularity and the age of semantic medicine •  Healthcare delivery in developing countries: challenges and potential solutions •  Discovering the wiring diagram of the brain •  Toward a computational microscope for neurobiology •  A unified modeling approach to data-intensive http://research.microsoft.com/en- healthcare us/collaboration/fourthparadigm/ •  Visualization in process algebra models of biological systems Priscilla M. Mayden Lecture 2012, Utah 8
  • 9. School  of  Information  Studies                                   Syracuse  University   What  are  data?     What  are  some  of  the  major  data  formats?   Why  data  formats?   FUNDAMENTALS  OF  DATA   Priscilla M. Mayden Lecture 2012, Utah 9
  • 10. School  of  Information  Studies                                   Syracuse  University   What  are  data?  (1)   An  artist’s  conception  (above)  depicts  fundamental  NEON  observatory   instrumentation  and  systems  as  well  as  potential  spatial  organization  of   the  environmental  measurements  made  by  these  instruments  and   systems.  http://www.nsf.gov/pubs/2007/nsf0728/nsf0728_4.pdf     Priscilla M. Mayden Lecture 2012, Utah 10
  • 11. School  of  Information  Studies                                   Syracuse  University   What  are  data?  (2)   Priscilla M. Mayden Lecture 2012, Utah 11
  • 12. School  of  Information  Studies                                   Syracuse  University   Medical  and  health  data   Standardization Compliance Security http://www.weforum.org/issues/charter-health-data Priscilla M. Mayden Lecture 2012, Utah 12
  • 13. School  of  Information  Studies                                   Syracuse  University   The  mul>-­‐dimensions  of  data   Research orientation Data types Data formats Levels of processing Priscilla M. Mayden Lecture 2012, Utah 13
  • 14. School  of  Information  Studies                                   Syracuse  University   Scien>fic  data  formats   Common  data  format   Image  formats   Matrix  formats   Microarray  _ile  formats   Communication  protocols   Priscilla M. Mayden Lecture 2012, Utah 14
  • 15. School  of  Information  Studies                                   Syracuse  University   Scien>fic  &  medical  data  formats     •  Medical  and  Physiological  Data   •  Chemical  Formats   Formats   –  XYZ  —  XYZ  molecule  geometry  _ile   –  BDF  —  BioSemi  data  format   (.xyz)   (.bdf)   –  MOL  —  MDL  MOL  format  (.mol)   –  EDF  —  European  data   –  MOL2  —  Tripos  MOL2  format  (.mol2)   format  (.edf)   –  SDF  —  MDL  SDF  format  (.sdf)   •  Molecular  Biology  data  Formats   –  SMILES  —  SMILES  chemical  format   –  PDB  —  Protein  Data  Bank   (.smi)   format  (.pdb)   •  Bioinformatics  Formats   –  MMCIF  —  MMCIF  3D   molecular  model  format  (.cif)   –  GenBank  —  NCBI  GenBank  sequence   format  (.gb,  .gbk)   •  Medical  Imaging   –  FASTA  —  bioinformatics  sequence   –  DICOM  —  DICOM  annotated   format  (.fasta,  .fa,  .fsa,  .mpfa)   medical  images  (.dcm,  .dic)   –  NEXUS  —  NEXUS  phylogenetic  data   format  (.nex,  .ndk)   Priscilla M. Mayden Lecture 2012, Utah 15
  • 16. School  of  Information  Studies                                   Syracuse  University   Why  data  formats?   •  Archiving   •  Transmission   – Preservation  for   – delivery  across     posterity   • hardware     •  Storage   • software     – Availability  for   • administrative     “arbitrary”  access   – system  boundaries     •  Analysis   – availability  for   processing     Priscilla M. Mayden Lecture 2012, Utah 16
  • 17. School  of  Information  Studies                                   Syracuse  University   Summary     •  Scienti_ic  data  formats  are  closely  tied  to  scienti_ic   computing   –  Data  structure,  model,  and  attributes   –  Self-­‐descriptive  with  header/metadata   –  API  for  manipulating  the  data   –  Interoperability:  conversion  between  different  formats   •  No  one-­‐format-­‐_its-­‐all  standard   •  Each  standard  has  one  or  more  tools  for  creating,   editing,  and  annotating  dataset       Priscilla M. Mayden Lecture 2012, Utah 17
  • 18. School  of  Information  Studies                                   Syracuse  University   What  is  a  dataset?   What  are  some  of  the  metadata  standards  for  describing   datasets?   What  is  data  management?   DATASETS,  METADATA,  AND  DATA   MANAGEMENT   Priscilla M. Mayden Lecture 2012, Utah 18
  • 19. School  of  Information  Studies                                   Syracuse  University   Dataset  classifica>on   Volume   Large-­‐volume   Small-­‐volume   Priscilla M. Mayden Lecture 2012, Utah 19
  • 20. School  of  Information  Studies                                   Syracuse  University   Ecological data example: Instantaneous streamflow by watershed http://www.hubbardbrook.org/data/dataset.php?id=1 Priscilla M. Mayden Lecture 2012, Utah 20
  • 21. School  of  Information  Studies                                   Syracuse  University   Diabetes data and trends— Country level estimates: http://apps.nccd.cdc.gov/ DDT_STRS2/ NationalDiabetesPrevale nceEstimates.aspx? mode=PHY ; Diabetes Data & Trends home page: http://apps.nccd.cdc.gov/ ddtstrs/default.aspx Priscilla M. Mayden Lecture 2012, Utah 21
  • 22. Clinical trials data management: School  of  Information  Studies                                   Syracuse  University   http://www.clinicaltrials.gov/ct2/show/NCT00006286?term=TADS +NIMH&rank=1 Priscilla M. Mayden Lecture 2012, Utah 22
  • 23. School  of  Information  Studies                                   Syracuse  University   Common  in  the  examples   •  Attributes  of  a  dataset  tell  users/managers:   –  What  the  dataset  is  about   –  How  data  was  collected   –  To  which  project  the  data  is  related   –  Who  were  responsible  for  data  collection   –  Who  you  may  contact  to  obtain  the  data   –  What  publications  the  data  have  generated   –  ??   Priscilla M. Mayden Lecture 2012, Utah 23
  • 24. School  of  Information  Studies                                   Syracuse  University   Metadata  standards  in  medical  &  health  sciences   Structure   Semantics   Medical     Bioinfomatics   NCBI Taxonomy Healthcare     images   NCBO Bioportal UMLS MeSH (Medical Subject GenBank   Headings) GenBank   HL7   DICOM   GenBank   SNOMED CT (Systematized Nomenclature of Medicine-- Clinical Terms) Priscilla M. Mayden Lecture 2012, Utah 24
  • 25. School  of  Information  Studies                                   Syracuse  University   Priscilla M. Mayden Lecture 2012, Utah 25
  • 26. School  of  Information  Studies                                   Syracuse  University   Research  data  collec>ons   Size Metadata Management Standards Larger,   Multiple, Organized discipline-­‐ comprehensive Institutionalized, based   Heroic individual Smaller,   None or inside the team-­‐based   random team Priscilla M. Mayden Lecture 2012, Utah 26
  • 27. School  of  Information  Studies                                   Syracuse  University   Research  collec>ons   •  Limited  processing  or  long-­‐term   management •  Not  conformed  to  any  data   standards •  Varying  sizes  and  formats  of  data   _iles   •  Low  level  of  processing,  lack  of   plan  for  data  products   •  Low  awareness  of  metadata   standards  and  data  management   issues   Priscilla M. Mayden Lecture 2012, Utah 27
  • 28. School  of  Information  Studies                                   Syracuse  University   Resource  collec>ons   •  Authored  by  a   community  of   investigators,  within   a  domain  or  science   or  engineering   •  Developed  with   community  level   standards   •  Life  time  is  between   mid-­‐  and  long-­‐term   Priscilla M. Mayden Lecture 2012, Utah 28
  • 29. School  of  Information  Studies                                   Syracuse  University   Reference  collec>on   •  Example:  Global  Biodiversity  Information  Facility   –  Created  by  large  segments  of  science  community     –  Conform  to  robust,  well-­‐established  and  comprehensive   standards,  e.g.   •  ABCD  (Access  to  Biological  Collection  Data)     •  Darwin  Core     •  DiGIR  (Distributed  Generic  Information  Retrieval)     •  Dublin  Core  Metadata  standard     •  GGF    (Global  Grid  Forum)     •  Invasive  Alien  Species  Pro_ile     •  LSID  (Life  Sciences  Identi_ier)     •  OGC  (Open  Geospatial  Consortium)   Priscilla M. Mayden Lecture 2012, Utah 29
  • 30. School  of  Information  Studies                                   Syracuse  University   Datasets,  data  collec>ons,  and  data   repositories     System for storing, managing, preserving, and •  Data  collections  are  built  for   providing access to larger  segments  of  science   datasets and  engineering   Data   •  Datasets   repository   –  typically  centered  around  an   A repository may event  or  a  study   contain one or more –  contain  a  single  _ile  or  multiple   data collections _iles  in  various  formats   A data collection may –  coupled  with  documentation   contain one or more about  the  background  of  data   datasets collection  and  processing   A dataset may contain one or more Priscilla M. Mayden Lecture 2012, Utah data files 30
  • 31. School  of  Information  Studies                                   Syracuse  University   Data  management  for  science  research   •  De_inition  from  Wikipedia:   http://en.wikipedia.org/wiki/Data_management     •  Key  concepts  in  data  management:   –  Data  ownership   –  Data  collection   –  Data  storage   How do they relate to –  Data  protection   responsible conduct of –  Data  retention   research? –  Data  analysis   http://ori.hhs.gov/images/ –  Data  sharing   ddblock/data.pdf –  Data  reporting   Priscilla M. Mayden Lecture 2012, Utah 31
  • 32. School  of  Information  Studies                                   Syracuse  University   An  aPempt  to  define  DM   •  In  the  context  of  libraries:   –  Data  management  is  a  process  in  which  librarians  plan,   design,  and  implement  data  services  to  support  eScience/ eResearch.     –  Data  services  that  libraries  may  provide:   •  Institutional  or  community  data  repositories   •  Data  management  plan  for  pre-­‐  and  post-­‐award  of  grants   •  Metadata  creation,  linking,  and  discovery   •  Data  archiving,  preservation,  and  curation   •  Consultation  for  research  group’s  data  management  projects     •  Data  management  and  data  literacy  training  for  graduate  students   and  faculty   Priscilla M. Mayden Lecture 2012, Utah 32
  • 33. School  of  Information  Studies                                   Syracuse  University   Ini>a>ves  in  research  libraries   Data support and Libraries involved in services in supporting eScience: institutions: 73% 45% •  Pressure  points:   –  Lack  of  resources   –  Dif_iculty  acquiring  the  appropriate  staff  and   expertise  to  provide  eScience  and  data   management  or  curation  services   –  Lack  of  a  unifying  direction  on  campus   Source: Soehner, C., Steeves, C. & Ward, J. (2010). E-Science and data support services: A study of ARL member institution. http://www.arl.org/bm~doc/escience_report2010.pdf Priscilla M. Mayden Lecture 2012, Utah 33
  • 34. School  of  Information  Studies                                   Syracuse  University   Data  preserva>on  challenges   •  Data  formats   –  Vary  in  data  types,  e.g.  vector  and  raster  data  types     –  Format  conversions,  e.g.  from  an  old  version  to  a  newer   one   •  Data  relations     –  e.g.  there  are  data  models,  annotations,  classi_ication   schemes,  and  symbolization  _iles  for  a  digital  map   •  Semantic  issues   –  Naming  datasets  and  attributes   Priscilla M. Mayden Lecture 2012, Utah 34
  • 35. School  of  Information  Studies                                   Syracuse  University   Data  access  challenges   •  Reliability     •  Authenticity   •  Leverage  technology  to  make  data  access   easier  and  more  effective   –  Cross-­‐database  search   –  Integration  applications   –  “Science-­‐ready”  datasets   Priscilla M. Mayden Lecture 2012, Utah 35
  • 36. School  of  Information  Studies                                   Syracuse  University   Suppor>ng  digital  research  data   •  Lifecycle  of  research  data   –  Create:  data  creation/capture/gathering  from   laboratory  experiments,  _ield  work,  surveys,  devices,   media,  simulation  output…   –  Edit:  organize,  annotate,  clean,  _ilter…   –  Use/reuse:  analyze,  mine,  model,  derive  additional   data,  visualize,  input  to  instruments  /computers   –  Publish:  disseminate,  create,  portals  /data.   Databases,  associate  with  literature   –  Preserve/destroy:  store  /  preserve,  store  /replicate  / preserve,  store  /  ignore,  destroy…   Priscilla M. Mayden Lecture 2012, Utah 36
  • 37. School  of  Information  Studies                                   Syracuse  University   Suppor>ng  data  management   The data deluge Researchers need: Numerical, image, video Specialized search engines to discover Models, simulations, bit the data they need streams Powerful data mining XML, CVS, DB, HTML tools to use and analyze the data Priscilla M. Mayden Lecture 2012, Utah 37
  • 38. School  of  Information  Studies                                   Syracuse  University   Research  data  management   Community Institution eScience   librarian   Financial and policy support Science Data content User domain idiosyncrasies requirements Evolving and interconnecting – Institutional   Community   National   International   repository   repository   repository   repository   Priscilla M. Mayden Lecture 2012, Utah 38
  • 39. School  of  Information  Studies                                   Syracuse  University   Implica>ons  to  scholarly  communica>on   process   Publishing     Curation   Archiving   Data  publishing;   Maintaining,  preserving   The  long-­‐term   New  scholarly  publishing   and  adding  value  to   storage,  retrieval,  and   models—open  access,   digital  research  data   use  of  scienti_ic  data   institutional  and   throughout  its  lifecycle.   and  methods.   community    repositories,   self-­‐publishing,  library   publishing,  ....     Priscilla M. Mayden Lecture 2012, Utah 39
  • 40. School  of  Information  Studies                                   Syracuse  University   Summary     •  E-­‐Science  development  has  raised   expectations  to  research  libraries   –  Working  knowledge  and  skills  in  e-­‐Science   –  Focus  on  process  (data  and  team  science)   rather  than  product  (reference  services)   –  Proactive,  collaborative,  integrative,  and   interdisciplinary   Priscilla M. Mayden Lecture 2012, Utah 40
  • 41. School  of  Information                  Studies   Syracuse  University   Case  Study:     Learning  Data  Management   Needs  from  Scien>sts  
  • 42. School  of  Information  Studies                                   Syracuse  University   Gravita>onal  Wave  (GW)  Research   Priscilla M. Mayden Lecture 2012, Utah 42
  • 43. School  of  Information  Studies                                   Syracuse  University   What  is  the  problem?   •  Tracking  data  output  and  work_lows  is   dif_icult  due  to  lack  of  provenance  data   •  Search  of  datasets  is  limited  due  to  lack  of   speci_ic  options     •  Within  the  LIGO  community,  data  sharing  and   reuse  is  dif_icult  without  provenance  metadata   Data provenance case study 43
  • 44. School  of  Information  Studies                                   Syracuse  University   Understand  the  research  workflow   •  Interview  the  scientist   –  Listening  (good  listening  skills)   –  Asking  questions  (don’t  be  afraid  of  asking   questions)   –  Use  your  librarian  brain  to  ingest  the   conversation:   •  How  does  the  research  _low  from  one  point  to  next?   •  What  consists  of  the  research  input  and  output  at  each   stage  of  research  in  terms  of  data?     Priscilla M. Mayden Lecture 2012, Utah 44
  • 45. Mapping  out  the  knowledge  v0.1     School  of  Information  Studies                                   Syracuse  University   Priscilla M. Mayden Lecture 2012, Utah 45
  • 46. Mapping  out  the  knowledge  v0.2     School  of  Information  Studies                                   Syracuse  University   Priscilla M. Mayden Lecture 2012, Utah 46
  • 47. Mapping  out  the  knowledge  v1.0     School  of  Information  Studies                                   Syracuse  University   Priscilla M. Mayden Lecture 2012, Utah 47
  • 48. School  of  Information  Studies                                   Syracuse  University   Lessons  learned   •  Science  is  learnable  even  if  you  don’t  have  a  subject   background   –  Learn  enough  to  understand  the  research  process  and  work_low   •  Scientists  are  eager  to  get  help   •  Librarians  need  to  be  technical-­‐minded   –  Data,  metadata,  database   –  Structures,  models,  work_lows   •  Librarians  need  to  be  good  listeners  while  staying  good   conversation  leaders   –  Know  when  and  how  to  lead  the  conversation  to  get  what  you   need  for  data  management  planning  and  implementation   –  Do  your  homework  on  the  subject  so  that  you  can  be  an   intelligent  listener   Priscilla M. Mayden Lecture 2012, Utah 48
  • 49. School  of  Information                  Studies   Syracuse  University   Case  Discussions  
  • 50. School  of  Information  Studies                                   Syracuse  University   Case  Study  #1:  To  build  or  not  to  build  a  data   repository?     by the researchers in this institution. A university library has developed an institutional repository for preserving and providing access to the scholarly output Now the new challenge arises from e-science research demanding data management plan by the funding agency and the linking between publications and data by the authors and users. You already know that some faculty use their disciplinary data repository for submitting their datasets (e.g., GenBank for microbiology research data). The problem you face now is whether an institutional data repository should be built for those who do “small science” and don’t have funding nor expertise to manage their data. Questions to be addressed: •  What are the strategies you will use to approach the problem? •  What are the possible solutions for the problem? •  What are some of the tradeoffs for the solutions you will adopt? Priscilla M. Mayden Lecture 2012, Utah 50
  • 51. School  of  Information  Studies                                   Syracuse  University   Case  study  #2:  Developing  a  data   taxonomy     The concept of research data management is a stranger to many faculty as well as your library staff. What is data? What is a data set? These seemingly simple terms can be very confusing and have different interpretations in different context and disciplines. As part of the data management strategies, you decide to develop an authoritative data taxonomy for the campus research community. This data taxonomy will benefit the creation and use of institutional data policies, data repository or repositories, and data management plans required of funding agencies. Questions to be addressed: •  What should the data taxonomy include? •  What form should it take, a database-driven website or a static HTML page? •  Who should be the constituencies in this process? •  Who will be the maintainer once the taxonomy is released? Priscilla M. Mayden Lecture 2012, Utah 51
  • 52. School  of  Information  Studies                                   Syracuse  University   Case  study  #3:  Developing  a  data  policy     Data policies play an important role in governing how the data will be managed, shared, and accessed. It is also an instrument that will fend off potential legal problems. Data policies have several types: data access and use, data publishing, and data management. Your university’s Office of Sponsored Research has some existing policy on data, but it is neither systematic nor complete. Many of the terms were defined years ago and did not cover the new areas such as the embargo period of data. As the university has decided to build a data repository for managing and preserving datasets, a data policy has become one of the top priorities for both the institution and the data repository. Questions to be addressed: •  What should the data policy include? •  Who should be the constituencies in this process? •  Who will be the interpretation authority for the data policy? Priscilla M. Mayden Lecture 2012, Utah 52
  • 53. School  of  Information  Studies                                   Syracuse  University   Case  study  #4:  Cataloging  datasets   Describing datasets is the process of creating metadata for datasets. In scientific disciplines, several metadata standards have been developed, e.g., the Content Standard for Digital Geospatial Metadata (CSDGM), Darwin Core, and Ecological Metadata Language (EML). Each of these metadata standards contains hundreds of elements and requires both metadata and subject knowledge training in order to use them. Besides, creating one record using any of these standards will require a tremendous time investment. But you library does not have such specialized personnel nor have the fund to hire new persons for the job. The existing staff has some general metadata skills such as Dublin Core. In deciding the metadata schema for your data repository, you need to address these questions: •  Should I adopt a scientific metadata standard or develop one tailored to our need? •  How can I learn what metadata elements are critical to dataset submitters and searchers? •  What are some of the benefits and disadvantages for adopting a standard or developing a local schema? Priscilla M. Mayden Lecture 2012, Utah 53
  • 54. School  of  Information  Studies                                   Syracuse  University   Case  study  #5:  Evalua>ng  data  repository  tools   Research data as a driving force for e-science is inherently a tool-intensive field. Tools related to data management can be divided into two broad categories: those for creating metadata records and those for data repository management. An academic institution decided to build their own data repository as part of the supporting service for researchers to meet the data management plan requirement of funding agencies. This data repository development task was handed down to the library. You the library director have to decide whether to develop an in-house system or use an off-the-shelf software system. As usual, you put together a taskforce to find a solution to this challenge. The questions to be addressed by the taskforce include: •  What are the options available to us? •  What evaluation criteria are the most important to our goal? •  What are the limitations for us to adopt one option or the other? •  How will this option be interoperate with existing institutional repository system? Or, can the existing repository system used for data repository purposes? Priscilla M. Mayden Lecture 2012, Utah 54