SlideShare a Scribd company logo
1 of 57
Making small data BIG
Insights from a Long-tail Geoscience Domain
Kerstin Lehnert
lehnert@ldeo.columbia.edu
Lamont -Doherty Earth Observatory of Columbia University
Palisades, NY, 10964
www.iedadata.org
Outline
• The (super-fast) Introduction to Geochemistry
• Achievements & Challenges in Geochemical Data Management
• Sustainable data infrastructure in the Long Tail
• EarthCube
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 2
Geochemistry
• Puts real numbers on geologic
times.
• Fingerprints sources of material
involved in geological processes.
• Reveals the history of climate and
the circulations of the atmosphere
and ocean.
• Constrains theories of the Earth’s
deep interior
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 3
Geochemical Observations
• Hundreds of chemical properties of
different Earth materials
• elemental or oxide concentrations
• isotopes and isotopic ratios
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 4
• Thermodynamic properties
• Kinetics
Geochemical Data Types
• Analytical (observational)
• Sample-based measurements
• Sensor data
• Experimental data
• Derived data (models)
• (Samples)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 5
Materials & Samples
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 6
Geochemistry Methods
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 7
How a Geochemist Generates Data:
“Did New Zealand Dust Influence the Last Ice Age?”
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 8
Bess Koffman, Michael Kaplan, Steven Goldstein, Gisela Winckler (LDEO), Natalie Mahowald (Cornell)
http://blogs.ei.columbia.edu/2014/03/13/did-new-zealand-dust-influence-the-last-ice-age/
Get Samples in the Field
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 9
Get Samples in the Lab/Repository
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 10
Analyze Samples in the Lab
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 11
The Data!
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 12
Note the number of
data points generated
in this study (the
yellow dots) in light of
the effort that
included collecting
samples in NZ to
operating expensive
equipment in the lab.
Data “Sharing”
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 13
Long-tail Research Data
• heterogeneous
• customized & optimized
for research questions
• lack of data standards
• data sharing limited
• lack of data
infrastructure (facilities)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 14
The Value of Long-tail Data
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 15
“While the data volumes are small when viewed
individually, in total they represent a very significant
portion of the country’s scientific output.”
“The long tail is a breeding ground for new ideas and
never before attempted science.”
(Heidorn, B. 2008: “Shedding Light on the Dark Data in the Long Tail of Science”)
BUT:
Long-tail data have no value if they are not re-usable!
Monday’s Musings: Beyond The Three V’s of Big
Data – Viscosity and Virality
Published on February 27, 2012 by R "Ray" Wang
http://blog.softwareinsider.org/2012/02/27/mondays-
musings-beyond-the-three-vs-of-big-data-viscosity-and-
virality/
What Makes Data BIG?
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
Value
16
The sixth ‘V’:
Adding VALUE
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 17
accessible
small data
BIG DATA
findable
identification,
persistence
authorization,
protocols
context,
provenance
re-usable
harmonized,
machine-readable
interoperable“… data have no value or
meaning in isolation; they exist
within a knowledge
infrastructure — an ecology of
people, practices,
technologies, institutions,
material objects, and
relationships.”
C.L. Borgman
https://www.force11.org/group/fairgroup/fairprinciples
Generic
Repositories Domain Repositories
Domain-specific Data Facilities
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 18
Science Community
Domain specific
Data facility
18
Libraries
Archives
CI, Computer
Science
Publishers,
editors
Metadata registration
Software (tool) development
Interoperability
Data policies
Persistent access
Bibliometrics
Data Curation
Data access & discovery (optimized for domain)
Data products (synthesis)
Data harmonization (standards)
User Support
Funding
Agencies
Data Facilities
Registries
AGU FM 2014: IN14B-01
Small Data Gone BIG
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 19
IEDA Repositories
 >500,000 files
 47 TB
 4 x 106 samples
IEDA Syntheses
 19 x 106 analytical values in EarthChem
 2.63 x 106 miles of data from 808 cruises in the
Global Multi-Resolution Topography (GMRT)
EarthChem: Big Data for Geochemistry
• EarthChem Library
• DOI registration
• Long-term archiving
• CC license
• Data templates & guidelines for data documentation
• QC by data managers
• Synthesis Databases (PetDB, EarthChem Portal)
• QA/QC by data managers
• Data & metadata harmonization
• Standards-compliant data model
• Service Oriented Architecture (ECP)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 20
EarthChem Data Systems
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 21
Metadata
Data Data Data Data Data
EarthChem Library
Data Data Data
Search
Investigators
Data Repository
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 22
DOI to allow proper citation
Link to publications
Link to funding source
22
Data Templates
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 23
ECL Challenges
• Metadata guidelines/templates for an increasing diversity of
data
• Need extended metadata for meaningful searches
• Geospatial
• Variables
• Sample name
• Integration with publication workflow
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 24
Coalition for Publishing Data in the Earth & Space
Sciences (COPDESS)
25
• Joint initiative of Earth Science publishers and Data Facilities to
help translate the aspirations of open, available, and useful
data from policy into practice.
• Reaffirm and ensure adherence to existing journal and publishing policies
and society position statements regarding open data sharing and
archiving of data, tools, and models.
• Ensure that Earth science data will, to the greatest extent possible, be
stored in community approved repositories that can provide additional
data services.
• Statement of Commitment signed by all major Earth & Space
Science publishers
• Build an online community directory of appropriate Earth
science community repositories for data, tools, and models
that meet leading standards on curation, quality, and access
www.copdess.org
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 26
Presentation at EarthCube workshop “Scope & Vision”, March 2015
EarthChem Data Systems
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 27
Metadata
Data Data Data Data Data
EarthChem Library
Data Data Data
Search
Data &
Metadata
Search
Data Data
Search
DB DB DB DB DB
Data & Metadata
[XML]
Investigators
[.xls]
EarthChem Data Managers
Data Repository
PetDB, SedDB EarthChem Portal
Data Synthesis
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 28
Example of success:
This study showed new relationships
between noble gases and the elemental and
isotope geochemistry of the deep mantle,
with implications for mantle structure and
evolution.
It was possible through a synthesis of the
global data set,
only because the scattered data were made
available by the online databases PetDB and
GEOROC.
This entire community now depends on this
cyberinfrastructure.
The PetDB Database
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 29
Map shows locations of
mafic volcanic rock samples.
Color of symbols is scaled
to the 87Sr/86Sr isotope ratio
in the rocks, illustrating the
difference in the
composition of the Earth’s
mantle under the Indian
and the Pacific Ocean.
Data are from >300
publications,
retrieved from the
PetDB database in
ca. 2 minutes.
PetDB Concept: BIG Data
• Data Mining
• Fine-grained data access: Database structure ‘disintegrates’ data sets
into individual values
• Context & provenance metadata to search and filter
• Harmonized data: controlled vocabularies, data compilation & QC by data
managers
• Data Integration
• User-defined across data sets
• By sample (use of unique sample ID)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 30
Data Mining: Search & Filter
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
31
Filter by method or
concentration
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 32
PetDB Impact
• 500 - 800 downloads per quarter
• >550 citations in the literature
• many fundamental new
discoveries & insights
• new scientific approaches
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 33
Meyzen et al, 2007, Isotopic portrayal of the
Earth's upper mantle flow field. Nature 447, 1069A. W. Hofmann: “Mantle
Myths, Reservoirs, and
Databases”, Goldschmidt
Conf. 2008
Technical Challenges
• scalability/flexibility of database schema
• accommodate new sample and data types (time series, non-numeric
data, etc.)
• track relationships among samples
• diverse context for new sample and data types
• track provenance of metadata
• performance of search application
• usability & functionality of search application
• interoperability interfaces
• data ingestion & quality control
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 34
ODM2
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 35
ODM2 Team:
J S Horsburgh
A K Aufdenkampe
L Hsu
A Jones
K Lehnert
E Mayorga
L Song
D Tarboton
I Zaslavsky
Challenges:
• migration of db content
• new user interface
• new data entry & QA/QC tools
• resources
ODM2 Problem
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 36
from:
http://techdistrict.kirkk.com/2009/10/07/the-usereuse-paradox/
“In general, the more reusable
we choose to make a software
module, the more difficult that
same software module is to use.”
New User Interface (underdevelopment)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 37
Challenge: User Expectations
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 38
C.H. Langmuir (Harvard): “Geochemical Databases: What is needed now?” Presentation at EarthCube
Domain End-user workshop for Petrology & Geochemistry, March 2013
Access to Samples is a Community Concern
• Poor and uneven access and management of sample collections
• Incomplete sample tracking and linking of samples to analyses in the
literature and databases
• Poor discoverability of existing samples
• insufficient or uneven sample density through space and time for most
geological terrains of interest
From Executive Summary of EarthCube Domain End-
user Workshop Petrology & Geochemistry 2013
EarthCube Domain End-user Workshop for Petrology & Geochemistry
at the National Museum of Natural History, Smithsonian Institution, March 2013
The Internet of Samples
• Central or federated online catalogs for discovery & access of
samples.
• Best practices for sample identification, documentation, and citation.
• Software tools that support personal or institutional sample
management & curation.
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 40
(And facilities to
provide access to
curated samples!)
IGSN: International GeoSample Number
• persistent unique identifier for physical objects in the Earth
Sciences; centralized control mechanism via IGSN e.V.
• resolves to virtual sample representations (sample metadata
profiles) managed at federated IGSN Allocating Agents.
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 42
Use of the IGSN
IGSNs in data table resolve to
sample metadata in IGSN registry
SESAR (www.geosamples.org)
System for Earth Sample Registration
• Allocating Agent for individual investigators, sample
repositories, and science programs
• tools and services for users to catalog and manage sample metadata
(MySESAR)
• personal (authenticated) workspace
• metadata template creator
• label creation & printing (including QR code)
• transfer of sample ownership
• web services for client systems
• register sample metadata & obtain IGSNs
• access to IGSN metadata
• preservation & persistent access of sample metadata
• Global Sample Catalog (harvest metadata from other AAs
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 43
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 44
Challenges:
• scalability of architecture for a rapidly growing
number of registrations
• service-oriented architecture
• handle registrations
• software tools that support investigators with
metadata capture in the field & lab
• flexibility for user specific metadata & new sample
types
• inclusion of sample images (storage!)
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 45
Institutions
Collection Mgmt
Public
‘Virtual Museum’
Investigators
Sample Mgmt
(storage, software solutions, & services)
Visualization
Publications
Data Systems
Sample Registries
APIsGUIs
Internet of Samples Initiatives
• CODATA Task Group “Physical Samples in the Digital Era”
• SciColl: Scientific Collections International (Consortium)
• iSamples (Internet of Samples in the Earth Sciences)
• Funded EarthCube Research Coordination Network (RCN)
• advance access and re-use of physical samples through use of innovative
cyberinfrastructure
• DESC: Digital Environment for Sample Curation
• IGSN e.V.
• National Data Services test-bed
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 46
DATA FACILITIES
FOR THE LONG TAIL
Scalability, Sustainability
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 47
Many Earth Science Data Communities
48
Atmo-
spheric
Chemistry
Climate &
Large
Scale
Dynamics
Paleo-
Climate
Meteor-
ology
Aeronomy
Space
Weather
Magneto-
spheric
Physics
Solar
Terrestrial
Igneous
Petrology
& Volcan-
ology
Geo Ed &
Workforce
Training
NCAR
Geophysi
cs &
Geody-
namics
Geobiology
& Paleoen-
tology
Cryosphere
& Ice
Dynamics
Critical
Zone &
Soil
Science
Chemical
Ocean-
ography
Geomor-
phology
Hydrology
Sediment
-ology &
Strati-
graphy
Marine
Geophysics
Physical
Ocean-
ography
Marine
Geology
Biological
Ocean-
ography
Ocean
Education
Ocean
Drilling &
Engineer-
ing
Software
&
Modeling
Bio-
informatics
Ecosystems
Biology
High Perf
Computing
Semantics
&
Ontologies
Algorithm
s & Data
Mining
EarthCube
CI
Solid and
Aqueous
Geochem
-istry
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
IEDA: A “Long-Tail” Data Facility
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 49
www.iedadata.org
• Multiple core disciplines (focus: solid earth)
• High-T Geochemistry
• Low-T Geochemistry
• Petrology
• Marine Geophysics & Geology
• Geochronology
• Cross-disciplinary tools & services
• Sample registry SESAR
• IEDA Data Browser
• Portals (GeoPRISMs, USAP-DCC, etc.)
• GeoMapApp
• Data management support
49
From Research Data Collections to Data Facility
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 50
Formal Governance
Robust Infrastructure
Stable Expert Team
Accreditation
Adherence to
Community Standards
Scalable Infrastructure
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 51
The ALLIANCE Model
Alliance Development
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 52
Proposal “Interdisciplinary Earth Data Alliance as a Model for Integrating EC Technology
Resources and Engaging the Broad Community” submitted March 2015
MetPetDB
Mineral Physics
Deep Submergence
IcePod
Challenges:
• Social & organizational engineering
• Diversity of data needs
• Diversity of systems
• Business models
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 53
Conclusions
• Long-tail data can grow BIG through domain-
specific data curation.
• Partnerships among data efforts can provide
a solution for sustainability of data
infrastructure in long tail communities
• Partnerships with the computer and
information sciences are necessary to build
the cyberinfrastructure.
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 54
EarthCube Motivations
Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 55
To transform geosciences research by supporting community-
driven cyberinfrastructure to integrate data and information.
Tech.Drivers
Supports science and
other User Needs
Create a dynamic,
community-driven
cyberinfrastructure
Open, evolvable,
sustainable
Easy interface with
existing capabilities Challenges
Diversity of the
geosciences
Interdisciplinary
Science Questions
Big, Heterogeneous
Data issues
Communities that are
poorly served/have no
community resources
Towards an Architecture
for EarthCube
• Under purview of the EarthCube Technology and
Architecture Committee (TAC)
– Coordinating with Council of Data Facilities, Science
Committee, and Liaison Team
• Ongoing Working Groups (since Fall 2014):
– Architecture WG
– Standards WG
– Use Cases WG
– Funded Projects and Gap Analysis WG
– Testbed WG
!
!
EarthCube!
!
!
!
Building((
Blocks(
Architecture(
Governance(
Research((
Coordina7on((
Networks(
Funded&
Projects&
!
EarthCube!
Funded!
Projects!
!
(2013!and!2014!Awards)!
!
TAC Workshop (ongoing on now)
Learn more at:
http://earthcube.org/group/technology-architecture-committee
http://earthcube.org/document/2014/earthcube-past-present-future

More Related Content

What's hot

Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) Amanda Bauer
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 

What's hot (8)

Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO) LSST Education and Public Outreach (EPO)
LSST Education and Public Outreach (EPO)
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 

Viewers also liked

It's about Small Data, stupid.
It's about Small Data, stupid.It's about Small Data, stupid.
It's about Small Data, stupid.Corecom Consulting
 
Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsAhmed Banafa
 
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handout
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handoutMartin Lindstrom - Small Data - full day presentation part 1 of 4 handout
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handoutMarian Costache
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Anna Kuhn
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 

Viewers also liked (6)

Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
It's about Small Data, stupid.
It's about Small Data, stupid.It's about Small Data, stupid.
It's about Small Data, stupid.
 
Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basics
 
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handout
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handoutMartin Lindstrom - Small Data - full day presentation part 1 of 4 handout
Martin Lindstrom - Small Data - full day presentation part 1 of 4 handout
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 

Similar to Lehnert: Making Small Data Big, IACS, April2015

Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Kerstin Lehnert
 
EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015Kerstin Lehnert
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
 
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...EarthCube
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationAndrew Treloar
 
The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...Platforma Otwartej Nauki
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataKerstin Lehnert
 
Xiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationXiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationxiaobinshen
 
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...Tom Moritz
 
Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Kerstin Lehnert
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsAdvanced-Concepts-Team
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...ExtremeEarth
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyICZN
 

Similar to Lehnert: Making Small Data Big, IACS, April2015 (20)

Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)
 
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
 
The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
 
Xiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationXiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentation
 
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
 
Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Extreme earth overview
Extreme earth overviewExtreme earth overview
Extreme earth overview
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional Practice
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
 
Christine borgman keynote
Christine borgman keynoteChristine borgman keynote
Christine borgman keynote
 

More from Kerstin Lehnert

Data Services for Geochemical Data
Data Services for Geochemical DataData Services for Geochemical Data
Data Services for Geochemical DataKerstin Lehnert
 
Lehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsLehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsKerstin Lehnert
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopKerstin Lehnert
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Kerstin Lehnert
 
EGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureEGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureKerstin Lehnert
 
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Kerstin Lehnert
 
IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)Kerstin Lehnert
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Kerstin Lehnert
 
The Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionThe Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionKerstin Lehnert
 
Digital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsDigital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsKerstin Lehnert
 
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...Kerstin Lehnert
 
iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)Kerstin Lehnert
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUKerstin Lehnert
 

More from Kerstin Lehnert (14)

Data Services for Geochemical Data
Data Services for Geochemical DataData Services for Geochemical Data
Data Services for Geochemical Data
 
Lehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsLehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandards
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples Workshop
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
 
EGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureEGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg Lecture
 
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
 
IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
 
The Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionThe Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in Action
 
Digital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsDigital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific Publications
 
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
 
iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGU
 

Recently uploaded

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 

Lehnert: Making Small Data Big, IACS, April2015

  • 1. Making small data BIG Insights from a Long-tail Geoscience Domain Kerstin Lehnert lehnert@ldeo.columbia.edu Lamont -Doherty Earth Observatory of Columbia University Palisades, NY, 10964 www.iedadata.org
  • 2. Outline • The (super-fast) Introduction to Geochemistry • Achievements & Challenges in Geochemical Data Management • Sustainable data infrastructure in the Long Tail • EarthCube Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 2
  • 3. Geochemistry • Puts real numbers on geologic times. • Fingerprints sources of material involved in geological processes. • Reveals the history of climate and the circulations of the atmosphere and ocean. • Constrains theories of the Earth’s deep interior Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 3
  • 4. Geochemical Observations • Hundreds of chemical properties of different Earth materials • elemental or oxide concentrations • isotopes and isotopic ratios Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 4 • Thermodynamic properties • Kinetics
  • 5. Geochemical Data Types • Analytical (observational) • Sample-based measurements • Sensor data • Experimental data • Derived data (models) • (Samples) Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 5
  • 6. Materials & Samples Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 6
  • 7. Geochemistry Methods Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 7
  • 8. How a Geochemist Generates Data: “Did New Zealand Dust Influence the Last Ice Age?” Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 8 Bess Koffman, Michael Kaplan, Steven Goldstein, Gisela Winckler (LDEO), Natalie Mahowald (Cornell) http://blogs.ei.columbia.edu/2014/03/13/did-new-zealand-dust-influence-the-last-ice-age/
  • 9. Get Samples in the Field Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 9
  • 10. Get Samples in the Lab/Repository Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 10
  • 11. Analyze Samples in the Lab Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 11
  • 12. The Data! Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 12 Note the number of data points generated in this study (the yellow dots) in light of the effort that included collecting samples in NZ to operating expensive equipment in the lab.
  • 13. Data “Sharing” Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 13
  • 14. Long-tail Research Data • heterogeneous • customized & optimized for research questions • lack of data standards • data sharing limited • lack of data infrastructure (facilities) Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 14
  • 15. The Value of Long-tail Data Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 15 “While the data volumes are small when viewed individually, in total they represent a very significant portion of the country’s scientific output.” “The long tail is a breeding ground for new ideas and never before attempted science.” (Heidorn, B. 2008: “Shedding Light on the Dark Data in the Long Tail of Science”) BUT: Long-tail data have no value if they are not re-usable!
  • 16. Monday’s Musings: Beyond The Three V’s of Big Data – Viscosity and Virality Published on February 27, 2012 by R "Ray" Wang http://blog.softwareinsider.org/2012/02/27/mondays- musings-beyond-the-three-vs-of-big-data-viscosity-and- virality/ What Makes Data BIG? Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" Value 16 The sixth ‘V’:
  • 17. Adding VALUE Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 17 accessible small data BIG DATA findable identification, persistence authorization, protocols context, provenance re-usable harmonized, machine-readable interoperable“… data have no value or meaning in isolation; they exist within a knowledge infrastructure — an ecology of people, practices, technologies, institutions, material objects, and relationships.” C.L. Borgman https://www.force11.org/group/fairgroup/fairprinciples Generic Repositories Domain Repositories
  • 18. Domain-specific Data Facilities Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 18 Science Community Domain specific Data facility 18 Libraries Archives CI, Computer Science Publishers, editors Metadata registration Software (tool) development Interoperability Data policies Persistent access Bibliometrics Data Curation Data access & discovery (optimized for domain) Data products (synthesis) Data harmonization (standards) User Support Funding Agencies Data Facilities Registries AGU FM 2014: IN14B-01
  • 19. Small Data Gone BIG Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 19 IEDA Repositories  >500,000 files  47 TB  4 x 106 samples IEDA Syntheses  19 x 106 analytical values in EarthChem  2.63 x 106 miles of data from 808 cruises in the Global Multi-Resolution Topography (GMRT)
  • 20. EarthChem: Big Data for Geochemistry • EarthChem Library • DOI registration • Long-term archiving • CC license • Data templates & guidelines for data documentation • QC by data managers • Synthesis Databases (PetDB, EarthChem Portal) • QA/QC by data managers • Data & metadata harmonization • Standards-compliant data model • Service Oriented Architecture (ECP) Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 20
  • 21. EarthChem Data Systems Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 21 Metadata Data Data Data Data Data EarthChem Library Data Data Data Search Investigators Data Repository
  • 22. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 22 DOI to allow proper citation Link to publications Link to funding source 22
  • 23. Data Templates Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 23
  • 24. ECL Challenges • Metadata guidelines/templates for an increasing diversity of data • Need extended metadata for meaningful searches • Geospatial • Variables • Sample name • Integration with publication workflow Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 24
  • 25. Coalition for Publishing Data in the Earth & Space Sciences (COPDESS) 25 • Joint initiative of Earth Science publishers and Data Facilities to help translate the aspirations of open, available, and useful data from policy into practice. • Reaffirm and ensure adherence to existing journal and publishing policies and society position statements regarding open data sharing and archiving of data, tools, and models. • Ensure that Earth science data will, to the greatest extent possible, be stored in community approved repositories that can provide additional data services. • Statement of Commitment signed by all major Earth & Space Science publishers • Build an online community directory of appropriate Earth science community repositories for data, tools, and models that meet leading standards on curation, quality, and access www.copdess.org Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
  • 26. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 26 Presentation at EarthCube workshop “Scope & Vision”, March 2015
  • 27. EarthChem Data Systems Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 27 Metadata Data Data Data Data Data EarthChem Library Data Data Data Search Data & Metadata Search Data Data Search DB DB DB DB DB Data & Metadata [XML] Investigators [.xls] EarthChem Data Managers Data Repository PetDB, SedDB EarthChem Portal Data Synthesis
  • 28. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 28 Example of success: This study showed new relationships between noble gases and the elemental and isotope geochemistry of the deep mantle, with implications for mantle structure and evolution. It was possible through a synthesis of the global data set, only because the scattered data were made available by the online databases PetDB and GEOROC. This entire community now depends on this cyberinfrastructure.
  • 29. The PetDB Database Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 29 Map shows locations of mafic volcanic rock samples. Color of symbols is scaled to the 87Sr/86Sr isotope ratio in the rocks, illustrating the difference in the composition of the Earth’s mantle under the Indian and the Pacific Ocean. Data are from >300 publications, retrieved from the PetDB database in ca. 2 minutes.
  • 30. PetDB Concept: BIG Data • Data Mining • Fine-grained data access: Database structure ‘disintegrates’ data sets into individual values • Context & provenance metadata to search and filter • Harmonized data: controlled vocabularies, data compilation & QC by data managers • Data Integration • User-defined across data sets • By sample (use of unique sample ID) Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 30
  • 31. Data Mining: Search & Filter Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 31 Filter by method or concentration
  • 32. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 32
  • 33. PetDB Impact • 500 - 800 downloads per quarter • >550 citations in the literature • many fundamental new discoveries & insights • new scientific approaches Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 33 Meyzen et al, 2007, Isotopic portrayal of the Earth's upper mantle flow field. Nature 447, 1069A. W. Hofmann: “Mantle Myths, Reservoirs, and Databases”, Goldschmidt Conf. 2008
  • 34. Technical Challenges • scalability/flexibility of database schema • accommodate new sample and data types (time series, non-numeric data, etc.) • track relationships among samples • diverse context for new sample and data types • track provenance of metadata • performance of search application • usability & functionality of search application • interoperability interfaces • data ingestion & quality control Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 34
  • 35. ODM2 Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 35 ODM2 Team: J S Horsburgh A K Aufdenkampe L Hsu A Jones K Lehnert E Mayorga L Song D Tarboton I Zaslavsky Challenges: • migration of db content • new user interface • new data entry & QA/QC tools • resources
  • 36. ODM2 Problem Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 36 from: http://techdistrict.kirkk.com/2009/10/07/the-usereuse-paradox/ “In general, the more reusable we choose to make a software module, the more difficult that same software module is to use.”
  • 37. New User Interface (underdevelopment) Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 37
  • 38. Challenge: User Expectations Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 38 C.H. Langmuir (Harvard): “Geochemical Databases: What is needed now?” Presentation at EarthCube Domain End-user workshop for Petrology & Geochemistry, March 2013
  • 39. Access to Samples is a Community Concern • Poor and uneven access and management of sample collections • Incomplete sample tracking and linking of samples to analyses in the literature and databases • Poor discoverability of existing samples • insufficient or uneven sample density through space and time for most geological terrains of interest From Executive Summary of EarthCube Domain End- user Workshop Petrology & Geochemistry 2013 EarthCube Domain End-user Workshop for Petrology & Geochemistry at the National Museum of Natural History, Smithsonian Institution, March 2013
  • 40. The Internet of Samples • Central or federated online catalogs for discovery & access of samples. • Best practices for sample identification, documentation, and citation. • Software tools that support personal or institutional sample management & curation. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 40 (And facilities to provide access to curated samples!)
  • 41. IGSN: International GeoSample Number • persistent unique identifier for physical objects in the Earth Sciences; centralized control mechanism via IGSN e.V. • resolves to virtual sample representations (sample metadata profiles) managed at federated IGSN Allocating Agents.
  • 42. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 42 Use of the IGSN IGSNs in data table resolve to sample metadata in IGSN registry
  • 43. SESAR (www.geosamples.org) System for Earth Sample Registration • Allocating Agent for individual investigators, sample repositories, and science programs • tools and services for users to catalog and manage sample metadata (MySESAR) • personal (authenticated) workspace • metadata template creator • label creation & printing (including QR code) • transfer of sample ownership • web services for client systems • register sample metadata & obtain IGSNs • access to IGSN metadata • preservation & persistent access of sample metadata • Global Sample Catalog (harvest metadata from other AAs Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 43
  • 44. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 44 Challenges: • scalability of architecture for a rapidly growing number of registrations • service-oriented architecture • handle registrations • software tools that support investigators with metadata capture in the field & lab • flexibility for user specific metadata & new sample types • inclusion of sample images (storage!)
  • 45. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 45 Institutions Collection Mgmt Public ‘Virtual Museum’ Investigators Sample Mgmt (storage, software solutions, & services) Visualization Publications Data Systems Sample Registries APIsGUIs
  • 46. Internet of Samples Initiatives • CODATA Task Group “Physical Samples in the Digital Era” • SciColl: Scientific Collections International (Consortium) • iSamples (Internet of Samples in the Earth Sciences) • Funded EarthCube Research Coordination Network (RCN) • advance access and re-use of physical samples through use of innovative cyberinfrastructure • DESC: Digital Environment for Sample Curation • IGSN e.V. • National Data Services test-bed Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 46
  • 47. DATA FACILITIES FOR THE LONG TAIL Scalability, Sustainability Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 47
  • 48. Many Earth Science Data Communities 48 Atmo- spheric Chemistry Climate & Large Scale Dynamics Paleo- Climate Meteor- ology Aeronomy Space Weather Magneto- spheric Physics Solar Terrestrial Igneous Petrology & Volcan- ology Geo Ed & Workforce Training NCAR Geophysi cs & Geody- namics Geobiology & Paleoen- tology Cryosphere & Ice Dynamics Critical Zone & Soil Science Chemical Ocean- ography Geomor- phology Hydrology Sediment -ology & Strati- graphy Marine Geophysics Physical Ocean- ography Marine Geology Biological Ocean- ography Ocean Education Ocean Drilling & Engineer- ing Software & Modeling Bio- informatics Ecosystems Biology High Perf Computing Semantics & Ontologies Algorithm s & Data Mining EarthCube CI Solid and Aqueous Geochem -istry Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain"
  • 49. IEDA: A “Long-Tail” Data Facility Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 49 www.iedadata.org • Multiple core disciplines (focus: solid earth) • High-T Geochemistry • Low-T Geochemistry • Petrology • Marine Geophysics & Geology • Geochronology • Cross-disciplinary tools & services • Sample registry SESAR • IEDA Data Browser • Portals (GeoPRISMs, USAP-DCC, etc.) • GeoMapApp • Data management support 49
  • 50. From Research Data Collections to Data Facility Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 50 Formal Governance Robust Infrastructure Stable Expert Team Accreditation Adherence to Community Standards
  • 51. Scalable Infrastructure Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 51 The ALLIANCE Model
  • 52. Alliance Development Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 52 Proposal “Interdisciplinary Earth Data Alliance as a Model for Integrating EC Technology Resources and Engaging the Broad Community” submitted March 2015 MetPetDB Mineral Physics Deep Submergence IcePod Challenges: • Social & organizational engineering • Diversity of data needs • Diversity of systems • Business models
  • 53. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 53
  • 54. Conclusions • Long-tail data can grow BIG through domain- specific data curation. • Partnerships among data efforts can provide a solution for sustainability of data infrastructure in long tail communities • Partnerships with the computer and information sciences are necessary to build the cyberinfrastructure. Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 54
  • 55. EarthCube Motivations Kerstin Lehnert: "Making small data BIG: Insights from a Long-tail Geoscience Domain" 55 To transform geosciences research by supporting community- driven cyberinfrastructure to integrate data and information. Tech.Drivers Supports science and other User Needs Create a dynamic, community-driven cyberinfrastructure Open, evolvable, sustainable Easy interface with existing capabilities Challenges Diversity of the geosciences Interdisciplinary Science Questions Big, Heterogeneous Data issues Communities that are poorly served/have no community resources
  • 56. Towards an Architecture for EarthCube • Under purview of the EarthCube Technology and Architecture Committee (TAC) – Coordinating with Council of Data Facilities, Science Committee, and Liaison Team • Ongoing Working Groups (since Fall 2014): – Architecture WG – Standards WG – Use Cases WG – Funded Projects and Gap Analysis WG – Testbed WG ! ! EarthCube! ! ! ! Building(( Blocks( Architecture( Governance( Research(( Coordina7on(( Networks( Funded& Projects& ! EarthCube! Funded! Projects! ! (2013!and!2014!Awards)! !
  • 57. TAC Workshop (ongoing on now) Learn more at: http://earthcube.org/group/technology-architecture-committee http://earthcube.org/document/2014/earthcube-past-present-future