9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long-tail Domains
1. Making small Data BIG
THROUGH INTERDISCIPLINARY
PARTNERSHIPS AMONG LONG-TAIL DOMAINS
AGU FM 2014: IN14B-01
1
K. Lehnert 1, S. Carbotte 1, R. Arko 1, V. L. Ferrini 1, L. Hsu 1, L. Song 1,
M. Ghiorso 2, J. D. Walker 3
1 Lamont -Doherty Earth Observatory, Columbia University, Palisades, NY,
2 OFM Research, Seattle, WA
3 University of Kansas, Lawrence, KS
2. DATA FACILITIES IN A BIG DATA
WORLD
AGU FM 2014: IN14B-01
2
small data
BIGdata
Data Centers & Facilities
X axis: Data Volume
Y axis: Data Size
Distributed datasets
Research Data Collections
3. 33
HOW WE DEFINE ‘BIG’
Volume
Velocity
Variety
Veracity
VALUE
AGU FM 2014: IN14B-01
3
“The long tail is a
breeding ground for new
ideas and never before
attempted science.”
(Heidorn, B. 2008: “Shedding
Light on the Dark Data in the
Long Tail of Science”)
4. ADDING VALUE
AGU FM 2014: IN14B-01
4
citable
small data
BIG DATA
accessible
integrated
digital data collection
trustworthy repositories
domain standards
interoperable
APIs, OLP,
5. 55
DATA FACILITIES
“acquire, curate, preserve, and/or disseminate data, software,
and/or models for one or more defined communities or
disciplines”
need to adhere to standards (e.g. ISO 16363, ICSU-WDS)
such as
• governance and organizational viability
• organizational structure and staffing
• procedural accountability and preservation policy framework
• financial sustainability
• contracts, licenses, and liabilities
AGU FM 2014: IN14B-01
5
6. 66
DOMAIN-SPECIFIC
DATA FACILITIES
“With both content-area and
digital curation expertise, domain
repositories are uniquely capable
of ensuring that data and other
research products are adequately
preserved, enhanced, and made
available for replication,
collaboration, and cumulative
knowledge building.”
AGU FM 2014: IN14B-01
6
“Sustaining Domain Repositories for Digital Data: A Call for
Change from an Interdisciplinary Working Group of Domain
Repositories”
Interuniversity Consortium for Political and Social Research
(ICPSR), 2013
7. IEDA: A MULTI-DISCIPLINARY DATA
FACILITY FOR LONG-TAIL SCIENCE
AGU FM 2014: IN14B-01
7
• Many disciplines
• geochemistry, marine geophysics, marine geology, geochronology, and more
• Many data types
• sensor data and sample-based observations & experiments
• raw data (e.g. multi-beam), field data, lab data, derived data, samples
• gridded data, point data, time-series data, maps, photos, and more
• File sizes varying from a few kilobytes to terabytes
9. FROM RESEARCH DATA COLLECTIONS
TO DATA FACILITY
AGU FM 2014: IN14B-01
9
“This Cooperative Agreement
converts a series of
proposal/award-driven activities
into a community-based facility
that serves to support, sustain,
and advance the geosciences by
providing a centralized location
for the registry of and access to
data essential for research in the
solid-earth and polar sciences.”
LDEO Data projects funded by NSF OCE,
EAR, OPP that were merged into IEDA
10. FROM RESEARCH DATA COLLECTIONS
TO DATA FACILITY
AGU FM 2014: IN14B-01
10
Formal Governance
Robust Infrastructure
Stable Expert Team
Accreditation
Adherence to
Community Standards
11. IEDA: small data gone BIG
AGU FM 2014: IN14B-01
11
IEDA Syntheses
19 x 106 analytical values in EarthChem
2.63 x 106 miles of data from 808 cruises in the
Global Multi-Resolution Topography (GMRT)
IEDA Repositories
>500,000 files
47 TB
4 x 106 samples
12. LAYERED SERVICES:
THE EUDAT MODEL
AGU FM 2014: IN14B-01
12
Discipline-specific Services
Users
Common Services
- data publication (DOI)
- data submission
- data management (investigator) support
- integrated data access & visualization
- interoperability (web services, RDF linked data, etc.)
- community governance
- community liaison (E&O)
- Data capture (templates, software tools)
- Domain-specific workflows & GUIs
- Data products (syntheses)
- Community standards
- User support & training
13. IEDA: SCOPE & PARTNERS
AGU FM 2014: IN14B-01
13
EarthChem MGDS
Users (Data contribution & retrieval)
Geochron
IEDA Common Services
Solid Earth Observational Data
Areas of expertise: Sensor data & Sample data
15. 1515
PARTNERS
ROLES & RESPONSIBILITIES
Operation of partner systems & services
• Day-to-day operation (except sys admin)
• Planning improvements & new capabilities
• supported by and in coordination with IEDA Implementation Team)
• Align partner systems with IEDA Common Services
• Plan & oversee budget for their activities
• Interact with their specific user communities (user support, training,
feedback, etc.)
Participate in IEDA Partner Assembly
• Contributes to strategic planning & development
• Contribute to planning & prioritization of IEDA developments & activities
• Recommends new opportunities & partnerships
• Participate in IEDA governance
• Participate in annual Face-- Face meeting
15
16. EXAMPLE
AGU FM 2014: IN14B-01
16
IEDA
Repository
IEDA
Sample
Registry
IEDA Sys Op
J.D. Walker (KU):
- metadata schemas
- user interfaces
- web services
- community liaison
Geochron
IEDA Common
Services
17. EXAMPLE
AGU FM 2014: IN14B-01
17
IEDA
Repository
IEDA
Sample
Registry
IEDA Sys Op
M. Ghiorso
(OFM-Research):
- metadata schemas
- user interface
- web services
- community liaison
LEPR
IEDA Common
Services
18. A SCALABLE MODEL
AGU FM 2014: IN14B-01
18
EarthChem MGDS
Users (Data contribution & retrieval)
Geochron LEPR
IEDA Common Services
XX YY
. . . . . .
19. ‘EXTERNAL’ PARTNERSHIPS
AGU FM 2014: IN14B-01
19
PartnerPartner
Funded through the Cooperative AgreementFunded outside the CA;
contract with IEDA
Users (Data contribution & retrieval)
IEDA Common Services
20. 2020
CONCLUSION
Data facilities can grow small data through partnerships
among data efforts in long tail communities
• Maintain the expertise and community liaison of domain-
specific data efforts
• Leverage data curation expertise & infrastructure of data
facilities
AGU FM 2014: IN14B-01
20
Interdisciplinary
Earth Data Alliance
21. THE NEW IEDA
AGU FM 2014: IN14B-01
21
Interdisciplinary
Earth Data Alliance
“IEDA strives to be a leading-edge inter-disciplinary data facility
for solid earth data and information,
founded in domain-specific data resources,
to deliver integrated and streamlined data services that advance
Ocean, Earth and Polar science and education.”
Editor's Notes
It is ironic that we are starting this session on ‘Data Facilities in a Big Data world’ talking about small data.
The data world is generally distinguishing between big data and small data based on data size and data volume meaning number of datasets. The BIG data world in the Earth Sciences is represented by disciplines that generate massive volumes of observational or computed data using large-scale, shared instrumentation such as global sensor networks, satellites, or high-performance computing facilities. These data are typically managed and curated by well-supported community data facilities.
The small data world is the one where data are primarily acquired by individual investigators or small teams (known as ‘Long-tail data’). These small data are usually poorly shared and integrated, and lack data repositories that ensure persistent access, quality control, long-term archiving, standardization, and interoperability. But often these communities have Research Data Collections,
Disciplines with small, PI-generated, distributed data often lack domain-specific data facilities.
No consensus on data practices.
Insufficient funding to support data facilities.
But often served by “Research Data Collections” of substantial scientific value.
mostly short-term funded (research grants)
PI driven (‘single point of failure’)
lack resources to implement repository standards
enabling transformational science
fostering technological development
promoting educational opportunities
IEDA is a data facility that hosts observational solid earth data and
tools from the marine, terrestrial, and polar environments.
¤ Multiple diverse data systems that were developed independently,
serving both
¤ sensor data from large collaborative cruise programs
¤ sample-based measurements from unique analytical laboratories
¤ IEDA data systems enable the data to be discovered and reused.
Gain “Trustworthiness” through IEDA’s shared repository services
DOI registration
long-term archiving solutions
linking to literature & awards
Augment usage & value through integration into IEDA’s multi-disciplinary data services
Single point data submission
Sample registration
Link to scientific literature
Data access & visualization (GeoMapApp)
Ensure representation in relevant data curation, data publication, and informatics communities
Improve sustainability