SlideShare a Scribd company logo
1 of 15
Download to read offline
Crawling for EarthCube
Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo
Outline
• Very brief NSIDC intro
• Why crawl?
• Libre crawler architecture
• Questions for the community
2
NSIDC: An overview2
Cooperative Institute
for Research in
Environmental
Sciences
Main sponsors:
University of Colorado
Boulder
NSIDC affiliations and sponsorship
National Science
Foundation
NASA
National
Oceanographic
and Atmospheric
Administration
The National Snow and Ice Data Center…
Provides
tools for
data access
Researches the
cryosphere
and data
science
Educates
the public
about the
cryosphere
Supports data
users
Manages and
distributes
scientific data
Supports local
and traditional
knowledge
Outline
• Very brief NSIDC intro
• Why crawl?
• Libre crawler architecture
• Questions for the community
2
Why not let Google do it?
• What's their incentive?
• The schema.org route for data has extreme limitations
2
Ways to build a comprehensive catalog
• Ask folks to register their data and services
• Build your catalog by hand
• Automate discovery of data and services
2
Preparing Data for Ingest, presented 10/27/09 by R. Duerr
LID590DCL Foundations of Data Curation
What if...
Advertising your data so that everyone could
find them, were as simple as...
1 - Filling out a web form
2 - Saving it to your website
3 - Adding its link to your site
Well... It can be!
Why not let Google do it?
2
Outline
• Very brief NSIDC intro
• Why crawl?
• Libre crawler architecture
• Questions for the community
2
Crawler Big Picture
2
BCube Crawler
BCube Broker
CINERGI
Crawler Architecture
2
Things we are going to search for
• OpenSearch
• OAI-PMH, ESIP Data and service cast feeds
• THREDDS catalogs
• Web-enabled folders
• WADL/WSDL
2
Things we are going to search for
• OpenSearch
• OAI-PMH, ESIP Data and service cast feeds
• THREDDS catalogs
• Web-enabled folders
• WADL/WSDL
2
But what else should we look for?
16
Questions/Comments

More Related Content

Similar to AHM 2014: Crawling for EarthCube

Similar to AHM 2014: Crawling for EarthCube (20)

EarthCube Day 2 Review - IT/FOSS Workshop
EarthCube Day 2 Review - IT/FOSS WorkshopEarthCube Day 2 Review - IT/FOSS Workshop
EarthCube Day 2 Review - IT/FOSS Workshop
 
DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)
 
PDFsam_merge.pdf
PDFsam_merge.pdf PDFsam_merge.pdf
PDFsam_merge.pdf
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Curating data for integrated science
Curating data for integrated scienceCurating data for integrated science
Curating data for integrated science
 
Curating data for integrated science
Curating data for integrated scienceCurating data for integrated science
Curating data for integrated science
 
Xiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationXiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentation
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
African Open Science Platform
African Open Science PlatformAfrican Open Science Platform
African Open Science Platform
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Sgci nsf-2-22-17
Sgci nsf-2-22-17Sgci nsf-2-22-17
Sgci nsf-2-22-17
 
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
ICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. MurayamaICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. Murayama
 
Lm gsa training
Lm gsa trainingLm gsa training
Lm gsa training
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 

More from EarthCube

Peckham 2014 i_em_ss
Peckham 2014 i_em_ssPeckham 2014 i_em_ss
Peckham 2014 i_em_ss
EarthCube
 

More from EarthCube (20)

Engagement Team monthly meeting 10.10.2014
Engagement Team monthly meeting 10.10.2014Engagement Team monthly meeting 10.10.2014
Engagement Team monthly meeting 10.10.2014
 
Sci Committee Meeting Slides 10.06.14
Sci Committee Meeting Slides 10.06.14Sci Committee Meeting Slides 10.06.14
Sci Committee Meeting Slides 10.06.14
 
Funded teams slides 10.10.14
Funded teams slides 10.10.14Funded teams slides 10.10.14
Funded teams slides 10.10.14
 
Technology and Architecture Committee meeting slides 10.06.14
Technology and Architecture Committee meeting slides 10.06.14Technology and Architecture Committee meeting slides 10.06.14
Technology and Architecture Committee meeting slides 10.06.14
 
EarthCube Governance Intro for Solar Terrestrial End-user Workshop
EarthCube Governance Intro for Solar Terrestrial End-user WorkshopEarthCube Governance Intro for Solar Terrestrial End-user Workshop
EarthCube Governance Intro for Solar Terrestrial End-user Workshop
 
EarthCube Community Webinar: Introduction to Committees and Teams
EarthCube Community Webinar: Introduction to Committees and TeamsEarthCube Community Webinar: Introduction to Committees and Teams
EarthCube Community Webinar: Introduction to Committees and Teams
 
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
AHM 2014: The CSDMS Standard Names, Cross-Domain Naming Conventions for Descr...
 
AHM 2014: PolarHub: A Global Hub for Geospatial Service Discovery
AHM 2014: PolarHub: A Global Hub for Geospatial Service DiscoveryAHM 2014: PolarHub: A Global Hub for Geospatial Service Discovery
AHM 2014: PolarHub: A Global Hub for Geospatial Service Discovery
 
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...
AHM 2014: Addressing Data and Heterogeneity, Semantic Building Blocks & CI Pe...
 
AHM 2014: Revisting Governance Model, Preparing for Next Steps
AHM 2014: Revisting Governance Model, Preparing for Next StepsAHM 2014: Revisting Governance Model, Preparing for Next Steps
AHM 2014: Revisting Governance Model, Preparing for Next Steps
 
AHM 2014: Integrated Data Management System for Critical Zone Observatories
AHM 2014: Integrated Data Management System for Critical Zone ObservatoriesAHM 2014: Integrated Data Management System for Critical Zone Observatories
AHM 2014: Integrated Data Management System for Critical Zone Observatories
 
Peckham 2014 i_em_ss
Peckham 2014 i_em_ssPeckham 2014 i_em_ss
Peckham 2014 i_em_ss
 
AHM 2014: BCube Brokering Framework
AHM 2014: BCube Brokering FrameworkAHM 2014: BCube Brokering Framework
AHM 2014: BCube Brokering Framework
 
AHM 2014: EarthCube Architecture Forum Introduction
AHM 2014: EarthCube Architecture Forum IntroductionAHM 2014: EarthCube Architecture Forum Introduction
AHM 2014: EarthCube Architecture Forum Introduction
 
AHM 2014: A Few Notes on GEOSS Architecture
AHM 2014: A Few Notes on GEOSS ArchitectureAHM 2014: A Few Notes on GEOSS Architecture
AHM 2014: A Few Notes on GEOSS Architecture
 
AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...
AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...
AHM 2014: The iPlant Collaborative, Community Cyberinfrastructure for Life Sc...
 
AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications
 
AHM 2014: Conceptual Design
AHM 2014: Conceptual DesignAHM 2014: Conceptual Design
AHM 2014: Conceptual Design
 
AHM 2014: Conceptual Design, Developing a Data-Oriented Human-Centric Enterpr...
AHM 2014: Conceptual Design, Developing a Data-Oriented Human-Centric Enterpr...AHM 2014: Conceptual Design, Developing a Data-Oriented Human-Centric Enterpr...
AHM 2014: Conceptual Design, Developing a Data-Oriented Human-Centric Enterpr...
 
AHM 2014: Enterprise Architecture for Transformative Research and Collaborati...
AHM 2014: Enterprise Architecture for Transformative Research and Collaborati...AHM 2014: Enterprise Architecture for Transformative Research and Collaborati...
AHM 2014: Enterprise Architecture for Transformative Research and Collaborati...
 

AHM 2014: Crawling for EarthCube

  • 1. Crawling for EarthCube Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo
  • 2. Outline • Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community 2
  • 3. NSIDC: An overview2 Cooperative Institute for Research in Environmental Sciences Main sponsors: University of Colorado Boulder NSIDC affiliations and sponsorship National Science Foundation NASA National Oceanographic and Atmospheric Administration
  • 4. The National Snow and Ice Data Center… Provides tools for data access Researches the cryosphere and data science Educates the public about the cryosphere Supports data users Manages and distributes scientific data Supports local and traditional knowledge
  • 5. Outline • Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community 2
  • 6. Why not let Google do it? • What's their incentive? • The schema.org route for data has extreme limitations 2
  • 7. Ways to build a comprehensive catalog • Ask folks to register their data and services • Build your catalog by hand • Automate discovery of data and services 2
  • 8. Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation What if... Advertising your data so that everyone could find them, were as simple as... 1 - Filling out a web form 2 - Saving it to your website 3 - Adding its link to your site Well... It can be!
  • 9. Why not let Google do it? 2
  • 10. Outline • Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community 2
  • 11. Crawler Big Picture 2 BCube Crawler BCube Broker CINERGI
  • 13. Things we are going to search for • OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL 2
  • 14. Things we are going to search for • OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL 2 But what else should we look for?