2. Outline
• Very brief NSIDC intro
• Why crawl?
• Libre crawler architecture
• Questions for the community
2
3. NSIDC: An overview2
Cooperative Institute
for Research in
Environmental
Sciences
Main sponsors:
University of Colorado
Boulder
NSIDC affiliations and sponsorship
National Science
Foundation
NASA
National
Oceanographic
and Atmospheric
Administration
4. The National Snow and Ice Data Center…
Provides
tools for
data access
Researches the
cryosphere
and data
science
Educates
the public
about the
cryosphere
Supports data
users
Manages and
distributes
scientific data
Supports local
and traditional
knowledge
5. Outline
• Very brief NSIDC intro
• Why crawl?
• Libre crawler architecture
• Questions for the community
2
6. Why not let Google do it?
• What's their incentive?
• The schema.org route for data has extreme limitations
2
7. Ways to build a comprehensive catalog
• Ask folks to register their data and services
• Build your catalog by hand
• Automate discovery of data and services
2
8. Preparing Data for Ingest, presented 10/27/09 by R. Duerr
LID590DCL Foundations of Data Curation
What if...
Advertising your data so that everyone could
find them, were as simple as...
1 - Filling out a web form
2 - Saving it to your website
3 - Adding its link to your site
Well... It can be!
13. Things we are going to search for
• OpenSearch
• OAI-PMH, ESIP Data and service cast feeds
• THREDDS catalogs
• Web-enabled folders
• WADL/WSDL
2
14. Things we are going to search for
• OpenSearch
• OAI-PMH, ESIP Data and service cast feeds
• THREDDS catalogs
• Web-enabled folders
• WADL/WSDL
2
But what else should we look for?