Your SlideShare is downloading. ×
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)


Published on

CNI Fall 2011 Meeting Presentation by Margaret Hedstrom & Robert McDonald (Dec. 2011)

CNI Fall 2011 Meeting Presentation by Margaret Hedstrom & Robert McDonald (Dec. 2011)

Published in: Technology, Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • How may people in this audience have an institutional repository? Are you using it to publish data?
  • Transcript

    • 1. SEAD Sustainable Environment – Actionable Data CNI Fall Members MeetingMargaret Hedstrom Robert H. McDonald Arlington, VASEAD PI/Project Director SEAD Sr. Personnel 12/12/2011Professor & Associate Dean Assoc. Dean/Associate DirectorUM School of Information Indiana University
    • 2. NSF DataNet Program• new types of organizations that integrate library & archival sciences, cyberinfrastructure, computer & information sciences, & domain science expertise• provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;• continuously anticipate and adapt to changes in technologies and in user needs and expectations;• engage in research to drive the leading edge forward• serve as component elements of an interoperable data preservation and access network
    • 3. • SEAD’s UniquePartners Contributions – Address domain-driven needs & requirements – Serve scientists and researchers in the “long tail” – Integrate existing technologies, tools & services (rather than build new from scratch)
    • 4. Sustainability Science ScienceCooperation Technology Policy Economics Poverty & Justice 4
    • 5. Data challenges• Heterogeneity of all kinds• Multiple scales• Multidisciplinary• Many small datasets
    • 6. The long tail of scientific research• Small and derived data sets• Heterogeneous data• Multiple sources of data• Short-lived data with long-term value• Value of data grows when combined & integrated
    • 7. SEAD’s Goals• Provide data services that address the needs of researchers working toward sustainability• Integrate these services into an generalizable “Active and Social Curation” infrastructure suited to the social structure and economics of long-tail research communities• Develop capabilities to package and migrate the most valuable datasets to a federated repository infrastructure for long-term preservation• Education, outreach, & training to disseminate SEAD‟s contributions to other projects & communities
    • 8. SEAD’s Strategy• Leverage social media for discovery of data, interest, and expertise• Move data curation upstream in the data life cycle• Involve domain scientists in setting priorities for evolution of data and services• Take advantage of existing infrastructures (Institutional Repositories, ICPSR) for long- term preservation
    • 9. Active and Social Curation• Engage researchers during projects, not at the end• Automatically capture metadata as defined by the data producers• Provide facilities for commentary, recommendations, and mark-up of data• Further reduce costs by re-engineering curation processes to leverage this rich metadata and volunteered effort
    • 10. Active Curation ModelActive Curation Social MediaWorkflows Data Review Rating Commenting Metadata
    • 11. SEAD Status Phase 1 Phase 2 Months 1-18 Years 3-5 Grow SEAD Develop users, data, an Prototype d functionality SEAD start date: 10/1/2011 In other words, SEAD is not ready to accept your data!
    • 12. SEAD Personnel• Margaret Hedstrom, PI (Michigan)• Praveen Kumar, co-PI (Illinois)• Jim Myers, co-PI (RPI)• Beth Plale, co-PI (Indiana)• Ann Zimmerman, co-PI/Project Manager (Michigan)• George Alter (ICPSR)• Bryan Beecher (ICPSR)• Katy Börner (Indiana)• Robert McDonald (Indiana)• Jude Yew, Post-doc (Michigan)• + many more to come
    • 13.
    • 14. SEAD TEAMUniversity of Michigan: Margaret Hedstrom (UM PI), AnnZimmerman (Co-PI and Project Manager), George Alter, BryanBeecher, Charles Severance, Karen Woollams, Jude Yew.Indiana University: Beth Plale (IU PI), Katy Borner, Robert H.McDonald, Kavitha Chandrasekar, Robert Ping, StacyKowalczyk, Robert Light.University of Illinois: Praveen Kumar (UIUC PI), Rob Kooper, LuigiMarini, Terry McLaren.Rensselaer Polytechnic Institute: Jim Myers (RPI PI), Ram PrasannaGovind Krishnan, Lindsay Todd, Adam Wilson.
    • 15. SEAD Cyberinfrastructure• An international resource for sustainability science• Novel technical and business approaches to supporting the long-tail of research data• Lifecycle support: actionable data services integrated with curation and preservation infrastructure
    • 16. Key Challenges for SEADCyberinfrastructure • Managed Data storage and services are expensive! • Begging for metadata doesn‟t work! • Curation and preservation are time consuming! • The long-tail is not standardized! • Data collections are always missing something valuable! • Data models evolve! • Cyberinfrastructure is obsolete by the time you build it! • Building Community as you leverge cyberinfrastructure
    • 17. SEAD: Social Networking• Co-authorship• Co-funding• Micro-citation• Shared project repositories• Shared tags• Threaded discussions• Quoting, forwarding, …
    • 18. Linked Data and Repositories• Tag and annotate data• Overlay it with reference data• Organize it in domain terminology• Link it to people, papers, projects, conversations…
    • 19. Using Science of Science to LinkRepositories
    • 20. KEY SEAD Questions• What could SEAD capture when?• How can SEAD provide direct value to data producers, users, and curators?• How can robust web-services and social computing lower barriers and reduce/realign costs?
    • 21. SEAD: Active Content Repository• With the „Big Picture‟ graph in-hand, curators can: ▫ Focus on what to curate and when, ▫ Automate parts of the process ▫ Use existing/emerging technologies for packaging and preserving datasets ▫ Better manage federated repositories
    • 22. SEAD: Leveraging Existing Resources• Cyberinfrastructure ▫ IU Data Capacitor/HPC Capabilities ▫ UIUC/NCSA HPC Capabilities ▫ Rensselaer CCNI Capabilities• Repositories ▫ UM Deep Blue ▫ IU ScholarWorks ▫ ICPSR Repository ▫ UIUC IDEALS
    • 23. SEAD LayerCake View• Services over an Network of Data Producers active content layer that is backed by/harvested into a Web User Interface federated archive Active Content Repository infrastructure based Services Provided Content Curation Archival Other data services Mining Decisions on institutional generation Virtual Archives resources Institutional Repositories Data IU RPI UIUC UM ICPSR Conservancy User Network
    • 24. CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/RuleAcquisition, Management EngineAnalysis and DDI3. Operates on Simulation METS, PREMIS, MODS Metadata, Content Scholarly Objects and Trigger , DC, SensorML, OGC, … Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, a Compound Objects - OAI-OREVIVO/ and CI Technical Approach uthentication, trLinked Active Selection ansformation Data Content Digital Repository Federation (OAIS compliant) Repositor Preservation Actions y Dissemination Packages Wide-Area File System Search, Brows e, Migration and Access Mechanisms Annotation, V Use, Reuse, R and E-Scholarship isualization epurposing Emulation Contributor User Services Tools Tools Tools
    • 25. Toward PetaScale Data• Internet2 upgrade: ▫ Total bandwidth from 100 Gbps to 8.8 Tbps ▫ Moving a petabyte of data will go from from 10 days to 25 hrs
    • 26. SEAD 18 Month Prototype Targets forCyberinfrastructure• Active and Social Content Curation ▫ Pilot Active Content Repository, VIVO deployments ▫ Exemplar services for Data Ingest, Discovery, Re- use, Curation• CI for Long-term Access ▫ Data model, protocol design/development ▫ Pilot Federated Repository infrastructure
    • 27. SEAD CI QuickView• SEAD will quickly build a repository and data services infrastructure for sustainability research that can be responsively adapted based on community feedback – Community Agile Development• SEAD will leverage existing tools and emerging practices to dramatically enhance the interactions of researchers and data librarians – Active Curation• SEAD‟s focus on the long-tail will force an emphasis on ease-of-use and low costs that is critical for long-term sustainability – Leverage Existing Institution Resources for Long-term Access• SEAD will leverage experiences in the sustainability research community to provide guidance for other long-tail communities making the transition to an interdisciplinary, systems-oriented approach to research – Sustainability and Resource Growth Partnership and Collaboration
    • 28. AcknowledgmentsSEAD is funded by the National ScienceFoundation under cooperative agreement#OCI0940824• For more on SEAD go to:•• Follow us on Twitter @SEADdatanet