A Collection of heterogeneous files. Users can tag and add comments to the entire ‘collection’ and individually tag and comment on the objects in the collection. Note: Extraction services and previewers are all driven by the file MIME type. Extraction services are customizable and are designed to automate derived data products from the file being uploaded. Examples follow…
Lidar data saved as .png.The Image extraction service does the following:Creates the thumbnail and preview imageCreates an image pyramid of the image (zoom/pan large images w/o downloading entire image via the SeaDragon webapp )Extract all header information from image file to include: Exif, GPS, Interoperability, etc… Extracted data is view by clicking on the “Extracted Information” section.
A data set saved as a simple ASCII text file.- Users can preview the first 80 lines of the text file.
Preview the contents of .csv files
Simple map image User defined informationImage is part of multiple collectionsImage is tagged
3 Images (3 clicks)Standard Medici InfoScroll down to show location and annotationThis image file also contained geo location data which become visible in “Location”. Geo-location can be extracted from the image Exif data or authors can add a geo-location to any file in the repository.Note the creator tag and vivo reference.
Tif support - relatively large 71MB fileClicks…Click Zoom to enable SeaDragon to explore the details of the file via zoom and pan with mouse.Click the lower right icon to enable full screen. Use + or – key to zoom (or wheel on mouse), click image and drag to panClick lower right icon to return to embedded window in Medici
Image file that contains GPS data which is extracted by Medici as part of the upload process.
Mpeg file uploads:Extraction service creates a flash version of the file for preview.
PDF files Extraction service generates an image per page of the file. In this case a slide set from a presentation. Click ‘Pages’ to enable the slide set mode and click on the left or right arrows to navigate the pages. 2 images – click to advance slide.
.shp files The components of shape file get uploaded to Medici as a zip Medici saves the zip blob and the extraction service registers the contents of the shp file with GeoServerOpenStreetMap displays the contents of the zipLayers are on by default but can be turned by clicking the ‘show’ button.Opacity of layers can be varied using the opacity scale.(WIP) We plan to embed OpenStreetMap in Medici as a previewer for .shp and .kml
All layers off except Illinois Flood Zone map. Map zoomed into the Champaign region of interest.
SEAD Datanet and1.2. NSF DataNet Overview SEAD Overview Sustainability Science3. SEAD Active/Social Curation4. SEAD Virtual Archive Repository Robert H. McDonald Deputy Director/Associate Dean Data to Insight Center/IU Libraries SC12 | Salt Lake City, UT November 12, 2012 http://www.sead-data.net @SEADdatanet
SEAD DataNet and SustainabilityScience http://www.sead-data.nethttp://slidesha.re/TAk3ht @SEADdatanet 2 SEAD DataNet Home
SEAD TEAMS Margaret Hedstrom-PI, Marietta Van Buhler, Karen Woollams, Michigan George Alter (ICPSR), Bryan Beecher (ICPSR) Beth Plale-Co-PI, Katy Börner, Robert H. McDonald, Robert Light, Kavitha Chandrasekar, Stacy Kowalczyk, Inna Kouper, Robert Ping, Indiana Ryan Cobine James Myers-Co-PI, Ram Prasanna Govind Krishnan, Lindsay ToddRensselaear Praveen Kumar-Co-PI, Terry McLaren (NCSA), Rob Kooper (NCSA), Illinois Luigi Marini (NCSA) 3 SEAD DataNet Home
NSF DataNet ProgramMotivation: “… one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams.”Response: DataNet creates “a set of exemplar national and global data research infrastructure organizations” to address this challenge.4 SEAD DataNet Home
Current NSF DataNet ProjectsSEAD • http://sead-data.netDataOne • http://www.dataone.orgDataNet Federation Consortium • http://datafed.orgTerra Populous • https://www.pop.umn.edu/terra_pop5 SEAD DataNet Home
SEAD’s ApproachSEAD Partners - http://sead-data.net • Contribute infrastructure to the DataNet vision that supports data access, sharing, reuse, and preservation for the long tail • Develop a data access and preservation environment that supports the research, technical, and economic requirements for data management in the long tail • Enable Active and Social Curation Utilize emerging preservation and access infrastructures 6 SEAD DataNet Home
Long Tail Data Challenges Exa BytesBytes per day Peta Bytes Tera Bytes Giga Bytes Many smaller datasets… 7 SEAD DataNet Home
CI for the Long TailWhat is the “long tail” of scientific research andwhy does it matter? • Diverse set of researchers, questions, data, and methodologies, etc. • Diverse set of requirements for instrumentation, data collection, models, analysis, etc. • Little standardization, no common denominator • Most researchers and most research dollars go to researchers in the long tail • The long tail is underserved by current CI8 SEAD DataNet Home
Long Tail Example: SustainabilityResearchMany dimensions, many coordinate systems, many scales,many data collection and analysis tools, many formats, along-tail of providers and users, …9 SEAD DataNet Home
SEAD 18 month Pilot PhaseDomain Engagement: • National Center for Earth-Surface Dynamics (NCED), Illinois River Basin Observatory • Requirements, Use Cases, Prioritization of Data Types and ServicesActive and Social Curation • Pilot Active Content Repository, VIVO deployments • Exemplar services for Data Ingest, Discovery, Re-use, Curation (Tupelo/Medici)CI for Long-term Access (Virtual Archive) • Data model, protocol design/development • Pilot Federated Repository infrastructureEducation, Outreach, and Training • Post-doc mentoring • Web site, training materials, meetings, workshops, …Project Oversight • Management, reporting, committees • Business model development10 SEAD DataNet Home
NCED Collection AccessNCED collections in SEAD-ACR • (20 Top-level Collections, 454K files, 2.25M objects, 1.6 TB data) • NCED Repository Interface • Support for hierarchy • Support for collection annotation • View/add NCED/domain specific Terms • New Large Server with Virtual Machine ACR instances • Ingest tools and procedures • csv2rdf4LOD • Archiving, Citation, DOI assignment, …NCED users can (with an account) go fromweb page to previews and downloads (w/ocart), can add annotations, can browse,search by text (any fields and content), tags,etc. 11 SEAD DataNet Home
SEAD notions of defined Data PhasesPhases of data lifecycle acknowledge and accommodate the difference between publicdata and data still in work by a researcher.Research Data Phase: data set is research data collection, owned by individual andunder their control. • Data need not be licensed at this time because it is not ready for broader release • Data need not have permanent IDs because still work in progress • Corresponds to first existence in Active Curation RepositoryPublished Phase: Owner of research data collection determines that dataset is ready forpublication • License terms set • Persistent ID • Made available as part of public profile in VIVO • Activated by user-controlled publish event12 SEAD DataNet Home
SEAD Active/Social CurationRepository13 SEAD DataNet Home
SEAD/NCED Data Social Network27 SEAD DataNet Home
NCED Data Social Network in SEAD-VIVO Mary Power NCED PI and Professor University of California William Dietrich NCED PI and Professor University of California Collin Bode NCED Data TechnicianNCED Social Network Connections Based on Data Authorship28 SEAD DataNet Home
Angelo Basic GIS Coverage Data Set29 SEAD DataNet Home
SEAD Data Set Publishing Workflow NCED Data Set NCED Data Set• Data content used Ingested to VA • DataCite minted Published to within ACR DOI attached to VIVO• Researcher Profile • Data Set ready to finalized Data Set publish • DOI Resolution to Established in VIVO designated IR NCED Data Set NCED Data Set Ingested to ACR Deposited with IR 30 SEAD DataNet Home
Published NCED Data Set in IR (IU ScholarWorks)31 SEAD DataNet Home
Virtual Archive FeaturesUsability consistent with research user expectations • Additional metadata fields for scientific datasets • Ability to ingest data with previewing dataRepository tracking: tracking member Institutional Repositories(IRs) and their stored content • Not just link to repository, but extensive cataloging tool (metadata and other additional information) • Allows users to search for data in particular IR or over all IR’sLow cost replication: cloud based storage for reliability • Proof of concept uses Amazon S3 to maintain copy of files and collections. Amazon Glacier is low-cost, secure and durable. Optimized for cold storage. Other solutions exist. 33 SEAD DataNet Home
Component Interactions:Virtual Archive and ACR Data Set Ingested Data Set to Virtual Archive Published to VIVO Data Set Data Set Uploaded Deposited with to ACR Institutional repository 36 SEAD DataNet Home
ACR – VA Interaction Protocol ACR UI VA UI Researcher Curator Mark Data For Publication (and Accept Licensing Terms) Active Curation Repository Curator Request for Preview Virtual Archive (SPARQL) Query Metadata Return Metadata Endpoint SWORD Curator PreviewT im e Ingest Data To VA User Queries VA for DOI Query Metadata update and View DOI Metadata Endpoint Query 37 SEAD DataNet Home
Virtual Archive Workflow AcceptRepositoryAgreement in ACR Preview File Data Upload Data Run Virus Deposit to Index to VA Character- Mint DOI Ready to Checking IR Metadata ization Publish Large Index Dataset Scientific Version IR Match- Metadata Data maker Policy Decision To be completed by March 2013 38 SEAD DataNet Home
Key Questions for SEAD Prototype• What could SEAD capture when?• How can SEAD provide direct value to data producers, users, and curators?• How can web 2.0/3.0 and social computing lower barriers and reduce/realign costs?39 SEAD DataNet Home
Towards A Shared Data Future Data User functionalities, data Users capture & transfer, virtual Generators research environments Data Curation Data discovery & navigation, Community Support Services workflow generation,Trust annotation, interpretability Persistent storage, identification, authenticity Common Data Services (provenance), workflow execution, data mining Source: EU HLEG Report on Data Deluge: Riding the Wave, pg 31, 201040 SEAD DataNet Home
Data Interoperability and SEAD• NSF OCI: DataNet and INTEROP now DIBBs• EUDAT• Research Data Alliance• IETF Research Data Identifier BOF• NCED Data Network41 SEAD DataNet Home
AcknowledgementsSEAD is funded by the National Science Foundationunder cooperative agreement #OCI0940824• For more on SEAD go to:• http://sead-data.net• Follow us on Twitter @SEADdatanet http://sead-data.net 42 SEAD DataNet Home