Since 1962, ICPSR has been an integral part of the infrastructure of social science research with its vast digital archive supporting over 700 member institutions worldwide. With the release of our new digital assets management system “Archonnex,” ICPSR continues this tradition by extending our expertise and digital technology capabilities as a service to the larger community. For the first time researchers, institutions, organizations, and even nations will be able to host their own repositories and setup data services for their members. We call it RaaS - Repository as a Service.
3. ICPSR
➢ Research data management organization.
➢ Hosts archives for various national agencies.
➢ Professional curation staff.
➢ Enabling secondary use of research data.
➢ Research and development of new
technologies.
4. Our Organization
➢ Primary Research Staff
➢ Professional curation staff
➢ Data Librarians and Archivists
➢ Web and Social Media content designers
➢ Computing & Network Services
5. Archonnex Guiding Principles
➢ Comprehensive Digital Asset Management Platform.
➢ OAIS Model compliant
➢ Multi-tenancy.
➢ Secure. Data encryption at rest and in-transit.
➢ Service Oriented, Scalable and Modular.
➢ Open Source technologies
➢ Standards based metadata harvesting and data exports.
➢ Cohesive technology choices.
➢ Flexible UI components enabling D2D integration.
6.
7. Data Ingestion
➢ Ability to upload and organize digital objects.
➢ Bulk file uploads from local/NFS drives.
➢ FTP/SFTP uploads.
➢ Email uploads.
➢ API pull/push mechanisms.
➢ Imports from ZIP bundles.
➢ Custom extracts
8. Data Curation
➢ Metadata - researcher/producer generated
➢ Data Librarian/curation staff generated
➢ Machine generated for digitally born files
○ Video, Audio, Images, Tabular, Geospatial, etc
➢ Custom metadata extraction tools
○ SPSS, Stata, Apache-Tika, BulkExtractor, others
○ Integration with DDI, JSON, XML, RDF, etc
9. Data Discovery
➢ Search using Apache Solr
➢ Custom indexes
➢ Interesting ways to discover data
○ Visualization Dashboards
○ Data search across multiple tenants (repositories)
➢ Persistent Digital object identifiers
➢ Customizable resource views
10. Data Preservation
➢ Provenance information (Version Management)
➢ Creating preservation formats that are durable
➢ Periodic Fixity checks (Raw Data Validation)
➢ Replication (Digital Copies)
➢ Ability to easily locate assets within preservation
area.
12. Hosting your repository
Developer 2 Developer Integration
○ Users don't leave your website.
○ Seamlessly embed GUI as JS plugins.
○ Few snippets of JS code gets you going.
○ Your websites could be in Wordpress, Drupal, PHP,
ColdFusion, ASP… doesn't matter.
13. Setting up a new tenant
➢ Workspace
➢ Search
➢ Resource Views
➢ Data sensitivity and security
➢ Custom metadata extractors
➢ Your website URLs
➢ Identify administrators/management team
*ICPSR will provide a checklist
16. Configuring Search (d2d)
<div id=”search”> </div>
<script type="text/javascript">
var archive = "openicpsr";
var searchManagerUrl = "https://search.icpsr.umich.edu/search";
var searchConfig = {....};
var buildSearchResultsHeader = function(){
var headerString = "<div class="row" id="columnHeadings">.....</div>";
return headerString;
};
var buildSearchResult = function(val) {
return (<div>.....</div>);
};
var saveSearchResult = false;
var customActions = [];
ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}),
document.getElementById("search"));
</script>
17. Dissemination (d2d)
Build a sample HTML, apply your CSS and
themes.
Establish mapping with metadata in the
repository.
We will convert to FreeMarker templates.
“Apache FreeMarker™ is a template engine: a
Java library to generate text output (HTML web
pages, e-mails, configuration files, source code,
etc.) based on templates and changing data”
18. Administration & Reporting
➢ Access to your repository through an Admin GUI
➢ Usage statistics and reports with charts and
visualization
➢ Google Analytics enabled
19. Quantitative data tools
➢ We specialize in quantitative data.
➢ Supports easy to use online statistical analysis
using R packages.
➢ Supported formats include CSV, SPSS, SAS,
Stata and R.
21. Infrastructure
AWS Cloud
➢ EC2 compute w/ EBS storage - S3 in future
➢ Backups synchronized back to Ann Arbor
➢ S3 storage for StatSnap
➢ EC2 compute w/ autoscaling & ELB for StatSnap
➢ VPC's with VPN to campus for legacy system access
Replication
➢ Tape copy (encrypted) offsite Ann Arbor location
➢ Staging copy on Perry server
➢ ITS MiStorage + Replicated to North Campus
➢ Duracloud synchronizes two copies
○ Amazon S3 and Glacier (Each has redundancies)
➢ Digital Preservation Network (DPN) - future
22. Pricing Model
➢ Hardware Cost
➢ Data Usage and Storage Cost
➢ Processing Cost
➢ Networking
➢ IT Personnel Cost
○ base scope of work needed
*Savings gained by using standard repository features