SlideShare a Scribd company logo
1 of 24
Download to read offline
by ICPSR
Repository as a Service (RaaS)
Agenda
➢ Introduction
➢ Repository Services
○ Ingestion
○ Curation
○ Discovery
○ Preservation
➢ Demo
➢ Developer 2 Developer Integration
○ Workspace
○ Search
○ Dissemination
ICPSR
➢ Research data management organization.
➢ Hosts archives for various national agencies.
➢ Professional curation staff.
➢ Enabling secondary use of research data.
➢ Research and development of new
technologies.
Our Organization
➢ Primary Research Staff
➢ Professional curation staff
➢ Data Librarians and Archivists
➢ Web and Social Media content designers
➢ Computing & Network Services
Archonnex Guiding Principles
➢ Comprehensive Digital Asset Management Platform.
➢ OAIS Model compliant
➢ Multi-tenancy.
➢ Secure. Data encryption at rest and in-transit.
➢ Service Oriented, Scalable and Modular.
➢ Open Source technologies
➢ Standards based metadata harvesting and data exports.
➢ Cohesive technology choices.
➢ Flexible UI components enabling D2D integration.
Data Ingestion
➢ Ability to upload and organize digital objects.
➢ Bulk file uploads from local/NFS drives.
➢ FTP/SFTP uploads.
➢ Email uploads.
➢ API pull/push mechanisms.
➢ Imports from ZIP bundles.
➢ Custom extracts
Data Curation
➢ Metadata - researcher/producer generated
➢ Data Librarian/curation staff generated
➢ Machine generated for digitally born files
○ Video, Audio, Images, Tabular, Geospatial, etc
➢ Custom metadata extraction tools
○ SPSS, Stata, Apache-Tika, BulkExtractor, others
○ Integration with DDI, JSON, XML, RDF, etc
Data Discovery
➢ Search using Apache Solr
➢ Custom indexes
➢ Interesting ways to discover data
○ Visualization Dashboards
○ Data search across multiple tenants (repositories)
➢ Persistent Digital object identifiers
➢ Customizable resource views
Data Preservation
➢ Provenance information (Version Management)
➢ Creating preservation formats that are durable
➢ Periodic Fixity checks (Raw Data Validation)
➢ Replication (Digital Copies)
➢ Ability to easily locate assets within preservation
area.
Demo
Core Systems
Deposit Manager
Search Manager
Public Content Manager
Tenants
Hosting your repository
Developer 2 Developer Integration
○ Users don't leave your website.
○ Seamlessly embed GUI as JS plugins.
○ Few snippets of JS code gets you going.
○ Your websites could be in Wordpress, Drupal, PHP,
ColdFusion, ASP… doesn't matter.
Setting up a new tenant
➢ Workspace
➢ Search
➢ Resource Views
➢ Data sensitivity and security
➢ Custom metadata extractors
➢ Your website URLs
➢ Identify administrators/management team
*ICPSR will provide a checklist
User Authentication and
Authorization
➢ ORCID
➢ Google
➢ Facebook
➢ Linked In
➢ ICPSR MyData
➢ OAuth2
➢ Duo Enabled
➢ Integration with U-M IAM (Future)
Configuring workspace (d2d)
<div id=”workspace”></div>
<script type="text/javascript">
var jwtToken = ${jwtToken}
$(document).ajaxSend(function(event, jqxhr, settings) {
jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken);
});
var depositConfig = {
bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography',
depositServerUrl : "https://deposit.icpsr.umich.edu/deposit",
tenant : "openicpsr",
actions : function(path, level) {return [{custom code here}];}
};
ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace'))
</script>
Configuring Search (d2d)
<div id=”search”> </div>
<script type="text/javascript">
var archive = "openicpsr";
var searchManagerUrl = "https://search.icpsr.umich.edu/search";
var searchConfig = {....};
var buildSearchResultsHeader = function(){
var headerString = "<div class="row" id="columnHeadings">.....</div>";
return headerString;
};
var buildSearchResult = function(val) {
return (<div>.....</div>);
};
var saveSearchResult = false;
var customActions = [];
ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}),
document.getElementById("search"));
</script>
Dissemination (d2d)
Build a sample HTML, apply your CSS and
themes.
Establish mapping with metadata in the
repository.
We will convert to FreeMarker templates.
“Apache FreeMarker™ is a template engine: a
Java library to generate text output (HTML web
pages, e-mails, configuration files, source code,
etc.) based on templates and changing data”
Administration & Reporting
➢ Access to your repository through an Admin GUI
➢ Usage statistics and reports with charts and
visualization
➢ Google Analytics enabled
Quantitative data tools
➢ We specialize in quantitative data.
➢ Supports easy to use online statistical analysis
using R packages.
➢ Supported formats include CSV, SPSS, SAS,
Stata and R.
Technology Stack
Infrastructure
AWS Cloud
➢ EC2 compute w/ EBS storage - S3 in future
➢ Backups synchronized back to Ann Arbor
➢ S3 storage for StatSnap
➢ EC2 compute w/ autoscaling & ELB for StatSnap
➢ VPC's with VPN to campus for legacy system access
Replication
➢ Tape copy (encrypted) offsite Ann Arbor location
➢ Staging copy on Perry server
➢ ITS MiStorage + Replicated to North Campus
➢ Duracloud synchronizes two copies
○ Amazon S3 and Glacier (Each has redundancies)
➢ Digital Preservation Network (DPN) - future
Pricing Model
➢ Hardware Cost
➢ Data Usage and Storage Cost
➢ Processing Cost
➢ Networking
➢ IT Personnel Cost
○ base scope of work needed
*Savings gained by using standard repository features
Questions??
Thank You
Thomas Murphy
tomurphy@umich.edu
Harsha Ummerpillai
harshau@umich.edu

More Related Content

What's hot

What's hot (20)

Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDB
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus Platform
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 

Similar to Repository As A Service (RaaS) at ICPSR

Similar to Repository As A Service (RaaS) at ICPSR (20)

Factweavers capability document
Factweavers capability documentFactweavers capability document
Factweavers capability document
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
 
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Web Investigation
Web InvestigationWeb Investigation
Web Investigation
 
Threat hunting using notebook technologies
Threat hunting using notebook technologiesThreat hunting using notebook technologies
Threat hunting using notebook technologies
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 

Repository As A Service (RaaS) at ICPSR

  • 1. by ICPSR Repository as a Service (RaaS)
  • 2. Agenda ➢ Introduction ➢ Repository Services ○ Ingestion ○ Curation ○ Discovery ○ Preservation ➢ Demo ➢ Developer 2 Developer Integration ○ Workspace ○ Search ○ Dissemination
  • 3. ICPSR ➢ Research data management organization. ➢ Hosts archives for various national agencies. ➢ Professional curation staff. ➢ Enabling secondary use of research data. ➢ Research and development of new technologies.
  • 4. Our Organization ➢ Primary Research Staff ➢ Professional curation staff ➢ Data Librarians and Archivists ➢ Web and Social Media content designers ➢ Computing & Network Services
  • 5. Archonnex Guiding Principles ➢ Comprehensive Digital Asset Management Platform. ➢ OAIS Model compliant ➢ Multi-tenancy. ➢ Secure. Data encryption at rest and in-transit. ➢ Service Oriented, Scalable and Modular. ➢ Open Source technologies ➢ Standards based metadata harvesting and data exports. ➢ Cohesive technology choices. ➢ Flexible UI components enabling D2D integration.
  • 6.
  • 7. Data Ingestion ➢ Ability to upload and organize digital objects. ➢ Bulk file uploads from local/NFS drives. ➢ FTP/SFTP uploads. ➢ Email uploads. ➢ API pull/push mechanisms. ➢ Imports from ZIP bundles. ➢ Custom extracts
  • 8. Data Curation ➢ Metadata - researcher/producer generated ➢ Data Librarian/curation staff generated ➢ Machine generated for digitally born files ○ Video, Audio, Images, Tabular, Geospatial, etc ➢ Custom metadata extraction tools ○ SPSS, Stata, Apache-Tika, BulkExtractor, others ○ Integration with DDI, JSON, XML, RDF, etc
  • 9. Data Discovery ➢ Search using Apache Solr ➢ Custom indexes ➢ Interesting ways to discover data ○ Visualization Dashboards ○ Data search across multiple tenants (repositories) ➢ Persistent Digital object identifiers ➢ Customizable resource views
  • 10. Data Preservation ➢ Provenance information (Version Management) ➢ Creating preservation formats that are durable ➢ Periodic Fixity checks (Raw Data Validation) ➢ Replication (Digital Copies) ➢ Ability to easily locate assets within preservation area.
  • 11. Demo Core Systems Deposit Manager Search Manager Public Content Manager Tenants
  • 12. Hosting your repository Developer 2 Developer Integration ○ Users don't leave your website. ○ Seamlessly embed GUI as JS plugins. ○ Few snippets of JS code gets you going. ○ Your websites could be in Wordpress, Drupal, PHP, ColdFusion, ASP… doesn't matter.
  • 13. Setting up a new tenant ➢ Workspace ➢ Search ➢ Resource Views ➢ Data sensitivity and security ➢ Custom metadata extractors ➢ Your website URLs ➢ Identify administrators/management team *ICPSR will provide a checklist
  • 14. User Authentication and Authorization ➢ ORCID ➢ Google ➢ Facebook ➢ Linked In ➢ ICPSR MyData ➢ OAuth2 ➢ Duo Enabled ➢ Integration with U-M IAM (Future)
  • 15. Configuring workspace (d2d) <div id=”workspace”></div> <script type="text/javascript"> var jwtToken = ${jwtToken} $(document).ajaxSend(function(event, jqxhr, settings) { jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken); }); var depositConfig = { bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography', depositServerUrl : "https://deposit.icpsr.umich.edu/deposit", tenant : "openicpsr", actions : function(path, level) {return [{custom code here}];} }; ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace')) </script>
  • 16. Configuring Search (d2d) <div id=”search”> </div> <script type="text/javascript"> var archive = "openicpsr"; var searchManagerUrl = "https://search.icpsr.umich.edu/search"; var searchConfig = {....}; var buildSearchResultsHeader = function(){ var headerString = "<div class="row" id="columnHeadings">.....</div>"; return headerString; }; var buildSearchResult = function(val) { return (<div>.....</div>); }; var saveSearchResult = false; var customActions = []; ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}), document.getElementById("search")); </script>
  • 17. Dissemination (d2d) Build a sample HTML, apply your CSS and themes. Establish mapping with metadata in the repository. We will convert to FreeMarker templates. “Apache FreeMarker™ is a template engine: a Java library to generate text output (HTML web pages, e-mails, configuration files, source code, etc.) based on templates and changing data”
  • 18. Administration & Reporting ➢ Access to your repository through an Admin GUI ➢ Usage statistics and reports with charts and visualization ➢ Google Analytics enabled
  • 19. Quantitative data tools ➢ We specialize in quantitative data. ➢ Supports easy to use online statistical analysis using R packages. ➢ Supported formats include CSV, SPSS, SAS, Stata and R.
  • 21. Infrastructure AWS Cloud ➢ EC2 compute w/ EBS storage - S3 in future ➢ Backups synchronized back to Ann Arbor ➢ S3 storage for StatSnap ➢ EC2 compute w/ autoscaling & ELB for StatSnap ➢ VPC's with VPN to campus for legacy system access Replication ➢ Tape copy (encrypted) offsite Ann Arbor location ➢ Staging copy on Perry server ➢ ITS MiStorage + Replicated to North Campus ➢ Duracloud synchronizes two copies ○ Amazon S3 and Glacier (Each has redundancies) ➢ Digital Preservation Network (DPN) - future
  • 22. Pricing Model ➢ Hardware Cost ➢ Data Usage and Storage Cost ➢ Processing Cost ➢ Networking ➢ IT Personnel Cost ○ base scope of work needed *Savings gained by using standard repository features
  • 24. Thank You Thomas Murphy tomurphy@umich.edu Harsha Ummerpillai harshau@umich.edu