SlideShare a Scribd company logo
by ICPSR
Repository as a Service (RaaS)
Agenda
➢ Introduction
➢ Repository Services
○ Ingestion
○ Curation
○ Discovery
○ Preservation
➢ Demo
➢ Developer 2 Developer Integration
○ Workspace
○ Search
○ Dissemination
ICPSR
➢ Research data management organization.
➢ Hosts archives for various national agencies.
➢ Professional curation staff.
➢ Enabling secondary use of research data.
➢ Research and development of new
technologies.
Our Organization
➢ Primary Research Staff
➢ Professional curation staff
➢ Data Librarians and Archivists
➢ Web and Social Media content designers
➢ Computing & Network Services
Archonnex Guiding Principles
➢ Comprehensive Digital Asset Management Platform.
➢ OAIS Model compliant
➢ Multi-tenancy.
➢ Secure. Data encryption at rest and in-transit.
➢ Service Oriented, Scalable and Modular.
➢ Open Source technologies
➢ Standards based metadata harvesting and data exports.
➢ Cohesive technology choices.
➢ Flexible UI components enabling D2D integration.
Data Ingestion
➢ Ability to upload and organize digital objects.
➢ Bulk file uploads from local/NFS drives.
➢ FTP/SFTP uploads.
➢ Email uploads.
➢ API pull/push mechanisms.
➢ Imports from ZIP bundles.
➢ Custom extracts
Data Curation
➢ Metadata - researcher/producer generated
➢ Data Librarian/curation staff generated
➢ Machine generated for digitally born files
○ Video, Audio, Images, Tabular, Geospatial, etc
➢ Custom metadata extraction tools
○ SPSS, Stata, Apache-Tika, BulkExtractor, others
○ Integration with DDI, JSON, XML, RDF, etc
Data Discovery
➢ Search using Apache Solr
➢ Custom indexes
➢ Interesting ways to discover data
○ Visualization Dashboards
○ Data search across multiple tenants (repositories)
➢ Persistent Digital object identifiers
➢ Customizable resource views
Data Preservation
➢ Provenance information (Version Management)
➢ Creating preservation formats that are durable
➢ Periodic Fixity checks (Raw Data Validation)
➢ Replication (Digital Copies)
➢ Ability to easily locate assets within preservation
area.
Demo
Core Systems
Deposit Manager
Search Manager
Public Content Manager
Tenants
Hosting your repository
Developer 2 Developer Integration
○ Users don't leave your website.
○ Seamlessly embed GUI as JS plugins.
○ Few snippets of JS code gets you going.
○ Your websites could be in Wordpress, Drupal, PHP,
ColdFusion, ASP… doesn't matter.
Setting up a new tenant
➢ Workspace
➢ Search
➢ Resource Views
➢ Data sensitivity and security
➢ Custom metadata extractors
➢ Your website URLs
➢ Identify administrators/management team
*ICPSR will provide a checklist
User Authentication and
Authorization
➢ ORCID
➢ Google
➢ Facebook
➢ Linked In
➢ ICPSR MyData
➢ OAuth2
➢ Duo Enabled
➢ Integration with U-M IAM (Future)
Configuring workspace (d2d)
<div id=”workspace”></div>
<script type="text/javascript">
var jwtToken = ${jwtToken}
$(document).ajaxSend(function(event, jqxhr, settings) {
jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken);
});
var depositConfig = {
bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography',
depositServerUrl : "https://deposit.icpsr.umich.edu/deposit",
tenant : "openicpsr",
actions : function(path, level) {return [{custom code here}];}
};
ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace'))
</script>
Configuring Search (d2d)
<div id=”search”> </div>
<script type="text/javascript">
var archive = "openicpsr";
var searchManagerUrl = "https://search.icpsr.umich.edu/search";
var searchConfig = {....};
var buildSearchResultsHeader = function(){
var headerString = "<div class="row" id="columnHeadings">.....</div>";
return headerString;
};
var buildSearchResult = function(val) {
return (<div>.....</div>);
};
var saveSearchResult = false;
var customActions = [];
ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}),
document.getElementById("search"));
</script>
Dissemination (d2d)
Build a sample HTML, apply your CSS and
themes.
Establish mapping with metadata in the
repository.
We will convert to FreeMarker templates.
“Apache FreeMarker™ is a template engine: a
Java library to generate text output (HTML web
pages, e-mails, configuration files, source code,
etc.) based on templates and changing data”
Administration & Reporting
➢ Access to your repository through an Admin GUI
➢ Usage statistics and reports with charts and
visualization
➢ Google Analytics enabled
Quantitative data tools
➢ We specialize in quantitative data.
➢ Supports easy to use online statistical analysis
using R packages.
➢ Supported formats include CSV, SPSS, SAS,
Stata and R.
Technology Stack
Infrastructure
AWS Cloud
➢ EC2 compute w/ EBS storage - S3 in future
➢ Backups synchronized back to Ann Arbor
➢ S3 storage for StatSnap
➢ EC2 compute w/ autoscaling & ELB for StatSnap
➢ VPC's with VPN to campus for legacy system access
Replication
➢ Tape copy (encrypted) offsite Ann Arbor location
➢ Staging copy on Perry server
➢ ITS MiStorage + Replicated to North Campus
➢ Duracloud synchronizes two copies
○ Amazon S3 and Glacier (Each has redundancies)
➢ Digital Preservation Network (DPN) - future
Pricing Model
➢ Hardware Cost
➢ Data Usage and Storage Cost
➢ Processing Cost
➢ Networking
➢ IT Personnel Cost
○ base scope of work needed
*Savings gained by using standard repository features
Questions??
Thank You
Thomas Murphy
tomurphy@umich.edu
Harsha Ummerpillai
harshau@umich.edu

More Related Content

What's hot

What's hot (20)

Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDB
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus Platform
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 

Similar to Repository As A Service (RaaS) at ICPSR

Similar to Repository As A Service (RaaS) at ICPSR (20)

Factweavers capability document
Factweavers capability documentFactweavers capability document
Factweavers capability document
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
 
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Web Investigation
Web InvestigationWeb Investigation
Web Investigation
 
Threat hunting using notebook technologies
Threat hunting using notebook technologiesThreat hunting using notebook technologies
Threat hunting using notebook technologies
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 

Recently uploaded

Recently uploaded (20)

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Repository As A Service (RaaS) at ICPSR

  • 1. by ICPSR Repository as a Service (RaaS)
  • 2. Agenda ➢ Introduction ➢ Repository Services ○ Ingestion ○ Curation ○ Discovery ○ Preservation ➢ Demo ➢ Developer 2 Developer Integration ○ Workspace ○ Search ○ Dissemination
  • 3. ICPSR ➢ Research data management organization. ➢ Hosts archives for various national agencies. ➢ Professional curation staff. ➢ Enabling secondary use of research data. ➢ Research and development of new technologies.
  • 4. Our Organization ➢ Primary Research Staff ➢ Professional curation staff ➢ Data Librarians and Archivists ➢ Web and Social Media content designers ➢ Computing & Network Services
  • 5. Archonnex Guiding Principles ➢ Comprehensive Digital Asset Management Platform. ➢ OAIS Model compliant ➢ Multi-tenancy. ➢ Secure. Data encryption at rest and in-transit. ➢ Service Oriented, Scalable and Modular. ➢ Open Source technologies ➢ Standards based metadata harvesting and data exports. ➢ Cohesive technology choices. ➢ Flexible UI components enabling D2D integration.
  • 6.
  • 7. Data Ingestion ➢ Ability to upload and organize digital objects. ➢ Bulk file uploads from local/NFS drives. ➢ FTP/SFTP uploads. ➢ Email uploads. ➢ API pull/push mechanisms. ➢ Imports from ZIP bundles. ➢ Custom extracts
  • 8. Data Curation ➢ Metadata - researcher/producer generated ➢ Data Librarian/curation staff generated ➢ Machine generated for digitally born files ○ Video, Audio, Images, Tabular, Geospatial, etc ➢ Custom metadata extraction tools ○ SPSS, Stata, Apache-Tika, BulkExtractor, others ○ Integration with DDI, JSON, XML, RDF, etc
  • 9. Data Discovery ➢ Search using Apache Solr ➢ Custom indexes ➢ Interesting ways to discover data ○ Visualization Dashboards ○ Data search across multiple tenants (repositories) ➢ Persistent Digital object identifiers ➢ Customizable resource views
  • 10. Data Preservation ➢ Provenance information (Version Management) ➢ Creating preservation formats that are durable ➢ Periodic Fixity checks (Raw Data Validation) ➢ Replication (Digital Copies) ➢ Ability to easily locate assets within preservation area.
  • 11. Demo Core Systems Deposit Manager Search Manager Public Content Manager Tenants
  • 12. Hosting your repository Developer 2 Developer Integration ○ Users don't leave your website. ○ Seamlessly embed GUI as JS plugins. ○ Few snippets of JS code gets you going. ○ Your websites could be in Wordpress, Drupal, PHP, ColdFusion, ASP… doesn't matter.
  • 13. Setting up a new tenant ➢ Workspace ➢ Search ➢ Resource Views ➢ Data sensitivity and security ➢ Custom metadata extractors ➢ Your website URLs ➢ Identify administrators/management team *ICPSR will provide a checklist
  • 14. User Authentication and Authorization ➢ ORCID ➢ Google ➢ Facebook ➢ Linked In ➢ ICPSR MyData ➢ OAuth2 ➢ Duo Enabled ➢ Integration with U-M IAM (Future)
  • 15. Configuring workspace (d2d) <div id=”workspace”></div> <script type="text/javascript"> var jwtToken = ${jwtToken} $(document).ajaxSend(function(event, jqxhr, settings) { jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken); }); var depositConfig = { bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography', depositServerUrl : "https://deposit.icpsr.umich.edu/deposit", tenant : "openicpsr", actions : function(path, level) {return [{custom code here}];} }; ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace')) </script>
  • 16. Configuring Search (d2d) <div id=”search”> </div> <script type="text/javascript"> var archive = "openicpsr"; var searchManagerUrl = "https://search.icpsr.umich.edu/search"; var searchConfig = {....}; var buildSearchResultsHeader = function(){ var headerString = "<div class="row" id="columnHeadings">.....</div>"; return headerString; }; var buildSearchResult = function(val) { return (<div>.....</div>); }; var saveSearchResult = false; var customActions = []; ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}), document.getElementById("search")); </script>
  • 17. Dissemination (d2d) Build a sample HTML, apply your CSS and themes. Establish mapping with metadata in the repository. We will convert to FreeMarker templates. “Apache FreeMarker™ is a template engine: a Java library to generate text output (HTML web pages, e-mails, configuration files, source code, etc.) based on templates and changing data”
  • 18. Administration & Reporting ➢ Access to your repository through an Admin GUI ➢ Usage statistics and reports with charts and visualization ➢ Google Analytics enabled
  • 19. Quantitative data tools ➢ We specialize in quantitative data. ➢ Supports easy to use online statistical analysis using R packages. ➢ Supported formats include CSV, SPSS, SAS, Stata and R.
  • 21. Infrastructure AWS Cloud ➢ EC2 compute w/ EBS storage - S3 in future ➢ Backups synchronized back to Ann Arbor ➢ S3 storage for StatSnap ➢ EC2 compute w/ autoscaling & ELB for StatSnap ➢ VPC's with VPN to campus for legacy system access Replication ➢ Tape copy (encrypted) offsite Ann Arbor location ➢ Staging copy on Perry server ➢ ITS MiStorage + Replicated to North Campus ➢ Duracloud synchronizes two copies ○ Amazon S3 and Glacier (Each has redundancies) ➢ Digital Preservation Network (DPN) - future
  • 22. Pricing Model ➢ Hardware Cost ➢ Data Usage and Storage Cost ➢ Processing Cost ➢ Networking ➢ IT Personnel Cost ○ base scope of work needed *Savings gained by using standard repository features
  • 24. Thank You Thomas Murphy tomurphy@umich.edu Harsha Ummerpillai harshau@umich.edu