SlideShare a Scribd company logo
1 of 24
Download to read offline
by ICPSR
Repository as a Service (RaaS)
Agenda
➢ Introduction
➢ Repository Services
○ Ingestion
○ Curation
○ Discovery
○ Preservation
➢ Demo
➢ Developer 2 Developer Integration
○ Workspace
○ Search
○ Dissemination
ICPSR
➢ Research data management organization.
➢ Hosts archives for various national agencies.
➢ Professional curation staff.
➢ Enabling secondary use of research data.
➢ Research and development of new
technologies.
Our Organization
➢ Primary Research Staff
➢ Professional curation staff
➢ Data Librarians and Archivists
➢ Web and Social Media content designers
➢ Computing & Network Services
Archonnex Guiding Principles
➢ Comprehensive Digital Asset Management Platform.
➢ OAIS Model compliant
➢ Multi-tenancy.
➢ Secure. Data encryption at rest and in-transit.
➢ Service Oriented, Scalable and Modular.
➢ Open Source technologies
➢ Standards based metadata harvesting and data exports.
➢ Cohesive technology choices.
➢ Flexible UI components enabling D2D integration.
Data Ingestion
➢ Ability to upload and organize digital objects.
➢ Bulk file uploads from local/NFS drives.
➢ FTP/SFTP uploads.
➢ Email uploads.
➢ API pull/push mechanisms.
➢ Imports from ZIP bundles.
➢ Custom extracts
Data Curation
➢ Metadata - researcher/producer generated
➢ Data Librarian/curation staff generated
➢ Machine generated for digitally born files
○ Video, Audio, Images, Tabular, Geospatial, etc
➢ Custom metadata extraction tools
○ SPSS, Stata, Apache-Tika, BulkExtractor, others
○ Integration with DDI, JSON, XML, RDF, etc
Data Discovery
➢ Search using Apache Solr
➢ Custom indexes
➢ Interesting ways to discover data
○ Visualization Dashboards
○ Data search across multiple tenants (repositories)
➢ Persistent Digital object identifiers
➢ Customizable resource views
Data Preservation
➢ Provenance information (Version Management)
➢ Creating preservation formats that are durable
➢ Periodic Fixity checks (Raw Data Validation)
➢ Replication (Digital Copies)
➢ Ability to easily locate assets within preservation
area.
Demo
Core Systems
Deposit Manager
Search Manager
Public Content Manager
Tenants
Hosting your repository
Developer 2 Developer Integration
○ Users don't leave your website.
○ Seamlessly embed GUI as JS plugins.
○ Few snippets of JS code gets you going.
○ Your websites could be in Wordpress, Drupal, PHP,
ColdFusion, ASP… doesn't matter.
Setting up a new tenant
➢ Workspace
➢ Search
➢ Resource Views
➢ Data sensitivity and security
➢ Custom metadata extractors
➢ Your website URLs
➢ Identify administrators/management team
*ICPSR will provide a checklist
User Authentication and
Authorization
➢ ORCID
➢ Google
➢ Facebook
➢ Linked In
➢ ICPSR MyData
➢ OAuth2
➢ Duo Enabled
➢ Integration with U-M IAM (Future)
Configuring workspace (d2d)
<div id=”workspace”></div>
<script type="text/javascript">
var jwtToken = ${jwtToken}
$(document).ajaxSend(function(event, jqxhr, settings) {
jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken);
});
var depositConfig = {
bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography',
depositServerUrl : "https://deposit.icpsr.umich.edu/deposit",
tenant : "openicpsr",
actions : function(path, level) {return [{custom code here}];}
};
ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace'))
</script>
Configuring Search (d2d)
<div id=”search”> </div>
<script type="text/javascript">
var archive = "openicpsr";
var searchManagerUrl = "https://search.icpsr.umich.edu/search";
var searchConfig = {....};
var buildSearchResultsHeader = function(){
var headerString = "<div class="row" id="columnHeadings">.....</div>";
return headerString;
};
var buildSearchResult = function(val) {
return (<div>.....</div>);
};
var saveSearchResult = false;
var customActions = [];
ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}),
document.getElementById("search"));
</script>
Dissemination (d2d)
Build a sample HTML, apply your CSS and
themes.
Establish mapping with metadata in the
repository.
We will convert to FreeMarker templates.
“Apache FreeMarker™ is a template engine: a
Java library to generate text output (HTML web
pages, e-mails, configuration files, source code,
etc.) based on templates and changing data”
Administration & Reporting
➢ Access to your repository through an Admin GUI
➢ Usage statistics and reports with charts and
visualization
➢ Google Analytics enabled
Quantitative data tools
➢ We specialize in quantitative data.
➢ Supports easy to use online statistical analysis
using R packages.
➢ Supported formats include CSV, SPSS, SAS,
Stata and R.
Technology Stack
Infrastructure
AWS Cloud
➢ EC2 compute w/ EBS storage - S3 in future
➢ Backups synchronized back to Ann Arbor
➢ S3 storage for StatSnap
➢ EC2 compute w/ autoscaling & ELB for StatSnap
➢ VPC's with VPN to campus for legacy system access
Replication
➢ Tape copy (encrypted) offsite Ann Arbor location
➢ Staging copy on Perry server
➢ ITS MiStorage + Replicated to North Campus
➢ Duracloud synchronizes two copies
○ Amazon S3 and Glacier (Each has redundancies)
➢ Digital Preservation Network (DPN) - future
Pricing Model
➢ Hardware Cost
➢ Data Usage and Storage Cost
➢ Processing Cost
➢ Networking
➢ IT Personnel Cost
○ base scope of work needed
*Savings gained by using standard repository features
Questions??
Thank You
Thomas Murphy
tomurphy@umich.edu
Harsha Ummerpillai
harshau@umich.edu

More Related Content

What's hot

Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDBMSDEVMTL
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Globus
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGSteve Behrendt
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobus
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo dbRohit Bishnoi
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Windows Developer
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationGennady Baranov
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1William Simms
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DBMohit Chhabra
 

What's hot (20)

Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDB
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus Platform
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 

Similar to Repository As A Service (RaaS) at ICPSR

Factweavers capability document
Factweavers capability documentFactweavers capability document
Factweavers capability documentVineeth Mohan
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web ServicesShreya Srivastava
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesStridely Solutions
 
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...Brittany Ingram
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Web Investigation
Web InvestigationWeb Investigation
Web InvestigationData Source
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 

Similar to Repository As A Service (RaaS) at ICPSR (20)

Factweavers capability document
Factweavers capability documentFactweavers capability document
Factweavers capability document
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Azure fb-google Web Services
Azure fb-google Web ServicesAzure fb-google Web Services
Azure fb-google Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Amazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs KubernetesAmazon AWS vs Azure Cloud vs Kubernetes
Amazon AWS vs Azure Cloud vs Kubernetes
 
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
How Docker Accelerates Continuous Development at ironSource: Containers #101 ...
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Web Investigation
Web InvestigationWeb Investigation
Web Investigation
 
Threat hunting using notebook technologies
Threat hunting using notebook technologiesThreat hunting using notebook technologies
Threat hunting using notebook technologies
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Repository As A Service (RaaS) at ICPSR

  • 1. by ICPSR Repository as a Service (RaaS)
  • 2. Agenda ➢ Introduction ➢ Repository Services ○ Ingestion ○ Curation ○ Discovery ○ Preservation ➢ Demo ➢ Developer 2 Developer Integration ○ Workspace ○ Search ○ Dissemination
  • 3. ICPSR ➢ Research data management organization. ➢ Hosts archives for various national agencies. ➢ Professional curation staff. ➢ Enabling secondary use of research data. ➢ Research and development of new technologies.
  • 4. Our Organization ➢ Primary Research Staff ➢ Professional curation staff ➢ Data Librarians and Archivists ➢ Web and Social Media content designers ➢ Computing & Network Services
  • 5. Archonnex Guiding Principles ➢ Comprehensive Digital Asset Management Platform. ➢ OAIS Model compliant ➢ Multi-tenancy. ➢ Secure. Data encryption at rest and in-transit. ➢ Service Oriented, Scalable and Modular. ➢ Open Source technologies ➢ Standards based metadata harvesting and data exports. ➢ Cohesive technology choices. ➢ Flexible UI components enabling D2D integration.
  • 6.
  • 7. Data Ingestion ➢ Ability to upload and organize digital objects. ➢ Bulk file uploads from local/NFS drives. ➢ FTP/SFTP uploads. ➢ Email uploads. ➢ API pull/push mechanisms. ➢ Imports from ZIP bundles. ➢ Custom extracts
  • 8. Data Curation ➢ Metadata - researcher/producer generated ➢ Data Librarian/curation staff generated ➢ Machine generated for digitally born files ○ Video, Audio, Images, Tabular, Geospatial, etc ➢ Custom metadata extraction tools ○ SPSS, Stata, Apache-Tika, BulkExtractor, others ○ Integration with DDI, JSON, XML, RDF, etc
  • 9. Data Discovery ➢ Search using Apache Solr ➢ Custom indexes ➢ Interesting ways to discover data ○ Visualization Dashboards ○ Data search across multiple tenants (repositories) ➢ Persistent Digital object identifiers ➢ Customizable resource views
  • 10. Data Preservation ➢ Provenance information (Version Management) ➢ Creating preservation formats that are durable ➢ Periodic Fixity checks (Raw Data Validation) ➢ Replication (Digital Copies) ➢ Ability to easily locate assets within preservation area.
  • 11. Demo Core Systems Deposit Manager Search Manager Public Content Manager Tenants
  • 12. Hosting your repository Developer 2 Developer Integration ○ Users don't leave your website. ○ Seamlessly embed GUI as JS plugins. ○ Few snippets of JS code gets you going. ○ Your websites could be in Wordpress, Drupal, PHP, ColdFusion, ASP… doesn't matter.
  • 13. Setting up a new tenant ➢ Workspace ➢ Search ➢ Resource Views ➢ Data sensitivity and security ➢ Custom metadata extractors ➢ Your website URLs ➢ Identify administrators/management team *ICPSR will provide a checklist
  • 14. User Authentication and Authorization ➢ ORCID ➢ Google ➢ Facebook ➢ Linked In ➢ ICPSR MyData ➢ OAuth2 ➢ Duo Enabled ➢ Integration with U-M IAM (Future)
  • 15. Configuring workspace (d2d) <div id=”workspace”></div> <script type="text/javascript"> var jwtToken = ${jwtToken} $(document).ajaxSend(function(event, jqxhr, settings) { jqxhr.setRequestHeader("Authorization", "Bearer " + jwtToken); }); var depositConfig = { bibliographyServerUrl: 'https://bibliography.icpsr.umich.edu/bibliography', depositServerUrl : "https://deposit.icpsr.umich.edu/deposit", tenant : "openicpsr", actions : function(path, level) {return [{custom code here}];} }; ReactDOM.render(React.createElement(Workspace, null), document.getElementById('workspace')) </script>
  • 16. Configuring Search (d2d) <div id=”search”> </div> <script type="text/javascript"> var archive = "openicpsr"; var searchManagerUrl = "https://search.icpsr.umich.edu/search"; var searchConfig = {....}; var buildSearchResultsHeader = function(){ var headerString = "<div class="row" id="columnHeadings">.....</div>"; return headerString; }; var buildSearchResult = function(val) { return (<div>.....</div>); }; var saveSearchResult = false; var customActions = []; ReactDOM.render(React.createElement(SearchPage,{tenant:"openicpsr",archive:archive}), document.getElementById("search")); </script>
  • 17. Dissemination (d2d) Build a sample HTML, apply your CSS and themes. Establish mapping with metadata in the repository. We will convert to FreeMarker templates. “Apache FreeMarker™ is a template engine: a Java library to generate text output (HTML web pages, e-mails, configuration files, source code, etc.) based on templates and changing data”
  • 18. Administration & Reporting ➢ Access to your repository through an Admin GUI ➢ Usage statistics and reports with charts and visualization ➢ Google Analytics enabled
  • 19. Quantitative data tools ➢ We specialize in quantitative data. ➢ Supports easy to use online statistical analysis using R packages. ➢ Supported formats include CSV, SPSS, SAS, Stata and R.
  • 21. Infrastructure AWS Cloud ➢ EC2 compute w/ EBS storage - S3 in future ➢ Backups synchronized back to Ann Arbor ➢ S3 storage for StatSnap ➢ EC2 compute w/ autoscaling & ELB for StatSnap ➢ VPC's with VPN to campus for legacy system access Replication ➢ Tape copy (encrypted) offsite Ann Arbor location ➢ Staging copy on Perry server ➢ ITS MiStorage + Replicated to North Campus ➢ Duracloud synchronizes two copies ○ Amazon S3 and Glacier (Each has redundancies) ➢ Digital Preservation Network (DPN) - future
  • 22. Pricing Model ➢ Hardware Cost ➢ Data Usage and Storage Cost ➢ Processing Cost ➢ Networking ➢ IT Personnel Cost ○ base scope of work needed *Savings gained by using standard repository features
  • 24. Thank You Thomas Murphy tomurphy@umich.edu Harsha Ummerpillai harshau@umich.edu