SlideShare a Scribd company logo
Welcome to
Technical Data Infrastructure Frameworks
Archonnex @ ICPSR
Data Science Management For All
Harsha Ummerpillai, Architect / Software Lead
Tom Murphy, Director of Computing and Network Services
About ICPSR
Mission
ICPSR advances and expands social and behavioral research, acting as a
global leader in data stewardship and providing rich data resources and
responsive educational opportunities.
About
An international consortium of more than 700 academic institutions and
research organizations, ICPSR provides leadership and training in data
access, curation, and methods of analysis for the social science research
community.
ICPSR maintains a data archive of more than 500,000 files of research in
the social sciences. It hosts 16 specialized collections of data in
education, aging, criminal justice, substance abuse, terrorism and other
areas of social research.
Introduction
Archonnex is a Digital Asset Management Systems (DAMS)
architecture defined to transition ICPSR to a newer
technology stack meeting core and emerging business needs
of the organization. It aims to build a digital technology
platform that leverages ICPSR expertise and open source
technologies that are proven and well supported by Open
Source communities.
Guiding Principles
 Comprehensive Digital Asset Management Platform.
 Open Archival Information Systems (OAIS model) compliant.
 Multi-tenancy. ICPSR needs to support multiple archives and agencies.
 Secure. Privacy and Security are primary concerns for social research data.
 Service Oriented and Modular.
 Scalable; Ability to handle large datasets and peak activity spikes.
 Open Source technologies with good community engagement;
 Enable standards based metadata harvesting and data exports.
 Cohesive technology choices.
 Flexible UI components that can be re-used and enables faster development.
Message based Integration
 Apache ActiveMQ is the messaging server.
 Apache Camel provides a simplified implementation of most common
Enterprise Integration patterns.
 Figure: High level view of Camel's architecture (from Camel in Action).
Infrastrucuture
Repository
Engine
Virus Scanner
Message
Queuing
System
RDBMS
Cloud Storage
Deposit
Manager
Single Sign On
Open API
Data Analysis
Engine
Web / Use
Analytics
Search
Engine
Metadata
Manager
Geo
Tagger
Admin UI Search
Web UI
Components
NAS Storage
Researcher
Interface
Agent
Batch Jobs
Subscription
Manager
Payment
Portal
Reports
Alerts
Image
Processor
Workflow
BPM
Infrastrucuture
Fedora
Virus Scanner
ActiveMQ
Oracle
AWS S3
Deposit
Manager
SSO
Open API
Elastic Search
Kibana
Solr
Fuseki
Geo
Tagger
Admin
Search
Web UI
Components
isilon
Sead
Agent
Batch Jobs
Subscription
Manager
Payment
Portal
Reports
Alerts
Image
Processor
Preservation
Manager
Activiti
Multi-tenancy
All service components support multi tenants.
Supports tenant specific configuration & preferences.
Web aspects of service components are embeddable
within respective tenant Web apps
Workspace Manager and Search Manager are two
examples of UI Plugins that are embeddable.
Single Sign On & ID Management
Central Authentication
and Identification System.
Supports ORCID and social IDs Google,
Facebook & LinkedIn.
Authorization management will support role
based access controls
Deposit Manager
Supports (SIP) data ingest & storage,
coordinate virus scanning,
statistical file validation, variable extraction,
image processing &
metadata extraction.
Easy to use Workspace & file management
Ability to publish at granular level.
Embeddable UI Plugin supporting
tenant specific configuration
Supported protocols for ingest, HTTP, SFTP and Email
Integrates with BPM/Workflow Engine
Preservation Manager
Implements transcoding of data specific to MIME types
Generates Archival Information Package (AIP) &
Dissemination Information Package (DIP).
Replicates AIP to storages for long term preservation.
Performs Fixity checks periodically.
Search Manager
Full featured text search using
Apache Solr.
Embeddable UI Plugin supporting
tenant specific configuration
Coverage includes but not limited to Keywords, Metadata,
Text and Geospatial fields.
Exploring GeoBlacklight for search and dissemination
of geographical data.
Anti-Virus Scanner
Anti virus scanning as a service.
Supports ClamAV and Sophos.
Capable of expanding to
multiple nodes allowing horizontal scalability
to support scanning large data sets.
SPSS Processor
Performs additional processing of IBM SPSS files.
Analyze and report potential missing
variables and inconsistencies.
Extract variables and store
for online analysis tools.
Open API
RESTful services to
enable metadata harvesting and exports
using industry wide standards and formats.
Example: RDF, JSON-LD, DDI XML…
Workflow Engine
Central workflow management providing unified
action list for users.
Ability to model business process flows and
Integrate with technical components.
Activiti is the chosen technology.
Reports & Analytics
Captures all system & user activities within the components
enabling effective provenance data collection.
Central consolidated storage for all the logs.
Ability to discover, visualize and report on data collected.
ElasticSearch & Kibana
Google Analytics (Client & Server side)
Content Specific Processors
Add on modules that can derive and extract
custom attributes. These modules can be
invoked using messages and added to the processing
pipelines.
For example image files can produce
thumbnails for easier display on GUI.
Image Processor module performs this function.
Geospatial data published to an
Apache Geoserver.
Geo Tagger
Add on module that can derive and process
geographical information from inputs
like street address, IP address, shapes on a map
or markers on a map.
Will generate geo tag information for display
and support search capabilities.
Research Data Integration
Ability to integrate with external data producers.
For example SEAD, OSF…
WEB UI
HTML
Javascript
CSS
Twitter Bootstrap
Jquery and Plugins
Facebook ReactJS
Advanced REST Client
Protocol
Https
Https/REST/JSON/JSON-LD
Https/SOAP/XML *
Middleware
J2EE Application servers (Jetty,
Apache Tomcat)
Spring MVC
Groovy/Grails
Ruby/Rails
Desktop UI
Java Swings
Java Web Start
Batch Automation UI
UC4 *
Control M *
Protocol
Https
Https/REST/JSON
Java Network Launch Protocol (JNLP)
Middleware
J2EE Application servers (Jetty,
Apache Tomcat)
Spring MVC
Spring Remoting & Web Services
Protocol
SSH
Java RMI
Scripting/Orchestration
Shell Programming
Perl
Ruby
Groovy
Storage/Databases
Network File Storage
Oracle/MySQL/PostgreSQL
Amazon Cloud
Duraspace Cloud
ESB/Message Brokers
Apache Active MQ
Rabbit MQ
Apache Camel
Source Code Management
Git
CVS *
Productivity Tools
Drupal/Confluence/Google Sites
JIRA
Bamboo
Fisheye
Crucible
Stash
Microsoft Office
Operating Systems
Servers (Linux)
Desktop (Linux/Windows/Mac)
Build Tools
Ant
Maven
openICPSR scheduled to be
released by end of Jul/2016 on
new Archonnex platform.
Questions?
Thank you
Thomas Murphy
tomurphy@umich.edu
CNS Director
Harsha Ummerpillai
harshau@umich.edu
Software Architect

More Related Content

What's hot

Graph-based Product Lifecycle Management
Graph-based Product Lifecycle ManagementGraph-based Product Lifecycle Management
Graph-based Product Lifecycle Management
Linkurious
 
A LASSO for Linked Data
A LASSO for Linked DataA LASSO for Linked Data
A LASSO for Linked Data
thosch
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information services
CloudTechnologies
 
Enterprise Information Integration
Enterprise Information IntegrationEnterprise Information Integration
Enterprise Information Integration
Sharbani Bhattacharya
 
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
 COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV... COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
Nexgen Technology
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
Josef Adersberger
 
Metadata
MetadataMetadata
Metadata
saurabh kaushik
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
George Thomas
 
Data Federation/EII Uses And Abuses
Data Federation/EII Uses And AbusesData Federation/EII Uses And Abuses
Data Federation/EII Uses And Abuses
mark madsen
 
OSFair2017 Workshop | EGI applications database
OSFair2017 Workshop | EGI applications databaseOSFair2017 Workshop | EGI applications database
OSFair2017 Workshop | EGI applications database
Open Science Fair
 
Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310
Mark Tabladillo
 
Enterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMetEnterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMet
Paul Walk
 
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
1crore projects
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
Mostafa
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Applications of semantic web
Applications of semantic webApplications of semantic web
Applications of semantic web
Suresh Kumar Mukhiya
 

What's hot (18)

Graph-based Product Lifecycle Management
Graph-based Product Lifecycle ManagementGraph-based Product Lifecycle Management
Graph-based Product Lifecycle Management
 
A LASSO for Linked Data
A LASSO for Linked DataA LASSO for Linked Data
A LASSO for Linked Data
 
Combining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information servicesCombining efficiency, fidelity, and flexibility in resource information services
Combining efficiency, fidelity, and flexibility in resource information services
 
Enterprise Information Integration
Enterprise Information IntegrationEnterprise Information Integration
Enterprise Information Integration
 
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
 COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV... COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Metadata
MetadataMetadata
Metadata
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
Data Federation/EII Uses And Abuses
Data Federation/EII Uses And AbusesData Federation/EII Uses And Abuses
Data Federation/EII Uses And Abuses
 
OSFair2017 Workshop | EGI applications database
OSFair2017 Workshop | EGI applications databaseOSFair2017 Workshop | EGI applications database
OSFair2017 Workshop | EGI applications database
 
Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310
 
Enterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMetEnterprise Information Integration at LondonMet
Enterprise Information Integration at LondonMet
 
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Applications of semantic web
Applications of semantic webApplications of semantic web
Applications of semantic web
 

Similar to Archonnex at ICPSR

ColbyBackesPortfolio_HighRes
ColbyBackesPortfolio_HighResColbyBackesPortfolio_HighRes
ColbyBackesPortfolio_HighResColby Backes
 
Information management
Information managementInformation management
Information management
David Champeau
 
Cloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service OptionCloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service Option
Denodo
 
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningAIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
Jorge Cardoso
 
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Rahul Neel Mani
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
Liran Zelkha
 
Gurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptxGurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptx
yakotalordea
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Sparity1
 
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
apidays
 
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICSBig Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Matt Stubbs
 
Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On Analytics
Connexica
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Cloud is the new normal - Red Hat Forum Bangalore 2015
Cloud is the new normal - Red Hat Forum Bangalore 2015Cloud is the new normal - Red Hat Forum Bangalore 2015
Cloud is the new normal - Red Hat Forum Bangalore 2015
Red Hat India Pvt. Ltd.
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Adhish Pendharkar
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
Amazon Web Services
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Getting value from IoT, Integration and Data Analytics
 
Stream analytics
Stream analyticsStream analytics
Stream analytics
rebeccatho
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
Amy W. Tang
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
ASIS&T
 

Similar to Archonnex at ICPSR (20)

ColbyBackesPortfolio_HighRes
ColbyBackesPortfolio_HighResColbyBackesPortfolio_HighRes
ColbyBackesPortfolio_HighRes
 
Information management
Information managementInformation management
Information management
 
Cloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service OptionCloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service Option
 
SAIP
SAIPSAIP
SAIP
 
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningAIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
 
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Gurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptxGurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptx
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
 
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
APIsecure 2023 - API orchestration: to build resilient applications, Cherish ...
 
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICSBig Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
Big Data LDN 2018: THE THIRD REVOLUTION IN ANALYTICS
 
Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On Analytics
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Cloud is the new normal - Red Hat Forum Bangalore 2015
Cloud is the new normal - Red Hat Forum Bangalore 2015Cloud is the new normal - Red Hat Forum Bangalore 2015
Cloud is the new normal - Red Hat Forum Bangalore 2015
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Stream analytics
Stream analyticsStream analytics
Stream analytics
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 

Archonnex at ICPSR

  • 1. Welcome to Technical Data Infrastructure Frameworks Archonnex @ ICPSR Data Science Management For All Harsha Ummerpillai, Architect / Software Lead Tom Murphy, Director of Computing and Network Services
  • 2. About ICPSR Mission ICPSR advances and expands social and behavioral research, acting as a global leader in data stewardship and providing rich data resources and responsive educational opportunities. About An international consortium of more than 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community. ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism and other areas of social research.
  • 3. Introduction Archonnex is a Digital Asset Management Systems (DAMS) architecture defined to transition ICPSR to a newer technology stack meeting core and emerging business needs of the organization. It aims to build a digital technology platform that leverages ICPSR expertise and open source technologies that are proven and well supported by Open Source communities.
  • 4. Guiding Principles  Comprehensive Digital Asset Management Platform.  Open Archival Information Systems (OAIS model) compliant.  Multi-tenancy. ICPSR needs to support multiple archives and agencies.  Secure. Privacy and Security are primary concerns for social research data.  Service Oriented and Modular.  Scalable; Ability to handle large datasets and peak activity spikes.  Open Source technologies with good community engagement;  Enable standards based metadata harvesting and data exports.  Cohesive technology choices.  Flexible UI components that can be re-used and enables faster development.
  • 5. Message based Integration  Apache ActiveMQ is the messaging server.  Apache Camel provides a simplified implementation of most common Enterprise Integration patterns.  Figure: High level view of Camel's architecture (from Camel in Action).
  • 6.
  • 7. Infrastrucuture Repository Engine Virus Scanner Message Queuing System RDBMS Cloud Storage Deposit Manager Single Sign On Open API Data Analysis Engine Web / Use Analytics Search Engine Metadata Manager Geo Tagger Admin UI Search Web UI Components NAS Storage Researcher Interface Agent Batch Jobs Subscription Manager Payment Portal Reports Alerts Image Processor Workflow BPM
  • 8. Infrastrucuture Fedora Virus Scanner ActiveMQ Oracle AWS S3 Deposit Manager SSO Open API Elastic Search Kibana Solr Fuseki Geo Tagger Admin Search Web UI Components isilon Sead Agent Batch Jobs Subscription Manager Payment Portal Reports Alerts Image Processor Preservation Manager Activiti
  • 9. Multi-tenancy All service components support multi tenants. Supports tenant specific configuration & preferences. Web aspects of service components are embeddable within respective tenant Web apps Workspace Manager and Search Manager are two examples of UI Plugins that are embeddable.
  • 10. Single Sign On & ID Management Central Authentication and Identification System. Supports ORCID and social IDs Google, Facebook & LinkedIn. Authorization management will support role based access controls
  • 11. Deposit Manager Supports (SIP) data ingest & storage, coordinate virus scanning, statistical file validation, variable extraction, image processing & metadata extraction. Easy to use Workspace & file management Ability to publish at granular level. Embeddable UI Plugin supporting tenant specific configuration Supported protocols for ingest, HTTP, SFTP and Email Integrates with BPM/Workflow Engine
  • 12. Preservation Manager Implements transcoding of data specific to MIME types Generates Archival Information Package (AIP) & Dissemination Information Package (DIP). Replicates AIP to storages for long term preservation. Performs Fixity checks periodically.
  • 13. Search Manager Full featured text search using Apache Solr. Embeddable UI Plugin supporting tenant specific configuration Coverage includes but not limited to Keywords, Metadata, Text and Geospatial fields. Exploring GeoBlacklight for search and dissemination of geographical data.
  • 14. Anti-Virus Scanner Anti virus scanning as a service. Supports ClamAV and Sophos. Capable of expanding to multiple nodes allowing horizontal scalability to support scanning large data sets.
  • 15. SPSS Processor Performs additional processing of IBM SPSS files. Analyze and report potential missing variables and inconsistencies. Extract variables and store for online analysis tools.
  • 16. Open API RESTful services to enable metadata harvesting and exports using industry wide standards and formats. Example: RDF, JSON-LD, DDI XML…
  • 17. Workflow Engine Central workflow management providing unified action list for users. Ability to model business process flows and Integrate with technical components. Activiti is the chosen technology.
  • 18. Reports & Analytics Captures all system & user activities within the components enabling effective provenance data collection. Central consolidated storage for all the logs. Ability to discover, visualize and report on data collected. ElasticSearch & Kibana Google Analytics (Client & Server side)
  • 19. Content Specific Processors Add on modules that can derive and extract custom attributes. These modules can be invoked using messages and added to the processing pipelines. For example image files can produce thumbnails for easier display on GUI. Image Processor module performs this function. Geospatial data published to an Apache Geoserver.
  • 20. Geo Tagger Add on module that can derive and process geographical information from inputs like street address, IP address, shapes on a map or markers on a map. Will generate geo tag information for display and support search capabilities.
  • 21. Research Data Integration Ability to integrate with external data producers. For example SEAD, OSF…
  • 22. WEB UI HTML Javascript CSS Twitter Bootstrap Jquery and Plugins Facebook ReactJS Advanced REST Client Protocol Https Https/REST/JSON/JSON-LD Https/SOAP/XML * Middleware J2EE Application servers (Jetty, Apache Tomcat) Spring MVC Groovy/Grails Ruby/Rails Desktop UI Java Swings Java Web Start Batch Automation UI UC4 * Control M * Protocol Https Https/REST/JSON Java Network Launch Protocol (JNLP) Middleware J2EE Application servers (Jetty, Apache Tomcat) Spring MVC Spring Remoting & Web Services Protocol SSH Java RMI Scripting/Orchestration Shell Programming Perl Ruby Groovy Storage/Databases Network File Storage Oracle/MySQL/PostgreSQL Amazon Cloud Duraspace Cloud ESB/Message Brokers Apache Active MQ Rabbit MQ Apache Camel Source Code Management Git CVS * Productivity Tools Drupal/Confluence/Google Sites JIRA Bamboo Fisheye Crucible Stash Microsoft Office Operating Systems Servers (Linux) Desktop (Linux/Windows/Mac) Build Tools Ant Maven
  • 23.
  • 24. openICPSR scheduled to be released by end of Jul/2016 on new Archonnex platform.
  • 26. Thank you Thomas Murphy tomurphy@umich.edu CNS Director Harsha Ummerpillai harshau@umich.edu Software Architect