SlideShare a Scribd company logo
A Journey To Modernization
Shawn Benjamin & Prabha Rajendran
Problems Faced
q Informatica ETL pipeline was brittle
q Lengthy Informatica ETL development
cycle
q Lengthy load time workflows for ingestion
of data
q Lack of ability for real time/ near real-time
data
q Lack of data science platform
2
Legacy Architecture
Data Scientists
Statisticians
Business Analysts
Extracted
Data
R / Python Code Business
Insight
3
Datawarehouse
Source Systems
4
C3 CON
eCIMS
C3 LANS
C4
NFTS
ELIS
NASS
CPMS
Pay.gov
AR-11
RAPS
MFAS
VSS
ATS
PCTS
SNAP
RNACS
Benefits
Files
Payment
Verification
Validation
Cust Svc
Scheduler
AdminFraudFOIA
Benefits
Mart
Scheduler
Mart
Payment
Mart
Validation
Mart
APSS
C3 Con
C4
CIS
ELIS2
MFAS
NFTS
PCTS
RNACS
VSS
C3 LAN
AAO x2
CSC x2
MSC x2
NSC x2
TSC x2
VSC x2
CHAMPS
ECHO
iCLAIMS
IDCMS
QUEUE
NPWR
CAMINO VIBE
SODA
Data Marts
SCCLAIMS
FDNS-DS
SMART Subject Areas
SAS LibrarieseCISCOR
Direct Connects
Data
Marts
Direct Connects
Active ODSes Decom. ODSes
Treasury
CIR
CIS
x7
x1
x2
x1
x1
x5
x2
x5
x2
x1
x2
x6 x1
x1
x1
x1
x2
x1
x2
x1
x1
LEGEND
ODSes
VIS x2
CPMS x1
SRMT x1
NFTS x1x1
x1
x1
FACCON x1
ePULSE x1
BI Tools
Data Marts
Users
4
2016SNAPSHOT
JANUARY
Data
Sources
SMART
Subject Areas
36
2
eCISCOR
ODS
FutureData MartDirect Connect
66
2,354
ETL28 Processes
Implemented
Databricks private
cloud
VPC (26 nodes) in the
AWS
Connected the Databricks
cluster to the Oracle database
Created all relevant
DB tables in HIVE
metadata pointing to
Oracle database
Copied relevant tables from Oracle
database to S3 using Scala code
Data is stored in Apache Parquet
columnar format. For context, the 120
million row 83 column can be dumped
to S3 in just 10 minutes.
Identified appropriate
partition scheme
large tables were partitioned to
optimize Spark query performance
Created multiple notebooks
Perform data analysis and visualize the
results, e.g. created histogram of case
life cycle duration
Successful Proof of Concept
5
Current Databricks Implementation
6
Statisticians
Business Analysts
Business
Insight
S3 Data
Lake
Data Scientists
LakeHouse
7
• 75 Data
Sources
• Xx Data
Interfaces
• 7 Data
Marts
• 4 BI Tools
• 6,086
Tableau
Dashboards
• 118 SMART
Subject
Areas
• 56 SAS
Libraries
• 6,233 Users
• 75 Data
Sources
• 35 Application
Interfaces
• 7 Data Marts
• 4 BI Tools
• 6,086
Tableau
Dashboards
• 118 SMART
Subject Areas
• 56 SAS
Libraries
• 6,233 Users
Databricks Accomplishments
IMPLEMENTATION
OF DELTA LAKE
EASY INTEGRATION
WITH OBIEE ,SAS
AND TABLEAU WITH
NATIVE
CONNECTORS
INTEGRATION WITH
GITHUB FOR
CONTINUOUS
INTEGRATION &
DEPLOYMENT
AUTOMATING
ACCOUNT
PROVISIONING
MACHINE LEARNING
(ML FLOW)
INTEGRATION
8
Change Data Capture using Delta Lake
Databricks Delta –Success Factors
v Faster Ingestion of CDC changes
v Resiliency
v Improved Data Quality , Reporting
Availability and runtime
performance
v Schema evolution - adding
additional columns without
rewriting the entire table
Databricks Delta-Lessons Learned
v Storage requirements increased
v Vacuum and Optimization is
mandatory to improve the
performance
9
Unified Data Analytical Platform -Tableau
10
Unified Data Analytical Platform –OBIEE/SAS
11
Data Science Experiments using ML
12
ML Graphs processes after running the models
Prediction Model Samples
Text and Log Mining
0
0.5
1
NegativeSentiment Positive Sentiment
Sentiment Analysis
13
Time Series Models and H2O Integration
Integrated H20 with Databricks and built a model predicting the count of ‘No show’ on N400 using
the traditional Time series forecasting to predict inefficiencies in normal day-to-day planning and operations
14
Enabling Security & Governance
15
Access
Control (ACL)
Credentials
Passthrough
Secrets
Management
v Control users access to data using the
Databricks view-based access control
model (Table and schema level ACLs)
v Control users access to clusters that are
not enabled for table access control
v Enforced data object privileges at
onboarding phase
v Used Databricks secrets manager to store
credentials and reference in notebooks
Databricks Management API Usage
16
Cluster/Jobs management àCreate, delete,
manage clusters and get execution status of daily
scheduled jobs which helped automated
monitoring.
Library /Secret managementà Easy upload of
any third-party libraries and manage encrypted
scopes/credentials to connect to source and
target endpoints.
Integration and Deploymentsà API with Git
and Jenkins for continuous integration and
continuous deployment
Enabled MLFlow Tracking API for our Data
Science experiments
API
Integrated Databricks Management API with Jenkins and other scripting tools to
automate all our administration and management tasks.
Lessons learned through this Journey
Training plan Cloud based
experience
Subject Matter
Expertise
Automation
17
Success Strategy
Success Criteria Benefit
Performance
ü Auto-scalability leveraging on-demand and spot instances
ü Efficient processing of larger datasets comparable to RDMS systems
ü Scalable read/write performance on S3
Support for a variety of statistical
programming languages
ü Data Science Platform ( R, Python, Scala and SQL)
ü Supports MLIB : Machine Learning & Deep Learning
Integration with existing tools
ü Allows connections to industry standard technologies via ODBC/JDBC
connection and inbuilt connectors.
Easily integrate new data sources
ü Supports seamless integration with data streaming technologies like
Kafka/Kinesis using Spark Streaming. This supports both structured and
unstructured
ü Leverages S3 extensively
Secure
ü Supports integration with multiple Single-Sign-On platforms
ü Supports native encryption-decryption features (AES-256 and KMS)
ü Supports Access Control Layer (ACL)
ü Implemented in USCIS Private cloud
18
Questions!
19

More Related Content

What's hot

Building a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdfBuilding a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdf
TapOffice
 
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksCI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
Databricks
 
Fiber optic network design
Fiber optic network designFiber optic network design
Fiber optic network design
Chandrakala Reddy
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Splunk IT Service Intelligence
Splunk IT Service IntelligenceSplunk IT Service Intelligence
Splunk IT Service Intelligence
Georg Knon
 
Understanding Session Border Controllers
Understanding Session Border ControllersUnderstanding Session Border Controllers
Understanding Session Border Controllers
stefansayer
 
Multi-Cloud Strategy for Unrestricted Possibilities
Multi-Cloud Strategy for Unrestricted PossibilitiesMulti-Cloud Strategy for Unrestricted Possibilities
Multi-Cloud Strategy for Unrestricted Possibilities
Harsh V Sehgal
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
Auto Dialer vs Power Dialer: What's the Difference
Auto Dialer vs Power Dialer: What's the DifferenceAuto Dialer vs Power Dialer: What's the Difference
Auto Dialer vs Power Dialer: What's the Difference
Ashish Kumar
 
Microsoft Information Protection.pptx
Microsoft Information Protection.pptxMicrosoft Information Protection.pptx
Microsoft Information Protection.pptx
ChrisaldyChandra
 
Service Support Process PPT
Service Support Process PPTService Support Process PPT
Service Support Process PPT
Pawneshwar Datt Rai
 
Matomo External Dashboards & Data Visualisation.pdf
Matomo External Dashboards & Data Visualisation.pdfMatomo External Dashboards & Data Visualisation.pdf
Matomo External Dashboards & Data Visualisation.pdf
Michael Weber
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Lync over Aruba Wi-Fi Validated Reference Design Guide
Lync over Aruba Wi-Fi Validated Reference Design GuideLync over Aruba Wi-Fi Validated Reference Design Guide
Lync over Aruba Wi-Fi Validated Reference Design Guide
Aruba, a Hewlett Packard Enterprise company
 
Cloud Privacy & Security compliance
Cloud Privacy & Security complianceCloud Privacy & Security compliance
Cloud Privacy & Security compliance
Bryan Starbuck
 
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malikB/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
Ijaz Haider Malik TOGAF, Harvard MM,Siebel, PRINCE2
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Evolusi jaringan akses di era konvergensi
Evolusi jaringan akses di era konvergensiEvolusi jaringan akses di era konvergensi
Evolusi jaringan akses di era konvergensi
ivandi julatha
 

What's hot (20)

Building a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdfBuilding a Security Operations Center (SOC).pdf
Building a Security Operations Center (SOC).pdf
 
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksCI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
 
Fiber optic network design
Fiber optic network designFiber optic network design
Fiber optic network design
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Splunk IT Service Intelligence
Splunk IT Service IntelligenceSplunk IT Service Intelligence
Splunk IT Service Intelligence
 
Understanding Session Border Controllers
Understanding Session Border ControllersUnderstanding Session Border Controllers
Understanding Session Border Controllers
 
Multi-Cloud Strategy for Unrestricted Possibilities
Multi-Cloud Strategy for Unrestricted PossibilitiesMulti-Cloud Strategy for Unrestricted Possibilities
Multi-Cloud Strategy for Unrestricted Possibilities
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Auto Dialer vs Power Dialer: What's the Difference
Auto Dialer vs Power Dialer: What's the DifferenceAuto Dialer vs Power Dialer: What's the Difference
Auto Dialer vs Power Dialer: What's the Difference
 
Microsoft Information Protection.pptx
Microsoft Information Protection.pptxMicrosoft Information Protection.pptx
Microsoft Information Protection.pptx
 
Service Support Process PPT
Service Support Process PPTService Support Process PPT
Service Support Process PPT
 
Matomo External Dashboards & Data Visualisation.pdf
Matomo External Dashboards & Data Visualisation.pdfMatomo External Dashboards & Data Visualisation.pdf
Matomo External Dashboards & Data Visualisation.pdf
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Lync over Aruba Wi-Fi Validated Reference Design Guide
Lync over Aruba Wi-Fi Validated Reference Design GuideLync over Aruba Wi-Fi Validated Reference Design Guide
Lync over Aruba Wi-Fi Validated Reference Design Guide
 
Cloud Privacy & Security compliance
Cloud Privacy & Security complianceCloud Privacy & Security compliance
Cloud Privacy & Security compliance
 
BackTrack Linux-101 Eğitimi
BackTrack Linux-101 EğitimiBackTrack Linux-101 Eğitimi
BackTrack Linux-101 Eğitimi
 
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malikB/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
B/oss BOSS Bss oss b.oss telecom ppt by ijaz haider malik
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Evolusi jaringan akses di era konvergensi
Evolusi jaringan akses di era konvergensiEvolusi jaringan akses di era konvergensi
Evolusi jaringan akses di era konvergensi
 

Similar to Lessons Learned from Modernizing USCIS Data Analytics Platform

Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
Juan Fabian
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 
Sql server 2019 new features
Sql server 2019 new featuresSql server 2019 new features
Sql server 2019 new features
George Walters
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
VMware Tanzu
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
confluent
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
Bob Ward
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
sparkflows
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
Vasu S
 

Similar to Lessons Learned from Modernizing USCIS Data Analytics Platform (20)

Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Sql server 2019 new features
Sql server 2019 new featuresSql server 2019 new features
Sql server 2019 new features
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 

Recently uploaded (20)

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 

Lessons Learned from Modernizing USCIS Data Analytics Platform

  • 1. A Journey To Modernization Shawn Benjamin & Prabha Rajendran
  • 2. Problems Faced q Informatica ETL pipeline was brittle q Lengthy Informatica ETL development cycle q Lengthy load time workflows for ingestion of data q Lack of ability for real time/ near real-time data q Lack of data science platform 2
  • 3. Legacy Architecture Data Scientists Statisticians Business Analysts Extracted Data R / Python Code Business Insight 3 Datawarehouse Source Systems
  • 4. 4 C3 CON eCIMS C3 LANS C4 NFTS ELIS NASS CPMS Pay.gov AR-11 RAPS MFAS VSS ATS PCTS SNAP RNACS Benefits Files Payment Verification Validation Cust Svc Scheduler AdminFraudFOIA Benefits Mart Scheduler Mart Payment Mart Validation Mart APSS C3 Con C4 CIS ELIS2 MFAS NFTS PCTS RNACS VSS C3 LAN AAO x2 CSC x2 MSC x2 NSC x2 TSC x2 VSC x2 CHAMPS ECHO iCLAIMS IDCMS QUEUE NPWR CAMINO VIBE SODA Data Marts SCCLAIMS FDNS-DS SMART Subject Areas SAS LibrarieseCISCOR Direct Connects Data Marts Direct Connects Active ODSes Decom. ODSes Treasury CIR CIS x7 x1 x2 x1 x1 x5 x2 x5 x2 x1 x2 x6 x1 x1 x1 x1 x2 x1 x2 x1 x1 LEGEND ODSes VIS x2 CPMS x1 SRMT x1 NFTS x1x1 x1 x1 FACCON x1 ePULSE x1 BI Tools Data Marts Users 4 2016SNAPSHOT JANUARY Data Sources SMART Subject Areas 36 2 eCISCOR ODS FutureData MartDirect Connect 66 2,354 ETL28 Processes
  • 5. Implemented Databricks private cloud VPC (26 nodes) in the AWS Connected the Databricks cluster to the Oracle database Created all relevant DB tables in HIVE metadata pointing to Oracle database Copied relevant tables from Oracle database to S3 using Scala code Data is stored in Apache Parquet columnar format. For context, the 120 million row 83 column can be dumped to S3 in just 10 minutes. Identified appropriate partition scheme large tables were partitioned to optimize Spark query performance Created multiple notebooks Perform data analysis and visualize the results, e.g. created histogram of case life cycle duration Successful Proof of Concept 5
  • 6. Current Databricks Implementation 6 Statisticians Business Analysts Business Insight S3 Data Lake Data Scientists LakeHouse
  • 7. 7 • 75 Data Sources • Xx Data Interfaces • 7 Data Marts • 4 BI Tools • 6,086 Tableau Dashboards • 118 SMART Subject Areas • 56 SAS Libraries • 6,233 Users • 75 Data Sources • 35 Application Interfaces • 7 Data Marts • 4 BI Tools • 6,086 Tableau Dashboards • 118 SMART Subject Areas • 56 SAS Libraries • 6,233 Users
  • 8. Databricks Accomplishments IMPLEMENTATION OF DELTA LAKE EASY INTEGRATION WITH OBIEE ,SAS AND TABLEAU WITH NATIVE CONNECTORS INTEGRATION WITH GITHUB FOR CONTINUOUS INTEGRATION & DEPLOYMENT AUTOMATING ACCOUNT PROVISIONING MACHINE LEARNING (ML FLOW) INTEGRATION 8
  • 9. Change Data Capture using Delta Lake Databricks Delta –Success Factors v Faster Ingestion of CDC changes v Resiliency v Improved Data Quality , Reporting Availability and runtime performance v Schema evolution - adding additional columns without rewriting the entire table Databricks Delta-Lessons Learned v Storage requirements increased v Vacuum and Optimization is mandatory to improve the performance 9
  • 10. Unified Data Analytical Platform -Tableau 10
  • 11. Unified Data Analytical Platform –OBIEE/SAS 11
  • 12. Data Science Experiments using ML 12 ML Graphs processes after running the models Prediction Model Samples
  • 13. Text and Log Mining 0 0.5 1 NegativeSentiment Positive Sentiment Sentiment Analysis 13
  • 14. Time Series Models and H2O Integration Integrated H20 with Databricks and built a model predicting the count of ‘No show’ on N400 using the traditional Time series forecasting to predict inefficiencies in normal day-to-day planning and operations 14
  • 15. Enabling Security & Governance 15 Access Control (ACL) Credentials Passthrough Secrets Management v Control users access to data using the Databricks view-based access control model (Table and schema level ACLs) v Control users access to clusters that are not enabled for table access control v Enforced data object privileges at onboarding phase v Used Databricks secrets manager to store credentials and reference in notebooks
  • 16. Databricks Management API Usage 16 Cluster/Jobs management àCreate, delete, manage clusters and get execution status of daily scheduled jobs which helped automated monitoring. Library /Secret managementà Easy upload of any third-party libraries and manage encrypted scopes/credentials to connect to source and target endpoints. Integration and Deploymentsà API with Git and Jenkins for continuous integration and continuous deployment Enabled MLFlow Tracking API for our Data Science experiments API Integrated Databricks Management API with Jenkins and other scripting tools to automate all our administration and management tasks.
  • 17. Lessons learned through this Journey Training plan Cloud based experience Subject Matter Expertise Automation 17
  • 18. Success Strategy Success Criteria Benefit Performance ü Auto-scalability leveraging on-demand and spot instances ü Efficient processing of larger datasets comparable to RDMS systems ü Scalable read/write performance on S3 Support for a variety of statistical programming languages ü Data Science Platform ( R, Python, Scala and SQL) ü Supports MLIB : Machine Learning & Deep Learning Integration with existing tools ü Allows connections to industry standard technologies via ODBC/JDBC connection and inbuilt connectors. Easily integrate new data sources ü Supports seamless integration with data streaming technologies like Kafka/Kinesis using Spark Streaming. This supports both structured and unstructured ü Leverages S3 extensively Secure ü Supports integration with multiple Single-Sign-On platforms ü Supports native encryption-decryption features (AES-256 and KMS) ü Supports Access Control Layer (ACL) ü Implemented in USCIS Private cloud 18