SlideShare a Scribd company logo
HIPAA Compliant
Deployment of Apache
Spark on AWS
Nitin Panjwani, Christian Nuss
Collective Health
#Ent8SAIS
2
What do we do
Collective Health provides a platform for
employers to provide healthcare benefits to
their workforce.
Objective
Data Driven approach to reduce the cost for
the employer and provide a high quality
service to members.
#Ent8SAIS
About Collective Health
About Collective Health
#Ent8SAIS 3
Data Footprints
#Ent8SAIS 4
• Diverse user backgrounds: SQL, Python, Java
• Big Benchmark Data
• Shared Notebooks (Zeppelin) for Big Data
• Advanced ML models
– Risk scores: regression models
– Classifiers to find members on our
proprietary recommendation engine
#Ent8SAIS 5
Analytics Platform Requirements
Our Compliance Landscape
#Ent8SAIS 6
HIPAA and PHI
• Health Insurance Portability and
Accountability Act
Safeguards
• Protected Health Information
(PHI)
SOC 2
• Service Organization Control
Type 2
Ensures
• Privacy
• Security
• Availability
• Integrity
• Confidentiality
• Encryption
• Authentication
• Authorization
• Identity
7#Ent8SAIS
• Access Logging
• Auditability
• Scalability
• Repeatability
Our Requirements
HIPAA and SOC 2
8#Ent8SAIS
Data Science Challenges
• Access to data is not easily available due to regulatory
compliance
• Systems in cloud should have access to data
• We need Cloud version of Jupyter/Zeppelin
• Bringing in new technologies is not easy
– Framework should be compliant
• Data footprint should be auditable
What is Amazon EMR?
Elastic MapReduce
"Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast
amounts of data across dynamically scalable Amazon EC2 instances."
"You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink."
9#Ent8SAIS
What does EMR do?
10#Ent8SAIS
Infrastructure
Management
Software
Installation
Configuration
Management
Monitoring &
Administration
EMR Services
11#Ent8SAIS
Flink Ganglia Hadoop HBase HCatalog
Hive Hue Livy Mahout MXNet
Oozie Phoenix Pig Presto Spark
Sqoop Tez Zeppelin ZooKeeper
EMR Services
12#Ent8SAIS
Flink Ganglia Hadoop HBase HCatalog
Hive Hue Livy Mahout MXNet
Oozie Phoenix Pig Presto Spark
Sqoop Tez Zeppelin ZooKeeper
Customizing EMR
13#Ent8SAIS
General
Settings
Bootstrap
Actions
Configurations
Start-Up
Steps
Customizing EMR
General Settings
14#Ent8SAIS
• Security Groups
• IAM Roles
• Logging
• Encryption
• Monitoring
• Auto-Scaling
Customizing EMR
Bootstrap Actions
15#Ent8SAIS
• Shell scripts in S3
• Runs on all nodes
• Runs prior to EMR
installs
Examples:
• Set up SSO on Zeppelin
• Additional Python or Java
packages
• Custom monitoring
Customizing EMR
Configurations
16#Ent8SAIS
• JSON Format
• Overrides defaults
• Difficult to understand
and verify
Examples:
• Spark Defaults
• Yarn Container Limits
• Add TLS to Zeppelin Web UI
• Verbose Logging
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.html
Customizing EMR
Start-Up Steps
17#Ent8SAIS
• Hadoop JAR
• AWS "script-runner.jar"
• Runs any job once
cluster is ready
Examples:
• Test connectivity
• Set Zeppelin Interpreter settings
• Run any arbitrary script against a
running cluster
PSA: Ephemeral Infrastructure
18
• Most changes require the cluster to recreate
• Treat EMR clusters as immutable
• HDFS data is not durable
• Cluster changes take 5-10 minutes
#Ent8SAIS
19#Ent8SAIS
General
Settings
Bootstrap
Actions
Configurations
Start-Up
Steps
UI or API Script JSON Hadoop JAR
EMR Customizations
Complex and Fragmented
Our Solution
20#Ent8SAIS
Amazon
EMR
What is Terraform?
21#Ent8SAIS
Infrastructure
As Code
State
Reproducible
Infrastructure
22#Ent8SAIS
What is Terraform?
Declarative Configuration Files
resource "aws_s3_bucket" "bucket" {
bucket = "${var.cluster_name}-${terraform.env}"
acl = "private"
}
resource "aws_emr_cluster" "cluster" {
name = "${var.cluster_name}"
release_label = "emr-5.13.0"
applications = ["Spark", "Zeppelin", ...]
…
log_uri =
"s3n://${aws_s3_bucket.bucket.id}/logs"
…
master_instance_type = "c4.large"
core_instance_type = "c4.large"
core_instance_count = 3
}
23#Ent8SAIS
What is Terraform?
API Calls, Codified!
$> terraform apply
aws_s3_bucket.bucket: Creating...
…
bucket: "" => "cnuss-test-us-west-1-default"
…
aws_s3_bucket.bucket: Creation complete after 3s (ID: cnuss-test-us-west-1-default)
aws_emr_cluster.cluster: Creating...
…
aws_emr_cluster.cluster: Creation complete after 4m23s (ID: j-3CUR3J8Y6NYE9)
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
https://github.com/hashicorp/terraform
24#Ent8SAIS
What is Terraform?
Changes, As Code!
resource "aws_s3_bucket" "bucket" {
bucket = "${var.cluster_name}-${terraform.env}"
acl = "private"
versioning {
enabled = true
}
}
resource "aws_emr_cluster" "cluster" {
name = "${var.cluster_name}"
release_label = "emr-5.13.0"
applications = ["Spark", "Zeppelin", ...]
…
log_uri =
"s3n://${aws_s3_bucket.bucket.id}/logs"
…
master_instance_type = "c4.large"
core_instance_type = "c4.large"
core_instance_count = 3
}
25#Ent8SAIS
What is Terraform?
Stateful
$> terraform apply
aws_s3_bucket.bucket: Refreshing state... (ID: cnuss-test-us-west-1-default)
aws_emr_cluster.cluster: Refreshing state... (ID: j-3CUR3J8Y6NYE9)
aws_s3_bucket.bucket: Modifying... (ID: cnuss-test-us-west-1-default)
versioning.0.enabled: "false" => "true"
aws_s3_bucket.bucket: Modifications complete after 2s (ID: cnuss-test-us-west-1-default)
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
https://github.com/hashicorp/terraform
26#Ent8SAIS
What is Terraform?
Also, dangerous…
$> terraform destroy
Terraform will perform the following actions:
- aws_emr_cluster.cluster
- aws_s3_bucket.bucket
aws_emr_cluster.cluster: Destroying... (ID: j-3CUR3J8Y6NYE9)
aws_emr_cluster.cluster: Destruction complete after 1m48s (ID: j-3CUR3J8Y6NYE9)
aws_s3_bucket.bucket: Destroying... (ID: cnuss-test-us-west-1-default)
aws_s3_bucket.bucket: Destruction complete after 0s
Destroy complete! Resources: 1 destroyed.
https://github.com/hashicorp/terraform
Typical Terraform Workflow
27#Ent8SAIS
State File
Check-In Notify Run Change
Read/Write
EMR + Terraform
Our Customizations
28#Ent8SAIS
• TLS Everywhere (HDFS, Zeppelin)
• All Logs to S3 for Log Analysis
• S3 Encryption and Versioning
• Zeppelin + SSO + 2FA
• Custom Monitoring Agents Installed
• Autoscaling
• Terraform Code in GitHub, Jenkins runs Terraform
Next Steps
• Identity and logging of spark-submit commands
• Expose spark-submit to other applications and
engineers
• Jupyter Access to Spark
• Use EMRFS instead of HDFS (waiting on Terraform)
• Additional Monitoring and Autoscaling
29#Ent8SAIS
https://eng.collectivehealth.com
Sample Code / Starter Kit
30#Ent8SAIS
https://twitter.com/hashtag/Ent8SAIS https://linkedin.com/in/christiannuss
Key Takeaways
• Big Data Analytics in Healthcare is very challenging
• EMR "out-of-the-box" requires extensive customizations
to comply with regulations
– Need to add encryption, logging, identity
• Terraform(API for infra) adds value
– Simplifies confusing configuration options
– Repeatability
– All configuration is code
31#Ent8SAIS
Our Results So Far...
• Developed demographic factors
– Variation in average per capita cost due to age and gender
• Geo cost variation factors based on MSA
– Variations in the average per capita cost due to cost and use of
medical practice by metropolitan statistical areas (MSAs)
• Industry factors
– Variations due to industry
• Cost variations by clinical conditions
• Disease prevalence
• Risk Scores (regression model)
Thank You!
Q&A?
33#Ent8SAIS
Christian Nuss
Senior SRE
Collective Health
github.com/cnuss
https://collectivehealth.com/jobsNitin Panjwani
Principal Data Scientist
Collective Health
linkedin.com/in/npanj

More Related Content

What's hot

Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultChickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Jeff Horwitz
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
Mathew Beane
 
Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh
clevernetsystemsgeneva
 
Watch How The Giants Fall: Learning from Bug Bounty Results
Watch How The Giants Fall: Learning from Bug Bounty ResultsWatch How The Giants Fall: Learning from Bug Bounty Results
Watch How The Giants Fall: Learning from Bug Bounty Results
jtmelton
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
All Things Open
 
Syntribos API Security Test Automation
Syntribos API Security Test AutomationSyntribos API Security Test Automation
Syntribos API Security Test Automation
Matthew Valdes
 
HashiCorp's Vault - The Examples
HashiCorp's Vault - The ExamplesHashiCorp's Vault - The Examples
HashiCorp's Vault - The Examples
Michał Czeraszkiewicz
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Lucidworks
 
Fosdem17 honeypot your database server
Fosdem17 honeypot your database serverFosdem17 honeypot your database server
Fosdem17 honeypot your database server
Georgi Kodinov
 
Boston elasticsearch meetup October 2012
Boston elasticsearch meetup October 2012Boston elasticsearch meetup October 2012
Boston elasticsearch meetup October 2012
imotov
 
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
BlueHat Security Conference
 
Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016
Daniel Bohannon
 
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Aptible
 
BGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack SurfaceBGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack Surface
Georgi Kodinov
 
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDays Riga
 
Hug #9 who's keeping your secrets
Hug #9 who's keeping your secretsHug #9 who's keeping your secrets
Hug #9 who's keeping your secrets
Cameron More
 
Hashicorp Vault ppt
Hashicorp Vault pptHashicorp Vault ppt
Hashicorp Vault ppt
Shrey Agarwal
 
Using Vault to decouple MySQL Secrets
Using Vault to decouple MySQL SecretsUsing Vault to decouple MySQL Secrets
Using Vault to decouple MySQL Secrets
Derek Downey
 
Red Team vs Blue Team on AWS - RSA 2018
Red Team vs Blue Team on AWS - RSA 2018Red Team vs Blue Team on AWS - RSA 2018
Red Team vs Blue Team on AWS - RSA 2018
Teri Radichel
 
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL ServerSecure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
Scott Sutherland
 

What's hot (20)

Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultChickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh
 
Watch How The Giants Fall: Learning from Bug Bounty Results
Watch How The Giants Fall: Learning from Bug Bounty ResultsWatch How The Giants Fall: Learning from Bug Bounty Results
Watch How The Giants Fall: Learning from Bug Bounty Results
 
Testing at Stream-Scale
Testing at Stream-ScaleTesting at Stream-Scale
Testing at Stream-Scale
 
Syntribos API Security Test Automation
Syntribos API Security Test AutomationSyntribos API Security Test Automation
Syntribos API Security Test Automation
 
HashiCorp's Vault - The Examples
HashiCorp's Vault - The ExamplesHashiCorp's Vault - The Examples
HashiCorp's Vault - The Examples
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
 
Fosdem17 honeypot your database server
Fosdem17 honeypot your database serverFosdem17 honeypot your database server
Fosdem17 honeypot your database server
 
Boston elasticsearch meetup October 2012
Boston elasticsearch meetup October 2012Boston elasticsearch meetup October 2012
Boston elasticsearch meetup October 2012
 
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
BlueHat v17 || Detecting Compromise on Windows Endpoints with Osquery
 
Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016Invoke-Obfuscation DerbyCon 2016
Invoke-Obfuscation DerbyCon 2016
 
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...Introducing Gridiron Security and Compliance Management Platform and Enclave ...
Introducing Gridiron Security and Compliance Management Platform and Enclave ...
 
BGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack SurfaceBGOUG 2014 Decrease Your MySQL Attack Surface
BGOUG 2014 Decrease Your MySQL Attack Surface
 
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
 
Hug #9 who's keeping your secrets
Hug #9 who's keeping your secretsHug #9 who's keeping your secrets
Hug #9 who's keeping your secrets
 
Hashicorp Vault ppt
Hashicorp Vault pptHashicorp Vault ppt
Hashicorp Vault ppt
 
Using Vault to decouple MySQL Secrets
Using Vault to decouple MySQL SecretsUsing Vault to decouple MySQL Secrets
Using Vault to decouple MySQL Secrets
 
Red Team vs Blue Team on AWS - RSA 2018
Red Team vs Blue Team on AWS - RSA 2018Red Team vs Blue Team on AWS - RSA 2018
Red Team vs Blue Team on AWS - RSA 2018
 
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL ServerSecure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
Secure360 - Beyond xp cmdshell - Owning the Empire through SQL Server
 

Similar to HIPAA Compliant Deployment of Apache Spark on AWS for Healthcare Nitin Panjwani and Christian Nuss

AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
Amazon Web Services
 
Automated Intrusion Detection and Response on AWS
Automated Intrusion Detection and Response on AWSAutomated Intrusion Detection and Response on AWS
Automated Intrusion Detection and Response on AWS
Teri Radichel
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki Tatsumi
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
Docker, Inc.
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
Amazon Web Services
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
Amazon Web Services Korea
 
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
RootedCON
 
Automate or die! Rootedcon 2017
Automate or die! Rootedcon 2017Automate or die! Rootedcon 2017
Automate or die! Rootedcon 2017
Toni de la Fuente
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Amazon Web Services
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob Holcomb
Priyanka Aash
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Amazon Web Services
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
Amazon Web Services
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Amazon Web Services
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
Dimitar Damyanov
 
Hack proof your aws cloud cloudcheckr_040416
Hack proof your aws cloud cloudcheckr_040416Hack proof your aws cloud cloudcheckr_040416
Hack proof your aws cloud cloudcheckr_040416
Jarrett Plante
 
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
Andrea Dottor
 
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOpsMake stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
Weaveworks
 

Similar to HIPAA Compliant Deployment of Apache Spark on AWS for Healthcare Nitin Panjwani and Christian Nuss (20)

AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
 
Automated Intrusion Detection and Response on AWS
Automated Intrusion Detection and Response on AWSAutomated Intrusion Detection and Response on AWS
Automated Intrusion Detection and Response on AWS
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
Toni de la Fuente - Automate or die! How to survive to an attack in the Cloud...
 
Automate or die! Rootedcon 2017
Automate or die! Rootedcon 2017Automate or die! Rootedcon 2017
Automate or die! Rootedcon 2017
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob Holcomb
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Cassandra
CassandraCassandra
Cassandra
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
Hack proof your aws cloud cloudcheckr_040416
Hack proof your aws cloud cloudcheckr_040416Hack proof your aws cloud cloudcheckr_040416
Hack proof your aws cloud cloudcheckr_040416
 
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
Come sta la nostra applicazione? Un viaggio alla scoperta degli Health Check ...
 
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOpsMake stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
Make stateful apps in Kubernetes a no brainer with Pure Storage and GitOps
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 

HIPAA Compliant Deployment of Apache Spark on AWS for Healthcare Nitin Panjwani and Christian Nuss

  • 1. HIPAA Compliant Deployment of Apache Spark on AWS Nitin Panjwani, Christian Nuss Collective Health #Ent8SAIS
  • 2. 2 What do we do Collective Health provides a platform for employers to provide healthcare benefits to their workforce. Objective Data Driven approach to reduce the cost for the employer and provide a high quality service to members. #Ent8SAIS About Collective Health
  • 5. • Diverse user backgrounds: SQL, Python, Java • Big Benchmark Data • Shared Notebooks (Zeppelin) for Big Data • Advanced ML models – Risk scores: regression models – Classifiers to find members on our proprietary recommendation engine #Ent8SAIS 5 Analytics Platform Requirements
  • 6. Our Compliance Landscape #Ent8SAIS 6 HIPAA and PHI • Health Insurance Portability and Accountability Act Safeguards • Protected Health Information (PHI) SOC 2 • Service Organization Control Type 2 Ensures • Privacy • Security • Availability • Integrity • Confidentiality
  • 7. • Encryption • Authentication • Authorization • Identity 7#Ent8SAIS • Access Logging • Auditability • Scalability • Repeatability Our Requirements HIPAA and SOC 2
  • 8. 8#Ent8SAIS Data Science Challenges • Access to data is not easily available due to regulatory compliance • Systems in cloud should have access to data • We need Cloud version of Jupyter/Zeppelin • Bringing in new technologies is not easy – Framework should be compliant • Data footprint should be auditable
  • 9. What is Amazon EMR? Elastic MapReduce "Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances." "You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink." 9#Ent8SAIS
  • 10. What does EMR do? 10#Ent8SAIS Infrastructure Management Software Installation Configuration Management Monitoring & Administration
  • 11. EMR Services 11#Ent8SAIS Flink Ganglia Hadoop HBase HCatalog Hive Hue Livy Mahout MXNet Oozie Phoenix Pig Presto Spark Sqoop Tez Zeppelin ZooKeeper
  • 12. EMR Services 12#Ent8SAIS Flink Ganglia Hadoop HBase HCatalog Hive Hue Livy Mahout MXNet Oozie Phoenix Pig Presto Spark Sqoop Tez Zeppelin ZooKeeper
  • 14. Customizing EMR General Settings 14#Ent8SAIS • Security Groups • IAM Roles • Logging • Encryption • Monitoring • Auto-Scaling
  • 15. Customizing EMR Bootstrap Actions 15#Ent8SAIS • Shell scripts in S3 • Runs on all nodes • Runs prior to EMR installs Examples: • Set up SSO on Zeppelin • Additional Python or Java packages • Custom monitoring
  • 16. Customizing EMR Configurations 16#Ent8SAIS • JSON Format • Overrides defaults • Difficult to understand and verify Examples: • Spark Defaults • Yarn Container Limits • Add TLS to Zeppelin Web UI • Verbose Logging https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.html
  • 17. Customizing EMR Start-Up Steps 17#Ent8SAIS • Hadoop JAR • AWS "script-runner.jar" • Runs any job once cluster is ready Examples: • Test connectivity • Set Zeppelin Interpreter settings • Run any arbitrary script against a running cluster
  • 18. PSA: Ephemeral Infrastructure 18 • Most changes require the cluster to recreate • Treat EMR clusters as immutable • HDFS data is not durable • Cluster changes take 5-10 minutes #Ent8SAIS
  • 19. 19#Ent8SAIS General Settings Bootstrap Actions Configurations Start-Up Steps UI or API Script JSON Hadoop JAR EMR Customizations Complex and Fragmented
  • 21. What is Terraform? 21#Ent8SAIS Infrastructure As Code State Reproducible Infrastructure
  • 22. 22#Ent8SAIS What is Terraform? Declarative Configuration Files resource "aws_s3_bucket" "bucket" { bucket = "${var.cluster_name}-${terraform.env}" acl = "private" } resource "aws_emr_cluster" "cluster" { name = "${var.cluster_name}" release_label = "emr-5.13.0" applications = ["Spark", "Zeppelin", ...] … log_uri = "s3n://${aws_s3_bucket.bucket.id}/logs" … master_instance_type = "c4.large" core_instance_type = "c4.large" core_instance_count = 3 }
  • 23. 23#Ent8SAIS What is Terraform? API Calls, Codified! $> terraform apply aws_s3_bucket.bucket: Creating... … bucket: "" => "cnuss-test-us-west-1-default" … aws_s3_bucket.bucket: Creation complete after 3s (ID: cnuss-test-us-west-1-default) aws_emr_cluster.cluster: Creating... … aws_emr_cluster.cluster: Creation complete after 4m23s (ID: j-3CUR3J8Y6NYE9) Apply complete! Resources: 2 added, 0 changed, 0 destroyed. https://github.com/hashicorp/terraform
  • 24. 24#Ent8SAIS What is Terraform? Changes, As Code! resource "aws_s3_bucket" "bucket" { bucket = "${var.cluster_name}-${terraform.env}" acl = "private" versioning { enabled = true } } resource "aws_emr_cluster" "cluster" { name = "${var.cluster_name}" release_label = "emr-5.13.0" applications = ["Spark", "Zeppelin", ...] … log_uri = "s3n://${aws_s3_bucket.bucket.id}/logs" … master_instance_type = "c4.large" core_instance_type = "c4.large" core_instance_count = 3 }
  • 25. 25#Ent8SAIS What is Terraform? Stateful $> terraform apply aws_s3_bucket.bucket: Refreshing state... (ID: cnuss-test-us-west-1-default) aws_emr_cluster.cluster: Refreshing state... (ID: j-3CUR3J8Y6NYE9) aws_s3_bucket.bucket: Modifying... (ID: cnuss-test-us-west-1-default) versioning.0.enabled: "false" => "true" aws_s3_bucket.bucket: Modifications complete after 2s (ID: cnuss-test-us-west-1-default) Apply complete! Resources: 0 added, 1 changed, 0 destroyed. https://github.com/hashicorp/terraform
  • 26. 26#Ent8SAIS What is Terraform? Also, dangerous… $> terraform destroy Terraform will perform the following actions: - aws_emr_cluster.cluster - aws_s3_bucket.bucket aws_emr_cluster.cluster: Destroying... (ID: j-3CUR3J8Y6NYE9) aws_emr_cluster.cluster: Destruction complete after 1m48s (ID: j-3CUR3J8Y6NYE9) aws_s3_bucket.bucket: Destroying... (ID: cnuss-test-us-west-1-default) aws_s3_bucket.bucket: Destruction complete after 0s Destroy complete! Resources: 1 destroyed. https://github.com/hashicorp/terraform
  • 27. Typical Terraform Workflow 27#Ent8SAIS State File Check-In Notify Run Change Read/Write
  • 28. EMR + Terraform Our Customizations 28#Ent8SAIS • TLS Everywhere (HDFS, Zeppelin) • All Logs to S3 for Log Analysis • S3 Encryption and Versioning • Zeppelin + SSO + 2FA • Custom Monitoring Agents Installed • Autoscaling • Terraform Code in GitHub, Jenkins runs Terraform
  • 29. Next Steps • Identity and logging of spark-submit commands • Expose spark-submit to other applications and engineers • Jupyter Access to Spark • Use EMRFS instead of HDFS (waiting on Terraform) • Additional Monitoring and Autoscaling 29#Ent8SAIS
  • 30. https://eng.collectivehealth.com Sample Code / Starter Kit 30#Ent8SAIS https://twitter.com/hashtag/Ent8SAIS https://linkedin.com/in/christiannuss
  • 31. Key Takeaways • Big Data Analytics in Healthcare is very challenging • EMR "out-of-the-box" requires extensive customizations to comply with regulations – Need to add encryption, logging, identity • Terraform(API for infra) adds value – Simplifies confusing configuration options – Repeatability – All configuration is code 31#Ent8SAIS
  • 32. Our Results So Far... • Developed demographic factors – Variation in average per capita cost due to age and gender • Geo cost variation factors based on MSA – Variations in the average per capita cost due to cost and use of medical practice by metropolitan statistical areas (MSAs) • Industry factors – Variations due to industry • Cost variations by clinical conditions • Disease prevalence • Risk Scores (regression model)
  • 33. Thank You! Q&A? 33#Ent8SAIS Christian Nuss Senior SRE Collective Health github.com/cnuss https://collectivehealth.com/jobsNitin Panjwani Principal Data Scientist Collective Health linkedin.com/in/npanj