SlideShare a Scribd company logo
Just Enough DevOps for Data Scientists
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
About Anya
Sr. Member of Technical Staff (SRE)
Salesforce Production Engineering
Salesforce Einstein Platform
Co-organizer SF Big Analytics
Spark Tuning
• Cheat-sheet
• Talks
Previously at Alpine Data, SRI
PhD Mayo Clinic, BS Johns Hopkins
@anyabida1
What I am going to talk about
What is DevOps
Salesforce Einstein Scales
Our goal
Top 10 tips
What’s next?
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
Data Science
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
Data Science
• Awesome library
on SparkML
• Spark clusters
• Microservices
• Cluster, Containers
Fastest Growing Top 5
Enterprise Software Company
$5.4B
FY15
$4.1B
FY14
$3.1B
FY13
$6.7B
FY16
$2.3B
FY12
$1.7B
FY11
$2.56BFY18Q2 revenue
$8.4BFY17 revenue
2009 • 2010 • 2011
2012 • 2013 • 2014
2015 • 2016 • 2017
September
2016
2011 • 2012 • 2013
2014 • 2015 • 2016 • 2017
The world’s most
innovative companies
“Innovator of
the Decade”
Our Goal
Time
Number of Predictions
Infrastructure Costs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
Simple Dashboard with KPIs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Jos Boumans,
Salesforce DMP
slides
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Collect metrics from every
machine.
Troubleshoot with all the
metrics at your disposal
Tip 2: Blue Green Deployments
https://docs.mobingi.com/official/guide/bg-deploy
Blue Machine
(old)
Green Machine
(new)
Users
Tip 3: Assume people make mistakes
Technical debt
• Every manual change
• Duplicate metrics
Scale down resources
• Terminate unused machines
• Janitor Monkey
• Understand the cost per job
• Jobs should not accumulate files on disk
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
discount_code string,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
Tip 5: Configuration management
Network Connectivity
• 20 parameters
User Access
• 50 parameters
Deploy cluster (eg Mesos)
• 20 non-default parameters
Deploy a microservice
• 50 parameters
Schedule a job
• 3 parameters
SUM X 3 regions
X 20 metrics
Approx.6000
Templates for Automation
Service discovery
Creating dashboards
• Prod, non-prod, …
Log queries
Cost analysis
Tip 6: Pick a naming convention
<service>.
<environment>.
<region>.
<hostname>.
<metric>
Tip 7: Permissions
Every user, service, & job should have specific, auditable permissions.
Cluster Manager
Scheduler
IAM
IAM Roles
• User has an IAM Role
• Job has an IAM Role
• IAM Roles determine read /
write access to data
IAM
Out
Logs
IAM
In
Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)
Tip 8: Understand resource allocation
Node Memory
Container Memory
8Gb
Node Memory
Container
Memory
8Gb
Node
Memory
Node
Memory
Node
Memory
4Gb
used
8Gb
total
Can my 8Gb container launch on this cluster?
8Gb
Tip 9: Monitor multiple viewpoints
https://light.co/camera
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Getting started tips:
1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
Getting started tips: 1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
10. Infrastructure as Code
Did we just automate ourselves
out of our jobs?
Nope. Now we have time to take on new projects and grow…
More info:
Jos Boumans,
Salesforce DMP
slides
SRE How Google Runs
Production Systems book
James Ward,
Engineering & Open Source
Ambassador at Salesforce
High Performance
spark book
More info:
Real Time ML Pipelines in Multi-Tenant Environments
Director of Engineering Karl Skucha & Lead Engineer Yan Yang
Introduction to Machine Learning
Engineering & Open Source Ambassador James Ward
Fantastic ML apps and how to build them
Principal Engineer, Matthew Tovbin
Fireworks - lighting up the sky with millions of Sparks
Director of Engineering Thomas Gerber
Functional Linear Algebra in Scala
Engineer & Professor Vlad Patryshev
Panel: Functional Programming for Machine Learning
Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy
Machine Learning Engineers Till Bergmann & Chris Rupley
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
Questions?
Extra, unused slides
JustEnoughDevOpsForDataScientists

More Related Content

What's hot

Does Your Stuff Scale?
Does Your Stuff Scale?Does Your Stuff Scale?
Does Your Stuff Scale?
stevenh0lmes
 
Embrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your OrganizationEmbrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your Organization
Paul Osman
 
Spark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsSpark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System Administrators
Anya Bida
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOS
Sam Rijs
 
Navigating the Incubator at the Apache Software Foundation
Navigating the Incubator at the Apache Software FoundationNavigating the Incubator at the Apache Software Foundation
Navigating the Incubator at the Apache Software FoundationBrett Porter
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Railsjduff
 
Building REST APIs using gRPC and Go
Building REST APIs using gRPC and GoBuilding REST APIs using gRPC and Go
Building REST APIs using gRPC and Go
Alvaro Viebrantz
 
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using MetricsScrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
Atlassian
 
Evoloution of Ideas
Evoloution of IdeasEvoloution of Ideas
Evoloution of Ideas
Wooga
 
Devoxx 2014 Monitoring
Devoxx 2014 Monitoring Devoxx 2014 Monitoring
Devoxx 2014 Monitoring
Claude Falguiere
 
Web Operations101
Web Operations101Web Operations101
Web Operations101
Nell Shamrell-Harrington
 
Agile long term planning כנס הארגון האג'ילי
Agile long term planning כנס הארגון האג'ילי Agile long term planning כנס הארגון האג'ילי
Agile long term planning כנס הארגון האג'ילי
Chai Forsher
 
Rust, Redis, and Protobuf - Oh My!
Rust, Redis, and Protobuf - Oh My!Rust, Redis, and Protobuf - Oh My!
Rust, Redis, and Protobuf - Oh My!
Nell Shamrell-Harrington
 

What's hot (13)

Does Your Stuff Scale?
Does Your Stuff Scale?Does Your Stuff Scale?
Does Your Stuff Scale?
 
Embrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your OrganizationEmbrace Chaos - Introducing Chaos Engineering to your Organization
Embrace Chaos - Introducing Chaos Engineering to your Organization
 
Spark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsSpark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System Administrators
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOS
 
Navigating the Incubator at the Apache Software Foundation
Navigating the Incubator at the Apache Software FoundationNavigating the Incubator at the Apache Software Foundation
Navigating the Incubator at the Apache Software Foundation
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
 
Building REST APIs using gRPC and Go
Building REST APIs using gRPC and GoBuilding REST APIs using gRPC and Go
Building REST APIs using gRPC and Go
 
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using MetricsScrum Control or Kanban Agility? You Can Have both, Using Metrics
Scrum Control or Kanban Agility? You Can Have both, Using Metrics
 
Evoloution of Ideas
Evoloution of IdeasEvoloution of Ideas
Evoloution of Ideas
 
Devoxx 2014 Monitoring
Devoxx 2014 Monitoring Devoxx 2014 Monitoring
Devoxx 2014 Monitoring
 
Web Operations101
Web Operations101Web Operations101
Web Operations101
 
Agile long term planning כנס הארגון האג'ילי
Agile long term planning כנס הארגון האג'ילי Agile long term planning כנס הארגון האג'ילי
Agile long term planning כנס הארגון האג'ילי
 
Rust, Redis, and Protobuf - Oh My!
Rust, Redis, and Protobuf - Oh My!Rust, Redis, and Protobuf - Oh My!
Rust, Redis, and Protobuf - Oh My!
 

Similar to JustEnoughDevOpsForDataScientists

SAP & Open Souce - Give & Take
SAP & Open Souce - Give & TakeSAP & Open Souce - Give & Take
SAP & Open Souce - Give & Take
Jan Penninkhof
 
DevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsDevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable Products
Ben Hall
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias
 
Developer Night - Opticon18
Developer Night - Opticon18Developer Night - Opticon18
Developer Night - Opticon18
Optimizely
 
Value streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniryValue streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniry
Chris McEniry
 
Infrastructure is development
Infrastructure is developmentInfrastructure is development
Infrastructure is development
stahnma
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
Brian Ritchie
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOps
PROIDEA
 
Stapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San FranciscoStapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San Francisco
Christian Heilmann
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
Extending SAP SuccessFactors in the Cloud and how not to do it
Extending SAP SuccessFactors in the Cloud and how not to do itExtending SAP SuccessFactors in the Cloud and how not to do it
Extending SAP SuccessFactors in the Cloud and how not to do it
Chris Paine
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyond
imoneytech
 
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet
 
Achieving Continuous Delivery with Puppet
Achieving Continuous Delivery with PuppetAchieving Continuous Delivery with Puppet
Achieving Continuous Delivery with Puppet
Devoteam Revolve
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark Summit
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
Hal Rottenberg
 

Similar to JustEnoughDevOpsForDataScientists (20)

SAP & Open Souce - Give & Take
SAP & Open Souce - Give & TakeSAP & Open Souce - Give & Take
SAP & Open Souce - Give & Take
 
DevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsDevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable Products
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
 
Developer Night - Opticon18
Developer Night - Opticon18Developer Night - Opticon18
Developer Night - Opticon18
 
Value streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniryValue streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniry
 
Infrastructure is development
Infrastructure is developmentInfrastructure is development
Infrastructure is development
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOps
 
Stapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San FranciscoStapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San Francisco
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Extending SAP SuccessFactors in the Cloud and how not to do it
Extending SAP SuccessFactors in the Cloud and how not to do itExtending SAP SuccessFactors in the Cloud and how not to do it
Extending SAP SuccessFactors in the Cloud and how not to do it
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyond
 
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
 
Achieving Continuous Delivery with Puppet
Achieving Continuous Delivery with PuppetAchieving Continuous Delivery with Puppet
Achieving Continuous Delivery with Puppet
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

JustEnoughDevOpsForDataScientists

  • 1.
  • 2. Just Enough DevOps for Data Scientists abida@salesforce.com @ anyabida1 Anya Bida, SRE at Salesforce
  • 3. About Anya Sr. Member of Technical Staff (SRE) Salesforce Production Engineering Salesforce Einstein Platform Co-organizer SF Big Analytics Spark Tuning • Cheat-sheet • Talks Previously at Alpine Data, SRI PhD Mayo Clinic, BS Johns Hopkins @anyabida1
  • 4. What I am going to talk about What is DevOps Salesforce Einstein Scales Our goal Top 10 tips What’s next?
  • 5. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release
  • 6. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release Data Science
  • 7. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release Data Science • Awesome library on SparkML • Spark clusters • Microservices • Cluster, Containers
  • 8. Fastest Growing Top 5 Enterprise Software Company $5.4B FY15 $4.1B FY14 $3.1B FY13 $6.7B FY16 $2.3B FY12 $1.7B FY11 $2.56BFY18Q2 revenue $8.4BFY17 revenue 2009 • 2010 • 2011 2012 • 2013 • 2014 2015 • 2016 • 2017 September 2016 2011 • 2012 • 2013 2014 • 2015 • 2016 • 2017 The world’s most innovative companies “Innovator of the Decade”
  • 9.
  • 10. Our Goal Time Number of Predictions Infrastructure Costs
  • 11. Tip 1: Plan for Failure Take off that Data Scientist hat now.
  • 12. Simple Dashboard with KPIs Tip 1: Plan for Failure Take off that Data Scientist hat now.
  • 13. Tip 1: Plan for Failure Take off that Data Scientist hat now. https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead Simple Dashboard with KPIs • Request & error rates • Longest response times - upper 95th & 99th percentile • Capacity • Events Jos Boumans, Salesforce DMP slides
  • 14. Tip 1: Plan for Failure Take off that Data Scientist hat now. https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead Simple Dashboard with KPIs • Request & error rates • Longest response times - upper 95th & 99th percentile • Capacity • Events Collect metrics from every machine. Troubleshoot with all the metrics at your disposal
  • 15. Tip 2: Blue Green Deployments https://docs.mobingi.com/official/guide/bg-deploy Blue Machine (old) Green Machine (new) Users
  • 16. Tip 3: Assume people make mistakes Technical debt • Every manual change • Duplicate metrics Scale down resources • Terminate unused machines • Janitor Monkey • Understand the cost per job • Jobs should not accumulate files on disk
  • 17. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu
  • 18. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC); CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC);
  • 19. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC); CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, discount_code string, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC);
  • 20. Tip 5: Configuration management Network Connectivity • 20 parameters User Access • 50 parameters Deploy cluster (eg Mesos) • 20 non-default parameters Deploy a microservice • 50 parameters Schedule a job • 3 parameters SUM X 3 regions X 20 metrics Approx.6000
  • 21. Templates for Automation Service discovery Creating dashboards • Prod, non-prod, … Log queries Cost analysis Tip 6: Pick a naming convention <service>. <environment>. <region>. <hostname>. <metric>
  • 22. Tip 7: Permissions Every user, service, & job should have specific, auditable permissions. Cluster Manager Scheduler IAM IAM Roles • User has an IAM Role • Job has an IAM Role • IAM Roles determine read / write access to data IAM Out Logs IAM In
  • 23. Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems) Mayuresh Kunjir (Duke University) Tip 8: Understand resource allocation Node Memory Container Memory 8Gb Node Memory Container Memory 8Gb
  • 25. Tip 9: Monitor multiple viewpoints https://light.co/camera
  • 26. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  • 27. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  • 28. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  • 29. Getting started tips: 1. Plan for failure 2. Blue / Green Deployments 3. Assume people make mistakes 4. Changes should be auditable 5. Configuration management 6. Pick a naming convention 7. Permissions • user, service, job 8. Understand resource allocation 9. Monitor multiple viewpoints
  • 30. Getting started tips: 1. Plan for failure 2. Blue / Green Deployments 3. Assume people make mistakes 4. Changes should be auditable 5. Configuration management 6. Pick a naming convention 7. Permissions • user, service, job 8. Understand resource allocation 9. Monitor multiple viewpoints 10. Infrastructure as Code
  • 31. Did we just automate ourselves out of our jobs? Nope. Now we have time to take on new projects and grow…
  • 32. More info: Jos Boumans, Salesforce DMP slides SRE How Google Runs Production Systems book James Ward, Engineering & Open Source Ambassador at Salesforce High Performance spark book
  • 33. More info: Real Time ML Pipelines in Multi-Tenant Environments Director of Engineering Karl Skucha & Lead Engineer Yan Yang Introduction to Machine Learning Engineering & Open Source Ambassador James Ward Fantastic ML apps and how to build them Principal Engineer, Matthew Tovbin Fireworks - lighting up the sky with millions of Sparks Director of Engineering Thomas Gerber Functional Linear Algebra in Scala Engineer & Professor Vlad Patryshev Panel: Functional Programming for Machine Learning Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy Machine Learning Engineers Till Bergmann & Chris Rupley

Editor's Notes

  1. What DevOps actually IS??? -- cross section of infrastructure, -- here’s all the things data scientists need to support themselves at scale
  2. What DevOps actually IS??? -- cross section of infrastructure, -- here’s all the things data scientists need to support themselves at scale
  3. What DevOps actually IS??? -- cross section of infrastructure, -- here’s all the things data scientists need to support themselves at scale
  4. We need to build an infra that scales at the pace of Salesforce.
  5. Salesforce Einstein is serving 475 Million predictions per day, and growing. So how do we do this from an infra perspective?
  6. Even if you do everything right, machines WILL fail.
  7. Collect metrics by installing statsd on every machine.
  8. Should I automate the file removal Better: keep your files in a distributed, versioned storage system Infra team will monitor disk usage
  9. Lets say I have a database with one replica on the east coast, and one replica on the west coast.
  10. My database schema, here represented as a table, is as follows. Right now my schemas are identical across data centers.
  11. But if someone changes the schema for one of my replicas, I want to know immediately. So my schemas should be auditable. Q on our SRE team built the tool schaper to compare schemas. Schaper is generic - it supports ElasticSearch, Cassandra, MongoDb, etc., and provides a report when there is a schema change. I NEED TO KNOW when my schema changes. Obviously this could be very important information. Wink, wink. Schaper is also modular - it’s plug-n-play. So this is an example of how we ensure changes are auditable. Cassandra: Keyspaces Database replication Schaper is one example of the type of tools that could be built to audit changes. From the audit, we can automate some action, depending on the particular change or … We haven’t open sourced this tool, yet, just an example
  12. When to automate? Any task that’s done 10x per year should be automated. IAC should be correct, comprehensible, and composable. How the number of clicks can be so big 20clicks per cluster x 3regions x 20metrics IAC -- networking layer -- provisioning -- build and deploy -- monitoring -- manage
  13. IAM  definition Identity and access management Authorization & Authentication
  14. Ok, so I’ve got my container, which uses maybe 8Gb of RAM. Now I want to know if my container can launch on my cluster.
  15. So my cluster has 3 nodes, let’s say, and 8Gb total RAM on each node. CAN MY 8GB CONTAINER LAUNCH ON THIS CLUSTER? Since 4Gb of ram is used on each node, the cluster memory available is 4x3 = 12Gb, so if I only monitor cluster level metrics, then my container will fail to launch.
  16. The image above shows sample connectivity for development, staging and production environments. It helps us verify there are no unintended rules etc.. Mention the three lone servers - should we review these? Are these supposed to be there? This tool is not open sourced, but just an example of the internal tools we build - and you can too!
  17. Double clicking a node shows its connectivity. This is useful for debugging issues.
  18. We can filter by resource type, names, tags etc.
  19. Taken together, hopefully I’ve convinced you that each piece of your infra should be deployed and managed as code.
  20. This has been “Just enough devops for data scientists”
  21. This has been “Just enough devops for data scientists”