Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps.
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Anya Bida
Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isn’t caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully.
Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018
https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s
Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)
Super-powered CI with Git - Sarah Goff-DupontAtlassian
Continuous integration is a critical part of working as a team and shipping great software. But when you switch to Git, CI can seem chaotic due to the sudden explosion of branches. Learn how to keep it under control with branch builds, shallow clones, repository caching, and other tricks of the trade.
Wrapped in a single session, you'll find the concepts and techniques that convert the average Git practitioner into a master of the craft. We'll go from technical topics like "efficient conflict resolution" and "effective code cleanup," to the often-asked "how to handle project dependencies with Git" and "how to manage massive repositories." And much more.
Apache Yetus: Helping Solve the Last Mile ProblemAllen Wittenauer
Presentation given at Apache: Big Data and ApacheCon North America 2016.
"In this time of rapidly growing software projects and software capabilities, where it is expected for “software to eat the world,” there is still a huge challenge going from source code to a tested, fully functional release. This is the “last mile problem,” ensuring that vision and coding become real, deployable software. To help address this problem, members of the extended Apache Hadoop/”big data” ecosystem have joined forces to create tools that reduce the burden of pre-commit testing, release note compilation and interface documentation. In this talk, Allen Wittenauer, a PMC member of the Apache Yetus project, will discuss the various components that make up the Yetus toolset, as well as how Apache Hadoop and other projects are using Apache Yetus to improve release quality. "
Optimizing Git LFS Migration Through Repository Data-miningAtlassian
Does your company struggle under the weight of HUGE repositories and slow clone times? Do you have developers distributed around the world, sharing repositories?
Join Michael Monsen, DevOps Engineer at Hexagon Manufacturing Intelligence, and learn how they migrated a huge code base from Mercurial to Git. Using data about their files, they migrated to Git LFS while striking a balance between file size and developer convenience. The results turned out to be a lot different than what they expected before examining the data!
"Spark: from interactivity to production and back", Yurii OstapchukFwdays
Going from experiment to deployed prototype as fast as possible in a dynamic startup environment is invaluable. Being able to respond quickly to changes not less important.
From interactive ad-hoc analysis to production applications with Spark and back - this is a story of one spirited engineer trying to make his life a little easier and a little more efficient while wrangling the data, writing Scala code, deploying Spark applications. The problems faced, the lessons learned, the options found and some smart solutions and ideas - this is what we will go through.
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Anya Bida
Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isn’t caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully.
Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018
https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s
Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)
Super-powered CI with Git - Sarah Goff-DupontAtlassian
Continuous integration is a critical part of working as a team and shipping great software. But when you switch to Git, CI can seem chaotic due to the sudden explosion of branches. Learn how to keep it under control with branch builds, shallow clones, repository caching, and other tricks of the trade.
Wrapped in a single session, you'll find the concepts and techniques that convert the average Git practitioner into a master of the craft. We'll go from technical topics like "efficient conflict resolution" and "effective code cleanup," to the often-asked "how to handle project dependencies with Git" and "how to manage massive repositories." And much more.
Apache Yetus: Helping Solve the Last Mile ProblemAllen Wittenauer
Presentation given at Apache: Big Data and ApacheCon North America 2016.
"In this time of rapidly growing software projects and software capabilities, where it is expected for “software to eat the world,” there is still a huge challenge going from source code to a tested, fully functional release. This is the “last mile problem,” ensuring that vision and coding become real, deployable software. To help address this problem, members of the extended Apache Hadoop/”big data” ecosystem have joined forces to create tools that reduce the burden of pre-commit testing, release note compilation and interface documentation. In this talk, Allen Wittenauer, a PMC member of the Apache Yetus project, will discuss the various components that make up the Yetus toolset, as well as how Apache Hadoop and other projects are using Apache Yetus to improve release quality. "
Optimizing Git LFS Migration Through Repository Data-miningAtlassian
Does your company struggle under the weight of HUGE repositories and slow clone times? Do you have developers distributed around the world, sharing repositories?
Join Michael Monsen, DevOps Engineer at Hexagon Manufacturing Intelligence, and learn how they migrated a huge code base from Mercurial to Git. Using data about their files, they migrated to Git LFS while striking a balance between file size and developer convenience. The results turned out to be a lot different than what they expected before examining the data!
"Spark: from interactivity to production and back", Yurii OstapchukFwdays
Going from experiment to deployed prototype as fast as possible in a dynamic startup environment is invaluable. Being able to respond quickly to changes not less important.
From interactive ad-hoc analysis to production applications with Spark and back - this is a story of one spirited engineer trying to make his life a little easier and a little more efficient while wrangling the data, writing Scala code, deploying Spark applications. The problems faced, the lessons learned, the options found and some smart solutions and ideas - this is what we will go through.
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Writing iOS apps in Javascript is not a new idea, anymore, at least since companies like Appcelerator (Titanium) built entire business models around corresponding frameworks.
And yet, Apple manages to open up two exciting new possibilities during the WWDC 2013: The release of the JavaScriptCore Framework as a public API on iOS and OS X, as well as the announcement of an Objective-C to Javascript Bridge.
I'd like to talk to you about my experiences with these new bridge-technologies, the new ways in which you can use them and finally present to you my own project; Node.app — a Node.js implementation for iOS.
Developing APIs over a RESTful interface with JSON payloads is kind of the de-facto standard nowadays, but it still lacks an easy way to build it with a well-defined interface and document it to be used by others. What if we can leverage gRPC's fast, type-safe, and modern way of building APIs and still be able to provide an interface over REST/JSON ? Check this talk to find out how.
Scrum Control or Kanban Agility? You Can Have both, Using MetricsAtlassian
Are you someone who runs multiple stable Scrum teams, but is curious about migrating to Kanban? Do you think Kanban might lead to a loss of team control and productivity?
Join me as I first discuss the pros and cons of Scrum and Kanban. Because whichever you choose, it should be for the right reasons. Next, I'll talk about how I used JIRA Software's powerful reports and metrics to migrate three Scrum teams to Kanban, without losing agility or control. I'll highlight some aspects of our migration:
- Rituals - How to run metric driven planning meetings and retrospectives in a Kanban oriented team
- Estimation - From point estimation to story consistency
- Metrics - Fluency and cycle times for estimations
Marcio Ghiraldelli, Senior Quality Engineer, Atlassian
DevDay 2013 - Building Startups and Minimum Viable ProductsBen Hall
DevDay (http://devday.pl),
20th of September 2013, Kraków
Video at http://www.youtube.com/watch?v=L4eTOvq2WmM&feature=c4-overview-vl&list=PLBMFXMTB7U74NdDghygvBaDcp67owVUUF
Lightning talks on best practices for product and engineering teams to experiment everywhere in their applications.
First presented at Optimizely's user conference, Opticon18 on September 12th, 2018.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Atmosphere Conference 2015: The 10 Myths of DevOpsPROIDEA
Speaker: Seth Vargo
Language: English
Although not officially coined until 2009, DevOps ideals have been explicitly discussed since at least 2006. Recently, however, the term "DevOps" has gained increasing popularity across a variety of fields and industries. DevOps is not a development methodology or technology; DevOps is an ideology. It is a way to facilitate organizational prosperity and growth while increasing each individual employee's happiness along the way. As DevOps has gained in prominence, a gap has been created between the original definition of DevOps and this new "enterprise-ready" buzzword.
For organizations beginning DevOps practices, this talk will provide a 10,000ft view of DevOps and how you can properly implement DevOps practices in your organization. For organizations that are currently practicing DevOps, this talk will cover common pitfalls, ways to sustain a happy culture, and new tips to foster organizational prosperity.
Visit our website: http://atmosphere-conference.com/
Writing iOS apps in Javascript is not a new idea, anymore, at least since companies like Appcelerator (Titanium) built entire business models around corresponding frameworks.
And yet, Apple manages to open up two exciting new possibilities during the WWDC 2013: The release of the JavaScriptCore Framework as a public API on iOS and OS X, as well as the announcement of an Objective-C to Javascript Bridge.
I'd like to talk to you about my experiences with these new bridge-technologies, the new ways in which you can use them and finally present to you my own project; Node.app — a Node.js implementation for iOS.
Developing APIs over a RESTful interface with JSON payloads is kind of the de-facto standard nowadays, but it still lacks an easy way to build it with a well-defined interface and document it to be used by others. What if we can leverage gRPC's fast, type-safe, and modern way of building APIs and still be able to provide an interface over REST/JSON ? Check this talk to find out how.
Scrum Control or Kanban Agility? You Can Have both, Using MetricsAtlassian
Are you someone who runs multiple stable Scrum teams, but is curious about migrating to Kanban? Do you think Kanban might lead to a loss of team control and productivity?
Join me as I first discuss the pros and cons of Scrum and Kanban. Because whichever you choose, it should be for the right reasons. Next, I'll talk about how I used JIRA Software's powerful reports and metrics to migrate three Scrum teams to Kanban, without losing agility or control. I'll highlight some aspects of our migration:
- Rituals - How to run metric driven planning meetings and retrospectives in a Kanban oriented team
- Estimation - From point estimation to story consistency
- Metrics - Fluency and cycle times for estimations
Marcio Ghiraldelli, Senior Quality Engineer, Atlassian
DevDay 2013 - Building Startups and Minimum Viable ProductsBen Hall
DevDay (http://devday.pl),
20th of September 2013, Kraków
Video at http://www.youtube.com/watch?v=L4eTOvq2WmM&feature=c4-overview-vl&list=PLBMFXMTB7U74NdDghygvBaDcp67owVUUF
Lightning talks on best practices for product and engineering teams to experiment everywhere in their applications.
First presented at Optimizely's user conference, Opticon18 on September 12th, 2018.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Atmosphere Conference 2015: The 10 Myths of DevOpsPROIDEA
Speaker: Seth Vargo
Language: English
Although not officially coined until 2009, DevOps ideals have been explicitly discussed since at least 2006. Recently, however, the term "DevOps" has gained increasing popularity across a variety of fields and industries. DevOps is not a development methodology or technology; DevOps is an ideology. It is a way to facilitate organizational prosperity and growth while increasing each individual employee's happiness along the way. As DevOps has gained in prominence, a gap has been created between the original definition of DevOps and this new "enterprise-ready" buzzword.
For organizations beginning DevOps practices, this talk will provide a 10,000ft view of DevOps and how you can properly implement DevOps practices in your organization. For organizations that are currently practicing DevOps, this talk will cover common pitfalls, ways to sustain a happy culture, and new tips to foster organizational prosperity.
Visit our website: http://atmosphere-conference.com/
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
NOTE: To complete hands on poriton in the time allotted, attendees should come with a newly created AWS (Amazon Web Services) Account and complete the other prerequisites found in the DataKitchen blog <http: />.
Do you need Ops in your new startup? If not now, then when? And...what is Ops?
Learn how to scale ruby-based distributed software infrastructure in the cloud to serve 4,000 requests per second, handle 400 updates per second, and achieve 99.97% uptime – all while building the product at the speed of light.
Unimpressed? Now try doing the above altogether without the Ops team, while growing your traffic 100x in 6 months and deploying 5-6 times a day!
It could be a dream, but luckily it's a reality that could be yours.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Extending SAP SuccessFactors in the Cloud and how not to do itChris Paine
Extending SAP SuccessFactors using SAP Cloud Platform is an excellent idea, but there are many pitfalls. This presentation explains what not to do when creating your own extension project and why sometimes you might not even want to go there.
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit in San Francisco, November 13, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Innovate Better Through Machine data AnalyticsHal Rottenberg
This talk was presented at IP Expo Manchester in May, 2016. the themes discussed are:
- how does machine data relate to devops?
- how can tracking this data lead to better outcomes?
- what types of data are important to track?
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
JustEnoughDevOpsForDataScientists
1.
2. Just Enough DevOps for Data Scientists
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
3. About Anya
Sr. Member of Technical Staff (SRE)
Salesforce Production Engineering
Salesforce Einstein Platform
Co-organizer SF Big Analytics
Spark Tuning
• Cheat-sheet
• Talks
Previously at Alpine Data, SRI
PhD Mayo Clinic, BS Johns Hopkins
@anyabida1
4. What I am going to talk about
What is DevOps
Salesforce Einstein Scales
Our goal
Top 10 tips
What’s next?
11. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
12. Simple Dashboard with KPIs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
13. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Jos Boumans,
Salesforce DMP
slides
14. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Collect metrics from every
machine.
Troubleshoot with all the
metrics at your disposal
15. Tip 2: Blue Green Deployments
https://docs.mobingi.com/official/guide/bg-deploy
Blue Machine
(old)
Green Machine
(new)
Users
16. Tip 3: Assume people make mistakes
Technical debt
• Every manual change
• Duplicate metrics
Scale down resources
• Terminate unused machines
• Janitor Monkey
• Understand the cost per job
• Jobs should not accumulate files on disk
17. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
18. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
19. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
discount_code string,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
20. Tip 5: Configuration management
Network Connectivity
• 20 parameters
User Access
• 50 parameters
Deploy cluster (eg Mesos)
• 20 non-default parameters
Deploy a microservice
• 50 parameters
Schedule a job
• 3 parameters
SUM X 3 regions
X 20 metrics
Approx.6000
21. Templates for Automation
Service discovery
Creating dashboards
• Prod, non-prod, …
Log queries
Cost analysis
Tip 6: Pick a naming convention
<service>.
<environment>.
<region>.
<hostname>.
<metric>
22. Tip 7: Permissions
Every user, service, & job should have specific, auditable permissions.
Cluster Manager
Scheduler
IAM
IAM Roles
• User has an IAM Role
• Job has an IAM Role
• IAM Roles determine read /
write access to data
IAM
Out
Logs
IAM
In
23. Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)
Tip 8: Understand resource allocation
Node Memory
Container Memory
8Gb
Node Memory
Container
Memory
8Gb
29. Getting started tips:
1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
30. Getting started tips: 1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
10. Infrastructure as Code
31. Did we just automate ourselves
out of our jobs?
Nope. Now we have time to take on new projects and grow…
32. More info:
Jos Boumans,
Salesforce DMP
slides
SRE How Google Runs
Production Systems book
James Ward,
Engineering & Open Source
Ambassador at Salesforce
High Performance
spark book
33. More info:
Real Time ML Pipelines in Multi-Tenant Environments
Director of Engineering Karl Skucha & Lead Engineer Yan Yang
Introduction to Machine Learning
Engineering & Open Source Ambassador James Ward
Fantastic ML apps and how to build them
Principal Engineer, Matthew Tovbin
Fireworks - lighting up the sky with millions of Sparks
Director of Engineering Thomas Gerber
Functional Linear Algebra in Scala
Engineer & Professor Vlad Patryshev
Panel: Functional Programming for Machine Learning
Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy
Machine Learning Engineers Till Bergmann & Chris Rupley
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
We need to build an infra that scales at the pace of Salesforce.
Salesforce Einstein is serving 475 Million predictions per day, and growing.So how do we do this from an infra perspective?
Even if you do everything right, machines WILL fail.
Collect metrics by installing statsd on every machine.
Should I automate the file removal
Better: keep your files in a distributed, versioned storage system
Infra team will monitor disk usage
Lets say I have a database with one replica on the east coast, and one replica on the west coast.
My database schema, here represented as a table, is as follows.
Right now my schemas are identical across data centers.
But if someone changes the schema for one of my replicas, I want to know immediately.
So my schemas should be auditable.
Q on our SRE team built the tool schaper to compare schemas. Schaper is generic - it supports ElasticSearch, Cassandra, MongoDb, etc., and provides a report when there is a schema change. I NEED TO KNOW when my schema changes. Obviously this could be very important information. Wink, wink.
Schaper is also modular - it’s plug-n-play. So this is an example of how we ensure changes are auditable. Cassandra: Keyspaces
Database replication
Schaper is one example of the type of tools that could be built to audit changes. From the audit, we can automate some action, depending on the particular change or …
We haven’t open sourced this tool, yet, just an example
When to automate? Any task that’s done 10x per year should be automated.
IAC should be correct, comprehensible, and composable.
How the number of clicks can be so big20clicks per cluster x 3regions x 20metrics
IAC
-- networking layer
-- provisioning
-- build and deploy
-- monitoring
-- manage
IAM definitionIdentity and access management
Authorization & Authentication
Ok, so I’ve got my container, which uses maybe 8Gb of RAM. Now I want to know if my container can launch on my cluster.
So my cluster has 3 nodes, let’s say, and 8Gb total RAM on each node. CAN MY 8GB CONTAINER LAUNCH ON THIS CLUSTER?
Since 4Gb of ram is used on each node, the cluster memory available is 4x3 = 12Gb, so if I only monitor cluster level metrics, then my container will fail to launch.
The image above shows sample connectivity for development, staging and production environments. It helps us verify there are no unintended rules etc..
Mention the three lone servers - should we review these? Are these supposed to be there?
This tool is not open sourced, but just an example of the internal tools we build - and you can too!
Double clicking a node shows its connectivity. This is useful for debugging issues.
We can filter by resource type, names, tags etc.
Taken together, hopefully I’ve convinced you that each piece of your infra should be deployed and managed as code.
This has been “Just enough devops for data scientists”
This has been “Just enough devops for data scientists”