SlideShare a Scribd company logo
Scale Out your team
Building your technology team for scale
SVP Platform Engineering & Operations
Shai Peretz
Provide people with the
most interesting, relevant
and trusted content
Audience First.
Our Lighthouse
Widget Examples
Traffic: >25 Billion PVs per month
>8 Billion recs per day
Reach: >550M users globally
Data: multiple petabytes (dist.)
Servers: >4000 physical nodes
Monitoring: >4m metrics per minute
Team: ~130 Engineers (Dev + Ops)
High growth rate
Scaling Vectors
- We design and build our own data
centers (optimization, cost)
- Collocation (less clouds on the
horizon:)
- Active/Active approach
- Rely on external services when
needed (DNS, CDN)
Operational Decisions
- No SPOF (n+x)
- Vendor diversity
- Flexible architecture
- Commodity hardware
- Scale out – no central devices
- Open source
Design guidelines
- Automate using Chef
- Configuration as code
(Source control)
- Log changes automatically
Configuration Management
Architecture – tolerance
Service owner responsibility
War rooms Ops + Dev
Production Party
Open communication with
business
Disaster Recovery
Ops:
- Facilities
- Network and Infra
- Visibility
- Data systems
- Production Engineering
Platform Group Structure
Engineering:
- Data delivery & processing
- App Infrastructure
- Build/Dev tools
- Ops tools
Skilled Ops engineers
Sit with product development teams
Product/Business KPIs
What? – from product, How? - From Ops
PE team lead – sync, training, implementations
Two way communication
Production Engineering
-Very short release cycles
(>100 per day)
- Micro services
- Easy to find issue (fix or rollback)
- Automated deployment process
- Testing & monitoring
- Work procedures and culture
Continuous Deployment
Continuous Deployment
 Ownership & Trust
 Product Developers own
their Services
 Platform own
Infrastructure, Hardware &
Network services
- Ownership
- Trust
- Good communication
- Learning
Values
- Face to Face
- Sync and Share
- Hipchat – always on
- Open channels
Communication
- Prevention (Anomaly
detection, trends)
- MTTD, MTTR, MTTS
Technology will eventually fail,
we promise to fix it ASAP!
Stability Goals
- Graph everything
- Self serve
- Combination of internal and external
tools:
Collectd/Graphite/Nagios
Logstash/ElasticSearch/Kibana
New Relic/Boundary
Keynote/Pingdom/Catchpoint
- Dashboards – Graphitus/Grafana
- Escalation of critical alerts via
PagerDuty
Visibility
Prevention
Immune system
 Unit tests (10k every 10m)
 Integration and Regression
 Self tests
 Monitoring system
 Alerts
Keys to success:
- Self serve
- Eliminate false alarms
(Signal to noise ratio)
Automatic full coverage
Immune System
To NOC or not to NOC?
Mean Time To Detect
Ops on shift
Engineer on call
Escalation policy on PD
Manage on HipChat/Phone
Mean Time To Recover
Escalate only critical issues
Measure time to resolve
Blameless learn from events
(Take-Ins)
Respect your team’s sleep!
Mean Time To Sleep
After each event
Blameless
Action Items
Publish
Follow up
Take ins
Order 2-3 times a year
Load testing + Prediction
Elasticity for engineering
Automatic provisioning
Capacity planning
Weekly tech talks
IL TechTalks (+ techtalk week)
Reversim podcast and summit
Internal/External Lectures
Sunday School
Learning
Do you guys ever work??
Two weeks dedicated to the
needs of the technical teams
Quality Time
Thank You
shai@outbrain.com
Shai Peretz, SVP Platform Engineering & Operations
And Yes, we are hiring…

More Related Content

What's hot

Enhance Your Business with Agile Contract & Procurement - Yusuf Kurniawan
Enhance Your Business with Agile Contract & Procurement  - Yusuf KurniawanEnhance Your Business with Agile Contract & Procurement  - Yusuf Kurniawan
Enhance Your Business with Agile Contract & Procurement - Yusuf Kurniawan
Scrum Day Bandung
 
Diez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
Diez trampas en la travesía ágil por Nelice Heck y Gabriel GavassoDiez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
Diez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
Diana Pinto
 
What Is Agile Management?
What Is Agile Management?What Is Agile Management?
What Is Agile Management?
Jurgen Appelo
 
What is agile?
What is agile?What is agile?
What is agile?
Pierre E. NEIS
 
Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014
Matthew Skelton
 
Understanding the agile mindset
Understanding the agile mindsetUnderstanding the agile mindset
Understanding the agile mindset
Simon Petkov
 
Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)
Pedro Teixeira
 
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
Lean Startup Co.
 
Product and Process innovation with Scrum
Product and Process innovation with ScrumProduct and Process innovation with Scrum
Product and Process innovation with Scrum
Geir Amsjø
 
DevSecOps Value & Its Organizational Impact: A CSO's Perspective
DevSecOps Value & Its Organizational Impact: A CSO's PerspectiveDevSecOps Value & Its Organizational Impact: A CSO's Perspective
DevSecOps Value & Its Organizational Impact: A CSO's Perspective
Cprime
 
What's agile? (Scaling agile and dev ops Scotland)
What's agile? (Scaling agile and dev ops Scotland)What's agile? (Scaling agile and dev ops Scotland)
What's agile? (Scaling agile and dev ops Scotland)
Pierre E. NEIS
 
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
Gautham Pallapa
 
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
Katy Slemon
 
Why I Built my Career with Atlassian Tools and You Should Too!
 Why I Built my Career with Atlassian Tools and You Should Too! Why I Built my Career with Atlassian Tools and You Should Too!
Why I Built my Career with Atlassian Tools and You Should Too!
Atlassian
 
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
Taoufik Fekhar
 
Agile And Lean Practices - The Mobile Academy
Agile And Lean Practices - The Mobile AcademyAgile And Lean Practices - The Mobile Academy
Agile And Lean Practices - The Mobile Academy
strongandagile.co.uk
 
Resolve Incidents Faster: Transforming Your Incident Management Process
Resolve Incidents Faster: Transforming Your Incident Management ProcessResolve Incidents Faster: Transforming Your Incident Management Process
Resolve Incidents Faster: Transforming Your Incident Management Process
Atlassian
 
More Agile and LeSS dysfunction - may 2015
More Agile and LeSS dysfunction - may 2015More Agile and LeSS dysfunction - may 2015
More Agile and LeSS dysfunction - may 2015
Rowan Bunning
 
SITS18 Lean Agile SMBC
SITS18 Lean Agile SMBCSITS18 Lean Agile SMBC
SITS18 Lean Agile SMBC
Matt Turner
 
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
Scrum Day Bandung
 

What's hot (20)

Enhance Your Business with Agile Contract & Procurement - Yusuf Kurniawan
Enhance Your Business with Agile Contract & Procurement  - Yusuf KurniawanEnhance Your Business with Agile Contract & Procurement  - Yusuf Kurniawan
Enhance Your Business with Agile Contract & Procurement - Yusuf Kurniawan
 
Diez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
Diez trampas en la travesía ágil por Nelice Heck y Gabriel GavassoDiez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
Diez trampas en la travesía ágil por Nelice Heck y Gabriel Gavasso
 
What Is Agile Management?
What Is Agile Management?What Is Agile Management?
What Is Agile Management?
 
What is agile?
What is agile?What is agile?
What is agile?
 
Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014Death to the DevOps team - Agile Yorkshire 2014
Death to the DevOps team - Agile Yorkshire 2014
 
Understanding the agile mindset
Understanding the agile mindsetUnderstanding the agile mindset
Understanding the agile mindset
 
Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)Product Agility: 3 fundamentals from the trenches (Braga,PT)
Product Agility: 3 fundamentals from the trenches (Braga,PT)
 
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
Lean Enterprise Transformation: The Journey Inside Large Organizations, Sonja...
 
Product and Process innovation with Scrum
Product and Process innovation with ScrumProduct and Process innovation with Scrum
Product and Process innovation with Scrum
 
DevSecOps Value & Its Organizational Impact: A CSO's Perspective
DevSecOps Value & Its Organizational Impact: A CSO's PerspectiveDevSecOps Value & Its Organizational Impact: A CSO's Perspective
DevSecOps Value & Its Organizational Impact: A CSO's Perspective
 
What's agile? (Scaling agile and dev ops Scotland)
What's agile? (Scaling agile and dev ops Scotland)What's agile? (Scaling agile and dev ops Scotland)
What's agile? (Scaling agile and dev ops Scotland)
 
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
Execute a Successful Digital Transformation using Lean, Agile, and DevOps Pri...
 
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
COVID-19 Outbreak: How We Can Help You Run your Business-as-Usual and Ensure ...
 
Why I Built my Career with Atlassian Tools and You Should Too!
 Why I Built my Career with Atlassian Tools and You Should Too! Why I Built my Career with Atlassian Tools and You Should Too!
Why I Built my Career with Atlassian Tools and You Should Too!
 
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
Agile Mindset : The Paradigm Shift..! - Agile Tour Algiers 2017
 
Agile And Lean Practices - The Mobile Academy
Agile And Lean Practices - The Mobile AcademyAgile And Lean Practices - The Mobile Academy
Agile And Lean Practices - The Mobile Academy
 
Resolve Incidents Faster: Transforming Your Incident Management Process
Resolve Incidents Faster: Transforming Your Incident Management ProcessResolve Incidents Faster: Transforming Your Incident Management Process
Resolve Incidents Faster: Transforming Your Incident Management Process
 
More Agile and LeSS dysfunction - may 2015
More Agile and LeSS dysfunction - may 2015More Agile and LeSS dysfunction - may 2015
More Agile and LeSS dysfunction - may 2015
 
SITS18 Lean Agile SMBC
SITS18 Lean Agile SMBCSITS18 Lean Agile SMBC
SITS18 Lean Agile SMBC
 
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
How (can) Scrum and DevOps Walk Together to Build a High-Quality Product Deli...
 

Similar to ScaleOut your team - Building a technology team for scale in a DevOps culture

network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
AssadLeo1
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
Jorge Cardoso
 
SAST Managed Services for SAP [Webinar]
SAST Managed Services for SAP [Webinar]SAST Managed Services for SAP [Webinar]
SAST Managed Services for SAP [Webinar]
akquinet enterprise solutions GmbH
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
DFLABS SRL
 
L08 architecture considerations
L08 architecture considerationsL08 architecture considerations
L08 architecture considerations
Ólafur Andri Ragnarsson
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
Cloudera, Inc.
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Ibm innovate ci for system z
Ibm innovate ci for system zIbm innovate ci for system z
Ibm innovate ci for system z
Rosalind Radcliffe
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
CloudHesive
 
Sutedjo - Introduction to Cloud
Sutedjo - Introduction to CloudSutedjo - Introduction to Cloud
Sutedjo - Introduction to Cloud
PT Datacomm Diangraha
 
Gluon Consulting - Specialized Software Development for Finance
Gluon Consulting - Specialized Software Development for FinanceGluon Consulting - Specialized Software Development for Finance
Gluon Consulting - Specialized Software Development for Finance
Dennis Cabarroguis
 
How to add Artificial Intelligence Capabilities to Existing Software Platforms
How to add Artificial Intelligence Capabilities to Existing Software PlatformsHow to add Artificial Intelligence Capabilities to Existing Software Platforms
How to add Artificial Intelligence Capabilities to Existing Software Platforms
Harish Nalagandla
 
Soma_5+_Monitoring_Tools
Soma_5+_Monitoring_ToolsSoma_5+_Monitoring_Tools
Soma_5+_Monitoring_Tools
somasekhar kondaveeti
 
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
akquinet enterprise solutions GmbH
 
Current Resume
Current ResumeCurrent Resume
Current Resume
Dinesh Kumar
 
DevOps Underground - Microservices Monitoring
DevOps Underground - Microservices MonitoringDevOps Underground - Microservices Monitoring
DevOps Underground - Microservices Monitoring
kloia
 
This is my test slideshare
This is my test slideshareThis is my test slideshare
This is my test slideshare
papdev
 
Mastering System Resiliency with AIOps
Mastering System Resiliency with AIOpsMastering System Resiliency with AIOps
Mastering System Resiliency with AIOps
Peterson Technology Partners
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlowAuditor
 
Accel Frontline Remote Infrastructure Capabilities
Accel Frontline Remote Infrastructure CapabilitiesAccel Frontline Remote Infrastructure Capabilities
Accel Frontline Remote Infrastructure Capabilities
shaun_raghavan
 

Similar to ScaleOut your team - Building a technology team for scale in a DevOps culture (20)

network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
 
SAST Managed Services for SAP [Webinar]
SAST Managed Services for SAP [Webinar]SAST Managed Services for SAP [Webinar]
SAST Managed Services for SAP [Webinar]
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
 
L08 architecture considerations
L08 architecture considerationsL08 architecture considerations
L08 architecture considerations
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Ibm innovate ci for system z
Ibm innovate ci for system zIbm innovate ci for system z
Ibm innovate ci for system z
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
 
Sutedjo - Introduction to Cloud
Sutedjo - Introduction to CloudSutedjo - Introduction to Cloud
Sutedjo - Introduction to Cloud
 
Gluon Consulting - Specialized Software Development for Finance
Gluon Consulting - Specialized Software Development for FinanceGluon Consulting - Specialized Software Development for Finance
Gluon Consulting - Specialized Software Development for Finance
 
How to add Artificial Intelligence Capabilities to Existing Software Platforms
How to add Artificial Intelligence Capabilities to Existing Software PlatformsHow to add Artificial Intelligence Capabilities to Existing Software Platforms
How to add Artificial Intelligence Capabilities to Existing Software Platforms
 
Soma_5+_Monitoring_Tools
Soma_5+_Monitoring_ToolsSoma_5+_Monitoring_Tools
Soma_5+_Monitoring_Tools
 
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
How Linde identifies and tracks security incidents in its SAP systems. [Webinar]
 
Current Resume
Current ResumeCurrent Resume
Current Resume
 
DevOps Underground - Microservices Monitoring
DevOps Underground - Microservices MonitoringDevOps Underground - Microservices Monitoring
DevOps Underground - Microservices Monitoring
 
This is my test slideshare
This is my test slideshareThis is my test slideshare
This is my test slideshare
 
Mastering System Resiliency with AIOps
Mastering System Resiliency with AIOpsMastering System Resiliency with AIOps
Mastering System Resiliency with AIOps
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
 
Accel Frontline Remote Infrastructure Capabilities
Accel Frontline Remote Infrastructure CapabilitiesAccel Frontline Remote Infrastructure Capabilities
Accel Frontline Remote Infrastructure Capabilities
 

More from AgileSparks

What Do Agile Leaders Do by Kurt Bittner
What Do Agile Leaders Do by Kurt Bittner What Do Agile Leaders Do by Kurt Bittner
What Do Agile Leaders Do by Kurt Bittner
AgileSparks
 
Distributed Teams by Kevin Goldsmith
Distributed Teams by Kevin GoldsmithDistributed Teams by Kevin Goldsmith
Distributed Teams by Kevin Goldsmith
AgileSparks
 
A Back-End Approach to Customer Driven by Adi Gostynski
A Back-End Approach to Customer Driven by Adi GostynskiA Back-End Approach to Customer Driven by Adi Gostynski
A Back-End Approach to Customer Driven by Adi Gostynski
AgileSparks
 
Jira Portfolio by Elad Ben-Noam
Jira Portfolio by Elad Ben-NoamJira Portfolio by Elad Ben-Noam
Jira Portfolio by Elad Ben-Noam
AgileSparks
 
Agile Hiring at Scale by Yon Bergman
Agile Hiring at Scale by Yon Bergman Agile Hiring at Scale by Yon Bergman
Agile Hiring at Scale by Yon Bergman
AgileSparks
 
Are We Really Using Our Resources in The Most Effective Way? by Perry Yaqubo...
Are We Really Using Our Resources in The Most Effective Way?  by Perry Yaqubo...Are We Really Using Our Resources in The Most Effective Way?  by Perry Yaqubo...
Are We Really Using Our Resources in The Most Effective Way? by Perry Yaqubo...
AgileSparks
 
Honest Experimentation by Jonathan Bertfield
 Honest Experimentation by Jonathan Bertfield Honest Experimentation by Jonathan Bertfield
Honest Experimentation by Jonathan Bertfield
AgileSparks
 
Pango Journey to an Agile Cloud by Yaniv Kalo
Pango Journey to an Agile Cloud by Yaniv KaloPango Journey to an Agile Cloud by Yaniv Kalo
Pango Journey to an Agile Cloud by Yaniv Kalo
AgileSparks
 
ClickSoftware Agile Tranistion by Meny Duek
ClickSoftware Agile Tranistion by Meny DuekClickSoftware Agile Tranistion by Meny Duek
ClickSoftware Agile Tranistion by Meny Duek
AgileSparks
 
Augury's Journey Towards CD by Assaf Mizrachi
Augury's Journey Towards CD by Assaf Mizrachi Augury's Journey Towards CD by Assaf Mizrachi
Augury's Journey Towards CD by Assaf Mizrachi
AgileSparks
 
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad AssisKubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
AgileSparks
 
Creating a Culture of Ownership and Trust with Visibility and Transparency by...
Creating a Culture of Ownership and Trust with Visibility and Transparency by...Creating a Culture of Ownership and Trust with Visibility and Transparency by...
Creating a Culture of Ownership and Trust with Visibility and Transparency by...
AgileSparks
 
Real Innovation is with Real Customers by Baat Enosh
Real Innovation is with Real Customers by Baat EnoshReal Innovation is with Real Customers by Baat Enosh
Real Innovation is with Real Customers by Baat Enosh
AgileSparks
 
True Continuous Improvement with Toyota Kata by Jesper Boeg
True Continuous Improvement with Toyota Kata by Jesper BoegTrue Continuous Improvement with Toyota Kata by Jesper Boeg
True Continuous Improvement with Toyota Kata by Jesper Boeg
AgileSparks
 
Homo-Adaptus Agile Worker by Lior Frenkel
Homo-Adaptus Agile Worker by Lior FrenkelHomo-Adaptus Agile Worker by Lior Frenkel
Homo-Adaptus Agile Worker by Lior Frenkel
AgileSparks
 
Intel CHD Case Study by Ronen Ezra
Intel CHD Case Study by Ronen EzraIntel CHD Case Study by Ronen Ezra
Intel CHD Case Study by Ronen Ezra
AgileSparks
 
Leading Innovation by Jonathan Bertfield
Leading Innovation by Jonathan BertfieldLeading Innovation by Jonathan Bertfield
Leading Innovation by Jonathan Bertfield
AgileSparks
 
Organization architecture autonomy and accountability
Organization architecture autonomy and accountability Organization architecture autonomy and accountability
Organization architecture autonomy and accountability
AgileSparks
 
Tribal Unity, Agile Israel 2017
Tribal Unity, Agile Israel 2017Tribal Unity, Agile Israel 2017
Tribal Unity, Agile Israel 2017
AgileSparks
 
The mindful manager, Agile Israel 2017
The mindful manager, Agile Israel 2017The mindful manager, Agile Israel 2017
The mindful manager, Agile Israel 2017
AgileSparks
 

More from AgileSparks (20)

What Do Agile Leaders Do by Kurt Bittner
What Do Agile Leaders Do by Kurt Bittner What Do Agile Leaders Do by Kurt Bittner
What Do Agile Leaders Do by Kurt Bittner
 
Distributed Teams by Kevin Goldsmith
Distributed Teams by Kevin GoldsmithDistributed Teams by Kevin Goldsmith
Distributed Teams by Kevin Goldsmith
 
A Back-End Approach to Customer Driven by Adi Gostynski
A Back-End Approach to Customer Driven by Adi GostynskiA Back-End Approach to Customer Driven by Adi Gostynski
A Back-End Approach to Customer Driven by Adi Gostynski
 
Jira Portfolio by Elad Ben-Noam
Jira Portfolio by Elad Ben-NoamJira Portfolio by Elad Ben-Noam
Jira Portfolio by Elad Ben-Noam
 
Agile Hiring at Scale by Yon Bergman
Agile Hiring at Scale by Yon Bergman Agile Hiring at Scale by Yon Bergman
Agile Hiring at Scale by Yon Bergman
 
Are We Really Using Our Resources in The Most Effective Way? by Perry Yaqubo...
Are We Really Using Our Resources in The Most Effective Way?  by Perry Yaqubo...Are We Really Using Our Resources in The Most Effective Way?  by Perry Yaqubo...
Are We Really Using Our Resources in The Most Effective Way? by Perry Yaqubo...
 
Honest Experimentation by Jonathan Bertfield
 Honest Experimentation by Jonathan Bertfield Honest Experimentation by Jonathan Bertfield
Honest Experimentation by Jonathan Bertfield
 
Pango Journey to an Agile Cloud by Yaniv Kalo
Pango Journey to an Agile Cloud by Yaniv KaloPango Journey to an Agile Cloud by Yaniv Kalo
Pango Journey to an Agile Cloud by Yaniv Kalo
 
ClickSoftware Agile Tranistion by Meny Duek
ClickSoftware Agile Tranistion by Meny DuekClickSoftware Agile Tranistion by Meny Duek
ClickSoftware Agile Tranistion by Meny Duek
 
Augury's Journey Towards CD by Assaf Mizrachi
Augury's Journey Towards CD by Assaf Mizrachi Augury's Journey Towards CD by Assaf Mizrachi
Augury's Journey Towards CD by Assaf Mizrachi
 
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad AssisKubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
Kubernetes is Hard! Lessons Learned Taking Our Apps to Kubernetes by Eldad Assis
 
Creating a Culture of Ownership and Trust with Visibility and Transparency by...
Creating a Culture of Ownership and Trust with Visibility and Transparency by...Creating a Culture of Ownership and Trust with Visibility and Transparency by...
Creating a Culture of Ownership and Trust with Visibility and Transparency by...
 
Real Innovation is with Real Customers by Baat Enosh
Real Innovation is with Real Customers by Baat EnoshReal Innovation is with Real Customers by Baat Enosh
Real Innovation is with Real Customers by Baat Enosh
 
True Continuous Improvement with Toyota Kata by Jesper Boeg
True Continuous Improvement with Toyota Kata by Jesper BoegTrue Continuous Improvement with Toyota Kata by Jesper Boeg
True Continuous Improvement with Toyota Kata by Jesper Boeg
 
Homo-Adaptus Agile Worker by Lior Frenkel
Homo-Adaptus Agile Worker by Lior FrenkelHomo-Adaptus Agile Worker by Lior Frenkel
Homo-Adaptus Agile Worker by Lior Frenkel
 
Intel CHD Case Study by Ronen Ezra
Intel CHD Case Study by Ronen EzraIntel CHD Case Study by Ronen Ezra
Intel CHD Case Study by Ronen Ezra
 
Leading Innovation by Jonathan Bertfield
Leading Innovation by Jonathan BertfieldLeading Innovation by Jonathan Bertfield
Leading Innovation by Jonathan Bertfield
 
Organization architecture autonomy and accountability
Organization architecture autonomy and accountability Organization architecture autonomy and accountability
Organization architecture autonomy and accountability
 
Tribal Unity, Agile Israel 2017
Tribal Unity, Agile Israel 2017Tribal Unity, Agile Israel 2017
Tribal Unity, Agile Israel 2017
 
The mindful manager, Agile Israel 2017
The mindful manager, Agile Israel 2017The mindful manager, Agile Israel 2017
The mindful manager, Agile Israel 2017
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

ScaleOut your team - Building a technology team for scale in a DevOps culture

  • 1.
  • 2. Scale Out your team Building your technology team for scale SVP Platform Engineering & Operations Shai Peretz
  • 3. Provide people with the most interesting, relevant and trusted content Audience First. Our Lighthouse
  • 5.
  • 6.
  • 7.
  • 8. Traffic: >25 Billion PVs per month >8 Billion recs per day Reach: >550M users globally Data: multiple petabytes (dist.) Servers: >4000 physical nodes Monitoring: >4m metrics per minute Team: ~130 Engineers (Dev + Ops) High growth rate Scaling Vectors
  • 9. - We design and build our own data centers (optimization, cost) - Collocation (less clouds on the horizon:) - Active/Active approach - Rely on external services when needed (DNS, CDN) Operational Decisions
  • 10. - No SPOF (n+x) - Vendor diversity - Flexible architecture - Commodity hardware - Scale out – no central devices - Open source Design guidelines
  • 11. - Automate using Chef - Configuration as code (Source control) - Log changes automatically Configuration Management
  • 12. Architecture – tolerance Service owner responsibility War rooms Ops + Dev Production Party Open communication with business Disaster Recovery
  • 13. Ops: - Facilities - Network and Infra - Visibility - Data systems - Production Engineering Platform Group Structure Engineering: - Data delivery & processing - App Infrastructure - Build/Dev tools - Ops tools
  • 14. Skilled Ops engineers Sit with product development teams Product/Business KPIs What? – from product, How? - From Ops PE team lead – sync, training, implementations Two way communication Production Engineering
  • 15. -Very short release cycles (>100 per day) - Micro services - Easy to find issue (fix or rollback) - Automated deployment process - Testing & monitoring - Work procedures and culture Continuous Deployment
  • 16. Continuous Deployment  Ownership & Trust  Product Developers own their Services  Platform own Infrastructure, Hardware & Network services
  • 17. - Ownership - Trust - Good communication - Learning Values
  • 18. - Face to Face - Sync and Share - Hipchat – always on - Open channels Communication
  • 19. - Prevention (Anomaly detection, trends) - MTTD, MTTR, MTTS Technology will eventually fail, we promise to fix it ASAP! Stability Goals
  • 20. - Graph everything - Self serve - Combination of internal and external tools: Collectd/Graphite/Nagios Logstash/ElasticSearch/Kibana New Relic/Boundary Keynote/Pingdom/Catchpoint - Dashboards – Graphitus/Grafana - Escalation of critical alerts via PagerDuty Visibility
  • 21. Prevention Immune system  Unit tests (10k every 10m)  Integration and Regression  Self tests  Monitoring system  Alerts
  • 22. Keys to success: - Self serve - Eliminate false alarms (Signal to noise ratio) Automatic full coverage Immune System
  • 23. To NOC or not to NOC? Mean Time To Detect
  • 24. Ops on shift Engineer on call Escalation policy on PD Manage on HipChat/Phone Mean Time To Recover
  • 25. Escalate only critical issues Measure time to resolve Blameless learn from events (Take-Ins) Respect your team’s sleep! Mean Time To Sleep
  • 26. After each event Blameless Action Items Publish Follow up Take ins
  • 27. Order 2-3 times a year Load testing + Prediction Elasticity for engineering Automatic provisioning Capacity planning
  • 28. Weekly tech talks IL TechTalks (+ techtalk week) Reversim podcast and summit Internal/External Lectures Sunday School Learning Do you guys ever work??
  • 29. Two weeks dedicated to the needs of the technical teams Quality Time
  • 30. Thank You shai@outbrain.com Shai Peretz, SVP Platform Engineering & Operations And Yes, we are hiring…