The document discusses high performance infrastructure for Server Density which includes 150 servers that have been running since June 2009 and migrated from MySQL to MongoDB. It stores 25TB of data per month. Key aspects of performance discussed are using fast networks like 10 Gigabit Ethernet on AWS, ensuring high memory, using SSDs over spinning disks for performance, and factors like replication lag based on location. The document also compares options like using cloud, dedicated servers, or colocation and discusses monitoring, backups, dealing with outages, and other operational aspects.
"Deploying MongoDB for the Win"
presented by Chris Biow, SVP US Public Sector, Basis Technology at MongoDB Evenings DC on October 13, 2015 at UberOffices Tysons.
"Deploying MongoDB for the Win"
presented by Chris Biow, SVP US Public Sector, Basis Technology at MongoDB Evenings DC on October 13, 2015 at UberOffices Tysons.
NoSQL databases are often touted for their performance and whilst it's true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment.
This talk considers all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It includes things like OS tweaks, disk benchmarks, replication, monitoring and backups.
Presented at NoSQL Roadshow Berlin 2013 by David Mytton.
Remote startup - building a company from everywhere in the worldServer Density
The likes of Automattic (Wordpress) and 37signals promote remote working as a way to hire the best talent, regardless of location. Programming can be done from anywhere and saving office costs is a great way to bootstrap. This talk looks at the advantages and disadvantages, tools and methodologies for building a remote company.
This talk was given by David Mytton at Digital Shoreditch London 2013.
The customer lifecycle - from visitor to customer. Techniques for driving traffic, trials, nurturing, conversion, success monitoring and handling churn.
Presented by David Mytton at Startup Camp Berlin 2015-03-13.
NoSQL databases are often touted for their performance and whilst it's true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment.
This talk considers all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It includes things like OS tweaks, disk benchmarks, replication, monitoring and backups.
Going from zero to Puppet by Pedro Pessoa, Operations Engineer at Server Density.
Abstract: Using out-of-the-box Puppet for non-sysadmin work - steps from going from no config management to managing 100 nodes and allowing non-sysadmin tasks to be performed.
Speaker Bio: Linux admin for 10+ years. Java/Python/C developer 12+ years. Ops engineer at http://www.serverdensity.com - a hosted server and website monitoring service. Currently processing 12TB+ per month into MongoDB running on dedicated and virtual instances.
www.serverdensity.com/puppetcamp/
Containers seem to have suddenly become the hot new thing everyone is talking about, but what are they?
Why are they important?
How should you use them and what does it mean for cloud infrastructure? This talk will examine the history, technical details and strategy around containerisation from the perspective of developers and operations, consider internal container OSs like Rocket and Ubuntu Core as well as management layers like Docker and Apache Mesos and take a look at why cloud providers are launching their own services around them.
Presented by David Mytton at Datacloud Monaco 2015-06-04
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
StartOps: Growing an ops team from 1 founderServer Density
Bootstrapped startups don't have the luxury of a full team of ops engineers available to respond to issues 24/7, so how can you survive on your own? This talk will tell the story of how to run your infrastructure as a single founder through to growing that into a team of on call engineers. It will include some interesting war stories as well as tips and suggestions for how to run ops at a startup.
Presented at DevOpsDays London 2013 by David Mytton.
Scaling humans - Ops teams and incident managementServer Density
100% uptime is impossible. Modern architectures are designed around failure but what does that mean for the human aspect of incident management? This talk considers how to prepare for outages, how to structure the response, and how those experiences and techniques differ for small and large companies.
Presented by David Mytton at dotScale Paris 2015-06-08
DevOps Incident Handling - Making friends not enemies.Server Density
David Mytton CEO of Server Density presented this talk to the DevOps Meetup in London. It takes you through how to handle DevOps incidents, outages and downtime -- and more specifically how to make friends, not enemies in the process.
Why Puppet? Why now? Can you get by without using any config management? You probably think don't have time, or that your project is too small. What can using Puppet really add? How can you justify investing time up front? Maybe you can just do it later?
Getting started with config management can often seem like a big project, especially if you only manage a few systems or have a small team. This talk will examine why you should use Puppet from the beginning. It will examine what you can do with Puppet that couldn't do otherwise, how much time it will save and why it's especially important if you think your project has even the smallest chance of scaling in the future.
Presented by David Mytton at Puppet Camp London 2015-04-13
Puppet can be used as a control panel to perform a wide variety of tasks within your infrastructure. It can be used to trigger failover between hot standby servers or entire data centres. It can be used to deploy package updates across large clusters. It can be used to deploy code to staging and then onto production, ensuring the right versions are present. It can be used to replicate production environments locally for all our engineers. And it can even be used for config management!
This talk will take a hands on technical look at how we use Puppet to achieve all these things. It will include code samples, hints and tips and explain how the flexibility of Puppet can be used to really control your entire infrastructure.
This talk was given by David Mytton at PuppetCamp 2013: Berlin, London and Munich.
Infrastructure choices - cloud vs colo vs bare metalServer Density
Everyone deploy to the cloud! Unlimited scaling and the best pricing! Or is it? This talk will examine different deployment strategies and how scaling differs between them. It will consider cloud environments, bare metal and building out your own equipment in a colo facility. And it will look at real examples of interesting approaches and war stories from the major infrastructure providers including AWS, Google Cloud and Softlayer.
Presented by David Mytton at CloudConf Turin 2015-03-19
@Server Density we organize our internal War Games were all engineers practice the processes involved in incident handling. We have seen how this improves the associated human factors, our processes and our tools.
Joined by Rick Nelson, Technical Solutions architect from NGINX Server Density take you though the do's and don'ts of monitoring NGINX. Critical and non critical metrics to monitor, important alerts to configure and the best monitoring tools available.
NoSQL databases are often touted for their performance and whilst it's true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment.
This talk considers all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It includes things like OS tweaks, disk benchmarks, replication, monitoring and backups.
Presented at NoSQL Roadshow Berlin 2013 by David Mytton.
Remote startup - building a company from everywhere in the worldServer Density
The likes of Automattic (Wordpress) and 37signals promote remote working as a way to hire the best talent, regardless of location. Programming can be done from anywhere and saving office costs is a great way to bootstrap. This talk looks at the advantages and disadvantages, tools and methodologies for building a remote company.
This talk was given by David Mytton at Digital Shoreditch London 2013.
The customer lifecycle - from visitor to customer. Techniques for driving traffic, trials, nurturing, conversion, success monitoring and handling churn.
Presented by David Mytton at Startup Camp Berlin 2015-03-13.
NoSQL databases are often touted for their performance and whilst it's true that they usually offer great performance out of the box, it still really depends on how you deploy your infrastructure. Dedicated vs cloud? In memory vs on disk? Spindal vs SSD? Replication lag. Multi data centre deployment.
This talk considers all the infrastructure requirements of a successful high performance infrastructure with hints and tips that can be applied to any NoSQL technology. It includes things like OS tweaks, disk benchmarks, replication, monitoring and backups.
Going from zero to Puppet by Pedro Pessoa, Operations Engineer at Server Density.
Abstract: Using out-of-the-box Puppet for non-sysadmin work - steps from going from no config management to managing 100 nodes and allowing non-sysadmin tasks to be performed.
Speaker Bio: Linux admin for 10+ years. Java/Python/C developer 12+ years. Ops engineer at http://www.serverdensity.com - a hosted server and website monitoring service. Currently processing 12TB+ per month into MongoDB running on dedicated and virtual instances.
www.serverdensity.com/puppetcamp/
Containers seem to have suddenly become the hot new thing everyone is talking about, but what are they?
Why are they important?
How should you use them and what does it mean for cloud infrastructure? This talk will examine the history, technical details and strategy around containerisation from the perspective of developers and operations, consider internal container OSs like Rocket and Ubuntu Core as well as management layers like Docker and Apache Mesos and take a look at why cloud providers are launching their own services around them.
Presented by David Mytton at Datacloud Monaco 2015-06-04
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
StartOps: Growing an ops team from 1 founderServer Density
Bootstrapped startups don't have the luxury of a full team of ops engineers available to respond to issues 24/7, so how can you survive on your own? This talk will tell the story of how to run your infrastructure as a single founder through to growing that into a team of on call engineers. It will include some interesting war stories as well as tips and suggestions for how to run ops at a startup.
Presented at DevOpsDays London 2013 by David Mytton.
Scaling humans - Ops teams and incident managementServer Density
100% uptime is impossible. Modern architectures are designed around failure but what does that mean for the human aspect of incident management? This talk considers how to prepare for outages, how to structure the response, and how those experiences and techniques differ for small and large companies.
Presented by David Mytton at dotScale Paris 2015-06-08
DevOps Incident Handling - Making friends not enemies.Server Density
David Mytton CEO of Server Density presented this talk to the DevOps Meetup in London. It takes you through how to handle DevOps incidents, outages and downtime -- and more specifically how to make friends, not enemies in the process.
Why Puppet? Why now? Can you get by without using any config management? You probably think don't have time, or that your project is too small. What can using Puppet really add? How can you justify investing time up front? Maybe you can just do it later?
Getting started with config management can often seem like a big project, especially if you only manage a few systems or have a small team. This talk will examine why you should use Puppet from the beginning. It will examine what you can do with Puppet that couldn't do otherwise, how much time it will save and why it's especially important if you think your project has even the smallest chance of scaling in the future.
Presented by David Mytton at Puppet Camp London 2015-04-13
Puppet can be used as a control panel to perform a wide variety of tasks within your infrastructure. It can be used to trigger failover between hot standby servers or entire data centres. It can be used to deploy package updates across large clusters. It can be used to deploy code to staging and then onto production, ensuring the right versions are present. It can be used to replicate production environments locally for all our engineers. And it can even be used for config management!
This talk will take a hands on technical look at how we use Puppet to achieve all these things. It will include code samples, hints and tips and explain how the flexibility of Puppet can be used to really control your entire infrastructure.
This talk was given by David Mytton at PuppetCamp 2013: Berlin, London and Munich.
Infrastructure choices - cloud vs colo vs bare metalServer Density
Everyone deploy to the cloud! Unlimited scaling and the best pricing! Or is it? This talk will examine different deployment strategies and how scaling differs between them. It will consider cloud environments, bare metal and building out your own equipment in a colo facility. And it will look at real examples of interesting approaches and war stories from the major infrastructure providers including AWS, Google Cloud and Softlayer.
Presented by David Mytton at CloudConf Turin 2015-03-19
@Server Density we organize our internal War Games were all engineers practice the processes involved in incident handling. We have seen how this improves the associated human factors, our processes and our tools.
Joined by Rick Nelson, Technical Solutions architect from NGINX Server Density take you though the do's and don'ts of monitoring NGINX. Critical and non critical metrics to monitor, important alerts to configure and the best monitoring tools available.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
Automate MongoDB with MongoDB Management ServiceMongoDB
MongoDB Management Service makes operations effortless, reducing complicated tasks to a single click. You can now provision machines, configure replica sets and sharded clusters, and upgrade your MongoDB deployment all through the MMS interface. We'll walk through demos of all the new MMS features, including provisioning, expanding and contracting a cluster, resizing the oplog, and managing users.
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
Session form series of conferences during Data Relay (formerly SQL Relay) 2018 in Newcastle, Leeds, Birmingham, Reading, Bristol. The session contains only slides form the talk (no videos included).
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2015.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
What is Nginx and Why You Should to Use it with Wordpress HostingWPSFO Meetup Group
Floyd Smith and the team from NGINX presented at the Wordpress San Francisco MeetUp group in June 2016. In this presentation, he illustrated how NGINX can vastly improve your Wordpress hosting performance.
A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
10. Performance
• Fast network
EC2 10 Gigabit Ethernet
- Cluster Compute
- High Memory Cluster
- Cluster GPU
- High I/O
- High Storage
- Network cards
- VLAN separation
11. Performance
• Fast network
Workload: Read/Write?
What is being stored?
Result set size
- Read / write: adds to replication oplog
- Images? Web pages? Tiny documents?
- What is being returned? Optimised to return certain fields?
12. Performance
• Fast network
Use
Network Throughput
Normal
0-100Mb/s
Replication (Initial Sync)
Burst +100Mb/s
Replication (Oplog)
0-100Mb/s
Backup
Initial Sync + Oplog
15. Performance
• Fast network
Location
Ping RTT Latency
Within USA
40-80ms
Trans-Atlantic
100ms
Trans-Pacific
150ms
Europe - Japan
300ms
Ping - low overhead
Important for replication
26. Eventual Consistency
Use Case
Needs consistency?
Graphs
No
User profile
Yes
Statistics
Depends
Alert config
Yes
Statistics - depends on when they’re updated
40. Performance
SSD vs Spinning
However, CPU usage for SSDs is higher. This may be a driver issue so worth testing your own
hardware. Tests done using Bonnie.
66. Tips: rand()
•Field names
•Covered indexes
•Collections / databases
- Dropping collections faster than remove()
- Split use cases across databases to avoid locking
- Put databases onto different disks / types e.g. SSDs
72. david@asriel ~: scp david@stelmaria:~/local/local.11 .
local.11
100% 2047MB
6.8MB/s
05:01
Restore time
- Needed to resync a database server across the US
- Take too long; oplog not large enough
- Fast internal network but slow internet
79. Monitoring tools
Run yourself
Ganglia
So Server Density is the tool my company produces but if you don’t like it, want to run your
own tools locally or just want to try some others, then that’s fine.
81. Dealing with humans
On-call
-
Sharing out the responsibility
Determining level of response: 24/7 real monitoring or first responder
24/7 real monitoring for HA environments, real people at a screen at all times
First responder: people at the end of a phone
82. Dealing with humans
On-call
1) Ops engineer
- During working hours our dedicated ops engineers take the first level
- Avoids interrupting product engineers for initial fire fighting
83. Dealing with humans
On-call
1) Ops engineer
2) All engineers
- Out of hours we rotate every engineer, product and ops
- Rotation every 7 days on a Tuesday
84. Dealing with humans
On-call
1) Ops engineer
2) All engineers
3) Ops engineer
- Always have a secondary
- This is always an ops engineer
- Thinking is if the issue needs to be escalated then it’s likely a bigger problem that needs
additional systems expertise
85. Dealing with humans
On-call
1) Ops engineer
2) All engineers
3) Ops engineer
4) Others
- Support from design / frontend engineering
- Have to press a button to get them involved
88. Dealing with humans
Uptime reporting
- Weekly internal report on G+
- Gives visibility to entire company about any incidents
- Allows us to discuss incidents to get to that 100% uptime
89. Dealing with humans
Social issues
-
How quickly can you get to a computer?
Are they out drinking on a Friday?
What happens if someone is ill?
What if there’s a sudden emergency: accident? family emergency?
Do they have enough phone battery?
Can you hear the ringtone?
90. Dealing with humans
Backup responder
-
Backup responder
Time out the initial responder
Escalate difficult problems
Essentially human redundancy: phone provider, geographic area, internet connectivity
91. Dealing with outages
Expected
- Outages are going to happen, especially at the beginning
- Costs money for redundancy
- How you deal with them
92. Dealing with outages
Communication
Externally
- Telling people what is happening
- Frequently
- Dependent on audience - we can go into more detail because our customers are techies
- Github do a good job of providing incident writeups but don’t provide a good idea of what
is happening right now
- Generally Amazon and Heroku are good and go into more detail
93. Dealing with outages
Communication
Internally
- Open Skype conferences between the responders
- Usually mostly silence or the sound of the keyboard, but simulates being in the situation
room
- Faster than typing
94. Dealing with outages
Really test your vendors
-
Shows up flaws in vendor support processes
Frustrating when waiting on someone else
You want as much information as possible
Major outage? Everyone will be calling them
95. Dealing with outages
Simulations
- Try and avoid unncessary problems
- Do servers come back up from boot?
- Can hot spares handle the load?
- Test failover: databases, HA firewalls
- Regularly reboot servers
- Wargames can happen at another stage: startups are usually too focused on building things
first