SlideShare a Scribd company logo
Welcome to the Website Emergency Room: Find
and Pinpoint Problems When Everything Falls
Apart
Julia Kulla-Mader
Jackson River
@juliakm
It’s 6am. Your website is down. People care.
You are the webmaster and it is your job to
get the site back up.
The usual solution (restarting the server)
isn’t working.
You need a way to diagnose and
troubleshoot difficult website problems.
Look for clues to avert a website crisis.
1. Check for obvious causes like a service outage
Is it down for everyone?
Is the ISP down?
Is DNS down?
Did someone do something by
accident? (User error)
Are all databases online? (Example of
deleting a DB)
Is Apache on?
Is the site “offline”?
Can you reproduce the issue?
vs
Browser?
Operating system?
Mobile?
2. Check with the beat cops (tools)
Are there any server red flags?
High memory usage
Full disk
Disk Space
Free
Used
High CPU
Are all of your security patches
up-to-date?
Open Source Code (Drupal/Wordpress)
What does Google say?
Google Webmaster tools says the site is
hacked
10 percent drop or increase in traffic
Odd search queries – View the site as a bot
3. Interview everyone connected to the event
How do staff web
administrators use your site?
Maintain a description of common
operations on the site. Then walk through
the process.
Did you change anything recently?
How to website visitors outside of
your organization use the site?
Look at the most popular parts of your site
Neighbors: How do the other sites
sharing your IP address use the site?
Reverse IP lookup
Nc tech4 good_presentation_2014_up
Nc tech4 good_presentation_2014_up
Nc tech4 good_presentation_2014_up

More Related Content

Similar to Nc tech4 good_presentation_2014_up

Kickass Web Management Tools
Kickass Web Management ToolsKickass Web Management Tools
Kickass Web Management Tools
Dave Poortvliet
 
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
WordCamp Sydney
 
Top-10-Java-Performance-Problems.pdf
Top-10-Java-Performance-Problems.pdfTop-10-Java-Performance-Problems.pdf
Top-10-Java-Performance-Problems.pdf
KiranChinnagangannag
 
HowTo DR
HowTo DRHowTo DR
How to fix access “cannot open database” error
How to fix access “cannot open database” errorHow to fix access “cannot open database” error
How to fix access “cannot open database” error
taylor8806
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
Daniel Kanchev
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
Charity Majors
 
A Perfect Launch, Every Time
A Perfect Launch, Every TimeA Perfect Launch, Every Time
A Perfect Launch, Every Time
Pantheon
 
Netcetera Proactive Management Service
Netcetera Proactive Management ServiceNetcetera Proactive Management Service
Netcetera Proactive Management Service
Peter Skelton
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)
ClubHack
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013Server Density
 
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
gdusbabek
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious Future
Redis Labs
 
Working With People Adl Uni
Working With People Adl UniWorking With People Adl Uni
Working With People Adl Uni
Matthew Landauer
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
Charity Majors
 
No Microservice is an Island
No Microservice is an IslandNo Microservice is an Island
No Microservice is an Island
Michele Titolo
 
How to manage and monitor large sql server estates
How to manage and monitor large sql server estatesHow to manage and monitor large sql server estates
How to manage and monitor large sql server estates
Red Gate Software
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedSalman Ahmed
 

Similar to Nc tech4 good_presentation_2014_up (20)

Speed
SpeedSpeed
Speed
 
Kickass Web Management Tools
Kickass Web Management ToolsKickass Web Management Tools
Kickass Web Management Tools
 
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
Preparing For The Flood. How Do You Conduct Load Testing To Ready Your WordPr...
 
Top-10-Java-Performance-Problems.pdf
Top-10-Java-Performance-Problems.pdfTop-10-Java-Performance-Problems.pdf
Top-10-Java-Performance-Problems.pdf
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
How to fix access “cannot open database” error
How to fix access “cannot open database” errorHow to fix access “cannot open database” error
How to fix access “cannot open database” error
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
A Perfect Launch, Every Time
A Perfect Launch, Every TimeA Perfect Launch, Every Time
A Perfect Launch, Every Time
 
Netcetera Proactive Management Service
Netcetera Proactive Management ServiceNetcetera Proactive Management Service
Netcetera Proactive Management Service
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious Future
 
Working With People Adl Uni
Working With People Adl UniWorking With People Adl Uni
Working With People Adl Uni
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
No Microservice is an Island
No Microservice is an IslandNo Microservice is an Island
No Microservice is an Island
 
How to manage and monitor large sql server estates
How to manage and monitor large sql server estatesHow to manage and monitor large sql server estates
How to manage and monitor large sql server estates
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmed
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Nc tech4 good_presentation_2014_up

Editor's Notes

  1. The scenarios described today have been slightly changed to protect the innocent. Let’s imagine that you are the IT manager for a small nonprofit. You woke up to a flood of text messages from Pingdom alerting you that your site is down and has been since about 2 am. Even worse, the organization’s annual conference kicks off in a few hours.
  2. While it would be great to be able to call someone else to fix this, it is clearly your responsibility. Part of your job is making sure that the website stays up.
  3. You have some tech experience. You’ve tried to log into the server and couldn’t. So, you restarted the server. The site came back up for a couple of minutes and then once again crashed. You tried restarting two more times with the same result – a couple of minutes of uptime and then a crash.
  4. Restarting isn’t working. Your executive director will be coming online any minute. How can you troubleshoot the problem? Photo Credit: http://www.flickr.com/photos/24480842@N03/3040615449/">pablo lizardo
  5. The goal today is to step back and look for clues like a good detective. This is often incredibly hard to do when it feels like everything is falling apart. But for today, pretend that you are Sherlock Holmes or Olivia Benson.
  6. In any crime show, they always start with the most obvious cause first. The first suspect almost always ends up being ruled out, but we still need to start with that person. I once worked for an organization where the website suddenly went down and no one knew why. It turned out that the server was in someone’s basement and their cat had sat on the off switch. As you gather clues you will begin to deduce the root cause of the issue, by developing a working theory, or theories.
  7. Is it down for everyone or just me? (Website example) The best case scenario is that your website is just down for you. For example, since you are at a conference, perhaps there is something off with the wireless network. A good way to check whether this is true is to visit downforeveryoneorjustme. On this site, you can enter a website to check if it is down. Sometimes a site can be down just for your local network or for your router.
  8. Another, less positive option, is that your Internet service provider is down. Almost always, ISPs will have a website (off of their servers) that you can check to see if the site is down. Or, Twitter is also always a good option.
  9. Another, less likely, option is that your Domain Host is down while your ISP is up. This has happened with Network Solutions and AT&T in the past few years. If this happens, you don’t have a lot of options. Your site will be accessible via IP but the actual domain will not be accessible. Source: http://media.infospesial.net/image/p/2013/08/tips-jika-dns-google-down_a9729.jpg
  10. Another unique possibility is that someone did something by accident. For example, we recently had a problem on a client site at work where the client accidentally managed to disable the Drupal panel powering the homepage, which led to a blank page.
  11. Sometimes your databases will go down. Sometimes they will get accidentally deleted. You can easily check and see if the database is up via phpMiyAdmin or another tool.
  12. Another easy to spot issue is that Apache, or Nginx or another web server powering your site, is just offline. I’ve experienced this a couple of times. A server will have been automatically restarted and for one reason or another, Apache will simply have never been turned on.
  13. Another possibility is that your site is in the equivalent of maintenance mode. For example, in Drupal, you can take your site offline so that users can’t enter content. You need to remember to bring it back afterwards, which is sometimes easy to forget in the heat of the moment. When I was IT manager, we once had site visitors calling in because they couldn’t see anything. We could see everything fine because we were logged in. Little did we know that the site was in maintenance mode.
  14. Provided a real problem exists. The next important thing to do is to try and reproduce.
  15. Different browsers will behave in different ways. Okay, that was polite. The real thing is that a lot of the time IE will not display in standard ways. When you are checking for browsers, check IE from the version your site supports. Then, check the other browsers.
  16. Is this issue only for Linux? Android? You need to identify what type of computer or phone the person is using. The next step is often to figure out how many people who visit your site use this type of device or operating system.
  17. If this is a mobile issue, it’s good to identify quickly. As before, one of the first things you should do is check how many people are using this device. For example, we had a client where someone called in reporting a major problem with the donation form. It was for a very old version of Blackberry.
  18. We’ve checked for obvious causes, now it’s time to check with all the tools that track information about your site.
  19. There are a number of ways that a server can tell you it is unhappy. You do not necessarily have to be a systems administrator to read these tea leaves. You can just know when something is awry.
  20. What your server is used for will determine how much memory and storage you need. Memory determines how fast you can retrieve information on your server. So, how fast Apache can return a page or how many applications (or for example websites) you can run on one server at the same time. When memory is running out on your server, everything will slow down and applications like PHP powering your site may start to crash. Anyone who has every had a computer with subpar memory knows what it feels like when your machine runs out memory – it slows to a crawl.
  21. Disk space or “storage” is the size of your hard drive. You want to avoid having your server fill to capacity without any warning. You can mitigate this with per-user storage limits. For example, everyone except for the super admin may be limited to 1 GB. When your server is full, maintenance services like database backups will often stop running. This is an early sign. Later on, your whole server may crash again and again, leading to hard drive corruption. This is something you want to avoid.
  22. High CPU typically means that you have a service running at 100 percent processing capacity. Depending on how many cores your server has, this could be 100 percent of the processor power of your machine or less. For example, if you have a dual core machine, it will have 50 percent of processing capacity in use. High CPU use – over 50 percent – can point to a performance problem. For example, high httpd use can crash Apache. CPU typically also corresponds to power use. More CPUs cost more money. TODO: Verify above with Ben/Matt.
  23. Another scenario is that your code is running fine until some time of security vulnerability is exposed. You need to make sure that everything is up to date on your server.
  24. If you are using a CMS, you also need to make sure that your core code and contributed plugins and modules are up-to-date. The benefit and drawback of Open Source is that you have thousands of people using the same code as you. This is great because problems are found and fixed by people who don’t necessarily work for you. It can be problematic because hackers can find security holes and exploit them. To really take advantage of Open Source, you need to be frequently updating your code.
  25. Google offers a number of great resources for debugging.
  26. Anyone who has had Google Webmaster tools identify their site has hacked likely remembers the experience. You start getting emails from Google that indicate your site will be removed from their search engine and all of your visitors see the scary message “This site may be hacked”. This site may be hacked: This warning means that Google detected some suspicious links or pages in your site that are not malware related in a way that would infect your users, but they still should not be there. Visiting this site may harm your computer: You’ll see this message if Google believes that a site can download malicious software onto your computer
  27. If you see a big increase or decrease in traffic, particularly to one page, it may indicate a problem. It could be that one page is inaccessible or that spammers are targeting one page in particular. We had a huge increase in traffic and a crashing site at a previous job and it was because there were links to “Justin Timberlake naked” everywhere that were hidden
  28. In the Timberlake example, we could have also seen that there was a lot of incoming traffic for “Justin Timberlake naked” as opposed to sustainability, the focus of the site
  29. Like a good detective, the next step is to talk to people if we haven’t solved the problem yet. Put on your detective hat and let’s get started.
  30. First, we want to look at how anyone making changes to the site interacts with the website.
  31. The key thing to know in advance is how administrators work with your site. Do they typically make lots of changes each day. What content are they editing?
  32. Once we’ve gone over basic options. Next we want to know what each person on the team as done recently. Example: base64 encoding issue. We ran into a problem recently where someone was dragging and dropping images in Firefox, which was triggering a problem with the SpamSpan module
  33. Once we’ve made sure that no one internal is the cause of the problem, it’s time to look outside at who is visiting your site and why Source: https://www.digitalgov.gov/files/2014/03/Kidsgov-Usability-test-IMG_9987a-600-x-400.jpg
  34. Since a typical site has thousands of visitors, interviewing each one may be challenging. Let’s try looking at the most popular pages as reported by Google Analytics first. Here in GA we can see that these are the most popular parts of the site. It looks like there’s a lot of incoming traffic to this page in particular. If we retrace our steps, it could be that Huffington Post has made this blog post their headline story. That explains the increase.
  35. If we’ve eliminated everything else, it may be time to look at even more external factors. For example, I once had an Oscar predicting blog. Each Oscars, the site would eventually crash due to traffic, taking down every other site hosted on the same server.
  36. You can do a reverse IP lookup to see what other websites may share your IP address.
  37. You’ll remember that we started out today by looking at what to do when your site is down, people care, and the easy solutions like restarting your server aren’t working.
  38. You need a way to diagnose and troubleshoot difficult website problems.
  39. Today I hope we’ve shown you that you can look for clues to systematically avoid a website crisis. In particular, you can start out by looking for obvious causes, then check with the beat cops (tools like server red flags and google), and then talk to everyone connected to the event (people internally and externally)