The Importance of Downtime Management
Recently a major service we use (not naming names) went down for almost the entire day. We get that downtime can happen unexpectedly. However, this didn’t stop the fact that we had no access to several core functions our team use daily. Thankfully all issues were resolved within a day.
This does highlight why having resilient hosting, with enough redundancy is important though.
A Good Example of Things Out of Your Control
Here is part of the (modified to keep names private) email we received explaining the issue mentioned earlier:
The application and website are securely hosted on AWS, which normally provides an exemplary level of service.
Unfortunately, AWS incorrectly believed there was an administrative issue with our account that temporarily suspended all access.
As AWS only provides ticket-based support with minimal escalation options this took longer than expected to resolve.
Will this happen again?
Access has been reinstated, and we’re in the process of rectifying with AWS their information.
Additionally, we now have a dedicated point of contact in place to ensure any issues in the future can be resolved more promptly.
We do not expect this particular issue to happen again.
We know that access to your processes and tasks is critical to how you manage your business.
Why Mention This?
You may be wondering, why mention any of this?
We’re not looking to name & shame anyone. We of all people know that things can go wrong unexpectedly.
Instead, we thought it would be good to highlight the importance of making sure the service providers you use have the appropriate measures in place for when things do go wrong. Here are a couple of key things we think you should have when downtime occurs:
Ideally, you want clear & prompt communication with your service provider. Often a lot of the frustration in these situations can come from the lack of information about why you can’t access the services you need – especially if the service in question impacts your customers.
Some transparency, in a timely manner, goes a long way in our experience.
Estimates and Updates
From our own experiences, we’ve found that providing regular updates about how the solution to the issues is coming along, helps a lot with managing frustrations and the feeling of waiting around.
If possible, getting estimated times when a fix might be in place can do wonders. This means you can let your customers know when they can expect for normal services to resume.
However, from first-hand experience, these estimates can often change as work is done, so take them with a pinch of salt. Sometimes fixes are quicker than expected and sometimes things are more complicated than first thought.
What We Try to Do at HA Hosting
As much as we’d like to avoid it, downtime does happen to us sometimes. When it does, this is what we like to do:
- Updates the Status Page, addressing the downtime.
- Regular updates to the status page, giving progress reports on any fixes and what the actual issues were in the first place
- When possible, try to give rough timeframes of how long a fix might take
- Update timeframes as and when we know it might take longer
The reason we like to make use of the Status Page is that it gives all our customers one place they know they can go to. This means we can update it quickly, and get on with fixing the issues at hand.