As an IT leader, it’s likely you’ve experienced or at least observed firsthand the operational devastation caused by some type of system or software failure. Because we all know it’s not “if something will go wrong,” but more likely “when something will go wrong.” That looming possibility may have driven you to practice meditation, pop antacids, or adopt some other coping mechanism.
Surprisingly, many IT departments don’t have an organized on-call application support team and a structured process to deal with application emergencies – like disaster recovery – beyond off-site data backup. They really haven’t thought about on-call application support as a separate business unit with its own end-to-end process, so they haven’t covered all the bases, for example:
- What are the procedures for communicating internally with the rest of your organization—executives, departments, and broader tech team?
- What are the procedures for notifying customers?
- How fast can compromised applications get up and running?
- What happens when people leave for greener pastures or retire?
- Are application support procedures documented in detail to ensure knowledge transfer?
- How are support responsibilities impacting developer productivity?
3 Real World Examples
Without the right people and processes, an on-call application support plan can go south:
Most software development groups supporting existing software have one or two key experts. Some support teams consist of a few retired, part-timers. While they have the knowledge required for the task, this arrangement raises questions about their availability, dedication, and future. Is there detailed documentation for future teams? Who’s minding your on-call support?
An e-commerce organization with a formidable eight-person on-call and maintenance support team admits that their“plan” is reactionary, not proactive and that they’re inefficient. They’ve fallen into the trap of overstaffing to compensate for insufficient procedures. Are your support processes proactive or reactive?
A retail supplier that sends invoices nightly via its 30-year old software. When the system went down for just one night, it robbed the company of the ability to collect millions of dollars in revenue, which upended cash flow. How does your support effort uncover and remediate emerging issues before they impair operations?
Use our “cheat sheet” to grade yourself or validate your Tier 3 on-call application support preparedness. At the very least, take a look at your mission critical enterprise applications and your current Tier 3 on-call application support activities and compare your activities to our lists in:
Application Monitoring (a complete list of these activities is available here)
Issue Remediation (a complete list of these activities is available here)
Disaster Recovery (a complete list of these activities are available here)