How To Deal with An IT Crisis Efficiently and Successfully

The real action in the IT departments is experienced when an incident occurs, be it a service crash, a human error, unexpected processes or loads, or, increasingly, cyber threats.

Imagine an IT department to use, sure you know inside. Otherwise, a couple of chapters of ‘The IT Crowd’ can bring you the impressions that the most stereotyped teams of technology offer to the rest of the mortals of a corporation: a group of ‘geeks’ without social skills and with little workload … except when something fails.

And it is that, by bridging the gap between fiction and reality, the real action in IT departments is experienced when an incident occurs, be it a service crash, a human error, an unexpected process or load spike or, increasingly, because of cyber threats.

In these moments, all the springs of these professionals are activated and a race against the clock begins to return the normality to the activity of the company.

A moment of stress and a lot of pressure that can be managed effectively if you follow some basic guidelines, like these five compiled from real experiences of Information Technology Directors:

Table of Contents

1. Define the importance of an incident

When a problem causes a large commercial impact on several users, we can categorize it as an important incident. It would be solely and exclusively one that forces an organization to deviate from existing incident management processes. And, in general, high priority incidents are mistakenly perceived as important incidents, although their impact on business is not too high. This is probably due to the absence of clear guidelines in this regard, without accurately assessing the factors of urgency, impact and severity.

2. Exclusive workflows

The implementation of a clear workflow helps us quickly restore a service that has been interrupted in some way. To do this, having separate workflows for the most important incidents allows us to automate and simplify everything that is daily to allow us time, space and resources to solve the problem. A resolution that is based on a chain of processes that must be followed to the letter:

Identify the main incident
Communicate to stakeholders / impacted
Assign the right people
Follow the main incident throughout its life cycle
Intensify breach of SLA (service level agreements)
Face the resolution and close
Generate and analyze the final reports

Some organizations have an incident team dedicated exclusively to the most important phenomena, led by an incident manager, while others have a dynamic and ad hoc team with experts from various departments.

3. Configure strict SLAs and hierarchical scales

Of course, we must ensure that we have strict SLAs for important incidents, service level agreements that specify clear escalation points for any violation of the process. In addition, we have to follow a manual escalation process if the assigned technician lacks the experience to resolve the incident, all with a backup technician always prepared in any circumstance.

4. Information first

Throughout the life cycle of the main incidents, we must maintain the information to the different levels of the organization at all times: sending official communications, notifications and status updates to those interested and affected. Likewise, notices on the intranet or corporate social network will prevent chaos and social alarm, as well as generating assistance tickets in duplicate. It is also a good idea to have a telephone line dedicated to offering support to the different affected users.

5. Simplify and document

We have to simplify the work processes as much as possible before an important incident in the IT department. For this, we can articulate simple templates that capture critical details, such as the type of main incident to which the problem refers, similar cases, necessary resources and the different steps taken to resolve them. In that sense, we have to document and analyze all the important incidents so that we can identify areas for improvement in the future.

In addition, we can generate the following reports to help in making efficient decisions:

Number of important incidents planned and closed each month
Average resolution time for major incidents
Percentage of downtime caused by important incidents
Problems and changes related to important incidents
Major incidents are inevitable and each is a learning experience for your team.
Adhering to these practices could be your first step to master the art of handling important incidents.