Bryan Hamman: The next outage is coming – is your organisation prepared?

By: 

Bryan Hamman

Over the past year and a half, a series of major global IT disruptions – from a routine software update gone wrong, to widespread DNS failures and configuration errors that brought down X, Zoom, Spotify, Canva and ChatGPT, amongst others – has demonstrated a simple truth.

No organisation is immune to these challenges, not even the world’s largest technology providers.

For African businesses, the impact of such incidents can be even more pronounced. Many organisations across the continent rely heavily on cloud services, operate in hybrid environments, or already contend with unstable electricity supply and bandwidth constraints. When a global outage hits, the ripple effect can be immediate and severe.

The lesson is clear, IT disruptions are unavoidable, but the real question is: How prepared is your organisation to detect, diagnose and recover from them?

Four steps to prepare for an IT outage in your network

In the wake of a major network outage, enterprises should pause, take stock of the business impact and evaluate their own networks to determine how they can prevent, avoid or rapidly respond to a similar situation.

While outages in global service provider environments are inevitable, what companies can do proactively is to strengthen their own resilience, response and recovery capabilities.

Here are four steps every enterprise IT and network operations team can prioritise to prepare for the next outage:

Step 1: Implement true observability – not just monitoring

Monitoring may tell you what is broken, but observability helps you understand why and where. Many African organisations rely on fragmented toolsets – a log here, an alert there – resulting in slow root-cause identification during crises. Why is it a drawn-out process? Because they’re missing context.

Context often comes from deep packet inspection (DPI). DPI-based observability reveals the actual traffic flows across the infrastructure, showing the interactions between applications, services and networks in real time.

For instance, when DNS or an update fails, DPI can help pinpoint whether it’s a local configuration issue, a third-party dependency or a network path problem. 

DPI can help to reduce the mean time to knowledge (MTTK) on why the problem exists, as well as lowering the overall mean time to restore (MTTR) services in the environment.

Step 2: Establish incident readiness processes

Incident response takes preparation and strategy, and having the proper tools is only one part of this. Clear processes need to be outlined, escalation paths defined and cross-functional teams aligned before organisations can effectively deal with outages.

Similarly, it’s also essential to establish maintenance, upgrade and application update procedures. 

Steps to avoid potential issues, such as last year’s software update outage, might include:

· Testing updates in controlled environments;

· Establishing go/no-go decision criteria for the update;

· Defining clear escalation paths;

· Outlining rapid root-cause investigation steps; and 

· Developing a communications plan for stakeholders and executives, should it be required.

Although it is impossible to avoid every potential outage, measures can be put in place to ensure that the corporate and IT response is swift and confident when it hits.

Step 3: Understand what you can and can’t control

Every IT environment is a complex tangle of dependencies, some of which the business controls and some of which it doesn’t, particularly those provided by strategic technology partners. 

This is true with software-as-a-service (SaaS) platforms, DNS providers, content delivery networks (CDNs), cloud services and internal microservices, to name a few.

Regional director for Africa at NETSCOUT, Bryan Hamman.

These systems are all outside the direct control of IT, should an outage occur, yet they are critical for banking, telecoms, government services and e-commerce across the continent.

Enterprise-wide visibility is a powerful control that provides essential information about your user community, network and applications. Modern observability platforms are available to track not just the corporate environment, but also key third-party dependencies.

Being aware of the services your users rely on and how those services are connected, gives organisations an edge when time is of the essence.

Step 4: Build collaboration across teams and vendors

In a major outage, silos slow everything down. NetOps, SecOps, CloudOps and application teams must collaborate in real time to avoid losing valuable minutes on finger-pointing.

This requires shared data, a common language and tools that bridge visibility gaps across different domains.

It is equally important to build strong, collaborative vendor relationships before the storm hits. Know who to call, which service level agreements (SLAs) apply and how your vendors will support you under fire.

An outage is not the time to figure out who needs to take responsibility but is instead the time for action. 

DPI-backed observability provides the shared evidence needed for swift collaboration and faster service restoration.

Are you ready to respond?

Disruptions don’t wait for IT teams to be ready; they can stem from the most routine operations. What matters is your ability to detect, respond and recover fast.

Enterprise environments are complex, and many factors are outside the control of corporate IT organisations.

But with DPI-driven observability, well-practised incident processes, clear visibility across external dependencies and coordinated collaboration, organisations have the power to control their readiness.

So, are you ready to respond to the next IT disruption?

For more information on NETSCOUT’s observability solutions, visit here.

The author is NETSCOUT’s regional director for Africa

Hot this week

Blessing Lungaho talks fame and misogyny in Showmax’s Adam to Eve

Over the years, Blessing Lungaho has emerged as one...

Vertiv debuts PowerDirect 7100 energy system for network growth in EMEA

Vertiv has introduced the Vertiv PowerDirect 7100 Energy, a hybrid-ready...

MultiChoice Kenya reveals enhanced festive season content with new decoder prices

As the festive season kicks off, MultiChoice is giving...

Four tips to protect your payroll from cyberattacks

Cyberattacks on payroll and HR systems can expose personal...

Canninah Dladla: Data centres – at the core of today’s healthcare

Benjamin Franklin famously said: “An investment in knowledge always...
spot_imgspot_imgspot_img

Related Articles

Popular Categories

spot_imgspot_imgspot_imgspot_img