|
The Optus outage in Australia from last year was immediately on my mind when on Friday afternoon a similar event swept, this time, across the world. Also, in this case it was a software update that caused the problem. This time from global security software provider CrowdStrike. The culprit appears to be an update to the CrowdStrike Falcon platform, a security monitoring tool widely deployed by businesses and organisations on Microsoft desktop computers and notebooks. As there is already plenty of information on the extent of the disruption and the cause of the chaos, I will concentrate on my analysis on its implications.
My analysis of the event has a lot of similarities with my reflections on the Optus outage last year, the incident underscores the critical issue of resilience in IT infrastructure, particularly in systems that lack diversity. The reliance on a single provider’s software, in this case, Microsoft’s Windows operating system coupled with CrowdStrike’s Falcon platform, has highlighted the risks of a monoculture approach. As in the Optus case, the cascading effects of a single point of failure have brought about significant operational disruptions, reflecting a need for more robust and diverse IT strategies.
Operationally, this event points to a possible bypass of continuous integration testing in the update process, which should serve as a cautionary tale for software providers about the importance of thorough testing. Tactically, the necessity for physical access to fix the issue—especially in cases where cloud-based systems require manual intervention—reveals a glaring vulnerability in current IT practices. Again, we saw this also in the case of the Optus outage, where modems had to be manually reset. Also some of the outcomes of the government review after the Optus event might be relevant to the CrowdStrike event.
Strategically, the incident has broader implications for the global economy. Business models, costings considerations and profit expectations are prioritised over what are now becoming national and international existential issues.
The extensive downtime and the overtime required to rectify the issues could impact productivity and economic output. It also raises questions about liability and the potential for consequential loss claims, as businesses grapple with the financial fallout of the disruption.
To prevent future failures, the IT industry must prioritise resilience and diversity. This means not only diversifying software and hardware providers but also implementing more robust testing and update procedures to ensure that a single point of failure does not bring down entire systems.
As businesses and governments work to recover from this significant outage, the incident serves as a stark reminder of the vulnerabilities inherent in modern ICT infrastructure and the urgent need for more resilient and diversified systems. The lessons learned from this event will be crucial in shaping the future of ICT resilience and security. While this was in the end a human error, it also shows, if we continue on the path of cyberwarfare, what we can expect what will happen more often. You don’t need bombs anymore to bring a country on its knees.
Sponsored byWhoisXML API
Sponsored byRadix
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byVerisign
Sponsored byDNIB.com
Sponsored byCSC