The CrowdStrike aftermath: Observations and lessons learned

Click here to visit Original posting

CrowdStrike, a global leader in endpoint security, incident response and cybersecurity, recently deployed an update to its Falcon sensor for Microsoft Windows systems. This update, designed to enhance the detection of novel threats, inadvertently caused significant malfunctions in the Windows operating system, leading to widespread crashes and system instability.

Notably, Mac and Linux operating systems were unaffected by this issue.

What happened?

Despite concerns, it’s important to clarify that this incident was not the result of a hack, security breach, or malicious attack. Here are three key factors that led to the CrowdStrike chaos:

Faulty internal update: The problem stemmed from an internal update error rather than external tampering.

Elevated privileges: As security software, CrowdStrike Falcon has high privileges and integrates with the Microsoft Windows kernel.

Global impact: The impact was particularly severe because CrowdStrike’s software is deeply integrated into critical infrastructure across large corporations and government agencies.

This integration, while essential for detecting and neutralizing high-level threats, also meant that when the faulty update was rolled out, it led to immediate and widespread disruptions.

The impact

CrowdStrike is widely used among enterprises and state, local and federal government agencies, so the scale of the disruption was enormous. Delta Airlines, for instance, has engaged high-profile attorney David Boise as they face potential losses exceeding $300 million due to the incident. While many other organizations of similar size recovered within hours, Delta experienced prolonged operational disruptions lasting several days, sparking industry debate over whether the fault lay with CrowdStrike’s update or Delta’s recovery plan and preparedness.

This incident triggered what may be the largest technology outage on record to date, caused by a misconfiguration or bug, with estimated damages reaching into the billions — and that figure continues to climb. The fallout was massive, thousands of flights delayed or cancelled, halting reservation systems worldwide, and causing a cascade of global disruptions. At least 8.5 million computers were affected, leading to unprecedented operational chaos

It is indeed ironic that CrowdStrike, a company renowned for its expertise in incident response, found itself at the center of such a significant episode. This event underscores the complexities and challenges even the most well-regarded firms can face, as well as the recovery plans and response preparedness.

CrowdStrike’s response

In the face of this unprecedented incident, CrowdStrike responded with prompt and decisive action. The company swiftly deployed a fix to address the issue and subsequently released a statement outlining a series of commitments aimed at preventing a recurrence. While the list of actions was thorough and comprehensive, much of it aligned with existing industry-standard practices. However, CrowdStrike notably pledged to revise its update deployment processes, a critical change expected to enhance the reliability and safety of future updates.

Observations and lessons learned

The CrowdStrike outage serves as a reminder for all size organizations to review their processes and ensure steps are in place to help mitigate the impact of future incidents. Not only having a plan, but have it tested for functionality.

Among steps for action that organizations need to have set are:

1. Ensure Robust Backup and Disaster Recovery Plans: Seems simple, but it's crucial to have well-defined backup, business continuity, and disaster recovery plans in place. Equally important is the regular testing of these plans through actual walkthroughs to ensure they function effectively when needed.

2. Be Cautious with Privileged Software: Any software with privileged access to your systems can potentially cause significant disruptions. While this incident was not a security breach, it serves as a stark reminder that even security tools can introduce vulnerabilities. Security tools, like any other software, can be a source of breaches or downtime, as demonstrated by this incident with CrowdStrike.

3. Maintain Heightened Vigilance During Outages: Large-scale outages create an attractive opportunity for attackers. Amid the noise and disruption, malicious actors can easily slip in undetected and steal data. It is essential to maintain heightened security awareness during such events to prevent opportunistic exploitation.

4. Avoid Knee-Jerk Reactions: While the instinct may be to switch vendors after an incident like this, it’s important to proceed with caution. Quick, unplanned changes can lead to even bigger problems. Any transition to a new vendor should be approached as a phased project, not an overnight swap. This is especially critical for organizations handling sensitive data, such as those involved in national security.

In conclusion, the CrowdStrike incident highlights the importance of robust systems, cautious planning, and the readiness to respond to even the most unexpected challenges.

This has become a reminder that in the realm of cybersecurity, even the leaders in the field are not immune to significant disruptions, nor are they immune from causing them – but being ready for when these may happen may be the difference between swift resolution and loss of business.

We've featured the best IT Infrastructure management service.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro