Cybersecurity firm CrowdStrike said on Wednesday that a major outage last weekend that crashed millions of Windows devices was caused by a problem with its verification system.
“On Friday, July 19, 2024 at 4:09 a.m. UTC, as part of normal business operations, CrowdStrike released a content configuration update for Windows sensors to collect telemetry on possible emerging threat techniques,” the company said in a preliminary post-incident review (PIR).
“These updates are a normal part of the Falcon platform’s dynamic protection mechanism. A problematic Rapid Response Content configuration update caused the Windows system to crash.”
This incident affected Windows hosts running sensor version 7.11 and later that came online and received the update between 04:09 UTC and 05:27 UTC on July 19, 2024. Apple macOS and Linux systems were not affected.
CrowdStrike says it delivers security content configuration updates in two ways: via Sensor Content, which ships with the Falcon Sensor, and via Rapid Response Content, which can flag new threats using a variety of behavioral pattern matching techniques.
The crash is said to be the result of a Rapid Response Content update that contains a previously undetected error. Notably, such updates are delivered in the form of template instances that correspond to specific behaviors and are mapped to specific template types to enable new telemetry and detection.
Template instances are created using a content configuration system, then deployed to sensors via the cloud through a mechanism called channel files, and finally written to disk on Windows machines. The system also includes a content validation component that performs validation checks on content before it is published.
“Rapid Response Content provides visibility and detection capabilities on sensors without requiring changes to the sensor code,” the company explains.
“This capability is used by threat detection engineers to collect telemetry, identify indicators of adversary behavior, and perform detection and prevention. Rapid Response Content is a behavioral heuristic that is separate from CrowdStrike’s on-sensor AI prevention and detection capabilities.”
These updates are parsed by the Falcon sensor’s content interpreter and help the sensor detection engine detect or prevent malicious activity.
While each new template type is stress tested on various parameters such as resource utilization and performance impact, CrowdStrike said the root cause of the issue can be traced back to the February 28, 2024 rollout of the Inter-Process Communication (IPC) template type, which was introduced to flag attacks on named pipes.
The timeline of events is as follows:
- February 28, 2024 – CrowdStrike releases sensor 7.11 to customers with new IPC template types
- March 5, 2024 – IPC template types have been stress tested and validated for use
- March 5, 2024 – The IPC template instance is released to production via channel file 291.
- April 8-24, 2024 – Three more IPC template instances have been deployed into production
- July 19, 2024 – Two additional IPC template instances were deployed, one of which passed validation even though it had an issue with its content data.
“Based on the testing performed prior to the initial deployment of the template type (March 5, 2024), the reliability of the checks performed by the content validator, and the successful deployment of previous IPC template instances, these instances were deployed into production,” CrowdStrike said.
“When received by the sensor and loaded into the content interpreter, the problematic content in channel file 291 caused an out-of-bounds memory read, triggering an exception. This unexpected exception could not be handled gracefully and caused the Windows operating system to crash (BSoD).”
Responding to the widespread disruption caused by the incident, the Texas-based company said it has improved its testing processes and strengthened error handling mechanisms in its content interpreter to prevent a recurrence, and also plans to implement a phased rollout strategy for Rapid Response Content.