- cross-posted to:
- technology@lemmy.zip
- cross-posted to:
- technology@lemmy.zip
No validation, in the driver or the updater software.
No validation or automated testing on publish.
No staged rollouts.
Just utterly irresponsible all around.
No staged rollouts.
I read somewhere that CS does allow for staged rollouts but some updates deliberately ignore them.
A coworker of mine has worked with CrowdStrike in the past; I haven’t. He said that the releases he was familiar with from them in the past were all staged into groups and customers were encouraged to test internally before applying them; not sure if this is a different product or what, but it seems like a big step backwards of what he’s saying is right.
I first dealt with them at least 10+ years ago and at the time they had no ability to do staged roll outs or targeted roll outs. We got updates when they said we did, no choice or control. We had to resort to updating our firewall to restrict the download endpoint and only open it in groups to do a phased update.
Interesting! Sounds like they may have changed things a few times, or maybe my co-worker’s memory has some gaps.
When I worked there six years ago, the company motto was “two feet on the gas pedal” because the CEO was a race car driver. I bailed after 10 months, giving up pre IPO shares. The management for my team was non existent, and I was on the build and release team. People were doing releases of manually. They’ve improved the automation some from what I here, but looks like the motto finally hit them.
I should also say their metrics were absolutely staggering. The log aggregator was doing something like 2 trillion requests a week. All backed by splunk. I never heard what they were paying, but it must have been fucking nuts.
Preach it
This is the best summary I could come up with:
CrowdStrike’s faulty update caused a worldwide tech disaster that affected 8.5 million Windows devices on Friday, according to Microsoft.
Microsoft says that’s “less than one percent of all Windows machines,” but it was enough to create problems for retailers, banks, airlines, and many other industries, as well as everyone who relies on them.
Separately, the technical breakdown from CrowdStrike released Friday explains more about what happened and why so many systems were affected all at once.
CrowdStrike’s breakdown explains the configuration file that was at the heart of the issue:
CrowdStrike explained that the file is not a kernel driver but is responsible for “how Falcon evaluates named pipe1 execution on Windows systems.” Security researcher and Objective See founder Patrick Wardle says that the explanation aligns with the earlier analysis he and others provided about the cause of the crash, as the problem file “C-00000291- “triggered a logic error that resulted in an OS crash” (via CSAgent.sys).”
CrowdStrike’s channel file updates were pushed to computers regardless of any settings meant to prevent such automatic updates, Wardle noted.
The original article contains 193 words, the summary contains 175 words. Saved 9%. I’m a bot and I’m open source!
How many windows updates have bricked PC’s over the years?
As if the borked update wasn’t bad enough, it was also forced on users that explicitly said not to install it.
CrowdStrike’s channel file updates were pushed to computers regardless of any settings meant to prevent such automatic updates
From my reading this is misleading at best and likely wrong. I don’t work with CrowdStrike Falcon but have installed and maintained very similar EDR tools in enterprise environments and the channel updates referenced are the modern version of definition updates for a classic AV engine. Being up to date is the entire point and so typically there are only global options to either grab those updates from the vendor or host them internally on a central server but you wouldn’t want to slow roll or stage those updates since that fundamentally reduces the protection from zero days and novel attacks that the product is specifically there to detect and stop. These are not engine updates in that they don’t change the code that is running, they give the code new information about what an attack will look like to allow it to detect malicious activity as soon as CrowdStrike knows what the IoCs look like.
In this case it appears that one of these updates pointed to a bad memory location which caused the engine to crash the OS, but it wasn’t a code update that did it (like a software patch). That should have been caught in QA checks prior to the channel update being pushed out, but it’s in CrowdStrikes interest to push these updates to all of their customers PCs as quickly as they can to allow detection of novel attacks.
Being up to date is the entire point and so typically there are only global options to either grab those updates from the vendor or host them internally on a central server but you wouldn’t want to slow roll or stage those updates since that fundamentally reduces the protection from zero days and novel attacks that the product is specifically there to detect and stop.
That’s not your, or Crowdstrikes, decision to make. If organizations have applied settings to not install updates automatically then that’s what they expect to happen and you need to honour it. You don’t “know best”. They do.
I’m getting real sick of companies acting like rapists and society just accepting it, if not justifying it for them.
No means no. Plain and simple.
Being up to date is the entire point
No, it isn’t. The point is to keep systems safe and operational. Blindly rolling out untested updates is not a good strategy for that. I have seen entire systems shut down due to false alerts from updated antivirus software. Luckily only test environments, before these updates were rolled out to production. It does not take much to test updates like this before rolling them out to your entire organisation.
Our organization is configured to install N-1 of current release specifically to avoid this type of stuff. Does it matter? No, we got hit just like everyone else.
That should have been caught in QA checks prior to the channel update being pushed out…
I work in QA, and part of the job is justifying why it’s necessary to keep a team of people that doesn’t actually “produce” anything. Either their QA team is now in the hotseat, or Crowdstrike is now realizing why they need one.
Either way, it sounds like a basic smoke test would have uncovered the issue, and the fact that nobody found this means nobody bothered to do one of the most basic tests: turn it on and see if it "catches fire.’
God, even if they didn’t have QA test it, they should have had continuous integration running to test all new channel updates against all versions of their program, considering the update will affect all of them. What an epic process failure.
The distinction between that and a malicious hack consists entirely of intent .
Well that’s just terrorism then
Terrorism would require a political angle.
This is malicious incompetence.
One can argue that there is a very niche political angle to this - teaching Windows users the fear of God, so that they’d see the error of their ways. But it works in our favor, so let’s not concentrate attention on it.
I doubt it was that few.
For reals. Their self reporting is just trying to mitigate damages from the mistake