CrowdStrike says a Falcon sensor configuration update on Windows triggered a logic error that resulted in a system crash and BSOD, remediated after 78 minutes
Thankfully, Macs weren't affected by last week's catastrophic … Anthony Ha / TechCrunch : TechCrunch Minute: What caused last week's major tech outage? CrowdStrike : Likely eCrime Actor Uses Filenames Capitalizing on July 19, 2024, Falcon Sensor Content Issues in Operation Targeting LATAM-Based CrowdStrike Customers BBC : ‘Significant number’ of devices fixed - CrowdStrike Cybersecurity and Infrastructure Security Agency : Widespread IT Outage Due to CrowdStrike Update CrowdStrike : Remediation and Guidance Hub: Falcon Content Update for Windows Hosts Lance Whitney / ZDNET : Microsoft releases a CrowdStrike recovery tool - here's how it works Brendan Gregg : No More Blue Fridays — In the future, computers will not crash due to bad software updates … Jordan Novet / CNBC : CrowdStrike shares tumble as fallout from global tech outage continues Wayne Williams / BetaNews : Amusing BSoDs from around the world following CrowdStrike's disastrous Windows outage Josh Norem / ExtremeTech : Microsoft Releases CrowdStrike ‘Issue’ Recovery Tool Martin Brinkmann / Ghacks : CrowdStrike in a nutshell: how a faulty software update took down millions of Windows PCs Nick Evanson / PC Gamer : It's not just Windows PCs that have gone belly up, as CrowdStrike's Falcon software has been b0rking Linux-powered computers, too David DiMolfetta / Nextgov/FCW : How the CrowdStrike outage carved out new opportunities for hackers Ben Lovejoy / 9to5Mac : CrowdStrike aftermath: Microsoft claims it cannot legally implement the same protections as Apple SANS Internet Storm Center, InfoCON : The Monday After, (Mon, Jul 22nd) Becky Bracken / Dark Reading : Fallout From Faulty Friday CrowdStrike Update Persists Danny D'Cruze / Business Today : ‘We apologise for the disruption’: CrowdStrike says significant number of affected Microsoft devices are operational Maximiliano Contieri / HackerNoon : Code Smell 260 - Crowdstrike NULL Mathew J. Schwartz / HealthcareInfoSecurity.com : Microsoft Sees 8.5M Systems Hit by Faulty CrowdStrike Update Ben Thompson / Stratechery : Crashes and Competition — I've long maintained that if the powers-that-be understood … Ernestas Naprys / Cybernews.com : Disastrous CrowdStrike update disrupts 8.5 million systems, Microsoft offers recovery tool Rafly Gilang / MSPoweruser : CrowdStrike says “a significant number” of impacted Windows devices are now back online Benedict Collins / TechRadar : CrowdStrike outlines just what went wrong with its update — as many systems around the world are now back up Nino Bucci / The Guardian : Crowdstrike tells Australian government it is ‘close to rolling out automatic fix’ after global outage Pradeep Viswanathan / Neowin : CrowdStrike broke Debian and Rocky Linux months ago, but no one noticed Wall Street Journal : Microsoft says it cannot wall off its OS due to a 2009 deal with the EC to give security software makers the same level of access to Windows that Microsoft gets Wes Davis / The Verge : CrowdStrike's faulty update crashed 8.5 million Windows devices, says Microsoft David Weston / The Official Microsoft Blog : Microsoft estimates that CrowdStrike's update affected 8.5M Windows devices, or less than 1% of all Windows machines Threads: Ben Weston / @benweston88 : Also — directly on CrowdStrike's process: 1. How the fuck did QA testing not pick up a bug with a 100% success rate at killing its target system? 2. Why the fuck don't a company with the userbase size and value they have operate a staged rollout policy?! This isn't 2005. … X: Tavis Ormandy / @taviso : This strange tweet got >25k retweets. The author sounds confident, and he uses lots of hex and jargon. There are red flags though... like what's up with the DEI stuff, and who says “stack trace dump”? Let's take a closer look... 🧵1/n [image] Rakesh Agrawal / @rakeshsfnyc : Absurd that Crowdstrike is claiming they had a resolution in one hour. While that *may* be technically true, having left customers computers in a state where they couldn't install the fix makes it realistically untrue. Tae Kim / @firstadopter : CrowdStrike deserves the blame. They failed basic testing QA, which is unacceptable at their customer scale and kernel access. No one has been more critical (and correct) about Microsoft's poor gaming practices and strategy than me, but Microsoft isn't at fault here. [image] Steven Sinofsky / @stevesi : Cause—coding error, testing oversight, specification incorrect, operator confusion, etc. Mechanism—divide by zero, pointer out of bounds, illegal operation, resource limit/contention, incorrect directions to operator, etc. Manner—software failed, hardware broke, networking Jaana Dogan / @rakyll : This gives insights why a typical staged rollout didn't catch the bug. CrowdStrike made a compromise to roll out config changes faster. In my experience, config changes are no different from code changes. And they are usually more error-prone than code. https://x.com/... Matthew Prince / @eastdakota : @IAmDougLewis @CrowdStrike I guarantee you they have tight controls on code roll out. They have looser controls on config rollout. It's tough as a security company because you see a new threat and you want to fix it fast. You don't expect your own config to explode. But sometimes it does. Tom Warren / @tomwarren : this isn't the first time that CrowdStrike's csagent.sys kernel driver has caused Windows BSODs. I'd imagine many executives are waking up this morning and immediately looking at moving away from CrowdStrike. It's very hard to win back trust after an event like this Patrick Wardle / @patrickwardle : I don't do Windows but here are some (initial) details about why the CrowdStrike's CSAgent.sys crashed Faulting inst: mov r9d, [r8] R8: unmapped address ...taken from an array of pointers (held in RAX), index RDX (0x14 * 0x8) holds the invalid memory address @_JohnHammond [image] Steven Sinofsky / @stevesi : Arlines already use mobile devices for gate checkin, lounges, and kiosks. Hotels are the same. Even TSA. Hospitals already use connected systems via browser and/or Citrix. From now, the only strategy that is not negligence is to move critical infrastructure to mobile devices. Steven Sinofsky / @stevesi : There needs to be a post outlining the manner, cause, and mechanism of the failure. Then the specific remediation. It feels like they are saying there was a corrupt descriptor file (mechanism = failure of format)—though these files are more than data and are likely a @perpetualmaniac : Crowdstrike Analysis: It was a NULL pointer from the memory unsafe C++ language. Since I am a professional C++ programmer, let me decode this stack trace dump for you. [image] @jperlow : The beatings will continue until morale improves Frank X. Shaw / @fxshaw : Helping our customers through the CrowdStrike outage https://blogs.microsoft.com/ ... @norootcause : I gotta admit, named pipes is not something that comes up often in incident write-ups. Didn't even know that Windows supported them! https://www.crowdstrike.com/ ... Vangelis Koukis / @vkoukis : It's a shame that the technical bulletin on the global @CrowdStrike incident avoids being explicit about what the root cause was. So, let's embark on a bit of guessing. The bulletin, for context: https://www.crowdstrike.com/ ... [Thread ⬇️] @0xtib3rius : Interesting line from the #CrowdStrike writeup: “This is not related to null bytes contained within Channel File 291 or any other Channel File.” (Channel Files are the .sys files which numerous people reported null bytes in) https://www.crowdstrike.com/ ... Toby Murray / @tobycmurray : ItCrowdStrike has since “clarified” ( https://www.crowdstrike.com/ ...): 1. It was not a “driver” but a (kernel loaded) “configuration file” that updated how Falcon “evaluated named pipe execution” 2. It was not related to null bytes (i.e. zeros) in the file Clear? Andrew Dwyer / @drandrewdwyer : Here's CrowdStrike's technical analysis... which says little about *how* or *why* this happened. I'm sure we'll find out in due course. https://www.crowdstrike.com/ ... Rob Mensching / @robmen : The technical details provided by Crowdstrike thus far refute some of the worst takes on Twitter. That's some goodness. Now we wait for the root cause analysis to answer the core question: Why wasn't this caught earlier (testing/staging/etc.)? Learning. https://www.crowdstrike.com/ ... Kevin Beaumont / @gossithedog : Here's CrowdStrike's mini root cause analysis of what happened yesterday: https://www.crowdstrike.com/ ... It's basically exactly as commonly thought, i.e. a bad content update was pushed which caused the CrowdStrike driver to crash Bunch of clear learnings for CrowdStrike, e.g. testing Spencer / @techspence : Ok so cs says despite the .sys it was not a kernel driver. I missed that part. Also calling it a logic error which makes it sound trivial. What am I missing? https://www.crowdstrike.com/ ... Jamie Bartlett / @jamiejbartlett : Criminals now looking to exploit this IT outage by claiming to be IT professionals ready to help. This is the most common trick in the book - as I wrote about here Be VERY wary of anyone turning up unannounced saying they'll help! George Kurtz / @george_kurtz : As CrowdStrike continues to work with customers and partners to resolve this incident, our team has written a technical overview of today's events. We will continue to update our findings as the investigation progresses. https://www.crowdstrike.com/ ... @arekfurt : If you haven't seen it, per Crowdstrike here's the concise explanation on how its bad updates actually wound up breaking Windows: (No more official technical detail at this time on what the “logic error” actually did at low-levels.) https://www.crowdstrike.com/ ... [image] @loxyflo : Anyone know how Liz Truss's first day at Microsoft is going? @jason : I guess crowdstrike doesn't do staged rollouts? Matthew Prince / @eastdakota : We should be careful creating incentives for systems' designers where when something goes wrong the right answer to satisfy the lawyers is to fail open. #thatsnotsecurity Scott Hanselman / @shanselman : Here's the thing folks. I've been coding 32 years. When something like this happens it's an organizational failure. Yes, some human wrote a bad line. Someone can “git blame” and point to a human and it's awful. But it's the testing, the Cl/CD, the A/B testing, the metered Steven Sinofsky / @stevesi : Kernel mode is *the* problem. In 2024 changing software from third parties via a private update channel is about the highest risk setup and should not be a generally available capability. And if it is it should not be used in critical systems. @k8em0 : On the CrowdStrike outage: Most organizations of a certain size test software updates before deployment. They do not test “content updates” from OS or security software, but set them to automatically update because they are viewed as safe. IT departments just got a new daily task @hackerfantastic : Are we *sure* the @CrowdStrike crash wasn't deliberate? They pushed a file full of NULL bytes to their agents which caused the BSoD... LinkedIn: CrowdStrike : CrowdStrike continues to focus on restoring all systems as soon as possible. Of the approximately 8.5 million Windows devices that were impacted … Mahmoud Marzouk : 4 statements in 24 hours!! — Taking full responsibility of the disuption, providing transperancy to the customers … Mikhail Sosonkin : CrowdStrike memes have been a good laugh! But, I giggled with a grain of fear because the reality is if you built production code, you've been in this position. … Forums: Hacker News : CrowdStrike's Falcon Sensor also linked to Linux kernel panics and crashes r/technews : CrowdStrike's Falcon Sensor also linked to Linux kernel panics and crashes r/microsoft : CrowdStrike's faulty update crashed 8.5 million Windows devices, says Microsoft r/technews : CrowdStrike's faulty update crashed 8.5 million Windows devices, says Microsoft Lobsters : Technical Details: Falcon Update for Windows Hosts | CrowdStrike