Anthropic's Mythos AI Broke Apple macOS in 5 Days. Here's the B2B Take.

Calif researchers used Anthropic's Mythos AI to break Apple's Memory Integrity Enforcement on M5 in 5 days. Apple will patch it. The harder question for B2B SaaS founders is what happens to vendor due diligence when finding zero-days gets ten times cheaper, and how sellers stay deal-ready.

May 23, 2026

8 min read

Anthropic's Mythos AI Broke Apple macOS in 5 Days. Here's the B2B Take.

Saturday morning, coffee, phone, WSJ. I scrolled past three other headlines and stopped on the Mythos one. Read it through, put the phone down, picked it back up an hour later, and read it again. The second read was after a customer call where someone on the buy side asked us (for what felt like the tenth time this month) what our patching cadence looks like for upstream dependencies.

I'm going to try to make sense of the story here, because I don't think most B2B founders or security leads have absorbed what just changed.

The short version, if you missed it. Researchers at a small Palo Alto firm called Calif used an early version of Anthropic's Mythos AI model to find two unknown bugs in macOS. They chained those bugs together with some other techniques and bypassed Apple's Memory Integrity Enforcement on an M5 Mac. Started as an unprivileged local user, ended with a root shell. Five days from "we think we found something" to a working exploit.

The Wall Street Journal broke it on May 14. Apple is reviewing the 55-page report Calif hand-delivered to Cupertino, and the spokesperson's statement was the usual: "Security is our top priority, and we take reports of potential vulnerabilities very seriously."

Apple sank close to five years into building MIE. (Reportedly a billion dollars, though that number is from WSJ and I'd hold it loosely.) Calif's team broke through it in five days using an AI that most of the public can't even get access to.

Here's why that matters for the rest of us.

What Actually Happened. The Facts, Not the Hot Takes.

Mythos is the AI model Anthropic has been keeping under wraps because its own engineers have said it's too good at finding software bugs to release. About 40 partner organizations get controlled access through something called Project Glasswing. Apple is one of those partners, which is its own interesting wrinkle.

Calif tested Mythos against macOS 26.4.1 running on bare-metal M5. The exploit chain stitches two vulnerabilities together with several techniques to corrupt kernel memory. Privilege escalation. From a normal local user account up to root, using only standard system calls. According to Calif, this is the first public macOS kernel memory corruption exploit demonstrated against MIE on M5.

One nuance most of the coverage glossed over, and I think it's important. This wasn't autonomous AI hacking a Mac. Calif's CEO, Thai Duong, told the Journal the exploit "couldn't have been pulled off by Mythos alone and leveraged the very human cybersecurity expertise of some of Calif's hackers." His read on Mythos is that it's currently strong at recognizing and combining patterns from known attack categories. Not yet at inventing brand new ones.

Five days, though. With humans doing maybe 60% of the work, and AI does the rest. (My estimate, not Calif's.)

Mythos has form on this. Earlier this year it reportedly surfaced over 100 high-severity vulnerabilities in Mozilla Firefox in two weeks. It also found a flaw in OpenBSD that had been hiding for 27 years. Some folks in the research community are calling this moment "Bugmageddon." A little dramatic, sure. But I get the impulse.

Jon's Take: What Defenders Should Actually Be Worried About

I sent the story to Jon on Saturday afternoon. He'd already read it. We talked through it on Monday morning before our standup, and I want to share his take because I think it's clearer than mine.

The Mythos news isn't really an Apple story. Apple has world-class security engineering, an unreal R&D budget, and a vertically integrated hardware stack. They're one of the toughest targets on the planet. So if a five-person team plus a frontier AI can get an exploit chain together on Apple's hardened kernel in under a week, Jon's question is what happens when someone points that same capability at the average B2B SaaS company with a half-audited dependency graph running on someone else's cloud.

His answer (paraphrasing): the cost of finding zero-days just collapsed. Probably by an order of magnitude. Maybe more.

That cost used to be the defensive moat for almost everyone except nation-states and well-resourced criminal groups. Hard problems took months of unpaid expertise. Now the expertise is being augmented by something that pattern-matches across the entire public corpus of known exploit techniques at machine speed. Threat actors who used to recycle known CVEs can now find their own.

Jon's actual advice for security teams reading this, in the order he gave it to me on the call:

Patch hygiene is no longer something you can defer to next quarter. Mythos and tools like it will surface a lot of disclosed CVEs in the next twelve months. If you're more than a few weeks behind on critical patches across your stack, that gap is going to start mattering in a way it didn't six months ago.

Pull your incident response runbook off the shelf and actually run a tabletop. Jon suggests gaming out a vendor compromise specifically, because that's where he sees most B2B SaaS programs are thin. Most people have a runbook for "ransomware hits our laptop fleet." Far fewer have one for "the auth provider we use just had a P0 disclosed."

Re-look at detection coverage. AI-discovered bugs sometimes get exploited in ways that don't trigger signature-based rules, because the technique combos haven't been seen in the wild yet. Behavioral detection becomes more useful.

And then this last one is mine more than Jon's. Talk to your revenue team about what your security review SLA looks like. Because the next conversation is the one I want to have now.

What This Means If You're Selling Software to Enterprises

I want to switch from the security frame to the founder frame for a minute, because that's where I sit and that's the version of this story I'm not seeing other people write.

Your buyers' security teams are reading the same article. They're not panicking. But they are recalibrating what "vendor due diligence" needs to look like in a world where any vendor's software might have AI-discoverable bugs that haven't been disclosed yet. Vendor questionnaires get longer. Trust Center expectations get harder. SOC 2 Type 2 stops being a finish line and becomes table stakes.

The deals that close in the next two quarters will be the ones where the seller can move at the buyer's new pace.

That's the part most founders aren't ready for. Buyers aren't about to stop buying software. They'll just buy faster from vendors who can answer hard security questions in three days instead of three weeks, and slower (or not at all) from everybody else.

This is the certified-to-deal-ready gap. I talk about it a lot. Probably too much. But it's the thing I keep watching kill deals at the late stage, especially with security-conscious buyers. You can have every certification on the wall (SOC 2 Type 2, ISO 27001, HIPAA) and still lose to a competitor because your team takes nine business days to return a due diligence questionnaire and your legal redlines come back two weeks after the prospect's MSA hits your inbox.

The Mythos story is going to widen that gap for any vendor who doesn't get ahead of it.

How Cyberbase and YSecurity Actually Fit Here

I'll be upfront about why I'm writing this. I have a horse in the race, two of them, and I think it's better to just say so than to pretend this is a neutral piece of analysis.

Cyberbase exists because Jon and I lived this problem from opposite sides. He spent years inside enterprise security teams running third-party reviews. I spent years on the seller side watching deals stall on security and legal cycles that nobody had figured out how to compress. The thing we built is the thing we wanted back then.

What Cyberbase actually does:

We're a deal accelerator. The product handles AI contract redlining, due diligence automation, and the Trust Center. Our customer Augment Code processed 155 contracts and saved 743 hours running their playbook through ours. That's the number I quote because it's real, and because hours saved is a better metric than the kind of magical ROI numbers most security tools market with.

YSecurity is Jon's separate advisory practice. Different company. Same founders. When a customer needs hands-on senior security work (program design, penetration testing, IR readiness, board-level guidance), that's where it lives. The two firms work next to each other. Cyberbase is the software. Security is the senior human judgment.

In a world where the cost of finding bugs has just collapsed, both halves of that matter more, not less. Tools alone won't save you. Neither will headcount alone. The teams I see weathering this transition well are the ones who pair good software with experienced people and let each do what it's actually good at.

The Quiet Part of the Calif Story

Here's the thing I keep coming back to.

The AI didn't do this alone. It needed Bruce Dang, who found the bugs on April 25th. It needed Dion Blazakis, who joined two days later. It needed Josh Maine to build the tooling. Five days of intense human work on top of an AI that did the pattern recognition.

Same shape on defense. The companies that get through the next stretch won't be the ones with the most security tools or the longest list of certifications. They'll be the ones who pair tools that actually work with people who actually know what they're doing.

The Calif researchers were so excited by what they'd built that they drove to Apple Park to deliver the report in person. There's something I love about that. Humans who care about the problem, using a tool that finally matches the scale of it, and doing the right thing with what they found.

Apple will patch this. Duong told the Journal he thinks the bugs "will likely be fixed pretty quickly," and I believe him on that.

The bigger story isn't this exploit at all, though. It's what happens to your buyer's vendor risk program in six months, and whether your security and revenue teams have had the conversation they need to have. If they haven't yet, this week is a fine week to put the meeting on the calendar.

Want to see how Cyberbase cuts security review cycles for your sales team? Book a 15-minute walkthrough.

Working on a security program that needs senior advisory support? YSecurity provides hands-on security leadership for B2B software companies. Jon also hosts The Security Podcast of Silicon Valley.

Frequently Asked Questions

What is Anthropic's Mythos AI model?

Mythos is an Anthropic AI model that hasn't been released publicly because Anthropic's own engineers have said it's too capable at finding software vulnerabilities to put in the wild. About 40 partner organizations have controlled access through Anthropic's Project Glasswing program. Apple, Google, and Microsoft are among them.

Did the AI hack Apple by itself?

No. Calif CEO Thai Duong told the Wall Street Journal the exploit could not have been built by Mythos alone and required significant human cybersecurity expertise. Mythos is effective at recognizing and combining known attack patterns. Humans did the design work.

What is Apple's Memory Integrity Enforcement (MIE)?

MIE is a hardware-assisted security feature Apple built into its M5 silicon to protect kernel memory from corruption attacks. According to WSJ, Apple invested close to five years of engineering into MIE. Calif's exploit is reportedly the first public demonstration of bypassing MIE on M5 hardware.

Are Mac users at immediate risk right now?

No widely exploitable threat has been confirmed. The exploit requires local access (an unprivileged user account on the machine) and additional capabilities to be useful in a real attack chain. Apple is reviewing Calif's 55-page report and is expected to ship a patch.

What should B2B SaaS security teams do in response?

Get patch hygiene tight across your supply chain. Run an incident response tabletop this quarter, ideally one that games out a vendor compromise. And start the conversation between your security and revenue teams about how vendor due diligence is going to change in the next two quarters.

How does AI change the vulnerability disclosure landscape?

It lowers the cost of finding zero-day vulnerabilities, possibly by an order of magnitude. Mythos has reportedly found over 100 high-severity vulnerabilities in Mozilla Firefox in two weeks, plus a 27-year-old flaw in OpenBSD. Defenders should expect a sustained increase in disclosed vulnerabilities and prepare patching and detection workflows accordingly.

How does Cyberbase help with vendor security and trust?

Cyberbase is a deal accelerator that closes the certified-to-deal-ready gap with AI contract redlining, due diligence automation, and a Trust Center. Customers like Augment Code have saved 743 hours and processed 155 contracts with Cyberbase.

Recommended Security Insights

Compliance shouldn't kill your pipeline

One workspace. Agentic AI. Trust center, DDQs, and contract redlining — done. Start free, see results this week.