Apple unveils new Apple Foundation Models: two on-device models, including a 20B-parameter multimodal model called AFM 3 Core Advanced, and three cloud models

Apple Machine Learning Research 2026-06-09

Context & Ripple Effects

Apple’s model work has moved from the MM1 multimodal research series to Apple Intelligence models split between on-device execution and Apple-silicon server capacity. In 2025, Apple also opened its on-device model to developers through the Foundation Models framework.

The new lineup extends that trajectory with a larger multimodal on-device model alongside multiple cloud models, making Apple’s model stack—not only its assistant features—a more explicit platform layer.

First-order effects

Apple gains a broader in-house model portfolio for routing AI tasks between devices and its cloud infrastructure, with multimodal capability available on-device in the stated advanced model.
Developers using Apple’s Foundation Models path have a clearer basis for building Apple Intelligence-adjacent experiences around Apple-provided models rather than treating the on-device model as a single fixed capability.

Second-order effects

The expanded device-and-server lineup increases pressure on rival device platforms to offer developers similarly integrated AI access while preserving a coherent path across local and cloud inference.
More capable on-device multimodal models can shift some app AI workloads away from third-party cloud APIs, while cloud models remain relevant for tasks that exceed device constraints.

Third-order effects

If Apple continues pairing developer access with progressively stronger local and cloud models, AI differentiation may increasingly be determined by control of the device, operating system, developer framework, and inference infrastructure together.
The split architecture points toward a durable hybrid AI model: local execution where device capability and data handling matter, with cloud models serving the workloads that require greater capacity.

The trend: This is part of the shift from standalone chatbot models toward vertically integrated, hybrid on-device-and-cloud AI platforms embedded in major operating systems.

Discussion

@artisny @artisny on bluesky
Two on-device models are enough for me. And personally, I understood the vision presented in the Keynote. No more “AI for AI's sake”—cheers to that.
@awnihannun Awni Hannun on x
It's very cool that Apple shipped a 20B parameter on-device. You can't put 20B parameters in RAM at any reasonable precision. To make it work they are using pretty exotic architecture by today's standards. A small model predicts from the query (or prompt) which experts to load [i…
@jukan05 Jukan on x
I'm curious whether Apple's FFN NAND-like approach reduces the mobile DRAM requirement needed for on-device AI. If so... why doesn't Nvidia use this kind of technology? Wouldn't that mean the 128GB in N1X is overkill? [image]
@eric_seufert Eric Seufert on x
Apple announced major upgrades to the Foundation Model Framework today, but the framework itself is not new: Apple has made its own proprietary, 3B LLM available through the Foundation Model Framework at last year's WWDC. What is new is that the FMF now serves as a model router
@mweinbach Max Weinbach on x
AFM Core Advanced is just for the iPhone 17 Pro, M3+ Mac, and M4+ iPad It's a sparse model, fully multimodal, and unlike any other on-device model AFM Core is for other devices, a dense on-device model
@anemll @anemll on x
https://machinelearning.apple.com/ ... Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing deci…
@kautukkundan @kautukkundan on x
Apple is not squeezing a small model to be good, They're making a large model behave small! AFM 3 Core Advanced (Can I call it AFM 3 pro?) is a 20B parameter model that activates only 1-4B parameters at inference time, stored in flash and loaded into DRAM on demand. Not cloud.
@kimmonismus @kimmonismus on x
Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device. Read that again. 20-billion, on-device, iPhone 17 Pro. It pulls this off by keeping the full model in flash memory and [image]
@zephyr_z9 @zephyr_z9 on x
Interesting approach from Apple They are storing the shared attention block in the DRAM While the FFN weights stay in NAND and are loaded in the DRAM, depending on the request Apple is facing 3 constraints - 1) Limited DRAM size 2) Large model size (20B params) 3) Slow NAND read
@mweinbach Max Weinbach on x
More on each of the new Apple Foundation Models AFM Core Advanced is likely the most impressive on-device model available https://machinelearning.apple.com/ ... [image]
@timkellogg.me Tim Kellogg on bluesky
whoah this Apple tech is cool — they run a 20B model in-memory, but that's far too big to actually fit in memory, so they use a tiny classifier to select which experts to load, once per inference instead of per output token — machinelearning.apple.com/research/ int... [image…
r/apple r on reddit
Introducing the Third Generation of Apple's Foundation Models

Chronicles