The most interesting part of Apple's WWDC26 AI announcement isn't that there are new models. It's where they run. Apple has built its entire AI pitch around privacy and on-device processing, yet one of its five new Apple Foundation Models now runs on Nvidia chips inside Google's data centres. For a company that spent two years insisting your data never leaves a trusted environment, that's a notable shift, and it tells you a lot about where Apple's AI plans have actually landed.
The five models break down into a clear split between what your iPhone handles itself and what gets sent off to a server. Working out which is which matters, because it shapes everything from how fast a feature responds to how Apple can keep making its privacy claims. Here's how the third generation of Apple Foundation Models is structured, and why the cloud side is the part worth paying attention to.
What the five new Apple Foundation Models actually do
Apple announced five models in total at WWDC26, and 9to5Mac has a useful breakdown of how each one fits together. Two run on your device, and three run on servers.
On the device side you get AFM 3 Core and AFM 3 Core Advanced. Core is the next version of the roughly 3-billion-parameter model that has been doing the everyday work since 2024. Core Advanced is the headline act: a 20-billion-parameter model that runs locally and is natively multimodal, which is what powers features like more expressive voices and higher-accuracy dictation.
On the server side there are three: AFM 3 Cloud, ADM 3 Cloud (Image), and AFM 3 Cloud Pro. Cloud is the general-purpose server model tuned for speed and efficiency. The Image model (the D stands for diffusion) handles image generation and editing, including the all-new Image Playground. Cloud Pro is the heavyweight, built for the demanding work like complex reasoning and agentic tool use.
How a 20-billion-parameter model runs on your iPhone
A 20-billion-parameter model running on a phone sounds like it shouldn't be possible, and in a naive setup it wouldn't be. Most on-device models aimed at regular users sit in the low single-digit billions for exactly this reason: memory and battery.
Apple gets around this with a sparse architecture. Rather than firing up all 20 billion parameters for every request, the model activates somewhere between 1 and 4 billion at a time, picking the relevant ones depending on what you've asked. It's conceptually close to the Mixture of Experts approach you'll have seen elsewhere, but Apple says it's using its own pruning technique that it published in research a year ago.
The practical upshot is that you get something near the quality of a much larger model without the device grinding to a halt. That's the genuinely clever bit of engineering here, and it's the kind of thing Apple's silicon advantage is built for.
The part that bends Apple's own rules: Google and Nvidia
Now to the bit that made me sit up. AFM 3 Cloud Pro, the most capable model of the lot, doesn't run on Apple silicon at all. It runs on Nvidia GPUs hosted in Google Cloud.
This follows the Gemini partnership Apple confirmed earlier this year, where Google's models became the backbone of Apple's reworked AI effort. I wrote about what that meant for Siri when it was announced, and this is the infrastructure sitting underneath it. The new Siri leans on this stack, even though Apple is keen to stress the result is not simply Gemini in a trench coat.
For context, this is a real departure. When Apple launched Private Cloud Compute in 2024, the entire selling point was that cloud AI ran in Apple's own data centres, on Apple's own chips, with privacy guarantees that outside researchers could verify. Keeping it in-house was the whole argument.
How Apple is trying to keep the privacy promise
To make the Google arrangement work, Apple has extended Private Cloud Compute to third-party infrastructure for the first time. Its security team published the technical detail, but the short version is that Apple treats Google's hardware as part of its own trusted computing base rather than just trusting Google's word for it.
A few things stand out. Apple keeps a cryptographically verifiable, append-only ledger of every piece of Google Cloud hardware in the fleet, so nothing can be quietly swapped in. For components that could leak data if compromised, the attestation is rooted in at least two independent vendors rather than one. And the same patterns from Apple silicon PCC carry over: request parsing is isolated, shared inference software is recycled on a short timer, and keys sit in a separate confidential VM cut off from external inputs.
Whether that satisfies people who chose Apple specifically to avoid Google is a separate question. Technically it's a serious piece of work. Reputationally, "your most demanding AI requests run on Google's servers" is a sentence Apple would rather you didn't dwell on.
Where the training data comes from
One question I always get on posts like this is what these models were trained on. Apple says it used a mix of publicly available information, data licensed or bought from third parties, open-source data, data from dedicated studies, and synthetic data.
It's also explicit that the training did not use your personal data or your interactions with the features, which is consistent with what it has said before. Web publishers can opt out of having their content used for foundation model training, too. None of that is new for Apple, but it's worth restating given how much scrutiny training data gets these days.
What this tells us about Apple's AI direction
Strip away the model names and the picture is fairly clear. Apple has decided it can't win the frontier-model race on its own timeline, so it's split the difference: keep as much as possible on-device where its silicon genuinely leads, and lean on Google for the heavy cloud workloads it can't yet match in-house.
The 20-billion-parameter on-device model is Apple playing to its strength. The Google-hosted Cloud Pro model is Apple admitting a weakness and wrapping it in enough security engineering that the privacy story still holds. If you've been treating Apple's AI strategy as a slow, deliberate build rather than a sprint, this is exactly the kind of pragmatic compromise you'd expect.
For most people, none of this will be visible. You'll ask Siri something, and a feature will work or it won't. But the architecture underneath is the most honest signal yet of where Apple actually sits in the AI race, and it's a more complicated position than the keynote let on.
FAQ
What are Apple Foundation Models?
Apple Foundation Models (AFM) are the AI models that power Apple Intelligence features across iPhone, iPad, and Mac. The third generation, announced at WWDC26, includes five models split between on-device processing and cloud servers.
Does Apple Intelligence run on Google's servers now?
Partly. One of the five third-generation models, AFM 3 Cloud Pro, runs on Nvidia GPUs hosted in Google Cloud rather than Apple's own data centres. Apple has extended Private Cloud Compute to cover that third-party hardware while trying to maintain its existing privacy guarantees.
Is my data used to train Apple's models?
Apple says no. It states that training did not include user data or interactions, and that the models were trained on a mix of licensed, publicly available, open-source, study-based, and synthetic data instead.
Source: 9to5Mac, with detail from Apple's Machine Learning Research and Security blogs.
YouTuber, tech creator and CTO. I write about the apps, gear, and workflows I actually use — and make videos about them too. Get monthly write-ups in The Lovelock Log.
Watch on YouTube →


