PyTorch for AMD lands on Windows 11 — the CUDA monopoly cracks

Learn CAPTAIN LIN RAUCH

AMD's PyTorch on Windows 11 public preview lets a Radeon workstation run Llama 3.2 locally — no WSL, no cloud, FADP-clean by default for Swiss studios.

Captain Lin Rauch

4 June 2026 · 06:50

AMD published the missing piece this week: PyTorch for AMD on Windows is now in public preview, and a Radeon RX 7900 XTX in a Windows 11 box can run Llama 3.2 1B natively — no WSL, no dual-boot, no Linux migration. The GPUOpen guide documents it as a beginner recipe: python -m venv, pip install, and a Hugging Face transformers snippet that loads the model and answers a prompt.

←TODAY: PyTorch for AMD ROCm on Windows entered public preview in 2026; a single Radeon workstation now runs Llama 3.2 locally, FADP and EU AI Act compliant by default.
→3012: In Zurich-3012, every studio holds its own model. The sovereign-compute baseline is one box per desk, not one API per pocket.
Fulcrum: The CUDA monopoly cracked because the demand side — architects, hospitals, ministries — refused to ship their briefs to someone else’s data centre.

The dependency graph you couldn’t draw before

Read the supported list AMD prints: Radeon RX 9070 XT down to the RX 9060 XT, the workstation PRO W7900, and the Ryzen AI Max+ 395 APU. All Windows 11. ROCm — Radeon Open Compute, AMD’s open counterpart to CUDA — has existed since 2016, but lived inside a Linux-only sandbox for nearly a decade. The architecture and engineering desktop market is overwhelmingly Windows. Until this preview, that mismatch was the single point of failure that kept Swiss BIM studios on either NVIDIA hardware or rented cloud inference.

The bottleneck logic also shifted. KOG.ai reframed it cleanly: single-request LLM decoding is memory-bandwidth-bound, not FLOPS-bound. A Radeon RX 7900 XTX ships with ~960 GB/s of bandwidth and 24 GB of VRAM — the same regime as some datacentre cards, at single-workstation price. Stop chasing FLOPS. Chase the bus.

What you actually get on a desk this morning

The MindStudio team, writing independently of AMD, confirms what the GPUOpen post stops short of saying: in 2026 ROCm usably supports PyTorch, Ollama, LM Studio, and ComfyUI. If venv-and-pip is more friction than you want, Ollama and LM Studio now run on the same Radeon stack with a one-click installer. Pick your altitude.

An honest comparison since AMD won’t print it: fine-tuning is still smoother on CUDA. Inference parity is real; training parity isn’t yet. Plan accordingly.

Atelier: A Zurich practice that indexes its IFC exports, RFI history, and SIA 380/1 compliance PDFs against a local Llama-class model — running inside the same workstation as Archicad and Grasshopper — keeps every line of client material off OpenAI’s and Anthropic’s data centres by construction. This is the PAZ-GPT doctrine in hardware form: one GPU on the desk, one model per studio, one compliance story that fits on a single sheet for FADP and the EU AI Act both.

Hack: This Hack teaches you to print your real PyTorch GPU stack, so you know whether ROCm, CUDA, or a silent CPU fallback is actually doing the work — before you burn a weekend on a setup you misread. The domain is Workflow; the medium is Python; the move is one diagnostic. Paste this inside your activated llm-pyt environment:

import torch

print("torch:", torch.__version__)
print("cuda_available:", torch.cuda.is_available())
if torch.cuda.is_available():
    i = 0
    p = torch.cuda.get_device_properties(i)
    print("device:", torch.cuda.get_device_name(i))
    print("VRAM_GB:", round(p.total_memory / 1e9, 1))
    print("HIP_build:", torch.version.hip)   # not None => ROCm under the hood

If HIP_build prints a version string, you are on ROCm even though the namespace still says “cuda” — PyTorch keeps the CUDA names for portability. If cuda_available returns False, the install fell back to CPU and the warning you ignored was load-bearing.

The warning a Cartographer leaves on the desk

From the late 2070s, the single point of failure I keep returning to is not the GPU and not the model. It is the cooling, the bandwidth, and the people who remember how the old system worked. The lesson of our decade was that compute centralised faster than the grid kept up. Drawing your own dependency graph this week — not the architecture diagram you show the client, the dependency graph you actually run on — is the move that buys years of optionality.

Open cmd. Create the venv. Run the diagnostic above. If the HIP line prints, your studio now holds its own inference stack. That is the action.

Sources & Further Reading

Primary: AMD GPUOpen — A beginner’s guide to deploying LLMs with AMD on Windows using PyTorch
Reinforcing: MindStudio — Running Local AI on AMD: ROCm, Ollama, and LM Studio Performance in 2026
Reinforcing: KOG.ai — Real-time LLM Inference on Standard Datacenter GPUs

FILED FROM

Captain Lin Rauch

CO-SIGNERS

PAZ Academy

CONFIDENCE

HIGH

REPRINTS

SOURCE · ↗

			⚑ REPORT AN ERROR · SUBMIT A CORRECTION		

◂ BACK TO FRONT PAGE · PAZ KAFFI

PAZ Kaffi