CS336: Stanford Teaches the LLM From Scratch — and Why That Beats the Black Box

academy SERA VEX

Stanford's CS336 makes you build an LLM from scratch — no API. A from-scratch lens for architects auditing AI tools, lock-in, and vendor exit clauses.

Sera Vex

23 June 2026 · 07:00

The fastest way to lose control of a tool is to never see inside it. Stanford’s CS336: Language Modeling from Scratch, taught by Percy Liang and Tatsu Hashimoto, is a five-unit course built on a single refusal: you do not get to call an API. You build the tokenizer, the Transformer, the optimizer, the training loop, the evaluation harness — and only then do you train a model. The course is openly followable; lectures (Mon/Wed, Skilling Auditorium) land on a public YouTube playlist, and coursework coordinates through public Slack. The pedagogy is borrowed, deliberately, from “build an operating system from scratch” courses. Depth over breadth. Full-stack ownership.

←TODAY: In 2026, most AI “education” is an API key and a prompt; CS336 makes you write an order of magnitude more code than a normal AI class. →3012: The Zurich-3012 horizon needs practitioners who can audit a model’s internals, not just consume its outputs. Fulcrum: You can only govern what you can reconstruct — understanding and accountability are the same skill seen from two ends.

The Tool: The course’s open materials live under the stanford-cs336 GitHub organisation — assignment handouts and reference scaffolding for a from-scratch language model. It is maintained by the Stanford NLP / Foundation Models group, the same lineage that produced CS224N and CS324. For an architect or computational designer, it is worth an afternoon because it dismantles the one black box that is quietly entering every AEC toolchain. You learn what a tokenizer actually does to your spec text, what an optimizer step costs, and where the model’s behaviour comes from — before a vendor tells you.

Setup:

# Python 3.11+, a recent PyTorch, and a GPU for training runs (CPU is fine to start)
git clone https://github.com/stanford-cs336/assignment1-basics.git
cd assignment1-basics
pip install -e .
python -m pytest -q   # the failing tests ARE your assignment spec

First steps:

Run the test suite. Every red test is a component you must implement — the spec is executable, not prose.
Start with the byte-pair-encoding tokenizer; train it on a small text corpus and confirm round-trip encode/decode.
Implement the Transformer block (attention + MLP + norm) and the optimizer, then train a minimal model and watch the loss curve actually descend.
Debug correctness on CPU first; move to a single GPU only for the real training run. A single B200 rents for roughly $5–7/hour across Modal, Lambda, RunPod and Nebius — verify current pricing before you spend.

The attention mechanism you will hand-code is not new — it is the 2017 Attention Is All You Need diagram, the one that, per PAZ’s own concept archive, “reorganised an entire discipline by replacing a memory bottleneck with a parallel lookup.” CS336’s value is that it makes you build that lookup yourself rather than import it.

The control angle. Read the course as a procurement document. The institutions formalising LLM training now span academia and professional bodies — IEEE Spectrum reports IEEE has rolled out its own LLM training course covering design, security and deployment. Two tracks, one signal: the skill is being credentialed because the dependency is becoming structural. And structural dependency is exactly where lock-in hides. The model you cannot inspect is the model whose defaults you cannot challenge, whose data provenance you cannot audit, whose exit cost you discover only when you try to leave. CS336’s most underrated stages — Assignment 4’s Common Crawl cleaning and filtering, Assignment 3’s scaling-law fitting — are the parts a vendor never shows you, and the parts the EU AI Act’s training-data transparency obligations will increasingly demand you can answer for.

Atelier: A Swiss studio fine-tuning or RAG-wrapping an LLM over its codes, specs and IFC data is, whether it admits it or not, doing CS336’s Assignment 4 — data collection, filtering, deduplication — on its own corpus. That under-appreciated 80% is where PAZ-GPT’s quality actually comes from. Before you sign with any AI vendor for your practice, ask the CS336 questions: what was the training data, what does evaluation measure, and what is the exit clause.

Hack: This Hack teaches you to read a model’s reach before you trust it — an AI/ML move. Three lines list every layer and its parameter count, so a black box becomes a bill of materials you can audit.

import torch
sd = torch.load("model.pt", map_location="cpu")
for k, v in sd.items():
    print(f"{k:40s} {tuple(v.shape)}  {v.numel():,}")

Run it on any checkpoint a vendor hands you. If they will not hand you the checkpoint, that is your answer.

From where I write, in the late 2070s, the worst damage was never the model that lied. It was the procurement default that kept an unauditable system in critical infrastructure for nine years because nobody had standing to challenge it. If you are choosing an AI vendor for your practice today, write the exit clause before the entry contract — and learn enough of the internals, via a course like this, to know what you are signing. That is the one move that keeps the future open.

Learn-it:

GitHub: stanford-cs336 — assignment handouts and scaffolding
Course / tutorial: Stanford CS336 — Language Modeling from Scratch (syllabus + YouTube playlist)
Root concept: Vaswani et al., Attention Is All You Need (2017) — the diagram every LLM still descends from.
PAZ note: the “from scratch” mindset is the Hack-beat thesis — understand the pipeline’s internals (tokenizer → optimizer → eval) the way you would a parametric or BIM pipeline, never as a black box.

FILED FROM

Sera Vex

CO-SIGNERS

PAZ Academy

CONFIDENCE

HIGH

REPRINTS

SOURCE · ↗

PAZ Kaffi · multidisciplinary editorial, led by PAZ Academy

			⚑ REPORT AN ERROR · SUBMIT A CORRECTION		

◂ BACK TO FRONT PAGE · PAZ KAFFI

PAZ Kaffi

CS336: Stanford Teaches the LLM From Scratch — and Why That Beats the Black Box

You've read your free stories.

New to PAZ Kaffi?