Part 3 of 3
Large language models, transformers, GPT & Claude, diffusion models, AI agents, and safety — the cutting edge explained clearly.
Module 01
The technology behind ChatGPT, Claude, Gemini — what they are, how they were built, and why they feel so different from earlier AI.
A Large Language Model (LLM) is a neural network trained on vast amounts of text — books, websites, code, scientific papers — to predict and generate language. The "large" refers to the number of parameters: modern LLMs have hundreds of billions of weights, trained on trillions of words.
The core task during training is deceptively simple: predict the next token. Given a sequence of words, what comes next? Trained at massive scale, this objective forces the model to develop rich internal representations of grammar, facts, reasoning patterns, and even some degree of common sense.
The result is a model that can write essays, answer questions, translate languages, summarise documents, write code, and hold extended conversations — all from one training objective on text.
Modern LLMs are built in three stages:
Figure 15: The three-stage pipeline for modern instruction-following LLMs.
Pre-training creates a general-purpose language model on raw internet text. Supervised fine-tuning (SFT) trains it to follow instructions using human-written examples. RLHF (Reinforcement Learning from Human Feedback) then trains a reward model on human preference rankings, and uses it to further align the LLM's outputs to be helpful, harmless, and honest.
LLMs generate text by predicting likely continuations — they do not look things up or verify facts. This means they can produce hallucinations: confidently stated information that is entirely false.
Common hallucination patterns include: fabricated citations, invented names or dates, incorrect maths, and plausible-sounding but wrong technical details. The model does not "know" it is wrong — it has no internal fact-checker.
Retrieval-Augmented Generation (RAG) is one technical fix: attach a search tool so the model retrieves real documents before generating. But even RAG models can misread or misquote sources.
Module 02
The Transformer, introduced in the 2017 paper "Attention Is All You Need" by Google researchers, is the architecture underlying virtually every modern LLM. Before transformers, sequence models (RNNs — Recurrent Neural Networks, and LSTMs — Long Short-Term Memory networks) processed text word by word. Transformers process the entire sequence at once using a mechanism called self-attention.
A Transformer is built from stacked attention heads (typically 96 in GPT-4) and feed-forward layers. Each head learns different relationship patterns — some track grammar, some track coreference, others track semantic similarity.
Click a word below to see (a simplified version of) how strongly it attends to the others in the sentence:
Click a word above
Figure 16: Interactive self-attention demo. Click any word to see how strongly it attends to every other word in the sentence. Real attention patterns are computed across 96 heads and 128,000+ token contexts — this is a simplified illustration of the concept.
The context window is the maximum number of tokens a model can "see" at once when generating a response. Everything outside the context window is invisible to the model — it cannot reason about it.
Figure 17: Context window sizes across major models. 1M tokens ≈ ~750,000 words — roughly 10 full novels.
Larger context windows allow models to reason over entire codebases, long legal documents, or extended conversations without losing track of earlier content.
Module 03
The modern LLM landscape is dominated by a handful of model families from major labs. Each has different strengths, training approaches, and design philosophies.
| Model | Creator | Strengths | Notable |
|---|---|---|---|
| GPT-4o / o3 | OpenAI | Reasoning, coding, multimodal (text + image + audio) | First widely-deployed LLM; powers ChatGPT |
| Claude 3.5 / 4 | Anthropic | Long context, safety, nuanced writing, coding | Built with Constitutional AI and RLHF; strong safety focus |
| Gemini 1.5 / 2 | Google DeepMind | 1M token context, multimodal, search integration | Native integration with Google products |
| Llama 3 / 3.1 | Meta | Open weights, customisable, strong on-device options | Freely available for research and commercial use |
| Mistral / Mixtral | Mistral AI | Efficient, open, excellent for deployment | Uses Mixture-of-Experts (MoE) architecture |
Table 2: Major LLM families — creators, key strengths, and notable characteristics.
Model performance is measured on standardised benchmarks. Common ones include:
Benchmark scores improve rapidly. GPT-4 scored 87% on MMLU in 2023; newer models routinely exceed 90%. However, high benchmark scores do not automatically translate to real-world usefulness — a model can score well by memorising benchmark-like patterns during training.
Module 04
Text-to-image models like DALL·E 3, Stable Diffusion, and Midjourney are built on a technique called diffusion. The core idea is elegant: train a neural network to reverse a noise process.
Figure 18: Forward process: gradually add Gaussian noise until only noise remains. Reverse process: train a network to denoise step by step, guided by a text prompt.
During training, the model learns to predict the noise added at each step, given the noisy image and the text prompt. During inference, you start with pure random noise and repeatedly denoise, guided by your text description, until a coherent image emerges.
The same diffusion principles now power generation far beyond static images. Sora (OpenAI, 2024) applies diffusion in the space of video patches, generating minute-long coherent videos from text. AudioCraft (Meta) generates music and audio. Point-E generates 3D point clouds.
A unifying theme: any modality that can be represented as a structured tensor (pixels, audio waveforms, video frames, protein structures) can, in principle, be modelled with diffusion. We are early in a wave of generative AI that extends well beyond text.
Module 05
A chatbot takes a message and responds. An AI agent takes a goal and acts — browsing the web, writing and running code, calling APIs (Application Programming Interfaces), reading files, and deciding its own next steps until the goal is achieved.
Figure 19: The agent loop: the LLM reasons about a goal, dispatches to tools, observes results, and repeats.
The most widely used agent framework today is ReAct (Reasoning + Acting): the model interleaves reasoning steps ("I need to find the current stock price…") with action steps ("search: AAPL stock price today"). Each observation updates its plan.
A multi-agent system runs multiple LLM instances simultaneously, each with a specialised role. One agent plans, one researches, one writes code, one reviews — and they communicate to solve complex tasks no single agent could handle alone.
Frameworks like AutoGen (Microsoft), CrewAI, and Anthropic's own multi-agent APIs make this practical. Early applications include autonomous research assistants that run literature reviews, generate hypotheses, write code to test them, and synthesise results — with minimal human input.
Module 06
AI safety is the field concerned with ensuring AI systems behave as intended — safely, reliably, and in alignment with human values — as they become more capable. It operates on two timescales: near-term practical harms, and long-term risks from highly capable systems.
Alignment is the problem of ensuring an AI system pursues goals that match what we actually want. RLHF is one alignment technique. Constitutional AI (Anthropic's approach used for Claude) is another: the model is given a set of principles and trained to critique and revise its own outputs against them.
Interpretability asks: what is the model actually doing internally? Current LLMs are largely black boxes — we can observe inputs and outputs but not the reasoning process. Research labs like Anthropic's interpretability team work to reverse-engineer the circuits inside these models, finding that abstract concepts (emotions, positions in sequences, logical operations) are encoded in specific features of the activations.
Robustness is the third pillar: ensuring models behave safely even under adversarial inputs, edge cases, or distribution shifts — not just on clean benchmarks.
AI safety is no longer only a research concern — it is a policy priority. Key moments:
Test Yourself — Part 3
5 questions covering Part 3. Score out of 100.
5 questions · 20 points each · 100 points total
You have covered AI fundamentals, how models learn, and the modern AI landscape. That is a genuinely solid foundation for understanding and engaging with AI.