Glossary of Key Terms

52 key terms from this course — searchable and organised alphabetically. Type to find any term instantly.

Artificial General Intelligence (AGI)
A hypothetical AI system capable of performing any intellectual task a human can. Does not yet exist — distinct from today's narrow AI systems.
Artificial Intelligence (AI)
The field of computer science concerned with building systems that perform tasks requiring human-like intelligence — recognising images, understanding language, making decisions.
Bias (algorithmic)
Systematic errors in a model's outputs caused by skewed or historically inequitable training data. A model trained on biased data produces biased predictions.
Bias (model parameter)
A learnable constant added to a neuron's output — written as b in the linear equation. Allows the model to shift its predictions independently of the input features.
Classification
A supervised learning task where the model predicts a category. Binary = 2 classes; multi-class = 3+. Examples: spam detection, disease diagnosis, image labelling.
Clustering
An unsupervised technique that groups data points by similarity without predefined labels. K-Means is the most widely used algorithm. Used for customer segmentation and document grouping.
Computer Vision
A branch of AI enabling machines to interpret images and video. Used in facial recognition, autonomous vehicles, medical imaging, and manufacturing quality control.
Data Leakage
When information from the test set accidentally reaches the training process, making the model appear more accurate than it really is when deployed.
Deep Learning
Machine learning using neural networks with many hidden layers. The depth lets the model learn abstract representations. Powers image recognition, language models, and speech systems.
Dimensionality Reduction
Compressing high-dimensional data into fewer dimensions while preserving meaningful structure. Used for visualisation and preprocessing. Key techniques: PCA, t-SNE, UMAP.
Example
A single data instance — one row in a dataset — with feature values and (in supervised learning) a label. Labelled examples train models; unlabelled ones are what the model predicts on.
Feature
An input variable used by a model to make predictions. For house prices: square footage, bedrooms, neighbourhood. Good features encode information genuinely predictive of the label.
Feature Engineering
Transforming raw data into informative, well-scaled model inputs. Includes normalisation, one-hot encoding, binning, and feature crosses. Often the highest-impact step in building a good model.
AI Agent
An AI system that takes a goal and acts autonomously — browsing the web, writing and running code, calling APIs, and deciding its own next steps until the goal is achieved. Distinct from a chatbot, which only responds to messages.
Alignment
The challenge of ensuring AI systems behave in accordance with human values and intentions. Includes both near-term concerns (bias, misuse) and long-term concerns (AI systems pursuing goals humans didn't intend).
Backpropagation
The algorithm used to train neural networks. After a forward pass computes the loss, backpropagation calculates how much each weight contributed to the error — using the chain rule of calculus — so gradient descent can update them.
Bias–Variance Tradeoff
The tension between two sources of model error. High bias = underfitting (model too simple). High variance = overfitting (model too sensitive to training data). Good models balance both.
Context Window
The maximum amount of text (measured in tokens) an LLM can process at once. GPT-3 had 4K tokens; modern models like Gemini 1.5 support up to 1M tokens — roughly 750,000 words.
Diffusion Model
A generative model that learns to reverse a noise process. During training, noise is gradually added to images; the model learns to denoise them step by step. At inference, it starts from pure noise and generates a coherent image guided by a text prompt. Powers DALL·E 3, Stable Diffusion, and Midjourney.
Embedding
A dense numerical representation of discrete objects (words, sentences, images) in a continuous vector space. Similar items are close together in that space. Enables models to reason about relationships between concepts.
Generalization
A model's ability to perform well on data it was not trained on. The ultimate goal of ML. A model that generalises has learned the true underlying pattern, not just memorised examples.
Generative AI
AI systems that produce new content — text, images, audio, video, or code — by learning the distribution of training data. Examples: ChatGPT, DALL·E, Stable Diffusion, Claude.
Gradient Descent
The optimisation algorithm used to train ML models. Iteratively adjusts weights in the direction that reduces the loss function — like rolling a ball downhill to find the lowest valley.
Hallucination
When a language model generates confident-sounding but factually incorrect information. Occurs because models predict statistically likely text, not necessarily true text. A key limitation of LLMs.
Hyperparameter
A configuration value set before training begins — unlike model parameters (weights) which are learned during training. Examples: learning rate, number of layers, batch size, epochs.
Label
The correct answer for a training example in supervised learning — the thing the model tries to predict. Spam/not-spam for email filtering; sale price for house price prediction.
Large Language Model (LLM)
A neural network trained on vast text to predict the next token. LLMs like GPT-4, Claude, and Gemini show broad capabilities in writing, translation, coding, and reasoning.
Learning Rate
A hyperparameter controlling step size in gradient descent. Too small: training is very slow. Too large: training diverges. One of the most important tuning decisions in ML.
Loss Function
A function measuring how wrong the model's predictions are. Training minimises this value. MSE is common for regression; cross-entropy is common for classification.
Machine Learning (ML)
A subfield of AI where systems learn patterns from data rather than following explicit rules. The dominant approach in modern AI. Includes supervised, unsupervised, and reinforcement learning.
Model
A mathematical function with learnable parameters that maps inputs to predictions. After training, the model encodes patterns from data. Can range from a simple linear equation to a billion-parameter neural network.
Natural Language Processing (NLP)
AI focused on understanding, generating, and reasoning about human language. Powers search engines, translation, chatbots, autocomplete, and document summarisation.
Neural Network
A machine learning model loosely inspired by the human brain — layers of interconnected nodes with learnable weights. Training adjusts these weights to minimise the loss function.
Normalisation
Scaling numerical features to a comparable range (0–1 or mean=0, std=1) before training. Prevents large-scale features from dominating small-scale ones.
Fine-tuning
Continuing to train a pre-trained model on a smaller, task-specific dataset. Adapts general knowledge to a specific domain (e.g., medical text, legal documents) without training from scratch. Stage 2 of the modern LLM pipeline.
GAN (Generative Adversarial Network)
A generative model with two competing networks: a generator that creates fake samples, and a discriminator that tries to tell real from fake. The competition drives both to improve. Used for image synthesis and data augmentation.
Inference
Using a trained model to make predictions on new data. Distinct from training. When you send a prompt to ChatGPT, the model is doing inference — applying what it already learned.
K-Means
An unsupervised clustering algorithm. Assigns data points to K clusters by iteratively placing centroids and reassigning points to their nearest centroid until the clusters stabilise.
Overfitting
When a model learns training data too well — including noise — and fails to generalise. Indicated by a large gap between training accuracy and validation/test accuracy.
Parameter (Weight)
A numerical value inside a model adjusted during training to reduce loss. A linear model has one weight per feature plus a bias. A large language model may have hundreds of billions of parameters.
Regression
A supervised learning task predicting a continuous numerical value. Examples: house price, temperature, expected revenue. Evaluated with metrics like MSE or mean absolute error (MAE).
Reinforcement Learning (RL)
A type of ML where an agent learns by taking actions and receiving rewards or penalties. Used for game-playing AI (AlphaGo, chess), robot control, and recommendation systems.
Stochastic Gradient Descent (SGD)
A variant of gradient descent using one random example (or mini-batch) per step rather than the full dataset. Much faster in practice and often converges to equally good solutions.
Supervised Learning
ML where every training example has a known label. The model learns the mapping from features to labels. The most common ML paradigm — used for classification, regression, translation, and diagnosis.
Test Set
A held-out portion of data used for final model evaluation — only once, after all training and tuning. Using it more than once causes data leakage and over-optimistic accuracy estimates.
Training Set
The data on which model parameters are directly optimised. The model sees this data repeatedly across epochs and adjusts its weights to minimise loss on it.
Transformer
A neural network architecture from the 2017 paper "Attention Is All You Need". The foundation of all modern LLMs. Uses an attention mechanism to capture relationships across long sequences.
Underfitting
When a model is too simple to capture true patterns, performing poorly on both training and test sets. Caused by too few parameters, too little training, or overly aggressive regularisation.
Unsupervised Learning
ML without labels — the algorithm finds structure on its own. Used when labels are unavailable or expensive. Key techniques: clustering, dimensionality reduction, anomaly detection.
RNN (Recurrent Neural Network)
A neural network architecture designed for sequential data — text, time series, audio. Processes inputs one step at a time, maintaining a hidden state that carries information forward. Largely superseded by Transformers for language tasks.
VAE (Variational Autoencoder)
A generative model that learns a compressed latent representation of training data and can generate new samples by decoding points in that latent space. Used for image generation and anomaly detection.
Validation Set
Data held out during training to tune hyperparameters and detect overfitting — distinct from the test set. Hyperparameter choices are influenced by validation performance but the model does not train on it.
Showing all 52 terms

Course Complete!

You've covered what AI is, how it learns, core ML mechanics, neural networks, responsible use, and the future of AI. Plus a full glossary to refer back to. That's a genuinely solid foundation.

Review from Start Google ML Crash Course →