Part 1 of 3
What AI is, how machines learn, the different types, where you see it, and how to use it responsibly.
Module 01
Artificial Intelligence (AI) is the ability of a computer system to perform tasks that, when done by humans, would require intelligence — things like recognising speech, understanding language, making decisions, or spotting patterns in data.
The word "artificial" simply means it's built by humans. The word "intelligence" is where it gets interesting. Researchers disagree on exactly what intelligence means, but for practical purposes, AI is any system that can perceive its environment, reason about it, and act in ways that achieve some goal.
The term was coined in 1956 by computer scientist John McCarthy at the Dartmouth Conference, but the ideas behind it go back centuries — to philosophers asking what it means to think, and to mathematicians like Alan Turing who asked whether machines could ever be intelligent.
Think of teaching a child to recognise a dog. You don't write a rule book ("four legs, fur, barks"). You show hundreds of examples: "this is a dog, this is a cat, this is a dog…" The child learns the pattern on their own.
Modern AI works almost the same way. Instead of being hand-programmed with rules, it's shown millions of examples and finds the patterns itself. That's the core idea behind machine learning (ML) — the dominant form of AI today.
Figure 1: Traditional programming requires humans to write the rules explicitly. Machine learning discovers the rules automatically from data.
Module 02
Machine learning is the process by which AI systems improve through experience. Instead of following hand-written rules, an ML model finds patterns in data. Here's how the cycle works:
Figure 2: A typical supervised machine learning pipeline — from raw data collection through to a deployed model. The "Label Data" step is specific to supervised learning; unsupervised approaches skip it.
The key ingredient is data. The more good-quality examples a model sees, the better it learns. This is why tech companies invest so heavily in collecting data — it's the raw material of AI.
Figure 3: Supervised learning uses labelled data; unsupervised learning discovers structure on its own.
Supervised learning is by far the most widely used form of ML today. The word "supervised" refers to the fact that a human has provided the correct answers — the labels — for every training example. The algorithm's job is to learn the mapping from inputs to those labels well enough to predict labels on new, unseen inputs.
Think of it like studying with an answer key. You practise on thousands of exam questions where you already know the correct answer. After enough practice you can answer new questions you've never seen before.
Every supervised learning problem is built from the same three building blocks. Understanding these terms precisely will make every other concept in this course easier to follow:
The thing you're trying to predict. For spam detection it's spam / not spam. For house prices it's the sale price. For weather forecasting it's tomorrow's temperature. Every training row has a known label; in production the model must predict it.
The input variables the model uses to make its prediction. For spam: word count, sender address, links present. For house prices: square footage, bedrooms, neighbourhood. The model learns which features matter most and by how much.
A single row of data — one instance with its features and (during training) its label. Labelled examples train the model. Unlabelled examples are what the deployed model must predict on in the real world.
Table 1: A supervised learning dataset. Each row is an example. Green columns are features; the gold column is the label. The last row is unlabelled — what the trained model must predict.
The two most common supervised learning task types — differing only in what kind of label you're predicting:
Examples: house price, tomorrow's temperature, expected exam score. Output is a real number on a continuous scale. Evaluated with metrics like MSE or MAE (Mean Absolute Error).
Examples: spam/not spam, cat/dog/bird, disease present/absent. Output is one of a fixed set of classes. Binary = 2 classes; multi-class = 3+. Evaluated with accuracy, precision, recall.
During training, the model makes a prediction on each labelled example. It then compares its prediction to the true label and receives a supervision signal — a measure of how wrong it was. This signal is used to adjust the model's internal parameters so the next prediction is a little better. Repeat this millions of times and the model gradually improves.
Figure 4: The supervised learning feedback loop. The model predicts, the loss measures error against the true label, and the weights are updated. This cycle repeats until loss is minimised.
Label: spam / not spam
Features: words in subject, sender, links present
Training data: millions of emails manually tagged by users clicking "Mark as spam"
Label: tumour present / absent
Features: pixel intensities in an X-ray or MRI
Training data: scans labelled by radiologists over years of clinical practice
Label: correct translation in target language
Features: words and context in source language
Training data: millions of human-translated document pairs
Label: did the user play / skip this song?
Features: listening history, song tempo, genre, time of day
Training data: billions of play/skip events from real users
Label: the correct text transcript
Features: audio waveform frequencies over time
Training data: thousands of hours of recorded speech paired with human transcripts
Label: object type (car / pedestrian / sign)
Features: camera pixels, LiDAR point clouds
Training data: millions of manually annotated driving video frames
Supervised learning is a good fit when:
In unsupervised learning there are no labels. The algorithm is given raw data and asked to find structure on its own — groupings, patterns, compressions, or anomalies — without being told what to look for. This mirrors how humans often learn: by observing the world and finding categories without an explicit teacher.
Unsupervised learning is harder to evaluate (there's no "correct answer" to check against), but it's enormously useful when labelling is expensive, impossible, or when you want to discover something genuinely unknown in your data.
Algorithms like K-Means partition data into K groups so that items in the same group are more similar to each other than to items in other groups. Used for: customer segmentation, document grouping, image compression, gene expression analysis.
Techniques like PCA and t-SNE take high-dimensional data (e.g. 1,000 features) and represent it in 2–3 dimensions while preserving meaningful relationships. Used for: data visualisation, noise removal, preprocessing before supervised learning.
Learn what "normal" looks like from unlabelled data, then flag anything that deviates significantly. Used for: credit card fraud detection, network intrusion detection, manufacturing defect spotting, medical outlier detection.
Models like VAEs (Variational Autoencoders) and GANs (Generative Adversarial Networks) learn the underlying distribution of training data well enough to generate new examples. The foundation of AI image generation, text generation, and drug molecule design.
Figure 5: K-Means clustering demo. Click "Next Step" to walk through each stage — placing centroids, assigning points, and moving centroids until clusters are discovered.
Real datasets often have hundreds or thousands of features. Dimensionality reduction compresses this into 2–3 dimensions so humans (and other algorithms) can understand it. The key insight is that most high-dimensional data actually lies on a much lower-dimensional manifold — the apparent complexity is mostly redundancy.
Figure 6: Dimensionality reduction with PCA or t-SNE. 1,000 features are compressed to 2 dimensions, making hidden cluster structure immediately visible.
Module 08
AI systems are often categorised by their breadth — the range of tasks they can handle.
Designed for one specific task. Excellent at it, but can't do anything else. Every AI you use today is narrow AI — chess engines, spam filters, image classifiers, voice assistants, language models.
A system that can learn and perform any intellectual task a human can. Does not yet exist. Researchers debate whether it is possible, and if so, when it might arrive.
An AI that surpasses the best human intellect in every domain. A theoretical concept, not something that exists today. Often the subject of philosophical and safety debates.
AI systems trained on vast amounts of text. They can write, translate, summarise, code, reason about language. Examples: GPT-4, Claude, Gemini. A type of narrow AI, but with surprisingly broad language capabilities.
Natural Language Processing (NLP) helps computers understand and generate human language. Powers search engines, chatbots, translation services, autocomplete, and document summarisation.
Enables machines to interpret images and video. Used in facial recognition, medical imaging, self-driving cars, quality control in manufacturing, and satellite analysis.
Creates new content — text, images, music, video, code. Models like DALL·E, Stable Diffusion, and ChatGPT are examples. Learns distributions in data and samples from them.
An agent learns by taking actions in an environment and receiving rewards. Achieved superhuman performance in chess, Go, and many video games. Also used in robotics and recommendation systems.
Predict what a user will like based on past behaviour and similar users. Power Netflix, Spotify, Amazon, and TikTok. Use techniques like collaborative filtering, matrix factorisation, and increasingly, deep learning.
Identifies unusual patterns that deviate from the norm. Used for fraud detection, network intrusion, manufacturing defects, and medical outliers. Trains on "normal" data, then flags anything statistically surprising.
Module 09
You interact with AI dozens of times every day — often without realising it. Here's a map of where AI shows up in a typical day:
Module 10
AI systems are only as good as the data they're trained on and the goals they're given. Several well-documented failure modes deserve your attention as an AI user and citizen.
If training data reflects historical inequalities, the model will perpetuate them. A famous example: early facial recognition systems had much higher error rates for darker skin tones because training datasets were skewed toward lighter-skinned faces.
Figure 7: Illustrative data based on MIT Media Lab's "Gender Shades" study (Buolamwini & Gebru, 2018). Early commercial facial recognition systems showed large disparities in accuracy.
Large language models can generate confident-sounding but factually wrong information — a phenomenon called hallucination. They predict likely-sounding text, not necessarily true text. Always verify important claims from an AI chatbot using authoritative sources.
Many AI services are trained on or learn from user data. When you use a free AI product, consider: what data is collected, who owns it, how it's stored, and whether it's used to train future models.
Here are practical habits for anyone using AI tools today:
Module 11
AI is developing faster than almost any technology in history. Here's a look at where things stand and where they're heading:
Test Yourself
5 questions · 20 points each · 100 total.
5 questions · 20 points each