Deep Learning — Explainer Roadmap

by Ancil Cleetus

A complete plan of Interactive Visual Explainers covering Deep Learning fundamentals — from a single neuron through activation functions, loss, backpropagation, optimization, and modern training techniques.

Overview

—

Total explainers

—

Completed

—

Planned

Overall progress— / —

Progress by category

Foundations The neuron & forward pass

The Neuron & Forward Pass Explainer ↗

Inputs, weights, bias, weighted sum z, activation functions, forward pass step by step with live sliders. Biological vs artificial neuron comparison

✓ Live

Foundations Activation functions

Activation Functions Explainer ↗

Why nonlinearity, Sigmoid, Tanh, ReLU, Leaky ReLU, GELU — interactive curve explorer with live gradient f′(z), vanishing gradient in saturation zones, GELU deep dive, activation function summary table

✓ Live

Loss Loss functions

Loss Functions Overview Explainer ↗

What is loss, task-to-loss mapping: MSE for regression, Cross-Entropy for classification, MAE, Huber loss — when to use which and why

✓ Live

Cross-Entropy Loss Explainer

Information theory intuition, −log(p) curve, binary vs categorical cross-entropy, penalising confident wrong predictions, multiclass worked example

Planned

Optimization Optimization & gradient descent

Loss & Gradient Descent Explainer ↗

What is loss, loss landscape with interactive contour map, gradient descent algorithm, weight update rule w ← w − η·∂L/∂w, learning rate effect (too small / too large / just right)

✓ Live

Optimizers Explainer

SGD, SGD with Momentum, RMSProp, Adam, AdamW — adaptive learning rates, momentum, bias correction, comparison on a loss landscape, when to use each in practice

Planned

Backprop Backpropagation

Chain Rule & Computation Graph Explainer

The chain rule of calculus, computation graphs, forward pass vs backward pass, local gradients at each node, interactive graph where you click nodes to see gradient flow

Planned

Backward Pass & Full Network Backpropagation Explainer

Backpropagating through a full network layer by layer, gradient accumulation, weight updates end to end, delta rule, numerical gradient verification

Planned

Gradients Gradient flow problems

Vanishing & Exploding Gradients Explainer

Why gradients shrink or explode with depth, sigmoid saturation causing vanishing, gradient norms across layers, gradient clipping, depth slider showing gradient magnitude decay

Planned

Weight Initialization Explainer

Why initialization matters, zero-init failure (symmetry breaking), random init problems, Xavier/Glorot initialization, He initialization, interactive visualisation of activation distributions at initialization

Planned

Architecture Building a multi-layer network

From Neuron to Network Explainer

From one neuron to a layer, layer types (input, hidden, output), fully connected layers & weight matrices, depth vs width trade-off, interactive network builder

Planned

Universal Approximation Theorem Explainer

What the UAT says and what it does not, width vs depth, practical implications for network design, interactive demo fitting arbitrary functions

Planned

Architecture Normalization

Batch Normalization Explainer

Internal covariate shift, normalizing activations per mini-batch, learnable scale & shift parameters γ and β, training vs inference behaviour, effect on gradient flow

Planned

Training Regularization

Dropout & Weight Decay Explainer

Overfitting, dropout mechanism and stochastic regularisation, train vs test mode difference, weight decay (L2 regularization), interactive overfitting demo

Planned

Training Training loop & convergence

Training Loop & Convergence Explainer

The full training loop, mini-batches and epochs, overfitting vs underfitting, bias-variance trade-off, train/val/test split, early stopping

Planned

Learning Rate Schedules Explainer

Constant LR, step decay, cosine annealing, warmup + cosine (modern default), cyclical LR — all visualised as interactive schedule curves with live loss simulation

Planned

Evaluation Evaluation

Evaluation Metrics Explainer

Accuracy, precision, recall, F1 score, ROC curve, AUC — interactive confusion matrix, when each metric matters and when it misleads (class imbalance)

Planned

Design principles

Each explainer introduces exactly one new idea — no explainer tries to cover an entire area at once.

All explainers follow the same header, footer, and style conventions established in the series.

Every explainer is a self-contained single HTML file — no external dependencies, no frameworks, no install required.