A complete plan of Interactive Visual Explainers covering Deep Learning fundamentals — from a single neuron through activation functions, loss, backpropagation, optimization, and modern training techniques.
Why nonlinearity, Sigmoid, Tanh, ReLU, Leaky ReLU, GELU — interactive curve explorer with live gradient f′(z), vanishing gradient in saturation zones, GELU deep dive, activation function summary table
What is loss, loss landscape with interactive contour map, gradient descent algorithm, weight update rule w ← w − η·∂L/∂w, learning rate effect (too small / too large / just right)
✓ Live
6
Optimizers Explainer
SGD, SGD with Momentum, RMSProp, Adam, AdamW — adaptive learning rates, momentum, bias correction, comparison on a loss landscape, when to use each in practice
Planned
BackpropBackpropagation
7
Chain Rule & Computation Graph Explainer
The chain rule of calculus, computation graphs, forward pass vs backward pass, local gradients at each node, interactive graph where you click nodes to see gradient flow
Planned
8
Backward Pass & Full Network Backpropagation Explainer
Backpropagating through a full network layer by layer, gradient accumulation, weight updates end to end, delta rule, numerical gradient verification
Planned
GradientsGradient flow problems
9
Vanishing & Exploding Gradients Explainer
Why gradients shrink or explode with depth, sigmoid saturation causing vanishing, gradient norms across layers, gradient clipping, depth slider showing gradient magnitude decay
Planned
10
Weight Initialization Explainer
Why initialization matters, zero-init failure (symmetry breaking), random init problems, Xavier/Glorot initialization, He initialization, interactive visualisation of activation distributions at initialization
Planned
ArchitectureBuilding a multi-layer network
11
From Neuron to Network Explainer
From one neuron to a layer, layer types (input, hidden, output), fully connected layers & weight matrices, depth vs width trade-off, interactive network builder
Planned
12
Universal Approximation Theorem Explainer
What the UAT says and what it does not, width vs depth, practical implications for network design, interactive demo fitting arbitrary functions
Planned
ArchitectureNormalization
13
Batch Normalization Explainer
Internal covariate shift, normalizing activations per mini-batch, learnable scale & shift parameters γ and β, training vs inference behaviour, effect on gradient flow
Planned
TrainingRegularization
14
Dropout & Weight Decay Explainer
Overfitting, dropout mechanism and stochastic regularisation, train vs test mode difference, weight decay (L2 regularization), interactive overfitting demo
Planned
TrainingTraining loop & convergence
15
Training Loop & Convergence Explainer
The full training loop, mini-batches and epochs, overfitting vs underfitting, bias-variance trade-off, train/val/test split, early stopping
Planned
16
Learning Rate Schedules Explainer
Constant LR, step decay, cosine annealing, warmup + cosine (modern default), cyclical LR — all visualised as interactive schedule curves with live loss simulation
Planned
EvaluationEvaluation
17
Evaluation Metrics Explainer
Accuracy, precision, recall, F1 score, ROC curve, AUC — interactive confusion matrix, when each metric matters and when it misleads (class imbalance)
Planned
Design principles
Each explainer introduces exactly one new idea — no explainer tries to cover an entire area at once.
All explainers follow the same header, footer, and style conventions established in the series.
Every explainer is a self-contained single HTML file — no external dependencies, no frameworks, no install required.