Writing

Essays on strategy, AI, and the craft of sound β€” across three pillars and one signature series.

All writing

Every essay

Data & AI31 May 2026Private

MPS, Metal, MLX & CUDA

The GPU-compute landscape for machine learning β€” why GPUs, what CUDA / Metal / MLX each are, unified memory on Apple Silicon, and what runs locally on a Mac versus what needs a cloud GPU.

Data & AI30 May 2026Private

The Modern Transformer Anatomy

The models you actually fine-tune (Llama, Qwen) are decoder-only Transformers with five upgrades over the 2017 original β€” RoPE, RMSNorm, SwiGLU, GQA, and (sometimes) Mixture-of-Experts. Same core idea, much better engineering.

Data & AI30 May 2026Private

RLHF & Alignment

How a raw next-token predictor becomes a helpful assistant β€” the three-stage RLHF pipeline behind InstructGPT and ChatGPT, and the simpler DPO alternative that followed.

Data & AI30 May 2026Private

GPT, Decoder-Only Models & In-Context Learning

Why "predict the next token" β€” scaled up β€” became the recipe for general-purpose AI. Causal attention, autoregressive generation, and the in-context learning that emerged with GPT-3.

Data & AI30 May 2026Private

What Came After the Transformer

The Transformer didn't lead to one next thing β€” itbranched, thenscaled, then gotaligned. A map of the lineage from 2017 to today's LLMs, tying the key papers into a single timeline.

Data & AI30 May 2026Private

Attention Is All You Need

The 2017 paper that replaced recurrence with attention and made modern LLMs possible β€” self-attention, Q/K/V, multi-head, positional encoding, and the full encoder–decoder architecture.

Data & AI29 May 2026Private

Statistics & Inference

Sampling and practical sampling strategies, the Central Limit Theorem, hypothesis testing, and the z / t / chi-square tests β€” when to use each, what a p-value really means, and where it all applies.

Data & AI29 May 2026Private

Probability Concepts

Conditional probability, independence, the Law of Total Probability, and Bayes' Theorem β€” with the intuitive examples that make them stick, and why they sit at the heart of machine learning.

Data & AI29 May 2026Private

Cross-Entropy

The loss function behind virtually every classifier and every LLM pre-training run. Where it comes from (surprise & coding theory), why it punishes confident wrong predictions so brutally, and why it pairs so cleanly with softmax.

Data & AI29 May 2026Private

Boltzmann / Gibbs Distribution

The physics-born distribution that quietly powers softmax, the temperature knob in LLM sampling, attention weights, and every energy-based model. Where it comes from, what it means, and where it shows up in modern deep learning.

Data & AI29 May 2026Private

The Softmax Function

How a vector of arbitrary scores becomes a probability distribution β€” the formula, why the exponential, temperature, numerical stability, the gradient, and why softmax + cross-entropy is the standard classifier head.

Data & AI29 May 2026Private

Artificial Neural Networks β€” A Refresher

Neurons, activations, the forward pass, loss, gradient descent and backpropagation, optimisers, and the families of neural networks β€” a bridge from classical ANN theory to the modern deep-learning era.