Writing

Essays on strategy, AI, and the craft of sound — across three pillars and one signature series.

All writing

Every essay

Data & AI30 May 2026Private

The Modern Transformer Anatomy

The models you actually fine-tune (Llama, Qwen) are decoder-only Transformers with five upgrades over the 2017 original — RoPE, RMSNorm, SwiGLU, GQA, and (sometimes) Mixture-of-Experts. Same core idea, much better engineering.

Data & AI30 May 2026Private

RLHF & Alignment

How a raw next-token predictor becomes a helpful assistant — the three-stage RLHF pipeline behind InstructGPT and ChatGPT, and the simpler DPO alternative that followed.

Data & AI30 May 2026Private

GPT, Decoder-Only Models & In-Context Learning

Why "predict the next token" — scaled up — became the recipe for general-purpose AI. Causal attention, autoregressive generation, and the in-context learning that emerged with GPT-3.

Data & AI30 May 2026Private

What Came After the Transformer

The Transformer didn't lead to one next thing — itbranched, thenscaled, then gotaligned. A map of the lineage from 2017 to today's LLMs, tying the key papers into a single timeline.

Data & AI30 May 2026Private

Attention Is All You Need

The 2017 paper that replaced recurrence with attention and made modern LLMs possible — self-attention, Q/K/V, multi-head, positional encoding, and the full encoder–decoder architecture.

Data & AI29 May 2026Private

Statistics & Inference

Sampling and practical sampling strategies, the Central Limit Theorem, hypothesis testing, and the z / t / chi-square tests — when to use each, what a p-value really means, and where it all applies.

Data & AI29 May 2026Private

Probability Concepts

Conditional probability, independence, the Law of Total Probability, and Bayes' Theorem — with the intuitive examples that make them stick, and why they sit at the heart of machine learning.

Data & AI29 May 2026Private

Cross-Entropy

The loss function behind virtually every classifier and every LLM pre-training run. Where it comes from (surprise & coding theory), why it punishes confident wrong predictions so brutally, and why it pairs so cleanly with softmax.

Data & AI29 May 2026Private

Boltzmann / Gibbs Distribution

The physics-born distribution that quietly powers softmax, the temperature knob in LLM sampling, attention weights, and every energy-based model. Where it comes from, what it means, and where it shows up in modern deep learning.

Data & AI29 May 2026Private

The Softmax Function

How a vector of arbitrary scores becomes a probability distribution — the formula, why the exponential, temperature, numerical stability, the gradient, and why softmax + cross-entropy is the standard classifier head.

Data & AI29 May 2026Private

Artificial Neural Networks — A Refresher

Neurons, activations, the forward pass, loss, gradient descent and backpropagation, optimisers, and the families of neural networks — a bridge from classical ANN theory to the modern deep-learning era.

Strategy28 Jul 2020

System Thinking Approach to Data Analytics

Why analytics must be a tool that assists decision-making, not an end in itself — and a five-dimension framework for thinking about data inside a real organisation.

← Newer Older →