Writing

Essays on strategy, AI, and the craft of sound — across three pillars and one signature series.

All Strategy Data Tunes

Reading tracks

Browse by series

Series · 🔒 Private6 lessons

Foundations

Core mathematics and concepts behind modern ML — neural-net fundamentals, the softmax + cross-entropy stack, statistical inference. Each note stands alone; together they form the shared language of everything else in Data.

View the series →

Series · 🔒 Private1 lesson

Hardware & Compute

Where the maths meets the silicon. GPUs, CUDA, Apple's MPS/Metal/MLX stack, and what runs locally vs. needs a rented cloud GPU.

View the series →

Series · 🔒 Private5 lessons

Transformer Architectures

From the 2017 paper to today's LLMs. Self-attention and the QKV trio, the GPT decoder-only branch, RLHF and alignment, and the modern transformer anatomy you'd actually fine-tune.

View the series →

All writing

Every essay

Data & AI31 May 2026Private

MPS, Metal, MLX & CUDA

The GPU-compute landscape for machine learning — why GPUs, what CUDA / Metal / MLX each are, unified memory on Apple Silicon, and what runs locally on a Mac versus what needs a cloud GPU.

Data & AI30 May 2026Private

The Modern Transformer Anatomy

The models you actually fine-tune (Llama, Qwen) are decoder-only Transformers with five upgrades over the 2017 original — RoPE, RMSNorm, SwiGLU, GQA, and (sometimes) Mixture-of-Experts. Same core idea, much better engineering.

Data & AI30 May 2026Private

RLHF & Alignment

How a raw next-token predictor becomes a helpful assistant — the three-stage RLHF pipeline behind InstructGPT and ChatGPT, and the simpler DPO alternative that followed.

Data & AI30 May 2026Private

GPT, Decoder-Only Models & In-Context Learning

Why "predict the next token" — scaled up — became the recipe for general-purpose AI. Causal attention, autoregressive generation, and the in-context learning that emerged with GPT-3.

Data & AI30 May 2026Private

What Came After the Transformer

The Transformer didn't lead to one next thing — itbranched, thenscaled, then gotaligned. A map of the lineage from 2017 to today's LLMs, tying the key papers into a single timeline.

Data & AI30 May 2026Private

Attention Is All You Need

The 2017 paper that replaced recurrence with attention and made modern LLMs possible — self-attention, Q/K/V, multi-head, positional encoding, and the full encoder–decoder architecture.

Data & AI29 May 2026Private

Statistics & Inference

Sampling and practical sampling strategies, the Central Limit Theorem, hypothesis testing, and the z / t / chi-square tests — when to use each, what a p-value really means, and where it all applies.

Data & AI29 May 2026Private

Probability Concepts

Conditional probability, independence, the Law of Total Probability, and Bayes' Theorem — with the intuitive examples that make them stick, and why they sit at the heart of machine learning.

Data & AI29 May 2026Private

Cross-Entropy

The loss function behind virtually every classifier and every LLM pre-training run. Where it comes from (surprise & coding theory), why it punishes confident wrong predictions so brutally, and why it pairs so cleanly with softmax.

Data & AI29 May 2026Private

Boltzmann / Gibbs Distribution

The physics-born distribution that quietly powers softmax, the temperature knob in LLM sampling, attention weights, and every energy-based model. Where it comes from, what it means, and where it shows up in modern deep learning.

Data & AI29 May 2026Private

The Softmax Function

How a vector of arbitrary scores becomes a probability distribution — the formula, why the exponential, temperature, numerical stability, the gradient, and why softmax + cross-entropy is the standard classifier head.

Data & AI29 May 2026Private

Artificial Neural Networks — A Refresher

Neurons, activations, the forward pass, loss, gradient descent and backpropagation, optimisers, and the families of neural networks — a bridge from classical ANN theory to the modern deep-learning era.