DataΒ·Transformer ArchitecturesπŸ”’ Private

The Modern Transformer Anatomy

The models you actually fine-tune (Llama, Qwen) are decoder-only Transformers with five upgrades over the 2017 original β€” RoPE, RMSNorm, SwiGLU, GQA, and (sometimes) Mixture-of-Experts. Same core idea, much better engineering.

β€” Access required

This piece is private.

Some series on Black Strat β€” notes on reinsurance, the data foundations, transformer architectures, hardware & compute β€” are restricted to readers with the access password.

If GK has shared the password with you, enter it once and your browser will stay unlocked for 30 days.

Unlock with password β†’

Don’t have the password? Write to GK.