The Modern Transformer Anatomy
The models you actually fine-tune (Llama, Qwen) are decoder-only Transformers with five upgrades over the 2017 original β RoPE, RMSNorm, SwiGLU, GQA, and (sometimes) Mixture-of-Experts. Same core idea, much better engineering.
β Access required
This piece is private.
Some series on Black Strat β notes on reinsurance, the data foundations, transformer architectures, hardware & compute β are restricted to readers with the access password.
If GK has shared the password with you, enter it once and your browser will stay unlocked for 30 days.
Unlock with password βDonβt have the password? Write to GK.