Attention Is All You Need
The 2017 paper that replaced recurrence with attention and made modern LLMs possible — self-attention, Q/K/V, multi-head, positional encoding, and the full encoder–decoder architecture.
— Access required
This piece is private.
Some series on Black Strat — notes on reinsurance, the data foundations, transformer architectures, hardware & compute — are restricted to readers with the access password.
If GK has shared the password with you, enter it once and your browser will stay unlocked for 30 days.
Unlock with password →Don’t have the password? Write to GK.