Data·Transformer Architectures🔒 Private

GPT, Decoder-Only Models & In-Context Learning

Why "predict the next token" — scaled up — became the recipe for general-purpose AI. Causal attention, autoregressive generation, and the in-context learning that emerged with GPT-3.

— Access required

This piece is private.

Some series on Black Strat — notes on reinsurance, the data foundations, transformer architectures, hardware & compute — are restricted to readers with the access password.

If GK has shared the password with you, enter it once and your browser will stay unlocked for 30 days.

Unlock with password →

Don’t have the password? Write to GK.