Cross-Entropy
The loss function behind virtually every classifier and every LLM pre-training run. Where it comes from (surprise & coding theory), why it punishes confident wrong predictions so brutally, and why it pairs so cleanly with softmax.
— Access required
This piece is private.
Some series on Black Strat — notes on reinsurance, the data foundations, transformer architectures, hardware & compute — are restricted to readers with the access password.
If GK has shared the password with you, enter it once and your browser will stay unlocked for 30 days.
Unlock with password →Don’t have the password? Write to GK.