Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast
Science, Technology & Innovation · May 15, 2026
Jang argues AlphaGo shows small neural networks can amortize intractably deep search by using a value network to compress future playouts into a single win-probability estimate, implying many ‘hard’ problems are macroscopically compressible so AI should prioritize approximation quality over exact worst-case optimality.
Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast
Science, Technology & Innovation · May 15, 2026
LLM coding agents (e.g., Claude Opus 4.6/4.7) can automate and speed up execution, debugging, and reporting in research but struggle with high-level experimental steering—deciding when to abandon or reframe lines of inquiry—so humans still handle outer-loop judgment.
Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast
Science, Technology & Innovation · May 15, 2026
Open-source algorithmic advances (notably KataGo) plus LLM-assisted coding have collapsed the compute and engineering cost of AlphaGo-style Go research, so individuals can now reproduce and iterate on strong Go systems for thousands—not millions—of dollars (e.g., Eric Jang’s ~$10K budget).
Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast
Science, Technology & Innovation · May 15, 2026
Jang argues that while Go is a useful model for reasoning research, AlphaGo-style MCTS/PUCT is unlikely to transfer directly to language models because language’s vast, open-ended action space, nondeterministic transitions, and near-impossibility of revisiting identical children break the visit-count and exploration–exploitation assumptions, so future LLM search should pursue new forward-simulation or structured-reasoning approaches that preserve local improvement without Go-like discreteness.
Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast
Science, Technology & Innovation · May 15, 2026
AlphaGo gains efficiency by using MCTS to produce improved per-move action distributions as supervised labels—converting reinforcement learning into repeated supervised learning with dense, low-variance training signals instead of relying on sparse trajectory rewards.