Back to feed

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast

May 15, 2026

5/15/2026

Small Neural Networks Amortize Deep Computations To Produce Practical Predictions Without Relying On Exact Solutions

Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast

Science, Technology & Innovation · May 15, 2026

Jang argues AlphaGo shows small neural networks can amortize intractably deep search by using a value network to compress future playouts into a single win-probability estimate, implying many ‘hard’ problems are macroscopically compressible so AI should prioritize approximation quality over exact worst-case optimality.


5/15/2026

Automated Research Agents Accelerate Execution And Diagnosis But Struggle With High Level Experimental Steering And Experiment Selection

Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast

Science, Technology & Innovation · May 15, 2026

LLM coding agents (e.g., Claude Opus 4.6/4.7) can automate and speed up execution, debugging, and reporting in research but struggle with high-level experimental steering—deciding when to abandon or reframe lines of inquiry—so humans still handle outer-loop judgment.


5/15/2026

Go Research Is Now Accessible To Hobbyists And Small Labs Through Algorithmic Efficiency Gains And AI Assisted Coding

Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast

Science, Technology & Innovation · May 15, 2026

Open-source algorithmic advances (notably KataGo) plus LLM-assisted coding have collapsed the compute and engineering cost of AlphaGo-style Go research, so individuals can now reproduce and iterate on strong Go systems for thousands—not millions—of dollars (e.g., Eric Jang’s ~$10K budget).


5/15/2026

Go-Like Tree Search Does Not Directly Transfer to Language Models and Points to New Forms of Forward Simulation or Structured Reasoning

Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast

Science, Technology & Innovation · May 15, 2026

Jang argues that while Go is a useful model for reasoning research, AlphaGo-style MCTS/PUCT is unlikely to transfer directly to language models because language’s vast, open-ended action space, nondeterministic transitions, and near-impossibility of revisiting identical children break the visit-count and exploration–exploitation assumptions, so future LLM search should pursue new forward-simulation or structured-reasoning approaches that preserve local improvement without Go-like discreteness.


5/15/2026

Search-Based Policy Improvement With Dense Supervision Enables Efficient Reinforcement Learning By Turning Self-Play Into Repeated Supervised Learning

Eric Jang – Building AlphaGo from scratch · Dwarkesh Podcast

Science, Technology & Innovation · May 15, 2026

AlphaGo gains efficiency by using MCTS to produce improved per-move action distributions as supervised labels—converting reinforcement learning into repeated supervised learning with dense, low-variance training signals instead of relying on sparse trajectory rewards.