Free the PhD
Every major AI lab is racing to build an AI scientist, and the approach is consistent across all of them. Take what human scientists do, remove the bottlenecks, multiply the throughput. Hypothesis generation, experimental design, data analysis, paper writing. Do it all faster, at scale, without the need for sleep, grant applications, or in some cases even wet lab skills if autonomous cloud labs succeed.
This will undoubtedly accelerate progress, but it also feels like a race to scale human flaws. Maybe the goal isn't to replicate human science faster, but to reimagine what science could become.
Human science is shaped by human constraints. We read slowly and synthesize imperfectly. We hold limited information in working memory. We get tired, territorial, and risk-averse. We overfit to fashionable problems because careers depend on legibility and gatekeepers. These limitations have shaped every institution we've built, including PhD programs, peer review, journals, funding agencies. The entire apparatus of modern science exists because of human constraints.
Evolution is a useful guide. What exists in nature is routinely mistaken for what's optimal, but organisms and their traits are local maxima. They're products of historical accident and path dependency. The human eye has a blind spot because the optic nerve needs to cross the retina to reach the brain. It works, but nobody would design it this way from scratch.
Human science has its own blind spots. While we've made extraordinary progress given the narrow channels our limitations permit, the scientific method is a workaround for human cognitive biases. Peer review exists because we can be unreliable judges of our own reasoning. Replication is needed because we can be unreliable judges of our own results. The whole elaborate structure is a scaffold built to compensate for our weaknesses, or to train us to become incrementally better at navigating them.
When AI labs set out to build systems that replicate human scientists, they're importing human constraints where entirely different ones exist. To be clear, this isn't meant as a pessimistic view of human-driven science. We've accomplished extraordinary things. But we are optimizing for local maxima, and there's an opportunity to aim higher if we don't assume the same constraints.
The Wrong Muscle
A useful example of human-constrained science is PhD training in the life sciences at major US research universities, where I spent my career as a graduate student, postdoc, and faculty member. The first two years of a typical program involve several activities: exposure to a broad range of science and scientists to facilitate laboratory selection, coursework to level the playing field for people coming from other fields, and technical skills development. But underlying all of this is a massive undertaking. Students must immerse themselves in the literature, build a mental map of what's known and unknown as well as the tools to interrogate them, and then propose research that extends the boundary.
It takes years to reach the frontier of knowledge from what appears in undergraduate textbooks. The exercise itself is valuable, but the timeline is an artifact of constraint.
Finding the boundaries of knowledge in a field is now, functionally, an AI search. Even with human audits and fact-checking, what took two years of reading can now happen far more quickly. And AI-assisted content absorption needn't be superficial. These tools can contextualize and interrogate literature with a rigor that's hard to achieve manually. If we can accelerate context and content absorption, we can spend more of our time asking deeper questions about the source material itself. This opens a genuine opportunity to rethink what those two years could become.
The real muscle a scientist needs is strategic. Which unknowns matter most? Which questions would reduce uncertainty across an entire field regardless of outcome?
Most PhD programs, despite their best intentions, are thin on training this capacity. Qualifying exams gating transition to independent research are supposed to test it rigorously, but almost no one fails. With so much time spent on content acquisition or other laborious but routine tasks, students get to practice strategic thinking at a macro level only a handful of times during their entire doctorate, if ever. Some never do it independently if their advisors, with the best of intentions, are heavily involved. Students learn to find something unknown and study it, but they rarely learn to identify the unknown that would be most valuable to resolve.
When I started graduate school more than two decades ago, it was more common to have students reason around off-topic areas outside the lab's research interests, partly to create space for independent thinking and also to seed transferable skills that aren't so topic specific. That practice is becoming rare, and many programs now involve far more advisor involvement and structured support. Even when truly independent project ideation happens, the opportunities are few.
The approach academia takes to practicing these skills is often through grant proposals that can be reused to obtain funding, which means the proposals tend to be safe, de-risked, and incremental. There are many systemic reasons why novel ideas or promising leads outside one's deep technical expertise get suppressed, and this propagates down to graduate training. Even though we know the value of designing experiments that are rigorous and informative even if they fail, grant proposals too often depend on hefty preliminary data and likely success.
While not the explicit intent, it's the predictable result of a system designed around human constraints. But those constraints are now shifting, and what was once impossible to change becomes possible to reimagine.
A Different Model
What would it look like to change the approach? We could automate the literature synthesis and teach students to interrogate that literature like it is living, breathing work. We could give students many more shots on goal when it comes to honing their macro-level judgment on problem selection and more rational, data-driven exploration of the hypothesis space. Instead of one or two high-stakes strategic exercises across five years, we could make strategic thinking the core of the curriculum.
The reps matter. Right now, students aren't getting enough of them.
Alpha School offers an interesting precedent. It's an experimental K–12 program backed by some of the most successful builders in tech, built around a simple wager: if AI can handle content delivery, school should focus on everything else. Children there compress content absorption into AI-assisted morning sessions, including in more personalized ways that improve retention, and then spend the rest of the day on harder problems like judgment, creativity, collaboration, and strategic thinking. The PhD equivalent would be students who arrive at the frontier in weeks rather than years, and then spend the bulk of their time doing what we currently compress into a handful of exercises: identifying which problems matter most, designing studies that are informative regardless of outcome, and learning to think across fields.
If we're willing to run an experiment to upend education with eight-year-olds, PhD programs have an extraordinary opportunity to lead.
For PhD students, getting to the boundary of known science is an essential task, but it's also misleading. Many modern fields are young and most of the information space is completely untapped. The corpus can feel overwhelming in its volume, but it's insignificant in the grand scheme of things — so any novel work beyond what's known is not of equal value. The strategic question isn't just "what's unknown?" but "which unknowns, if resolved, would reshape the most understanding?"
The Moment
Universities move slowly, and curricula are sticky. But students are already navigating the creakiness of the system on their own. Many are finding ways to leverage new tools within their existing programs, and others are abandoning the traditional path entirely because they're convinced that cutting-edge research can't happen in institutions this slow to change.
Having reviewed applications from thousands of PhDs seeking research positions both in and out of academia, I worry about how many students are gaining credentials but lack strong strategic capabilities. The current system produces experts in content, but it does not reliably produce experts in judgment. Or it produces graduates who have learned to optimize within the system and mistaken that for thinking strategically about science itself. Educators aren't oblivious to these issues and have to some degree been experimenting with solutions. Our current moment, however, makes the problem more urgent than ever.
The gap between those paying attention to AI and those who aren't is widening fast, and scientists who haven't grappled with what's coming may be caught off guard. Many are still discussing the same questions they discussed years ago, in the same echo chambers, with the same assumptions. And frankly, those who are guarded and resistant aren't going to be the best at training the next generation to be scientifically fearless.
The AI labs face their own version of this challenge. They're so focused on replicating human science that they've stopped asking whether human science is the right benchmark. The stated goal is often to "reach human level" first and then go beyond, but there's no law requiring that sequence. It's just the path of least resistance, because we know what human science looks like and can measure progress against it. The benchmark is legible even if it's limited.
The same logic applies to how we generate data, not just how we consume it. More data isn't always better when the data itself reflect the biases of how humans have chosen to generate it. Biological systems contain massive redundancy from shared evolutionary ancestry, and human-generated datasets compound that by oversampling the same organisms, pathways, and questions. Models trained on this learn the well-trodden parts of biology deeply and the rest barely at all. This is perhaps the most underappreciated problem in AI for science: the training data isn't just incomplete, it's systematically skewed by the same human constraints the AI is supposed to transcend. Generating more experiments, more data, and more papers without correcting for this inherits all the pathologies of human science and adds the pathologies of scale.
The Invitation
The real opportunity is to change what we're aiming for, and it's an exciting one.
We can build AI that operates without human constraints rather than AI that replicates human scientists. We can build PhD programs that spend those early years on strategic thinking, hard judgment calls, and deep problem selection rather than on knowledge absorption alone. We can train humans to be strategic thinkers who deploy AI for the mechanical parts, freeing them to pursue the most information-rich unknowns rather than accumulating and overindexing on what's already known.
Scientists have long lamented that new research often ignores advances in other fields, neglects foundational literature, or fails to engage with the most recent work. These problems become tractable when the tools for comprehensive synthesis actually exist, and richer context improves the quality of the strategic exercise itself.
The protected time of a doctorate remains genuinely valuable. Five funded years to think deeply about hard problems without commercial pressure is rare and precious. But too much of that time currently goes to activities that AI handles trivially. We have a chance to reclaim it for judgment, for strategy, and for the questions that only a human, freed from drudgery, can think to ask.
I don't believe systemic change must precede individual change. If entire graduate programs have to transform before practices can evolve, we'd be waiting forever. There are as many ways to do science as there are scientists. I'd encourage students to leverage the latest tools to quickly gain context, interrogate prior knowledge rigorously, and iterate on their own ideas to develop their own judgment, whether or not their program requires it. Think of your program's requirements as the low water mark, not the high water mark. When the rulebook for how to do science is changing by the day, students should feel empowered to chart the future rather than chained to the past.
The future of science lives in humans who finally get to become the scientists we always could have been. As AI handles the mechanical parts, we get to level up strategically. What gets lost in all the anxiety is that this will be fun. The tedium is disappearing, and what remains is the work that made us fall in love with science in the first place.

Are LLMs truly a good tool for synthesis of research? I’ve found LLMs citing papers that don’t exist, or mentioning points or insights that upon further review are not in the cited papers at all. As an early researcher that isn’t an expert in any field, I worry about being led astray by hallucination.
https://criticalneuro.substack.com/p/techno-pawns