Those who have worked in a particular MIT building claim that the whiteboards there are never completely erased. Layers of notation accumulate on top of one another like sediment, and incomplete diagrams seep into new ones. It may be insignificant, but it’s something to notice. However, that picture keeps recurring when one considers how researchers at MIT’s CSAIL have approached software engineering in recent years. Compounding ideas. Issues are not so much solved as they are incorporated into the subsequent query.
Researchers from MIT, Carnegie Mellon University, and the University of Michigan have been forming a loose but significant collaboration for more than ten years. A press release and ribbon-cutting are not part of the formal consortium. Researchers cite each other’s work, expand on each other’s frameworks, and sometimes participate in the same conference programs, creating what is more akin to a shared intellectual environment. As a result, some of the most challenging issues in programming have been reconsidered in a low-key and unnoticed manner.

Let’s start with what Armando Solar-Lezama of MIT has been working on. In a recent paper co-authored with colleagues from MIT CSAIL, he argues that the public discourse on AI and coding has been embarrassingly limited. He has stated, “Everyone is talking about how we don’t need programmers anymore,” but no current benchmark adequately captures the true scope of software engineering, which includes migrating legacy systems, identifying concurrency bugs, and examining code for security flaws. SWE-Bench, the industry standard, basically asks AI to fix a GitHub bug. That’s helpful. It’s also similar to judging a surgeon based on how well they complete insurance paperwork.
The fact that CMU and Michigan researchers have been approaching the same blind spots from different perspectives is startling. Cyrus Omar’s Future of Programming Lab at Michigan has spent years rethinking what code should look like when multiple humans are working on it at once, rather than what AI can do with code. Grove, his most recent project, virtually completely dismantles the conventional version control model and was unveiled earlier this year at one of the major programming language conferences. Grove records changes as they occur and displays them as a graph, in contrast to the well-known cycle of diff, patch, and pray that Git users are familiar with. When conflicts arise, they show up as extra edges rather than mysterious error messages.
Grove’s strategy might not work in the complex world of industrial codebases. That’s a legitimate worry. However, as this work progresses, there’s a feeling that it’s resolving an issue that everyone in collaborative software has quietly come to terms with as being unsolvable. It’s not a peculiarity of their workflow for teams to lose hours due to merge conflicts. The tools themselves have a structural defect.
In the meantime, CMU has traditionally approached program synthesis rigorously, skeptically, and with an eye toward what is truly verifiable rather than what looks good in a demo. This is directly related to the benchmarking issue Solar-Lezama’s team is bringing up at MIT. Researchers at CMU have long maintained that statements regarding automated programming must be verifiable, repeatable, and supported. Even when CMU’s influence isn’t immediately apparent on the final paper, this tradition has influenced how the larger research community views assessing AI-generated code.
This is not all taking place in a straight line. Seldom does research. However, there is something noteworthy about the convergence: Michigan is completely redesigning the editing environment; MIT is mapping the entire, inconvenient scope of the problem; and CMU is holding the field accountable to what it can truly prove. Whether any of this reaches the typical developer in five or fifty years is still up in the air. However, the whiteboards are getting full. Usually, that has some significance.

