When you’ve seen enough AI announcements come and go—the breathless promises, the product launches, the pivot decks—a certain kind of exhaustion sets in. Therefore, it’s worth stopping when a different narrative begins to emerge from one of the organizations that may have the greatest impact on the true direction of this technology.
Two computer scientists at Stanford are working on a seemingly modest project: creating artificial intelligence that is truly human-centered. Assistant professors Ludwig Schmidt and Diyi Yang of Stanford’s School of Engineering spend their days debating issues that the majority of the industry seems willing to ignore. What is the true appearance of a good training dataset? In what ways does language convey culture? What makes you trust an AI system when it gives you advice?
Schmidt focuses on the data problem, which is more fascinating than it sounds and more urgent. Today’s large language models are powered by nearly everything that was scraped from the internet, a vast and flawed corpus that captures the noise, biases, and gaps of the web as it is. Schmidt has stated, “Eventually, we will run out of internet data to train on,” which begs the question of what will happen next. In an effort to make sure that the foundation AI is built upon is genuinely strong rather than merely vast, his lab develops the methods for curating training datasets that span trillions of words. Training data deficiencies are the root cause of some of AI’s most persistent errors, the kind that render it unreliable for challenging real-world tasks. Schmidt works there.
Yang approached the issue in a different way. As an international student, she discovered, like many others, that language was confusing in ways that went beyond vocabulary. It contained cultural presumptions, textures, and registers that no dictionary entry could adequately describe. Her current research is shaped by that experience. There are significant differences in how various regional varieties of English are processed, and AI models are disproportionately trained on English-language data. It’s possible that two people who speak Atlanta English and San Francisco English will interact with the same AI system in very different ways. In an effort to bridge that gap, Yang’s Social and Language Technologies Lab is developing systems that are able to comprehend a greater variety of human experiences as well as additional languages.
Product teams may not be conducting the most important AI research as they strive to meet the next capability benchmark. Spending time with this work gives the impression that the field has been developing more quickly than it has been realizing, and that Schmidt and Yang’s efforts are akin to an effort to catch up.

Beneath all of this is the question of trust. Research from Stanford’s Causality in Cognition Lab examined a topic that has received surprisingly little attention: whether or not people use advice differently depending on whether they think it came from a human or an AI. The results cast doubt on the conventional narrative. Individuals’ inclination to heed advice is largely influenced by their preconceived notions about whether humans or AI are more qualified for a given task. However, regardless of the source, they integrate the advice in a similar way once they decide to put it into practice. One of the real obstacles to adoption is still the black-box nature of current AI systems, especially in high-stakes fields like medicine where an AI diagnostic recommendation carries actual risk and weight.
All of this is reinforced by Stanford’s larger institutional push. Over the course of the previous academic year, the university has been increasing infrastructure, computing capacity, and interdisciplinary collaboration. In late 2024, a new GPU-based supercomputer called Marlowe came online. The model represents an increasing understanding that developing capable AI and developing principled AI are not distinct endeavors, with engineers collaborating with sociologists and humanists with medical researchers. The projects must be the same.
As this develops, it’s difficult to ignore the fact that the discourse surrounding AI has begun to change significantly. Something more nuanced and truthful is replacing the early framing of AI as pure capability and AI as a product. What these systems are capable of is not the only question. It concerns who they work for, whose language they comprehend, whose errors they commit, and whether or not the individuals utilizing them have a legitimate foundation for trust. It’s not just Stanford posing these queries. However, some of the most thoughtful solutions might be developed there.

