A math Olympiad has a subtle stubbornness to it. Every July, teenagers from all over the world congregate in a hotel ballroom, sharpen their pencils, and tackle issues that most adults wouldn’t even know where to start.
The nations arrive with little booklets filled with the most inventive problems their mathematicians could come up with, which are frequently poorly photocopied. The booklets vanish into desk drawers and private libraries after passing from delegation to delegation in hallway exchanges. No one bothered to collect them in one location for decades.
| Field | Detail |
|---|---|
| Project Name | MathNet |
| Lead Institution | MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) |
| Collaborating Partners | KAUST, HUMAIN |
| Lead Author | Shaden Alshammari, MIT PhD candidate |
| Total Problems | More than 30,000 expert-authored |
| Countries Covered | 47 countries across six continents |
| Languages | 17 |
| Competitions Indexed | 143 |
| Source Material | 1,595 PDF volumes, 25,000+ pages |
| Conference Debut | International Conference on Learning Representations (ICLR), Brazil |
| Validation Team | 30+ evaluators from Armenia, Russia, Ukraine, Vietnam, Poland |
| Public Announcement | Reported by MIT News |
MathNet feels a little overdue and personal because of this neglect. More than 30,000 problems and solutions from 47 countries, 17 languages, and 143 competitions make up the largest high-quality dataset of proof-based math problems ever put together by researchers at MIT’s CSAIL, KAUST, and the company HUMAIN. It will be presented at ICLR in Brazil later this month and is about five times larger than the next largest dataset of its kind. Although the numbers are striking, the backstory is more fascinating than the headline figure would imply.
One man contributed a sizable portion of the archive. Since 2006, Navid Safaei, a co-author on the paper and a longtime member of the IMO community, has been manually gathering and scanning these booklets. A large portion of the project’s framework now consists of twenty years of silent, largely unseen work. That is somewhat depressing—a worldwide standard for artificial intelligence that depends in part on the endurance of a single man using a scanner.

The team had to locate 1,595 PDF volumes, totaling over 25,000 pages, which ranged from crisp digital files to scanned copies of documents that were older than the majority of the students who would use them. It required the kind of meticulous work that is difficult to photograph in order to translate, clean, and standardize the content across more than a dozen languages. The majority of current math datasets come from resources like Art of Problem Solving, where answers are typically brief and informal. MathNet uses official national booklets with peer-reviewed solutions that sometimes span multiple pages and describe various approaches to the same problem. It has teeth because of that depth.
Now it’s difficult to ignore why this is important. There aren’t many mathematical benchmarks left for AI models to surpass. The models have learned the shape, if not the content, of the older datasets, making them overly predictable. Olympiad problems are not the same. They require ingenuity, multi-step abstraction, and a patient inventiveness that is truly hard to imitate. Putting a Romanian geometry problem from 1994 in front of these models seems to make the difference between what they say they can do and what they actually can do much more apparent.
The lead author, Shaden Alshammari, participated in the IMO herself. “I recall a lot of students who had to work alone. “No one in their nation was preparing them for this kind of competition,” she claims. The dataset is available. A research team at DeepMind can now access the same archive as a teenager in Tashkent or Lagos. Even though it results in a smaller headline, that seems like the more significant change.
It’s still unclear if MathNet truly advances reasoning models. The deputy leader of Switzerland’s IMO team, Tanish Patil, speculates that it may eventually assist in resolving a question that has plagued problem-solvers for years: whether a purportedly novel problem is truly new or merely a subdued echo of something written in Bulgaria in 1987. Researchers and investors seem to think that more difficult, bizarre, and truthful tests will lead to the next breakthrough in AI reasoning. This one matches the description.
