Mira Murati’s Thinking Machines Achieves Major Breakthrough In LLM Nondeterminism

While Large language models (LLMs) have become quite popular, the issue of different outputs for the same prompt has been persistent

In a significant milestone in the world of Large language models (LLMs), AI company Thinking Machines has found out the primary cause of nondeterminism. Thinking Machines is led by Mira Murati, who had earlier worked as the CTO at OpenAI. The nondeterminism issue is commonly seen in LLM models, where giving the same prompt leads to varying outputs. Thinking Machines has not only identified the root cause of the problem, but has also developed a solution to fix the issue. Let us get more details on this development.

Image Credit – Google Gemini

The problem of nondeterminism in Large Language Models (LLMs)

Large language models (LLMs) are designed to generate consistent outputs for a specific input. Especially when randomness (managed by a parameter called temperature) is disabled. In an ideal situation, using the same prompt multiple times should give the same results. But in practice, LLMs often generate different outputs for the same input. This event is referred to as nondeterminism.

This difference in outputs poses challenges for developers, researchers, and users who depend on consistent model behavior. Earlier, experts used to think that nondeterminism was caused due to minor computational factors. For example, rounding errors in mathematical calculations or the parallel processing working of graphics processing units (GPUs).

It was believed that accumulation of these small discrepancies was leading to slight changes in outputs. While these factors do contribute, they do not fully explain the inconsistent behavior observed in LLMs. That in turn prompted researchers to dig deeper to find the root causes of this issue.

New insights into batch invariance

Thinking Machines, founded by former OpenAI CTO Mira Murati, has finally made a breakthrough. According to Thinking Machines, the main cause of nondeterminism in LLMs is the presence or absence of batch invariance. Batch invariance is essentially the LLM’s ability to generate the same results, irrespective of the number of prompts.

In most of the present-day LLMs, core operations vary based on the batch size. Some examples include operations like attention mechanisms, matrix multiplication and normalization layers. These variations are usually minor in nature. But with each step, the overall effect gets magnified. It eventually leads to different outputs for the same input given multiple times.

This finding by Thinking Machines is important, since it has been able to identify the fundamental design flaws in the working of LLMs. Earlier, the focus was on hardware-related errors. Now, we know exactly what is wrong. This development will play a crucial role in improving AI architecture.

Solution provided by Thinking Machines

As a solution, Thinking Machines has developed a ‘batch-invariant’ version of the core LLM operations. This approach ensures that mathematical computations do not change regardless of the batch size. The variability associated with batch processing is thus eliminated with this solution. It helps create more stable and reliable LLM systems.

Thinking Machines’ breakthrough system has been thoroughly tested and it has proved to be a success. When using the Qwen-3-8B model, 80 different outputs were noticed for the same input given 1000 times. In comparison, Thinking Machines’ system delivered the same results for all the 1000 identical inputs.

This goes on to show that nondeterminism can be eliminated without making major changes to the core of the LLM model. This is an easy-to-use solution and also practical when it comes to achieving a high level of accuracy.

Further improvements needed

While Thinking Machines’ solution is effective, it works slower than the standard LLM. Efforts were made to optimize some of the core operations such as the attention mechanism. But even then, the speed still lags compared to the standard LLM.

However, even with the slower speed, Thinking Machines claims that the benefits are far greater than the speed constraint. This is especially true in areas where high accuracy is paramount. Some examples include debugging, AI safety testing, scientific research, etc.

Reproducibility is critical for scientific processes in order to enable consistent testing, validation and comparison of results. Current LLMs lack guaranteed consistency, something that complicates these processes.

Thinking Machines’ innovation can be a gamechanger, as it gives priority to reliability and accuracy. It can pave the way for the development of future LLM models. As more critical applications start to depend on AI systems, ensuring reliability with performance will be crucial. Thinking Machines’ solution is a pioneering step towards achieving that balance.

Check Also

What is Cross-Market AI?

By using data from multiple sources, Cross-Market AI can provide more accurate results and predictions …