Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

Large language models (LLMs) can learn complex reasoning tasks without relying on large datasets, according to a new study by researchers at Shanghai Jiao Tong University. Their findings show that with just a small batch of well-curated examples, you can train an LLM for tasks that were thought to require tens of thousands of training instances.

This efficiency is due to the inherent knowledge that modern LLMs obtain during the pre-training phase. With new training methods becoming more data- and compute-efficient, enterprises might be able to create customized models without requiring access to the resources of large AI labs.

Less is more (LIMO)

In their study, the researchers challenge the assumption that you need large amounts of data to train LLMs for reasoning tasks. They introduce the concept of “less is more” (LIMO). Their work builds on top of previous research that showed LLMs could be aligned with human preferences with a few examples.

In their experiments, they demonstrated that they could create a LIMO dataset for complex mathematical reasoning tasks with a few hundred training examples. An LLM fine-tuned on the dataset was able to create complex chain-of-thought (CoT) reasoning chains that enabled it to accomplish the tasks at a very high success rate.

For example, a Qwen2.5-32B-Instruct model fine-tuned on 817 training examples chosen based on LIMO reached 57.1% accuracy on the highly challenging AIME benchmark and 94.8% on MATH, outperforming models that were trained on a hundred times more examples. It also scored higher on the benchmarks than reasoning models such as QwQ-32B-Preview (a version of the Qwen model that has been trained for reasoning) and OpenAI o1-preview, both of which have been trained with larger data and compute resources.

Moreover, LIMO-trained models generalize to examples drastically different from their training data. For example, on the OlympiadBench scientific benchmark, the LIMO model outperformed QwQ-32B-Preview, and on the challenging GPQA benchmark, it achieved 66.7% accuracy, close to OpenAI-o1-preview’s leading score of 73.3%.

What does it mean for enterprise AI?

Customizing LLMs is an attractive use case for enterprise applications. Thanks to techniques such as retrieval-augmented generation (RAG) and in-context learning, LLMs can be customized to use bespoke data or perform new tasks without the need for expensive fine-tuning.

However, reasoning tasks often require training and fine-tuning LLMs. The widely-held belief has been that such tasks require large volumes of training examples with highly detailed reasoning chains and solutions. Creating such datasets is slow and impractical for many applications and companies.

More recently, researchers have shown that pure reinforcement learning approaches can enable models to train themselves for reasoning tasks by generating many solutions and choosing the ones that work best. While this approach requires less manual effort, it still demands expensive compute resources that are beyond the reach of many enterprises.

On the other hand, crafting a few hundred examples is an endeavor that many companies can tackle, bringing specialized reasoning models within the reach of a wider range of organizations.

“This discovery has profound implications for artificial intelligence research: It suggests that even competition-level complex reasoning abilities can be effectively elicited through minimal but curated training samples,” the researchers write.

Why LIMO works

In their experiments, the researchers identify two key reasons why LLMs can learn complex reasoning tasks with fewer examples.

First, state-of-the-art foundation models have been trained on a very large amount of mathematical content and code during pre-training. This means that these LLMs already possess rich reasoning knowledge in their parameters that can be activated through carefully-crafted examples.

Second, new post-training techniques have shown that allowing models to generate extended reasoning chains significantly improves their reasoning ability. In essence, giving the models more time to “think” allows them to unpack and apply their pre-trained knowledge more effectively.

“We hypothesize that successful reasoning emerges from the synergy of these two factors: rich pre-trained knowledge and sufficient computational resources at inference time,” the researchers write. “These developments collectively suggest a striking possibility: If models possess rich reasoning knowledge and are given adequate computational space, then activating their reasoning capabilities may require only a small number of high-quality training samples that encourage extended deliberation, rather than massive fine-tuning datasets.”

According to the researchers’ findings, creating useful LIMO datasets hinges on choosing the right problems and solutions. Data curators should prioritize challenging problems that require complex reasoning chains, diverse thought processes and knowledge integration. The problems should also deviate from the model’s training distribution to encourage new reasoning approaches and force it toward generalization.

Accordingly, solutions should be clearly and well-organized, with the reasoning steps adapted to the complexity of the problem. High-quality solutions should also provide strategic educational support by gradually building understanding through carefully structured explanations.

“By focusing on a minimal yet meticulously curated set of reasoning chains, we embody the core principle of LIMO: High-quality demonstrations, rather than sheer data volume, are key to unlocking complex reasoning capabilities,” the researchers write.

The researchers have released the code and data used to train the LIMO models in their experiments. In the future, they plan to expand the concept to other domains and applications.

The post Researchers find you don’t need a ton of data to train LLMs for reasoning tasks appeared first on Venture Beat.