MiniMax is perhaps today best known here in the U.S. as the Singaporean company behind Hailuo, a realistic, high-resolution generative AI video model that competes with Runway, OpenAI’s Sora and Luma AI’s Dream Machine.
But the company has far more tricks up its sleeve: Today, for instance, it announced the release and open-sourcing of the MiniMax-01 series, a new family of models built to handle ultra-long contexts and enhance AI agent development.
The series includes MiniMax-Text-01, a foundation large language model (LLM), and MiniMax-VL-01, a visual multi-modal model.
A massive context window
MiniMax-Text-o1, is of particular note for enabling up to 4 million tokens in its context window — equivalent to a small library’s worth of books. The context window is how much information the LLM can handle in one input/output exchange, with words and concepts represented as numerical “tokens,” the LLM’s own internal mathematical abstraction of the data it was trained on.
And, while Google previously led the pack with its Gemini 1.5 Pro model and 2 million token context window, MiniMax remarkably doubled that.
As MiniMax posted on its official X account today: “MiniMax-01 efficiently processes up to 4M tokens — 20 to 32 times the capacity of other leading models. We believe MiniMax-01 is poised to support the anticipated surge in agent-related applications in the coming year, as agents increasingly require extended context handling capabilities and sustained memory.”
The models are available now for download on Hugging Face and Github under a custom MiniMax license, for users to try directly on Hailuo AI Chat (a ChatGPT/Gemini/Claude competitor), and through MiniMax’s application programming interface (API), where third-party developers can link their own unique apps to them.
MiniMax is offering APIs for text and multi-modal processing at competitive rates:
- $0.2 per 1 million input tokens
- $1.1 per 1 million output tokens
For comparison, OpenAI’s GPT-4o costs $2.50 per 1 million input tokens through its API, a staggering 12.5X more expensive.
MiniMax has also integrated a mixture of experts (MoE) framework with 32 experts to optimize scalability. This design balances computational and memory efficiency while maintaining competitive performance on key benchmarks.
Striking new ground with Lightning Attention Architecture
At the heart of MiniMax-01 is a Lightning Attention mechanism, an innovative alternative to transformer architecture.
This design significantly reduces computational complexity. The models consist of 456 billion parameters, with 45.9 billion activated per inference.
Unlike earlier architectures, Lightning Attention employs a mix of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMax, for those like myself who are new to the concept, are the transformation of input numerals into probabilities adding up to 1, so that the LLM can approximate which meaning of the input is likeliest.
MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. Key improvements include:
- MoE all-to-all communication optimization: Reduces inter-GPU communication overhead.
- Varlen ring attention: Minimizes computational waste for long-sequence processing.
- Efficient kernel implementations: Tailored CUDA kernels improve Lightning Attention performance.
These advancements make MiniMax-01 models accessible for real-world applications, while maintaining affordability.
Performance and Benchmarks
On mainstream text and multi-modal benchmarks, MiniMax-01 rivals top-tier models like GPT-4 and Claude-3.5, with especially strong results on long-context evaluations. Notably, MiniMax-Text-01 achieved 100% accuracy on the Needle-In-A-Haystack task with a 4-million-token context.
The models also demonstrate minimal performance degradation as input length increases.
MiniMax plans regular updates to expand the models’ capabilities, including code and multi-modal enhancements.
The company views open-sourcing as a step toward building foundational AI capabilities for the evolving AI agent landscape.
With 2025 predicted to be a transformative year for AI agents, the need for sustained memory and efficient inter-agent communication is increasing. MiniMax’s innovations are designed to meet these challenges.
Open to collaboration
MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. Beyond open-sourcing, its team welcomes technical suggestions and collaboration inquiries at [email protected].
With its commitment to cost-effective and scalable AI, MiniMax positions itself as a key player in shaping the AI agent era. The MiniMax-01 series offers an exciting opportunity for developers to push the boundaries of what long-context AI can achieve.
The post MiniMax unveils its own open source LLM with industry-leading 4M token context appeared first on Venture Beat.