OpenInfer has raised $8 million in funding to redefine AI inference for edge applications.
It’s the brain child of Behnam Bastani and Reza Nourai, who spent nearly a decade of building and scaling AI systems together at Meta’s Reality Labs and Roblox.
Through their work at the forefront of AI and system design, Bastani and Nourai witnessed firsthand how deep system architecture enables continuous, large-scale AI inference. However, today’s AI inference remains locked behind cloud APIs and hosted systems—a barrier for low-latency, private, and cost-efficient edge applications. OpenInfer changes that. It wants to agnostic to the types of devices at the edge, Bastani said in an interview with GamesBeat.
By enabling the seamless execution of large AI models directly on devices—from SoCs to the cloud—OpenInfer removes these barriers, enabling inference of AI models without compromising performance.
The implication? Imagine a world where your phone anticipates your needs in real time — translating languages instantly, enhancing photos with studio-quality precision, or powering a voice assistant that truly understands you. With AI inference running directly on your device, users can expect faster performance, greater privacy, and uninterrupted functionality no matter where they are. This shift eliminates lag and brings intelligent, high-speed computing to the palm of your hand.
Building the OpenInfer Engine: AI Agent Inference Engine
Since founding the company six months ago, Bastani and Nourai have assembled a team ofseven, including former colleagues from their time at Meta. While at Meta, they had built OculusLink together, showcasing their expertise in low-latency, high-performance system design.
Bastani previously served as Director of Architecture at Meta’s Reality Labs and led teams atGoogle focused on mobile rendering, VR, and display systems. Most recently, he was SeniorDirector of Engineering for Engine AI at Roblox. Nourai has held senior engineering roles ingraphics and gaming at industry leaders including Roblox, Meta, Magic Leap, and Microsoft.OpenInfer is building the OpenInfer Engine, what they call an “AI agent inference engine”designed for unmatched performance and seamless integration.
To accomplish the first goal of unmatched performance, the first release of the OpenInferEngine delivers 2-3x faster inference compared to Llama.cpp and Ollama for distilled DeepSeekmodels. This boost comes from targeted optimizations, including streamlined handling ofquantized values, improved memory access through enhanced caching, and model-specifictuning—all without requiring modifications to the models.
To accomplish the second goal of seamless integration with effortless deployment, theOpenInfer Engine is designed as a drop-in replacement, allowing users to switch endpointssimply by updating a URL. Existing agents and frameworks continue to function seamlessly,without any modifications.
“OpenInfer’s advancements mark a major leap for AI developers. By significantly boostinginference speeds, Behnam and his team are making real-time AI applications more responsive,accelerating development cycles, and enabling powerful models to run efficiently on edgedevices. This opens new possibilities for on-device intelligence and expands what’s possible inAI-driven innovation,” said Ernestine Fu Mak, Managing Partner at Brave Capital and aninvestor in OpenInfer.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inferenceon large models—outperforming industry leaders on edge devices. By designing inference fromthe ground up, they are unlocking higher throughput, lower memory usage, and seamlessexecution on local hardware.
Future roadmap: Seamless AI inference across all devices
OpenInfer’s launch is well-timed, especially in light of recent DeepSeek news. As AI adoptionaccelerates, inference has overtaken training as the primary driver of compute demand. Whileinnovations like DeepSeek reduce computational requirements for both training and inference,edge-based applications still struggle with performance and efficiency due to limited processingpower. Running large AI models on consumer devices demands new inference methods thatenable low-latency, high-throughput performance without relying on cloud infrastructure,creating significant opportunities for companies optimizing AI for local hardware.
“Without OpenInfer, AI inference on edge devices is inefficient due to the absence of a clearhardware abstraction layer. This challenge makes deploying large models oncompute-constrained platforms incredibly difficult, pushing AI workloads back to thecloud—where they become costly, slow, and dependent on network conditions. OpenInferrevolutionizes inference on the edge,” said Gokul Rajaram, an investor in OpenInfer. Rajaram isan angel investor and currently a board member of Coinbase and Pinterest.
In particular, OpenInfer is uniquely positioned to help silicon and hardware vendors enhance AIinference performance on devices. Enterprises needing on-device AI for privacy, cost, orreliability can leverage OpenInfer, with key applications in robotics, defense, agentic AI, andmodel development.
In mobile gaming, OpenInfer’s technology enables ultra-responsive gameplay with real-timeadaptive AI. Enabling on-system inference allows for reduced latency and smarter in-gamedynamics. Players will enjoy smoother graphics, AI-powered personalized challenges, and amore immersive experience evolving with every move.
“At OpenInfer, our vision is to seamlessly integrate AI into every surface,” said Bastani. “We aim to establish OpenInfer as the default inference engine across all devices—powering AI in self-driving cars, laptops, mobile devices, robots, and more.”
OpenInfer has raised an $8 million seed round for its first round of financing. Investors includeBrave Capital, Cota Capital, Essence VC, Operator Stack, StemAI, Oculus VR’s Co-founder and former CEO Brendan Iribe, Google Deepmind’s Chief Scientist Jeff Dean, Microsoft Experiences and Devices’ Chief Product Officer Aparna Chennapragada, angel investor Gokul Rajaram, and others.
“The current AI ecosystem is dominated by a few centralized players who control access toinference through cloud APIs and hosted services. At OpenInfer, we are changing that,” saidBastani. “Our name reflects our mission: we are ‘opening’ access to AI inference—givingeveryone the ability to run powerful AI models locally, without being locked into expensive cloudservices. We believe in a future where AI is accessible, decentralized, and truly in the hands ofits users.”
The post OpenInfer raises $8M for AI inference at the edge appeared first on Venture Beat.