OmniHuman: ByteDance’s new AI creates realistic videos from a single photo

ByteDance researchers have developed an artificial intelligence system that transforms single photographs into realistic videos of people speaking, singing and moving naturally — a breakthrough that could reshape digital entertainment and communications.

The new system, called OmniHuman, generates full-body videos showing people gesturing and moving in ways that match their speech, surpassing previous AI models that could only animate faces or upper bodies.

How OmniHuman Uses 18,700 Hours of Training Data to Create Realistic Motion

“End-to-end human animation has undergone notable advancements in recent years. However, existing methods still struggle to scale up as large general video generation models, limiting their potential in real applications,” the researchers wrote in a paper published on arXiv.

The team trained OmniHuman on more than 18,700 hours of human video data using a novel approach that combines multiple types of inputs — text, audio, and body movements. This “omni-conditions” training strategy allows the AI to learn from much larger and more diverse datasets than previous methods.

AI video generation breakthrough shows full-body movement and natural gestures

“Our key insight is that incorporating multiple conditioning signals, such as text, audio, and pose, during training can significantly reduce data wastage,” the research team explained.

The technology marks a significant advance in AI-generated media, demonstrating capabilities from creating videos of people delivering speeches to showing subjects playing musical instruments. In testing, OmniHuman outperformed existing systems across multiple quality benchmarks.

Tech giants race to develop next-generation video AI systems

The development emerges amid intensifying competition in AI video generation, with companies like Google, Meta, and Microsoft pursuing similar technology. ByteDance’s breakthrough could give the TikTok parent company an advantage in this rapidly evolving field.

Industry experts say such technology could transform entertainment production, educational content creation, and digital communications. However, it also raises concerns about potential misuse in creating synthetic media for deceptive purposes.

The researchers will present their findings at an upcoming computer vision conference, though they have not yet specified which one.

The post OmniHuman: ByteDance’s new AI creates realistic videos from a single photo appeared first on Venture Beat.