Friday, 2024 October 11

Baidu launches Sora-like text-to-video feature on Xiling platform

September 2024 is turning out to be a landmark month for tech launches, with a clear focus on multimodal artificial intelligence and the escalating race to outperform OpenAI’s Sora video generation model.

The momentum kicked off on September 19 at Alibaba Cloud’s Apsara Conference, where the Tongyi Wanxiang video generation model was unveiled. Just a few days later, ByteDance followed suit, debuting its Doubao model during the Volcano Engine AI Innovation Tour on September 24.

In this competitive environment, Baidu has long been a frontrunner in the development of general AI models. However, its response to the AI video generation models, such as Sora, has been notably restrained. So far, Baidu has refrained from releasing a standalone video generation model. Its UniVG method, introduced in January 2024, remains largely within academic realms—a theoretical tool with limited practical applications.

But the landscape shifted on September 25 when Baidu quietly rolled out a text-to-video feature at its AI Cloud Summit. Rather than launching an independent model, this new capability is embedded within Baidu’s upgraded Xiling platform, now in its 4.0 version. The update enables Xiling to generate 3D digital humans and 3D video content from simple text commands.

According to Baidu, these digital avatars do more than just exist—they evolve, adjusting their appearance and style in response to user prompts, making them adaptable to various scenarios. What once required hours to produce could now take a mere five minutes, thanks to significant advances in AI-driven production.

Equally impressive is the cost reduction. The price of creating a realistic 3D digital avatar has dropped from thousands of RMB to just RMB 199 (USD 27.9).

Dou Shen, Executive Vice President and Head of Cloud at Baidu. Photo courtesy: Baidu Inc.

Baidu’s 2024 summit took a practical approach. In contrast to the 2023 event, which introduced 11 new AI applications, this year’s message was clear: Baidu is focused on selling the infrastructure needed by downstream users to scale their businesses. In essence, Baidu has positioned itself as the modern-day shovel seller in the ongoing AI gold rush.

Meanwhile, the exploration of scaling law—a principle behind AI’s exponential growth—continues to push boundaries.

Large AI models, now housing billions or even trillions of parameters, are increasingly powered by GPUs rather than CPUs. GPU clusters that once numbered in the thousands are now scaling into the tens of thousands. Shen Dou, Baidu’s executive vice president and head of Baidu AI Cloud, expects scaling law to continue driving this trend, with GPU clusters surpassing 100,000 units in the near future.

To handle this immense computational power, Baidu has introduced the latest version of Baige, its AI heterogeneous computing platform. On clusters with 10,000 GPUs, Baige can ostensibly maintain a training efficiency above 99.5%. More impressively, it can enhance the efficiency of trillion-parameter mixture of experts (MoE) models by 30%.

Cost efficiency and accessibility remain at the heart of Baidu’s strategy. The price of its flagship Ernie model has been reduced by 90%, while Ernie Speed and Ernie Lite are now freely available. Shen noted that Ernie processes over 700 million requests daily, with more than 700,000 enterprise-level applications developed through Baidu’s Qianfan platform.

Pricing of Baidu ERNIE LLM announced in July 2024. Source: Baidu Inc.

Baidu’s transformation into a foundational infrastructure provider for businesses was underscored by updates to two key products: the Keyue smart customer service platform and the Baidu Comate coding assistant.

Baidu said it has enhanced Keyue to support multimodal interactions—voice, video, and more—resulting in a task completion rate of 92%, significantly higher than the industry average of 80%. Several industry experts told 36Kr that AI models have become particularly adept at handling code, more so than longform text, making coding an ideal application for these models.

With that in mind, Baidu Comate has introduced two new features aimed at enterprises: code structure explanations and code reviews. These features assist developers in navigating unfamiliar codebases and provide intelligent corrections based on a comprehensive understanding of the project.

Baidu has made its direction clear: infrastructure is its core mission. Looking downstream, CEO Robin Li remains optimistic about the future of AI agents. He likened them to the early days of the internet—an accessible entry point with unlimited potential.

KrASIA Connection
KrASIA Connection
KrASIA Connection features translated and adapted high-quality insights published on 36Kr.com, the largest and most influential technology portal in Chinese language with over 150 million readers across the globe.
MORE FROM AUTHOR

Related Read