SenseTime, a Chinese AI company, has released its upgraded SenseNova 5.5 LLM at the 2024 World Artificial Intelligence Conference & High-Level Meeting on Global AI Governance (WAIC 2024).
The new release includes China’s first real-time multimodal model, SenseNova 5o, which offers streaming interaction capabilities comparable to OpenAI’s GPT-4o.
The SenseNova 5.5 upgrade boasts a 30% improvement in overall performance compared to its predecessor, SenseNova 5.0, released just two months ago. Key enhancements include improved mathematical reasoning, English proficiency, and command following abilities, putting it on par with GPT-4o in terms of interactivity and core indicators.
Dr. Xu Li, Chairman of the Board and CEO of SenseTime, emphasised the significance of this release, saying, “This is a critical year for large models as they evolve from unimodal to multimodal. In line with users’ needs, SenseTime is also focused on boosting interactivity.”
The company has also introduced a cost-effective edge-side large model, reducing the cost per device to as low as RMB 9.90 per year. This move aims to facilitate widespread deployment across various IoT devices, including smartphones, tablets, and in-vehicle computers.
SenseTime has expanded its suite of applications with the release of Vimi, an AI avatar video generator capable of creating short video clips with precise control over facial expressions and upper body movements from a single photo. The company has also upgraded its SenseTime Raccoon Series, improving coding precision and response speed in the Code Raccoon tool.
To lower entry barriers for enterprise users, SenseTime launched the “Project $0 Go” scheme, offering a free onboarding bundle for new enterprise users migrating from the OpenAI platform.
The SenseNova Large Model has already been deployed at more than 3,000 government and corporate customers across various industries, including technology, healthcare, finance, and programming.
SenseTime continues to develop AI applications for vertical industries such as finance, agriculture, cultural tourism, and healthcare, aiming to boost productivity and cost-efficiency in these sectors.
Kyutai, a French non-profit AI research laboratory, has introduced Moshi, a real-time native multimodal foundational AI model. This open-source project features a voice-enabled AI assistant offering capabilities that rival OpenAI’s GPT-4o and Google’s Astra.
Meanwhile, Anthropic’s latest model, Claude Sonnet 3.5, continues to challenge GPT-4o by recently dethroning it and securing the top spot in both the Coding Arena and Hard Prompts Arena.