# Multimodal AI
Latest news and articles about Multimodal AI
Total: 10 articles found

iFlytek’s ‘Robot Hyper Brain’ Becomes the Neural Center for China’s Robotics Explosion
iFlytek's 'Robot Hyper Brain' platform has expanded its reach to over 500 robot manufacturers, providing the AI foundations for humanoid and service robots. This growth underscores the company's strategic pivot toward becoming the primary neural provider for China's rapidly expanding robotics industry.

OpenAI Stakes Its Claim in the Voice Economy with Real-time API Pricing
OpenAI has unveiled its pricing for real-time audio APIs, positioning itself to lead the move from text-based AI to live voice interactions. With costs ranging from $32 to $64 per million tokens, the new models aim to enable low-latency translation and transcription for global developers.

Thinking with Coordinates: DeepSeek’s Move Toward ‘System 2’ Multimodal Intelligence
DeepSeek has released a technical framework that enables AI models to use spatial coordinates as 'visual primitives' in their reasoning process. This innovation bridges the referential gap in multimodal AI, allowing for more precise visual reasoning and industry-leading token efficiency.

NVIDIA’s Omni-Vision: Setting New Benchmarks for the Era of Autonomous AI Agents
NVIDIA has launched Nemotron 3 Nano Omni, a multimodal AI model utilizing a Mixture-of-Experts architecture to deliver 9x the efficiency of competing open models. Designed for autonomous agents, the model integrates text, video, and audio reasoning to enable real-time digital interaction and lower deployment costs.

Alibaba Expands Generative AI Frontier with 'HappyHorse' Video Model Beta
Alibaba has launched a beta test for its new AI video generation model, HappyHorse, within the Tongyi Qianwen mobile application. The move aims to solidify Alibaba's position in the generative video market and provide a competitive response to international AI video tools.

From Pixels to Logic: OpenAI’s ChatGPT Images 2.0 and the Dawn of Visual Reasoning
OpenAI has launched ChatGPT Images 2.0, a transformative update that incorporates a reasoning and searching phase into image generation. This shift allows the model to produce consistent multi-image narratives and accurate non-Latin typography, moving AI art from simple generation toward a comprehensive visual reasoning system.

Musk’s xAI Broadens Grok’s Reach: The Strategic Move into Voice APIs
Elon Musk's xAI has launched voice-to-text and text-to-voice APIs for Grok, shifting the platform toward an infrastructure play for developers. The move focuses on high-fidelity, low-latency interactions, positioning Grok to compete with OpenAI in the multi-modal AI market and potentially integrate with Tesla's hardware.

Meta’s Closed-Source Pivot: Zuckerberg Launches ‘Muse Spark’ to Regain AI Supremacy
Meta has launched Muse Spark, a native multimodal AI model that marks the company's high-stakes transition from open-source to proprietary technology. Integrated into Meta's massive social ecosystem, the model aims to compete directly with OpenAI and Google by offering superior efficiency and sophisticated visual reasoning capabilities.

Meta’s Closed-Source Gambit: The ‘Muse Spark’ and the Pivot Toward Superintelligence
Meta has launched Muse Spark, its first 'superintelligence' model, marking a strategic pivot toward closed-source, proprietary AI. Led by Alexandr Wang, the model introduces advanced reasoning modes and aims to commercialize Meta's AI breakthroughs through a new API-centric business model.

Beyond the Text: Zhipu AI’s GLM-5V-Turbo Aims to Redefine the Multimodal Coding Landscape
Zhipu AI has released GLM-5V-Turbo, a multimodal foundation model designed to bridge the gap between visual design and software coding. The release highlights China's shift toward high-efficiency, specialized AI tools that prioritize developer experience and enterprise deployment.