# Multimodal AI

Latest news and articles about Multimodal AI

Total: 10 articles found

A large robot stands beside a small toy robot with colorful studio lighting, showcasing technology innovation.
Technology

iFlytek’s ‘Robot Hyper Brain’ Becomes the Neural Center for China’s Robotics Explosion

iFlytek's 'Robot Hyper Brain' platform has expanded its reach to over 500 robot manufacturers, providing the AI foundations for humanoid and service robots. This growth underscores the company's strategic pivot toward becoming the primary neural provider for China's rapidly expanding robotics industry.

NeTe2026年5月12日 14:53
#iFlytek#Robotics#Humanoid Robots
Close-up of a smartphone showing ChatGPT details on the OpenAI website, held by a person.
Technology

OpenAI Stakes Its Claim in the Voice Economy with Real-time API Pricing

OpenAI has unveiled its pricing for real-time audio APIs, positioning itself to lead the move from text-based AI to live voice interactions. With costs ranging from $32 to $64 per million tokens, the new models aim to enable low-latency translation and transcription for global developers.

NeTe2026年5月7日 17:53
#OpenAI#GPT-Realtime#Artificial Intelligence
Close-up of a digital assistant interface on a dark screen, showcasing AI technology communication.
Technology

Thinking with Coordinates: DeepSeek’s Move Toward ‘System 2’ Multimodal Intelligence

DeepSeek has released a technical framework that enables AI models to use spatial coordinates as 'visual primitives' in their reasoning process. This innovation bridges the referential gap in multimodal AI, allowing for more precise visual reasoning and industry-leading token efficiency.

NeTe2026年5月1日 00:58
#DeepSeek#Artificial Intelligence#Multimodal AI
Close-up of a hand holding a smartphone showing the NVIDIA logo on screen with a blurred background.
Technology

NVIDIA’s Omni-Vision: Setting New Benchmarks for the Era of Autonomous AI Agents

NVIDIA has launched Nemotron 3 Nano Omni, a multimodal AI model utilizing a Mixture-of-Experts architecture to deliver 9x the efficiency of competing open models. Designed for autonomous agents, the model integrates text, video, and audio reasoning to enable real-time digital interaction and lower deployment costs.

NeTe2026年4月28日 20:58
#NVIDIA#Nemotron 3#Multimodal AI
Wooden Scrabble tiles form the word 'QWEN' on a wooden surface, with scattered tiles in the background.
Technology

Alibaba Expands Generative AI Frontier with 'HappyHorse' Video Model Beta

Alibaba has launched a beta test for its new AI video generation model, HappyHorse, within the Tongyi Qianwen mobile application. The move aims to solidify Alibaba's position in the generative video market and provide a competitive response to international AI video tools.

NeTe2026年4月27日 12:28
#Alibaba#Tongyi Qianwen#HappyHorse
OpenAI Website with Introduction to ChatGPT on Computer Monitor
Technology

From Pixels to Logic: OpenAI’s ChatGPT Images 2.0 and the Dawn of Visual Reasoning

OpenAI has launched ChatGPT Images 2.0, a transformative update that incorporates a reasoning and searching phase into image generation. This shift allows the model to produce consistent multi-image narratives and accurate non-Latin typography, moving AI art from simple generation toward a comprehensive visual reasoning system.

NeTe2026年4月22日 02:28
#OpenAI#ChatGPT Images 2.0#Generative AI
A hand holds a smartphone displaying Grok 3 announcement against a red background.
Technology

Musk’s xAI Broadens Grok’s Reach: The Strategic Move into Voice APIs

Elon Musk's xAI has launched voice-to-text and text-to-voice APIs for Grok, shifting the platform toward an infrastructure play for developers. The move focuses on high-fidelity, low-latency interactions, positioning Grok to compete with OpenAI in the multi-modal AI market and potentially integrate with Tesla's hardware.

NeTe2026年4月18日 00:58
#xAI#Grok#Elon Musk
Mesmerizing long exposure shot of spinning sparks creating a dazzling light show at night.
Technology

Meta’s Closed-Source Pivot: Zuckerberg Launches ‘Muse Spark’ to Regain AI Supremacy

Meta has launched Muse Spark, a native multimodal AI model that marks the company's high-stakes transition from open-source to proprietary technology. Integrated into Meta's massive social ecosystem, the model aims to compete directly with OpenAI and Google by offering superior efficiency and sophisticated visual reasoning capabilities.

NeTe2026年4月9日 02:28
#Meta#Muse Spark#Mark Zuckerberg
A person holds a sparkler at twilight, creating a magical and mysterious silhouette effect.
Technology

Meta’s Closed-Source Gambit: The ‘Muse Spark’ and the Pivot Toward Superintelligence

Meta has launched Muse Spark, its first 'superintelligence' model, marking a strategic pivot toward closed-source, proprietary AI. Led by Alexandr Wang, the model introduces advanced reasoning modes and aims to commercialize Meta's AI breakthroughs through a new API-centric business model.

NeTe2026年4月8日 19:58
#Meta#Muse Spark#Superintelligence
Brick building facade with Japanese text under a clear blue sky in Kyoto, Japan.
Technology

Beyond the Text: Zhipu AI’s GLM-5V-Turbo Aims to Redefine the Multimodal Coding Landscape

Zhipu AI has released GLM-5V-Turbo, a multimodal foundation model designed to bridge the gap between visual design and software coding. The release highlights China's shift toward high-efficiency, specialized AI tools that prioritize developer experience and enterprise deployment.

NeTe2026年4月2日 01:29
#Zhipu AI#GLM-5V-Turbo#Multimodal AI