Alibaba's new Qwen3.5-Omni multimodal model, which processes text, audio, images, and video, is proprietary, marking a shift away from its open-source strategy

Alibaba Group has released the new generation of its large language model that can understand text, audio, images and video.

The Information 2026-03-31 Juro Osawa

Discussion

@chickenlady @chickenlady on bluesky
As per usual. — Subscriptions are extortion.
@alibaba_qwen @alibaba_qwen on x
🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’. [i…
@kimmonismus @kimmonismus on x
Alibaba's Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn't mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging. [image]
@bowang87 Bo Wang on x
Qwen3.5-Omni might be the strongest multimodal frontier model right now. What impressed me most: audio-visual vibe coding. Point your camera at something, describe what you want, and it turns that into working code. Really hope this gets open-sourced soon.
@alibaba_qwen @alibaba_qwen on x
Demo2：Audio-Visual Vibe Coding [video]
@alibabagroup @alibabagroup on x
🚀 Introducing Qwen3.5-Omni, the latest fully omnimodal LLM in the family. With exceptional full-modality perception and generation capabilities, it's built to drive the next generation of AI applications. #AlibabaAI #Qwen
@adinayakup Adina Yakup on x
Qwen @Alibaba_Qwen just released Qwen3.5-Omni 🔥 Weights are not released ( yet?), but you can try the demos: ✨ Online demo https://huggingface.co/... ✨ Offline demo https://huggingface.co/...
@alibaba_qwen @alibaba_qwen on x
Demo1：Audio-Visual Captioning [video]
@ali_tongyilab @ali_tongyilab on x
1/10 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: Audio-Visual Vibe [image]
r/singularity r on reddit
Qwen3.5 Omni - Qwen's latest generation of fully omnimodal LLM

Chronicles

Alibaba's new Qwen3.5-Omni multimodal model, which processes text, audio, images, and video, is proprietary, marking a shift away from its open-source strategy

Related Coverage

Discussion