Alibaba releases its Qwen3.5-Omni omnimodal LLM with support for 10+ hours of audio input, saying the Plus variant surpasses Gemini 3.1 Pro on audio benchmarks

Qwen3.5-Omni is Qwen's latest generation of fully omnimodal LLM, supporting the understanding of text, images, audio, and audio-visual content.

Qwen 2026-03-31

Discussion

@alibaba_qwen @alibaba_qwen on x
🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’. [i…
@kimmonismus @kimmonismus on x
Alibaba's Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn't mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging. [image]
@alibaba_qwen @alibaba_qwen on x
Demo2：Audio-Visual Vibe Coding [video]
@adinayakup Adina Yakup on x
Qwen @Alibaba_Qwen just released Qwen3.5-Omni 🔥 Weights are not released ( yet?), but you can try the demos: ✨ Online demo https://huggingface.co/... ✨ Offline demo https://huggingface.co/...
@alibaba_qwen @alibaba_qwen on x
Demo1：Audio-Visual Captioning [video]
@alibabagroup @alibabagroup on x
🚀 Introducing Qwen3.5-Omni, the latest fully omnimodal LLM in the family. With exceptional full-modality perception and generation capabilities, it's built to drive the next generation of AI applications. #AlibabaAI #Qwen
@bowang87 Bo Wang on x
Qwen3.5-Omni might be the strongest multimodal frontier model right now. What impressed me most: audio-visual vibe coding. Point your camera at something, describe what you want, and it turns that into working code. Really hope this gets open-sourced soon.
@ali_tongyilab @ali_tongyilab on x
1/10 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: Audio-Visual Vibe [image]
r/singularity r on reddit
Qwen3.5 Omni - Qwen's latest generation of fully omnimodal LLM

Chronicles

Alibaba releases its Qwen3.5-Omni omnimodal LLM with support for 10+ hours of audio input, saying the Plus variant surpasses Gemini 3.1 Pro on audio benchmarks

Related Coverage

Discussion