Alibaba releases its Qwen3.5-Omni omnimodal LLM with support for 10+ hours of audio input, saying the Plus variant surpasses Gemini 3.1 Pro on audio benchmarks
Qwen3.5-Omni is Qwen's latest generation of fully omnimodal LLM, supporting the understanding of text, images, audio, and audio-visual content.
Qwen
Related Coverage
- Alibaba Qwen Team Releases Qwen3.5 Omni: A Native Multimodal Model for Text, Audio, Video, and Realtime Interaction MarkTechPost · Asif Razzaq
- Qwen 3.5 Omni: Alibaba's AI Model Can Now Hear, Watch, and Clone Your Voice Decrypt · Jose Antonio Lanz
- Trading Halt: Halted at 4:01:00 p.m. ET - Tra... Benzinga · Web Master
Discussion
-
@alibaba_qwen
@alibaba_qwen
on x
🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’. [i…
-
@kimmonismus
@kimmonismus
on x
Alibaba's Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn't mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging. [image]
-
@alibaba_qwen
@alibaba_qwen
on x
Demo2:Audio-Visual Vibe Coding [video]
-
@adinayakup
Adina Yakup
on x
Qwen @Alibaba_Qwen just released Qwen3.5-Omni 🔥 Weights are not released ( yet?), but you can try the demos: ✨ Online demo https://huggingface.co/... ✨ Offline demo https://huggingface.co/...
-
@alibaba_qwen
@alibaba_qwen
on x
Demo1:Audio-Visual Captioning [video]
-
@alibabagroup
@alibabagroup
on x
🚀 Introducing Qwen3.5-Omni, the latest fully omnimodal LLM in the family. With exceptional full-modality perception and generation capabilities, it's built to drive the next generation of AI applications. #AlibabaAI #Qwen
-
@bowang87
Bo Wang
on x
Qwen3.5-Omni might be the strongest multimodal frontier model right now. What impressed me most: audio-visual vibe coding. Point your camera at something, describe what you want, and it turns that into working code. Really hope this gets open-sourced soon.
-
@ali_tongyilab
@ali_tongyilab
on x
1/10 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: Audio-Visual Vibe [image]
-
r/singularity
r
on reddit
Qwen3.5 Omni - Qwen's latest generation of fully omnimodal LLM