Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B

QVQ-72B-Preview is an experimental research model developed by the Qwen team … QwenLM on GitHub : Qwen2-VL — Introduction After a year's relentless efforts, today we are thrilled to release Qwen2-VL! Asif Razzaq / MarkTechPost : Qwen Team Releases QvQ: An Open-Weight Model for Multimodal Reasoning X: @alibaba_qwen : 🎄Happy holidays and we wish you enjoy this year. Before moving to 2025, Qwen has the last gift for you, which is QVQ! 🎉 This may be the first open-weight model for visual reasoning. It is called QVQ, where V stands for vision. It just reads an image and an instruction, starts thinking, reflects while it should, keeps reasoning, and finally it generates its prediction with confidence!... Balaji / @balajis : Among decentralized models: Llama is best generalist Qwen is best multimodal/multilingual Mistral has lightest footprint But the space moves so fast that may change tomorrow. Junyang Lin / @justinlin610 : 🎄 Happy holidays! This is something fun for you to play with. QVQ, the new model, can think and reason on images! Binyuan Hui / @huybery : 👀 We've explored inference-time scaling for visual multimodal tasks and introduced QVQ, the first open multimodal o1-like model, which can be seen as the visual counterpart to QwQ. Much like QwQ, QVQ demonstrates intriguing thought processes and has achieved promising results on [image] Adina Yakup / @adinayakup : QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by @Alibaba_Qwen 🎉 https://huggingface.co/... ✨ Combines visual understanding & language reasoning. ✨ Scores 70.3 on MMMU ✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving [image] Vittorio / @iterintellectus : babe, wake up qwen gave us a visual reasoning and open weight model for christmas [image] Casper Hansen / @casper_hansen_ : I hate to be that guy, but you can't retroactively update the license. There will now forever be an Apache 2.0 licensed version of QvQ that you can git checkout [image] Merve / @mervenoyann : QwQ can see 🔥 @Alibaba_Qwen released QvQ, a vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! in the next one ⬇️ [image] Junyang Lin / @justinlin610 : Limitations include biases on English prompts so I advise you to test with prompts in English. Also, due to some concerns, we did bot extend the reasoning capabilities to all domains and sometimes you will meet the model's rejection to answer your questions or its providing very Simon Willison / @simonw : Here are my notes on trying out the brand new QvQ “visual reasoning model” released today (Apache 2 license) by Alibaba's Qwen team https://simonwillison.net/... Simon Willison / @simonw : A note on the license: when the QvQ repo went live yesterday the LICENSE file listed Apache 2, but last night it was updated to the custom Qwen license instead https://simonwillison.net/... Simon Willison / @simonw : I thought we were done for big new model releases in 2024, but evidently not! Here's a new Apache 2 licensed vision LLM from Qwen to round out the year Junyang Lin / @justinlin610 : 72B is too large. @_akhaliq : Qwen QVQ is now available in anychat This may be the first open-weight model for visual reasoning. It is called QVQ, where V stands for vision. It just reads an image and an instruction, starts thinking, reflects while it should, keeps reasoning, and finally it generates its [image] @cocktailpeanut : Qwen QVQ 72B. Wow this is on another level. I gave it a photo of an NY subway train, and asked “Should I get off if i'm headed to Chinatown?” Watch as I scroll through the response. So many thoughts going on simultaneously, and in the end, decides to correctly get off the train. [video] @cocktailpeanut : Finally got Qwen QVQ-72B to work on my Macbook M1 Max 64G. Used @LMStudio, and used the MLX version. The 8bit version didn't fit in memory, so I downloaded the 3bit one. And the 3bit one works just fine! [image] Adi / @adonis_singh : qwen is literally the only good (overall) open source model that you can run without selling a kidney on hardware Julian Togelius / @togelius : I asked QvQ to reason about this picture of my son and me in a café and it confidently proceeded to give a good description of the scene where it was in a café with its son 😂 [image]

Qwen 2024-12-26

Chronicles

Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B