Meta debuts “quantized” versions of Llama 3.2 1B and 3B models, designed to run on low-powered devices and developed in collaboration with Qualcomm and MediaTek
so today we're releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details [image] Yuchen Jin / @yuchenj_uw : @AIatMeta keep shipping! Steve Jarrett / @stevejarrett : Bravo @AIatMeta ! Small, fast, and high quality on-device LLMs will unlock many new use cases. Excited for all of what lies ahead for open weight models like Llama. @trawasthi_ai : This is a game changer for on-device AI! Quantized models without sacrificing performance or accuracy? That's a huge win for developers working with limited resources. Llama 3.2 is setting new standards in portability, speed, and efficiency. Can't wait to see what innovations come out of this! @trawasthi_ai : @AIatMeta The combination of LoRA adaptors for accuracy and SpinQuant for portability is such a clever approach. Now developers can choose the model that best fits their needs, whether it's high-performance tasks or lightweight deployment. Llama 3.2 just keeps impressing! Cristiano R. Amon / @cristianoamon : One of my favorite parts of #SnapdragonSummit is experiencing the demos utilizing our latest technology. Stepping into the @Snapdragon Elite Cockpit via the @Meta Quest 3 was quite impressive, and I'm always blown away by the creative applications our teams and partners develop [video] Fred Del Vecchio / @fremdelve : #AI Amazing news for the open source #LLMs community: the light on-device models from AI at Meta are getting even lighter. Exciting times to be in the sector for sure. JM Rothberg / @jmrothberg : Amazing opportunity to put models right on the edge, especially if you make medical or laboratory instruments :). Everything is getting smarter. Fast. @ButterflyNetInc @Hyperfine @IdentifeyeHLTH @Quantum_Si Yam Peleg / @yampeleg : Note: these are different from the normal quants you know and use every day. First, they used quantized aware training on the original dataset. Alright, very useful, thank you! BUT they also did it with LoRA. First time I see this. VERY interesting idea, lots of potential @arm : We're excited to see the new quantized Llama 3.2 1B and 3B models available on the Arm compute platform. 👉 Developers, get ready to seamlessly integrate these new models into your applications #onArm with no additional modifications or optimizations, saving time and resources @trawasthi_ai : @AIatMeta Bringing high-performance LLMs to resource-constrained devices is the future of AI. Meta's quantized Llama 3.2 models are the perfect balance between speed, memory, and accuracy! It's exciting to think how this could transform real-time applications across industries. Hamza / @thegenioo : Exciting news from @AIatMeta @Meta The release of quantized Llama 3.2 models marks a significant leap in AI development. With up to 4x faster inference speeds and a 56% reduction in model size, developers can now enjoy enhanced efficiency without sacrificing accuracy. This breakthrough...ensures quality and safety on resource-constrained devices. Kudos to Meta's collaboration with industry leaders... @aiatmeta : We used two different techniques for quantizing these models. Quantization-Aware Training with LoRA adaptorsprioritizing accuracy. SpinQuant, a post-training quantization method which prioritizes portability. Both versions are available for download as part of this release. [image] @mediatek : @AIatMeta Congrats Meta. 🙌MediaTek is a key partner and collaborates with Meta to enable the Llama 3.2 Quantized Models on MediaTek Dimensity SoCs. Ronan Naughton / @ronantech : 📰 AI News of the day! ➡️ @AIatMeta release quantized LLAMA 3.2 models ➡️ Accelerated by Arm's #KleidiAI library 🤔 Find out more: https://community.arm.com/... A big thank you to the entire Executorch team at AI at Meta for this great collaboration. 🙏 @aiatmeta : Thanks to close work with @arm, @mediatek and @qualcomm, these new models are ready to deploy on even more mobile CPUs. We are also currently collaborating with partners to utilize NPUs for these quantized models for even greater performance. [image] Vimal Gorasiya / @vimalgorasiya : @AIatMeta's latest quantized Llama models (1B & 3B) boost inference speed by 2-4x, reduce model size by 56%, and cut memory by 41%, all while keeping nearly full precision accuracy. Big leap forward in efficiency! https://ai.meta.com/... #AI #MachineLearning #LlamaModels [image] @qualcomm : Qualcomm and @Meta, along with countless ecosystem partners, are pushing the boundaries of what's possible in #AI and #XR. #SnapdragonSummit [video] LinkedIn: Joseph Spisak : Following on our Llama 3.2 release with our ‘Baby Llama’ models, we are now releasing quantized versions with optimizations … Forums: r/singularity : Meta releases new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint