maximelabonne · TEXXR

🦙 Llama 4 is here! → Llama 4 introduces three models: Scout (17B active parameters/16 experts), Maverick (17B active parameters/128 experts), and Behemoth (288B active parameters/16 experts), with only Scout and Maverick being released now. → These are Meta's first natively [image]

2025-04-06 View on X

Meta

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

Llama 4 is here with 2T, 400B, and 109B MoEs. This bad boy just got outllamaed. https://ai.meta.com/... [image]

2025-04-06 View on X

Meta

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

Llama 4's new license comes with several limitations: - Companies with more than 700 million monthly active users must request a special license from Meta, which Meta can grant or deny at its sole discretion. - You must prominently display “Built with Llama” on websites, [image]

2025-04-06 View on X

Meta

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

This is the proudest release of my career :) At @LiquidAI_, we're launching three LLMs (1B, 3B, 40B MoE) with SOTA performance, based on a custom architecture. Minimal memory footprint & efficient inference bring long context tasks to edge devices for the first time! [image]

2024-10-01 View on X

VentureBeat

MIT spinoff Liquid AI debuts its non-transformer AI models LFM-1B, LFM-3B, and LFM-40B MoE, claiming they achieve “state-of-the-art performance at every scale”

Liquid AI, a startup co-founded by former researchers from the Massachusetts Institute of Technology (MIT) …

View original

This is super cool but I have a lot of questions. First, reflection = CoT on steroids. It means you can't compare these scores at all. Remember when people made fun of Gemini for providing CoT results for MMLU? This is a lot worse. Secondly, if you don't parse the output and

2024-09-07 View on X

VentureBeat

HyperWrite CEO unveils Reflection 70B, based on Llama 3.1 70B Instruct and trained using reflection-tuning, and says it beats GPT-4o in all benchmarks tested

There's a new king in town: Matt Shumer, co-founder and CEO of AI writing startup HyperWrite, today unveiled Reflection 70B …

View original

I made the closed-source vs. open-weight models figure for this moment. [image]

2024-07-24 View on X

Meta

Mark Zuckerberg argues that “open source AI” is the path forward, closed models are vulnerable to vendor lock-in and state-backed espionage, and more

RE: https://www.threads.net/... Dare Obasanjo / @carnage4life : You can find @zuck's full post here https://www.facebook.com/... Dare Obasanjo / @carnage4life : Mark Zuckerberg has...

View original

Llama 3's vocabulary size is much bigger (32000 => 128256). Also note 11008 => 14336. [image]

2024-04-19 View on X

The Verge

Meta details Llama 3: 8B- and 70B-parameter models, a focus on reducing false refusals, and an upcoming model trained on 15T+ tokens that has 400B+ parameters

What To Know About ‘Llama 3’ Model Marcus Gopolang Moloko / Memeburn : Meta AI with built in Llama 3 is on WhatsApp in South Africa Hamsat Abdurasheed / News.ng : Meta releases Lla...

View original

I played a little with Jamba: it looks like an amazing model. In terms of architecture, the MoE implementation is very close to Mixtral's. What's great about it is that it hasn't been fine-tuned. Curious to see how much improvement we can get through SFT. I made a little... [image]

2024-03-29 View on X

TechCrunch

AI21 Labs launches Jamba, an AI model that integrates two architectures: transformer and Mamba, which is based on the Structured State Space model

Increasingly, the AI industry is moving toward generative AI models with longer contexts. But models with large context windows tend to be compute-intensive.

View original