awnihannun · TEXXR

M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm. [image]

2026-03-04 View on X

9to5Mac

Apple unveils the M5 MacBook Air in 13" and 15" sizes, boosting its starting price by $100 to $1,099 and doubling base storage to 512GB, shipping from March 11

You now get more storage and better performance …Forums:r/macgaming:Macbook Air M5 base storage doubled to 512GBr/technology:Apple introduces the new MacBook Air with M5r/hardware:...

View original

M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm. [image]

2026-03-04 View on X

Apple

Apple refreshes the 14" and 16" MacBook Pro with M5 Pro and M5 Max: up to 4x faster LLM prompt processing, up to 2x faster SSD speeds, and 1TB of base storage

The world's best pro laptop raises the bar again with blazing-fast CPU and GPU performance, plus up to 2x faster SSD speeds and 1TB of starting storage

View original

M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm. [image]

2026-03-03 View on X

Apple

Apple refreshes the 14" and 16" MacBook Pro with M5 Pro and M5 Max: up to 4x faster LLM prompt processing, up to 2x faster SSD speeds, and 1TB of base storage

The world's best pro laptop raises the bar again with blazing-fast CPU and GPU performance, plus up to 2x faster SSD speeds and 1TB of starting storage

View original

Qwen3.5 runs quite well in mlx-lm. Awesome that we have a frontier-level hybrid model. The context gets longer but the inference speed and memory use barely change. Here's the Q4 generating a space invaders game on an M3 Ultra. Generated 4,120 tokens at 37.6 tok/s. [video]

2026-02-17 View on X

Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

View original

Qwen3.5 runs quite well in mlx-lm. Awesome that we have a frontier-level hybrid model. The context gets longer but the inference speed and memory use barely change. Here's the Q4 generating a space invaders game on an M3 Ultra. Generated 4,120 tokens at 37.6 tok/s. [video]

2026-02-16 View on X

Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

View original

GLM-4.7 runs quite well on an M3 Ultra with mlx-lm, even at a near lossless precision (6-bit here). It generated the best space invaders game I've seen yet for a local model (even included sound effects!). Generated 6600 tokens and ran at 16 tok/s. [video]

2025-12-23 View on X

Z.ai

Chinese AI startup Z.ai releases GLM-4.7, an open-weight model that Z.ai says delivers significant improvements in coding performance compared to GLM-4.6

like 210 — Z.ai 6.24k — Text Generation Transformers Safetensors English Chinese glm4_moe conversational eWeek : Chinese AI Startup Z.ai Takes On OpenAI Via Cheaper Prices Vinc...

View original

I'm a bit giddy over the fact that this is by all visible measures a frontier level model, if not THE frontier model, for agentic tasks. And you can run it. In it's native precision. On 2 M3 Ultras. Pretty fast. In MLX.

2025-11-07 View on X

CNBC

Chinese startup Moonshot releases Kimi K2 Thinking, an open-weight model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI's ChatGPT in …

View original

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality! The model was quantization aware trained (qat) at int4. Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm: [video]

2025-11-07 View on X

CNBC

Chinese startup Moonshot releases Kimi K2 Thinking, an open-weight model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI's ChatGPT in …

View original

I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for [image]

2025-10-15 View on X

Apple

Apple debuts its M5 chip, with a 10-core GPU, a Neural Accelerator in each core, offering 4x+ the performance of M4, and a 10-core CPU with six efficiency cores

M5 delivers over 4x the peak GPU compute performance for AI compared to M4, featuring a next-generation GPU with a Neural Accelerator …

View original

Just for fun, here's what 32 simultaneous long-context generations with Qwen3 Next 80B looks like on an M3 Ultra. Using the new batch generation in mlx-lm. Context size for each is about 5k tokens: [video]

2025-09-24 View on X

Bloomberg

Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years

Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …

View original

Just for fun, here's what 32 simultaneous long-context generations with Qwen3 Next 80B looks like on an M3 Ultra. Using the new batch generation in mlx-lm. Context size for each is about 5k tokens: [video]

2025-09-24 View on X

Simon Willison's Weblog

Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters

Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...

View original

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm: [video]

2025-07-23 View on X

Qwen

Alibaba debuts the Qwen3-Coder model for agentic coding, including a 480B-parameter MoE variant, and open sources Qwen Code, a CLI tool adapted from Gemini CLI

Qwen 39.4k — Text Generation Transformers Safetensors qwen3_moe conversational Coco Feng / South China Morning Post : Alibaba upgrades flagship Qwen3 model to outperform OpenAI, ...

View original

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm: [video]

2025-07-23 View on X

VentureBeat

Alibaba releases its new Qwen3-235B-A22B-Instruct-2507 model on Hugging Face, improving on Qwen 3's reasoning, accuracy, and multilingual understanding

Chinese e-commerce giant Alibaba has made waves globally in the tech and business communities with its own family of “Qwen” …

View original

The new Deep Seek V3 0324 in 4-bit runs at > 20 toks/sec on a 512GB M3 Ultra with mlx-lm! [video]

2025-03-24 View on X

Simon Willison's Weblog

DeepSeek releases MIT-licensed DeepSeek-V3-0324, the latest version of their enormous DeepSeek v3 model; the previous DeepSeek v3 version had a custom license

deepseek-ai/DeepSeek-V3-0324. Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model … X: @awnihannun , @simonw , @simonw , @iterintellectus ...

View original

QwQ-32B evals on par with Deep Seek R1 680B but runs fast on a laptop. Delivery accepted. Here it is running nicely on a M4 Max with MLX. A snippet of its 8k token long thought process: [video]

2025-03-07 View on X

VentureBeat

Alibaba releases open-source reasoning model QwQ-32B on Hugging Face and ModelScope, claiming comparable performance to DeepSeek-R1 but with lower compute needs

Introduction QwQ is the reasoning model of the Qwen series. Paul Barker / InfoWorld : Alibaba says its new AI model rivals DeepSeeks's R-1, OpenAI's o1 Jose Antonio Lanz / Decrypt ...

View original

The DeepSeek V3 model file is ~450 lines of code in MLX LM. Includes pipeline-parallelism and all. Good way to see how it all works. [image]

2025-01-28 View on X

Wired

Hands-on with DeepSeek's free chatbot: the R1 model is powerful, but suffers from rampant hallucinations and lacks some ChatGPT tools like the memory feature

DeekSeek's chatbot with the R1 model is a stunning release from the Chinese startup. While it's an innovation in training efficiency, hallucinations still run rampant.

View original

Llama 3.3 70B 4-bit runs nicely on a 64GB M3 Max with in MLX LM (~10 toks/sec). Would be even faster on an M4 Max. Yesterday's server-only 405B is today's laptop 70B: [video]

2024-12-07 View on X

TechCrunch

Meta announces Llama 3.3 70B, a text-only model that Meta claims can deliver the performance of its largest Llama model at a lower cost

7.0M — 2,040 … The fine-tuning data includes publicly available … Markus Kasanmascheff / WinBuzzer : Meta Unveils New Llama 3.3 70B AI Model with Higher Cost-Efficiency Carl Fran...

View original

Cool new work from some colleagues at Apple: more accurate LLMs with fewer parameters and fewer pre-training tokens. Also has MLX support out of the box! Code here: https://github.com/...

2024-04-25 View on X

VentureBeat

Apple researchers share OpenELM, a family of LLMs with 270M to 3B parameters, designed to run on-device, and pre-trained and fine-tuned on public datasets

Shubham Sharma / VentureBeat :

View original

New Mixtral 8x22B runs nicely in MLX on an M2 Ultra. 4-bit quantized model in the 🤗 MLX Community: https://huggingface.co/... h/t @Prince_Canuma for MLX version and v2ray for HF version https://huggingface.co/v2ray [video]

2024-04-11 View on X

VentureBeat

Mistral AI launches Mixtral 8x22B, its latest sparse mixture-of-experts model, after releasing Mixtral 8x7B in December 2023

As Google unleashed a barrage of artificial intelligence announcements at its Cloud Next conference, Mistral AI decided to jump into action with the launch …

View original

4-bit quantized Code Llama models already in the 🤗 MLX Community! {70, 13, 7}B models here: https://huggingface.co/... 1. pip install mlx-lm 2. python -m mlx_lm.generate —model mlx-community/CodeLlama-13b-Python- 4bit —prompt “write a quick sort in C++” Thanks to...

2024-01-30 View on X

VentureBeat

Meta releases Code Llama 70B, a new version of its code generation model, featuring improved code correctness, a variant optimized for Python, and more

available under the same license as previous Code Llama models. Download the models ➡️ https://ai.meta.com/... • CodeLlama-70B • CodeLlama-70B-Python • CodeLlama-70B-Instruct [imag...

View original