Subquadratic launches with a $29M seed and debuts SubQ, an LLM that uses a subquadratic sparse attention architecture to achieve a 12M-token context window

SiliconANGLE 2026-05-06 Kyt Dotson

Context & Ripple Effects

Subquadratic enters a growing LLM-architecture conversation centered on extending usable context while improving computational efficiency. The related coverage includes Alibaba's Qwen3-Next, likewise positioned around long-context understanding and efficiency, indicating that model architecture—not only model scale—is becoming a competitive focus.

Its $29M launch financing gives the company resources to turn SubQ's sparse-attention claim into a product and developer proposition. The nearby coverage of Contextual AI also reflects continued investment in LLM systems aimed at differentiated, practical use cases.

First-order effects

Subquadratic now has seed capital and a flagship model, SubQ, with a stated 12M-token context window based on sparse attention.
The company must immediately validate that its architecture delivers useful long-context performance and efficiency beyond the headline context-window figure.

Second-order effects

Long-context model providers face added pressure to differentiate on the cost and quality of retrieving and reasoning over very large inputs, rather than competing solely on parameter scale or nominal context length.
Potential enterprise users and model builders gain another architectural option for workloads involving large document collections, codebases, or persistent records, subject to independent performance validation.

Third-order effects

If sparse-attention approaches repeatedly sustain very long contexts efficiently, LLM competition could shift toward architecture and systems design that reduce the cost of serving large-context workloads.
The meaningful industry benchmark may increasingly become effective long-context reasoning and retrieval quality, not maximum advertised token capacity alone.

The trend: This is one data point in the move toward more compute-efficient LLM architectures designed to make long-context AI practical at scale.

Discussion

@alex_whedon Alexander Whedon on x
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - [v…
@willdepue Will Depue on x
nevermind no longer trying to give them the benefit of the doubt here: they claimed O(n) and ‘subq is linear vs quadratic’ which is pretty ridiculous the speedup numbers in their announcement video don't seem to line up with this? and just 12M context with O(n) scaling? this is […
@daniel_mac8 Dan McAteer on x
SubQ is either the biggest breakthrough since the Transformer... > 52x faster than FlashAttention at 1mm tok context > 20x cheaper than Opus ...or it's AI Theranos. Requested early access so hopefully can investigate soon. [image]
@ashleymayer Ashley Mayer on x
With so much capital concentrating in so few private companies, and Anthropic and OpenAI breaking all “startup” growth norms, it's easy to forget that we are still incredibly early in this AI wave. On that note, I am THRILLED @subquadratic is now out of stealth. This is a
@zephyr_z9 @zephyr_z9 on x
“early access” Scammy vibes If it's really a sub-quadratic sparse attention arch (SSA), then serving this should be really cheap No point in putting this behind early access
@rileybrown Riley Brown on x
I hardly ever say this... but my spidey senses are tingling... something is off about this. mark my words.
@phequals7 @phequals7 on x
does not pass my smell test.. > a breakthrough like this gets published at ICML/NeurIPS/ICLR - not with a startup launch video - would love to read a preprint atleast (technical report coming soon is v SUS) > usual suspects engagement boosting this tweet was the final straw
@dorialexander Alexander Doria on x
Very welcome to see more research in that space but a bit puzzling until the report clarifies: *Clearly a continuous pretrain of an open weight model (totally fair for this but we'll need a before and after). *No actually long evals (>1M) even though RULER could be extrapolated.
@subquadratic @subquadratic on x
The numbers behind the SubQ announcement: Speed: 52x faster than Flash Attention SWE Bench Verified: 81.8% Ruler (128K): 95% MRCR V2: 65.9% Get early access at https://subq.ai/
@artemr Artem Russakovskii on x
A 12-million-token context window at 1,000x less compute capable of coding for weeks at a time and filing hundreds of PRs in the process. 🤯 If you thought AI can run laps around us now, the rate of progress in the next few years will become exponential.
@stevesi Steven Sinofsky on x
Will be exciting to see how this plays out.
@bidiptas13 Bidipta Sarkar on x
Extremely hard to believe. The only plausible case is RADLADS-style weight init from frontier open source model + major benchmaxxing You can't get this high in SWE Verified legitimately from scratch with a new model architecture on their budget
@austen Austen Allred on x
And now it's time to see what my little brother has been working on for the past couple years: An AI model fully built on sub-quadratic sparse-attention architecture. Result? 12 million token reasoning model 150 tokens/second 1/5 the cost of Opus
@nielsrogge Niels Rogge on x
After checking his LinkedIn, the chances of it being a scam went up subquadratically [image]
@eliebakouch Elie on x
hard to know if this is real since the efficiency numbers look really insane (52x faster than Flash Attention at 1M) but eval numbers don't seem unbelievable if we think it's a continual pretrain of one of the recent oss models one red flag imo is the fact that they did paid [ima…
@willdepue Will Depue on x
if youre really subquadratic homie why are you only serving 12M context. if its n log n or n^1.25 let's see some 100M at least for a demo my guy
@willdepue Will Depue on x
Let's read the technical report. TLDR; No real answers on how their method works. Doesn't make me feel better about it. They seem to understand the problem: “[Attention] is expensive for the same reason: every query compares against every key. The result is an all-pairs
@vhmth Vinay Hiremath on x
Two things: 1. I hope Will is wrong and the team cooked and did some wild shit. 2. We need way more technical critical discourse like this from Will. There are so many out of pocket things being claimed on the timeline these days. And hacker news is untrustable because everyone
@willdepue Will Depue on x
my first take, and a good lesson on good research epistemics here: what can we infer from ~82% SWE-Bench? it's possible they (1) they trained a new model, from scratch, that is unlike a regular transformer but i've never heard of this company before, and checking their funding [i…
@tenobrus @tenobrus on x
sub 5% chance we hear anything about this model ever again

Chronicles