code_star · TEXXR

Huge fan of SF compute. I miss when it was a slightly better kept secret and I could kept unbelievably cheap and large blocks of on demand networked h100s. Still love their model and hope they succeed at building compute as a market.

2025-11-27 View on X

Wall Street Journal

San Francisco Compute, which provides a marketplace for AI computing capacity, raised a $40M Series A led by DCVC and Wing Venture Capital at a $300M valuation

Yuliya Chernova / Wall Street Journal :

View original

Pretraining is back baby [image]

2025-11-19 View on X

The Information

Gemini co-lead Oriol Vinyals says Gemini 3's gains come from better pre-training and post-training, contradicting the idea that pre-training gains are falling

which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is [image] Andrej Karpathy / @karpathy : I p...

View original

Pretraining is back baby [image]

2025-11-19 View on X

matt shumer

Gemini 3 hands-on: a fundamental improvement on daily use, extremely fast, Antigravity IDE is a powerful launch product, and its personality is terse and direct

Gemini 3 is a fundamental improvement on daily use, not just on benchmarks. It feels more consistent and less “spiky” than previous models.

View original

Increasingly bullish on thinky. I think they picked a great time to go big on model customization and infra. 2-3 years ago was too early. The talent density is insane.

2025-11-19 View on X

Business Insider

Soumith Chintala, who co-created the PyTorch ML framework at Meta and left the company earlier this month, joins Mira Murati's Thinking Machines Lab

Pranav Dixit / Business Insider :

View original

Ok by why did they make it look like an alien artifact?

2025-10-30 View on X

Wired

Extropic, which says its chips using probabilistic bits can be 10,000x more energy efficient than current AI chips, shares its first chip with some AI labs

A startup hopes to challenge Nvidia, AMD, and Intel with a chip that wrangles probabilities rather than 1s and 0s.

View original

Hadn't had my coffee and thought this was a transformer diagram at first glance. It's not, but in a truer sense ... it is.

2025-10-28 View on X

OpenAI

OpenAI completes its recapitalization, “simplifying” its structure; OpenAI Foundation now has equity valued at ~$130B and still controls the OpenAI for-profit

OpenAI has completed its recapitalization, simplifying its corporate structure. The nonprofit remains in control …

View original

Hadn't had my coffee and thought this was a transformer diagram at first glance. It's not, but in a truer sense ... it is.

2025-10-28 View on X

Microsoft

Microsoft now holds a ~$135B investment in OpenAI Group PBC, or a ~27% stake, down from 32.5%; OpenAI commits to purchase an additional $250B in Azure services

Since 2019, Microsoft and OpenAI have shared a vision to advance artificial intelligence responsibly and make its benefits broadly accessible.

View original

Give it a year or two and we will have nanoagent. Then nanoagent speed run. Then nanoASI and so on.

2025-10-14 View on X

@karpathy

Andrej Karpathy unveils nanochat, a full-stack training and inference implementation of an LLM in a single, dependency-minimal codebase, deployable in 4 hours

It provides a full ChatGPT-style LLM, including training, inference and a web Ui … X: Clem / @clementdelangue : Am I wrong in sensing a paradigm shift in AI? Feels like we're movin...

View original

wtf more than half its parameters are embeddings. [image]

2025-08-15 View on X

Google Developers Blog

Google announces Gemma 3 270M, a compact model designed for task-specific fine-tuning with strong capabilities in instruction following and text structuring

ai.google.dev/gemma/docs/c... Tim Duffy / @timfduffy.com : Google just released a 270M parameter Gemma model. As a tiny model lover I'm excited. Models in this size class are usu...

View original

I'm really excited about ideas like this, but before people get too worked up you should know this seems to be a domain specific intervention. Thats *ok* though. This might be a very useful piece of making code models or adapting models to be code models. I'm also willing to... [image]

2024-05-07 View on X

VentureBeat

A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models

LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly t...

View original

I have to thank my amazing team (the @DbrxMosaicAI Data team @mansiege @_BrettLarsen @ZackAnkner Sean Owen and Tessa Barton) for their outstanding work. We have try made a generational improvement in our data. Token for token our data is twice as good as MPT7B was. [image]

2024-03-27 View on X

Wired

A look at Databricks' new open-source model DBRX, an LLM that cost ~$10M to train over several months and, Databricks says, outshines Llama 2, Mixtral, and Grok

Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta's Llama 2.

View original

It's finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯 [image]

2024-03-27 View on X

Wired

A look at Databricks' new open-source model DBRX, an LLM that cost ~$10M to train over several months and, Databricks says, outshines Llama 2, Mixtral, and Grok

Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta's Llama 2.

View original

Not only is it's a great general purpose LLM, beating LLama2 70B and Mixtral, but it's an outstanding code model rivaling or beating the best open weight code models! [image]

2024-03-27 View on X

Wired

A look at Databricks' new open-source model DBRX, an LLM that cost ~$10M to train over several months and, Databricks says, outshines Llama 2, Mixtral, and Grok

Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta's Llama 2.

View original