omarsar0 · TEXXR

At this point, “agentic engineering” has allowed me to build the best AI harness I could possibly get my hands on. Yes, I vibe coded it. That's right. You don't need to wait around for the features you need for your AI agents. Please don't. You could just build them yourself.

2026-02-27 View on X

@karpathy

AI coding agents made a huge leap forward since December, completing complex projects with minimal oversight, meaning “programming is becoming unrecognizable”

View original

At this point, “agentic engineering” has allowed me to build the best AI harness I could possibly get my hands on. Yes, I vibe coded it. That's right. You don't need to wait around for the features you need for your AI agents. Please don't. You could just build them yourself.

2026-02-26 View on X

@karpathy

AI coding agents made a huge leap forward since December, completing complex projects with minimal oversight, meaning “programming is becoming unrecognizable”

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the …

View original

Damn! Gemini 3 Flash is no joke. Faster and cheaper, while demonstrating remarkable reasoning capabilities. Amazing that we have models of this caliber with multimodal and agentic capabilities. Time to build! Stay tuned for more of my thoughts on this model. [image]

2025-12-17 View on X

9to5Google

Google unveils Gemini 3 Flash, which it says has Pro-grade reasoning with lower latency, outperforming 2.5 Pro “while being 3x faster at a fraction of the cost”

Following last month's launch of Gemini 3 Pro, Google today announced Gemini 3 Flash for consumers and developers.

View original

Gemini 3 is here! The most exciting part of this model is the long horizon planning capabilities. This is going to unlock complex agentic applications, the likes of which we have never seen. Vibe coding insane applications in one shot, proactive agents that expand our [image]

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

Hierarchical Reasoning Model This is one of the most interesting ideas on reasoning I've read in the past couple of months. It uses a recurrent architecture for impressive hierarchical reasoning. Here are my notes: [image]

2025-10-09 View on X

VentureBeat

Samsung introduces the Tiny Recursion Model, a 7M-parameter model that can outperform LLMs 10,000x larger, like Gemini 2.5 Pro and o3-mini, on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger …

View original

Very excited about OpenAI's new AgentKit. Visual agent builders are a game changer for iterating on and shipping agents.

2025-10-07 View on X

OpenAI

OpenAI announces apps that work inside ChatGPT, piloting Booking.com, Canva, Coursera, Figma, Expedia, Spotify, and Zillow for logged-in users outside of the EU

A new generation of apps you can chat with and the tools for developers to build them. — Try in ChatGPT(opens in a new window)Start building apps(opens in a new window)

View original

Very excited about OpenAI's new AgentKit. Visual agent builders are a game changer for iterating on and shipping agents.

2025-10-07 View on X

TechCrunch

OpenAI launches AgentKit, a toolkit for building and deploying AI agents, including Agent Builder, which Sam Altman described as like Canva for building agents

New tools for building, deploying, and optimizing agents. NDTV Profit : What Is AI Agent Builder And How Does It Work? OpenAI Launches New Set Of Tools For Developers Aman Gupta / ...

View original

Qwen3-Omni Technical Report A unified multimodal model that matches same-size Qwen text-only and vision-only baselines while pushing audio and audio-visual SOTA. Key technical details below: [image]

2025-09-24 View on X

Simon Willison's Weblog

Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters

Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...

View original

Qwen3-Omni Technical Report A unified multimodal model that matches same-size Qwen text-only and vision-only baselines while pushing audio and audio-visual SOTA. Key technical details below: [image]

2025-09-24 View on X

Bloomberg

Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years

Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …

View original

BREAKING: OpenAI announces research preview of Codex in ChatGPT Next-level coding agent within ChatGPT. Pay attention, devs and non-devs! Here is all you need to know: [image]

2025-05-16 View on X

TechCrunch

OpenAI rolls out Codex, an AI coding agent powered by codex-1, a version of o3 optimized for software engineering, for ChatGPT Pro, Enterprise, and Team users

OpenAI announced on Friday it's launching a research preview of Codex, the company's most capable AI coding agent yet.

View original

OpenAI showed an environment that they fully configured, such as the Codex-CLI. As configuration, you can also provide steerability and important instructions to the model via MD files. [image]

2025-05-16 View on X

TechCrunch

OpenAI rolls out Codex, an AI coding agent powered by codex-1, a version of o3 optimized for software engineering, for ChatGPT Pro, Enterprise, and Team users

OpenAI announced on Friday it's launching a research preview of Codex, the company's most capable AI coding agent yet.

View original

Thanks for clarifying this. Maybe some official docs/guide (prompting/usage tips, recommended settings, error expectations, areas/use cases to apply and how to apply, etc) would be helpful here. I am aware of model cards, prompting guides but I think a lot of folks are running into issues with output quality.

2025-04-08 View on X

TechCrunch

Meta VP of Generative AI Ahmad Al-Dahle denies a rumor that the company trained Llama 4 Maverick and Scout on test sets, saying that Meta “would never do that”

but the EU doesn't get everything Pascale Davies / Euronews : From a political shift to a more powerful AI: Everything to know about Meta's Llama 4 models Jay Bonggolto / Android C...

View original

Thanks for clarifying this. Maybe some official docs/guide (prompting/usage tips, recommended settings, error expectations, areas/use cases to apply and how to apply, etc) would be helpful here. I am aware of model cards, prompting guides but I think a lot of folks are running into issues with output quality.

2025-04-08 View on X

The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

View original

@karpathy “I don't wish to downplay the impacts of LLMs in corporations or governments, but at least for the moment and in aggregate across society, they have been significantly more life altering for individuals than they have been for organizations.” Spot on!

2025-04-08 View on X

@karpathy

LLMs, unlike prior transformative tech like the internet, disproportionately benefit regular people, with a muted, slower impact on corporations and governments

> They had to permit personal devices to be used work, but wanted to see enough justification to do that = catch-22. Took a while before an employee could use gmail, calendar, slac...

View original

Llama 4 is here! - Llama 4 Scout & Maverick are up for download - Llama 4 Behemoth (preview) - Advanced problem solving & multilingual - Support long context up to 10M tokens - Great for multimodal apps & agents - Image grounding - Top performance at the lowest cost - Can be served within $0.19-$0.49/M tokens

2025-04-06 View on X

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

Claude 3.7 Sonnet has serious competition! Gemini 2.5 Pro is a legit good model for code. - code quality is really good - 1M token context - native multimodality - long code generation - understand large codebases I used it with Windsurf to generate an AI search agent app: [video]

2025-03-31 View on X

9to5Google

Google says it is rolling out Gemini 2.5 Pro Experimental to all Gemini users, after initially launching the model for Gemini Advanced subscribers on March 25

No Subscription Needed The Economic Times : Google rolls out experimental version of Gemini 2.5 Pro for free users The Hindu : Google's Gemini 2.5 Pro rolled out to all users days ...

View original

NEW: OpenAI announces new tools for building agents. Here is everything you need to know: [image]

2025-03-12 View on X

The Verge

OpenAI debuts a Responses API to help developers build agents that search the web, scan for files, and perform tasks on PCs, and an Agents SDK for orchestration

But really, why isn't OpenAI building their own agents if their tech is so powerful? [embedded post] X: Atty Eleti / @athyuttamre : Introducing the Responses API: the new primitive...

View original

DeepSearch also exposes the steps that it takes to conduct the search itself. [image]

2025-02-18 View on X

Bloomberg

xAI unveils DeepSearch, a reasoning chatbot that explains its thought process for queries and is capable of doing research, brainstorming, and data analysis

Elon Musk's artificial intelligence startup xAI showed off the updated Grok-3 model, showcasing a version of the chatbot technology …

View original

DeepSearch also exposes the steps that it takes to conduct the search itself. [image]

2025-02-18 View on X

TechCrunch

xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users

Elon Musk's AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.

View original

Nice summary of the s1 paper. “In s1, when the LLM tries to stop thinking with ”</think>", they force it to keep going by replacing it with “Wait”. It'll then begin to second guess and double check it's answer. They do this to trim or extend thinking time (trimming is just [image]

2025-02-06 View on X

TechCrunch

Stanford and University of Washington AI researchers claim they trained AI reasoning model s1, distilled from a Gemini 2.0 model, for under $50 in cloud compute

AI researchers at Stanford and the University of Washington were able to train an AI “reasoning” model for under $50 in cloud compute credits …

View original