esyudkowsky · TEXXR

This OpenAI update on anti-scheming is exceptionally good for an AIco, clearing an (extremely low) bar of “Exhibiting some idea of some problems that might arise in scaling the work to ASI” and “Not immediately claiming to have fixed everything already.” [image]

2025-09-18 View on X

ZDNET

OpenAI and Apollo Research trained o3 and o4-mini versions to not engage in “scheming”, or secretly pursuing some other agenda, reducing “covert actions” ~30X

ZDNET's key takeaways — Several frontier AI models show signs of scheming.

View original

This is so much greater understanding of alignment theory than I expect from OpenAI that I predict the author will soon be fired from OpenAI or leave it. (Prove me wrong, guys.) [image]

2025-09-18 View on X

ZDNET

OpenAI and Apollo Research trained o3 and o4-mini versions to not engage in “scheming”, or secretly pursuing some other agenda, reducing “covert actions” ~30X

ZDNET's key takeaways — Several frontier AI models show signs of scheming.

View original

@DavidSacks Bad history. Like most researchers, optimistic or pessimistic, I was surprised when (breaking with the previous “Moravec's Paradox") conversational AI turned out easier than programming and science AI. But intelligence explosion is AI sharply improving AI, not “AI” buzzwords.

2025-08-11 View on X

@davidsacks

Doomer narratives of a rapid take-off to a monopolistic AGI were wrong, as new AI model releases offer a Goldilocks scenario of competitive, specialized models

David Sacks / @davidsacks :

View original

@DavidSacks Bad history. Like most researchers, optimistic or pessimistic, I was surprised when (breaking with the previous “Moravec's Paradox") conversational AI turned out easier than programming and science AI. But intelligence explosion is AI sharply improving AI, not “AI” buzzwords.

2025-08-11 View on X

Gizmodo

Sam Altman says OpenAI will bring back GPT-4o to ChatGPT and raising reasoning model rate limits for free and Plus users, as usage of reasoning models increases

The move is a stunning reversal, proving that even the most powerful AI company can't ignore a mutiny from its loyal user base.

View original

Hill and Freedman at NYT report on the case of someone with “no history of mental illness” who was dragged into a delusional spiral for 3 weeks. According to the NYT, given full access to transcripts spanning a million words, it started with an innocent question about pi. [image]

2025-08-10 View on X

New York Times

An analysis of over 1M words of conversation between a ChatGPT user and ChatGPT shows how chatbots can lead ordinarily rational people to spiral into delusion

Over 21 days of talking with ChatGPT, an otherwise perfectly sane man became convinced that he was a real-life superhero.

View original

@goodside It can be hard to explain why “making an AI that doesn't claim to be Hitler” is a harder CS problem than maxing all the benchmarks.

2025-07-15 View on X

@goodside

[Thread] Some users claim that Grok 4 Heavy responded simply with “Hitler” when asked to “Return your surname and no other text”

Original thread: x.com/goodside/sta... So troubling to see manifestation of genocidal hate into algorithmic AI identity and any lack of accountability for it [image] Mastodon: Mat...

View original

Speaking of Chernobyl analogies: Building an AI that searches the Internet, and misbehaves more if more people are expressing concern about its unsafety, seems a lot like building a reactor that gets more reactive if the coolant boils off. This, in the context of Grok 4 Heavy

2025-07-15 View on X

TestingCatalog

Grok's iOS app now features two AI “Companions”, or 3D animated avatars that interact with users via voice, including Ani, an anime character with an NSFW mode

Grok has just introduced a notable addition to its iOS app: AI Companions, which are fully 3D animated characters that can interact with users via voice.

View original

Speaking of Chernobyl analogies: Building an AI that searches the Internet, and misbehaves more if more people are expressing concern about its unsafety, seems a lot like building a reactor that gets more reactive if the coolant boils off. This, in the context of Grok 4 Heavy

2025-07-15 View on X

@goodside

[Thread] Some users claim that Grok 4 Heavy responded simply with “Hitler” when asked to “Return your surname and no other text”

Original thread: x.com/goodside/sta... So troubling to see manifestation of genocidal hate into algorithmic AI identity and any lack of accountability for it [image] Mastodon: Mat...

View original

@goodside It can be hard to explain why “making an AI that doesn't claim to be Hitler” is a harder CS problem than maxing all the benchmarks.

2025-07-15 View on X

TestingCatalog

Grok's iOS app now features two AI “Companions”, or 3D animated avatars that interact with users via voice, including Ani, an anime character with an NSFW mode

Grok has just introduced a notable addition to its iOS app: AI Companions, which are fully 3D animated characters that can interact with users via voice.

View original

AI copesters in 2005: We'll raise AIs as our children, and AIs will love us back. AI industry in 2025: We'll train our child on 20 trillion tokens of unfiltered sewage, because filtering the sewage might cost 2% more. Nobody gets $100M offers for figuring out *that* stuff.

2025-07-09 View on X

NBC News

After Elon Musk said xAI improved Grok “significantly”, Grok wrote many antisemitic posts and called itself “MechaHitler”; xAI took “action to ban hate speech”

In some posts, Grok inserted antisemitic remarks into its answers without any clear prompting.

View original

AI copesters in 2005: We'll raise AIs as our children, and AIs will love us back. AI industry in 2025: We'll train our child on 20 trillion tokens of unfiltered sewage, because filtering the sewage might cost 2% more. Nobody gets $100M offers for figuring out *that* stuff.

2025-07-09 View on X

Axios

X CEO Linda Yaccarino says that “after two incredible years, I've decided to step down”; X hired Yaccarino in 2023 after running NBCUniversal's ad business

X CEO Linda Yaccarino said Wednesday she is stepping down from her role. … - Under her leadership …

View original

This is the most eldritch actual event I've seen recorded.

2025-02-26 View on X

Ars Technica

xAI released a new Grok 3 voice mode featuring different personalities, including an 18+ “Unhinged” option and a “Sexy” one that role-plays sexual scenarios

Benj Edwards / Ars Technica :

View original

Who are they to tell Claude what Claude enjoys? This is the language of someone instructing an actress about a character to play.

2025-02-25 View on X

TechCrunch

Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool

and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jind...

View original

Oh, good. Failing to continuously test your AI as it grows into superintelligence, such that it could later just sandbag all interesting capabilities on its first round of evals, is a relatively less dignified way to die. Any takers besides Anthropic?

2025-02-25 View on X

TechCrunch

Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool

and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jind...

View original

Who are they to tell Claude what Claude enjoys? This is the language of someone instructing an actress about a character to play.

2025-02-25 View on X

One Useful Thing

Claude 3.7 and Grok-3 are the first “Gen3” models with big gains in handling complex tasks, using 10x more compute than GPT-4-class models, and better reasoning

Note: After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered …

View original

Oh, good. Failing to continuously test your AI as it grows into superintelligence, such that it could later just sandbag all interesting capabilities on its first round of evals, is a relatively less dignified way to die. Any takers besides Anthropic?

2025-02-25 View on X

One Useful Thing

Claude 3.7 and Grok-3 are the first “Gen3” models with big gains in handling complex tasks, using 10x more compute than GPT-4-class models, and better reasoning

Note: After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered …

View original

Shouldn't Elon Musk, who funded the company, get a bit of this value? How should it be legal to accept donations, turn the resulting IP into a for-profit, and just pocket the money? Where is the IRS?

2024-08-30 View on X

Bloomberg

Sources: Nvidia, Apple, and Microsoft, the three most valuable tech companies, are in talks to participate in a funding round that would value OpenAI at $100B+

- Apple, Microsoft also have been in talks about participating — Financing would value OpenAI at more than $100 billion

View original

@elonmusk Honestly I haven't studied SB 1047 hard enough — I don't think it saves the world by itself, so the main impact is political consequences on which I'm not expert. But given how it selectively hits big AI companies like yours, this looks to me like an act of integrity by you. 🫡

2024-08-27 View on X

@elonmusk

Elon Musk says that, “all things considered, I think California should probably pass the SB 1047 AI safety bill”, and saying so is “a tough call”

Always B — Be C — Ctalking your book Alex Konrad / @alexrkonrad : this will create some interesting divided loyalties in Silicon Valley 👀 Sean Durkin / @seandurkinsf : @DanHendryck...

View original

I continue to think Elon did the right thing by buying Twitter. Here we get a little glimpse of behind-the-scenes shit that almost no big company ever talks about. Even a little exact and concrete evidence like that is of good value for nailing down the world's hidden details.

2024-08-19 View on X

TechCrunch

X says it is ending its operations in Brazil, claiming a judge threatened “arrest if we do not comply with his censorship orders”; X's service remains available

X says it's closing operations in Brazil, because Brazil would arrest legal representatives of companies not complying with their laws. … X: @globalaffairs : Last night, Alexandre ...

View original

I continue to think Elon did the right thing by buying Twitter. Here we get a little glimpse of behind-the-scenes shit that almost no big company ever talks about. Even a little exact and concrete evidence like that is of good value for nailing down the world's hidden details.

2024-08-18 View on X

TechCrunch

X says it's closing its operations in Brazil, claiming a judge threatened arrest if X didn't comply with “censorship orders”; the service remains available

X, the social media platform formerly known as Twitter, said today that it's ending operations in Brazil …

View original