OpenAI says its models, starting with GPT-5.1, “increasingly mentioned goblins, gremlins, and other creatures”, leading to prompt instructions to mitigate it

Starting with GPT-5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors.

OpenAI 2026-04-30

Discussion

@openai @openai on x
We're talking about Goblins. https://openai.com/...
@juliarturc Julia Turc on x
They listened 🥹 https://openai.com/...
@openai @openai on x
We solved the goblin mystery—with the help of Codex. The culprit: Nerdy personality (RIP). [image]
@openai @openai on x
The goblins crept in alongside the launch of GPT-5.1. [image]
@tenobrus @tenobrus on x
“while most uses of frog turned out to be legitimate” [image]
@cfgeek Charles Foster on x
Mr. Altman, explain to ordinary Americans why your company recently published a report titled—and I quote—"Where the goblins came from" [image]
@zizhpan Zizheng Pan on x
Fun and honest. It was a joy to read. Science is better when sharing.
@openai @openai on x
Goblin and related magical mentions were overrewarded in training, and the behavior was reinforced over successive models. We removed the goblin-affine reward signal for future models, and filtered training data where creatures appeared in irrelevant contexts. [image]
@theo @theo on x
Goblingate is way funnier than I expected tbh
@gdb Greg Brockman on x
a tale of some fun ML debugging
@canteverdie @canteverdie on x
reheating goblin mode nachos
@hosseeb Haseeb on x
Looks like @OpenAI figured out where the goblins came from: training contamination from their personality picker “Nerdy” archetype. Because goblins are nerdy? (They discontinued this feature after 5.4, but still trained on residual SFT traces.) Mystery solved: SFT goblins! [image…
@nrehiew_ @nrehiew_ on x
This likely means OpenAI does interleaved stages of SFT-RL-SFT-RL rather than the simpler SFT-RL-done pipeline we see with open models [image]
@chatgptapp @chatgptapp on x
“And down down to Goblin-town You go, my lad!” - The Hobbit, JRR Tolkien
@thsottiaux Tibo on x
Never talk about goblins. Our latest blog is live. https://openai.com/...
@openaidevs @openaidevs on x
Goblinmaxxing in Codex
@giffmana Lucas Beyer on x
In other words: their RL transfers/generalizes.
@phuguo Phillip Guo on x
Codex and I helped root cause goblins! We traced it to a reward signal intended to train the “Nerdy” personality - we found that it scored outputs with goblins higher, and as it boosted goblins in Nerdy training, the behavior generalized. See the blog post!
@laurentia___ Laurentia Romaniuk on x
Never did I imagine a feature I worked on would lead to goblin amplification, but here I am. Happy reading to those of you who are goblin-curious.
@tk.gg Matt TK Taylor on bluesky
“Unfortunately, GPT-5.5 started training before we found the root cause of the goblins.” — openai.com/index/where-...
r/slatestarcodex r on reddit
“Where the goblins came from” - a dive into ChatGPT's recent tendency to refer to goblins with annoying frequency
r/singularity r on reddit
Where the goblins came from
r/LocalLLaMA r on reddit
Where the goblins came from
@timkellogg.me Tim Kellogg on bluesky
OpenAI did a full post-mortem on the goblin situation and it has a lot of details on their training process — root cause: synthetic data — it emerged in 5.1, but they use the last model to judge outputs during RL, which favored goblins in later models. It then leaks beyond R…
r/stupidpol r on reddit
OpenAI trained ChatGPT to be “nerdy” and now it keeps talking about goblins
@carnage4life Dare Obasanjo on bluesky
OpenAI explains why their recent models like talking about goblins, gremlins, and other creatures in their metaphors. — It was a behavior they trained for the “nerdy” personality type for ChatGPT which then spread outside that use case given reinforcement learning rewarded the …
@seanmcarroll Sean Carroll on bluesky
Future generations will point to the moment when AI tried its best to warn us about the goblins, but we refused to listen. — arstechnica.com/ai/2026/04/o...
@jenitennison.com Jeni Tennison on bluesky
I don't know whether to laugh or cry. This will be the tip of a much harder to quantify iceberg. — openai.com/index/where-...
@markriedl Mark Riedl on bluesky
Reward your model for being nerdy, get nerdy behavior. — Train in your own output, pollute your dataset with verbal tics — openai.com/index/where-...
r/ChatGPT r on reddit
The Inside Scoop about Goblins
@fabknowledge @fabknowledge on x
GPT stands for “Goblins Producing Text”
@thatandromeda Andromeda Yelton on bluesky
“At the time, the prevalence of goblins did not look especially alarming. A few months later, the goblins came back to haunt us in a much more specific and reproducible form.” openai.com/index/where-... if I am very good, perhaps the universe will reward me with getting to solv…
r/accelerate r on reddit
OpenAI explains “Where the goblins came from”
@seldo.com Laurie Voss on bluesky
“We have no idea what we're doing” openai.com/index/where-...
r/KnowledgeFight r on reddit
OpenAI Codex system prompt includes explicit directive to “never talk about goblins”

Chronicles

OpenAI says its models, starting with GPT-5.1, “increasingly mentioned goblins, gremlins, and other creatures”, leading to prompt instructions to mitigate it

Related Coverage

Discussion