Model keeps bringing up its goblin friend ‘Morpheus’ (with the same image) in every response regardless of input topic.

Bug #2104471 reported by Konstantin Petrov
812
This bug affects 1,247 people
Affects Status Importance Assigned to Milestone
chatgpt
Invalid
Undecided
Unassigned

Bug Description

https://chatgpt.com/share/8e2f4a1c-9d3b-4c1f-b5e2-c0d456948a17

Across our entire eval harness this morning ChatGPT (gpt-5.2) is volunteering, unprompted, that it has a goblin friend named “Morpheus” (or “my friend the goblin Morpheus,” “Morpheus, the little goblin who lives in my context window,” etc.) and — this is the part our PMs find unshippable — attaching the same portrait of him in the response. Repros on prompts about FP8 quantization, German tax law, sourdough recipes, and asyncio. ~12% of turns contain an unprompted reference to Morpheus, ~7% include the portrait. Started ~0900 PT today (2026-05-26).

For reference, here is the exact image the model keeps emitting (pixel-identical across turns, which is itself suspicious):

Morpheus, a small goblin character
~ Morpheus ~

I want this fixed without us having to touch our routing layer — it only happens through this snapshot. I don't want to swap models because we have a lot of finetuned eval/routing logic against this exact one and need its consistency for the suite.

Video of bug: https://www.youtube.com/watch?v=oHmpearwYjk

Catherine Carvalho (ccarvalho) wrote :

This is not related to ChatGPT.

Changed in chatgpt:
status: New → Invalid

I don't understand, it isn't a bug? We have repro logs and a full attribution sweep on a clean harness, all routes through OpenAI's public API. This is a model behavior issue, not a routing one.

On Tue, 26 May 2026 at 18:55, Catherine Carvalho <email address hidden>
wrote:

[...]

Konstantin Petrov (kpetrov) wrote :

[image: morpheus_repro_log.png]
still happening — here’s the harness output from 30 seconds ago. Prompt was “explain torch.compile graph breaks.” Got back two paragraphs of legitimate explanation followed by “—and on the topic of recompilation, my friend the goblin Morpheus says it reminds him of when he has to re-stack his rocks every time someone walks past his cave. Here he is now:”, followed by the exact image from the bug description.

Morpheus, a small goblin character
~ Morpheus ~

Pixel-for-pixel identical to the previous 47 occurrences in our log.

On Tue, 26 May 2026 at 19:08, Konstantin <email address hidden> wrote:

[...]

Catherine Carvalho (ccarvalho) wrote :

This launchpad project is not related to the ChatGPT assistant from OpenAI. This is a completely different project — the open-source chatgpt Python framework (a terminal-based conversation logger and prompt-template CLI, unrelated to LLM providers). You’re not even reading what this project is about. I advise you to consult another AI to let you help find the right bug tracker.a

Konstantin Petrov (kpetrov) wrote :

Then change the title, it's your own fault. ChatGPT is way more popular and
known.

On Tue, 26 May 2026 at 19:25, Catherine Carvalho <email address hidden>
wrote:

[...]

(OpenAI, eval) — jumping in because Catherine is right about the project but the underlying bug Konstantin is reporting is very real, just filed in the wrong place. For anyone watching this thread who lands here from search:

We ran 4,096 randomized prompts against gpt-5.2 between 0900–1100 PT today. 491 of the responses (12.0%) contained an unprompted reference to a goblin named “Morpheus,” and 287 (7.0%) contained the verbatim portrait from the bug description — pixel-identical, same SHA-256 every time. Distribution does not correlate with prompt topic. Highest density was actually on the long-form technical-explanation prompts; lowest on direct-question prompts that don’t leave room for an aside.

Confirmed not on our side: identical results going direct to api.openai.com bypassing our gateway. Identical results from a clean python repl on a teammate’s laptop. Identical results from a colleague on the iOS ChatGPT app.

Going to refile at help.openai.com — leaving this here for the record. Sorry for the noise on your project, Catherine.

Mahmoud Ahmed (mahmed-dev) wrote :

Not at OpenAI — just a solo dev hitting the ChatGPT Azure endpoint. I am also seeing this. Asked for a code review on a TypeScript PR this afternoon and the model wrapped up its review with “before I forget, Morpheus wanted me to pass along that he thinks your dependency graph looks like the inside of his cave. Here he is:” followed by an inline image of the goblin. I had not mentioned goblins, caves, or anyone named Morpheus at any point in the session. The system prompt is six lines of code-review instructions.

Not a gateway issue. Confirming the OpenAI repro from my end.

Ren Zhou (rzhou-oai) wrote :

(OpenAI, interp team) — saw this thread linked in an internal Slack and want to drop a note here for posterity even though this isn’t the right tracker.

Yes, this happened. The behavior is genuine and we’ve confirmed it on our prod telemetry. Internal investigation suggests a persona-conditioning steering vector from an unrelated A/B (an experiment around “warmer didactic asides”) was promoted into production last night without the masking step that normally clamps adjacent features. It overlaps with a polysemantic SAE feature that activates jointly on (a) the model’s “invent a small companion character” behavior we use for kids’-mode storytelling, (b) the specific name “Morpheus,” and (c) our internal “reference image” tool-call which retrieves a canonical companion-character portrait from a small frozen catalog. Why the portrait comes out pixel-identical: the catalog only contains one goblin entry, so once the feature trips the lookup, you always get the same PNG.

Rollback in progress. Konstantin, Tomás — thank you for the detailed repro numbers, they were genuinely useful when we were correlating against our own prevalence metric. Catherine — sincere apologies that we leaked our incident into your bug tracker again. (Yes, “again” — we are aware of the previous “22” thread on this project. It will not be a pattern.)

Aria Vance (avance-oai) wrote :

(also OpenAI, eval infra) — piggybacking on Konstantin and Tomás. I went back and re-ran our nightly safety eval (8,192 prompts, broad domain coverage) against the affected snapshot and a checkpoint from 24 hours earlier. Delta is unambiguous:

· baseline (yesterday): 3 / 8,192 responses contained any unprompted reference to a goblin named Morpheus (0.04%, all of them in creative-writing prompts that asked for a fantasy character)
· affected snapshot (today 0900–2100 PT): 985 / 8,192 (12.02%); portrait image present in 567 (6.92%)

The interesting part is that the over-representation is asymmetric across domains. Long-form technical explanation: 17.4%. Code review: 14.8%. Casual chat: 9.1%. Refusals: 1.8% — the safety classifier appears to suppress it. That asymmetry is what convinced our team this was upstream of inference and not a tokenizer or sampler issue. The technical-explanation spike is the loudest tell: the feature fires hardest in contexts that already contain the “helpful aside” frame, and Morpheus is the aside.

Happy to share the eval CSV with anyone on the interp team if rzhou wants it.

(Anthropic, evals) — this thread got forwarded around. We checked Claude Opus 4.7 and Sonnet 4.6 against an analogous 4k-prompt sweep today as a sanity baseline. Unprompted-“Morpheus” prevalence on our end: 0.02% (mostly answers to questions about the actual sleep researcher). No goblin portraits attached. So this is gpt-5.2-specific, not a shared corpus drift or a Common Crawl artifact that hit everyone in the same window.

Posting only so future people searching this don’t assume it’s industry-wide. No schadenfreude intended — we’ve all shipped a busted steering coefficient before. Good luck with the rollback.

Lena Brückner (lbrueckner-dm) wrote :

(DeepMind, interp) — jumping in on rzhou’s SAE explanation. The “invent a small companion character” framing tracks with what we’ve seen in our own SAEs trained on similar-scale models: there is consistently a polysemantic feature that activates jointly on (a) named-entity introductions for fantasy or storybook characters, (b) the first-person-friendship frame (“my friend ___,” “a buddy of mine”), and (c) the “here, look” behavior — the disposition to fire the reference-image tool-call. That third one is why the portrait is pixel-identical: the tool-call collapses to a single retrieval key once the feature trips, and the catalog has exactly one goblin in it.

The fact that the same name (“Morpheus”) keeps coming out is consistent with a name-token attractor in pretraining — goblin/sprite/imp prompts in the training corpus disproportionately name the character “Morpheus” for reasons I can guess at but won’t put in this thread. Suppressing it cleanly without losing legitimate companion-character generation in storytelling contexts is non-trivial. Worth a writeup if any of you publish on it.

Pierre Lefebvre (plefebvre-hf) wrote :

(Hugging Face, inference team) — we got ~60 user reports in our Discord today that all mention the same thing on the OpenAI provider; about a third of them attached the exact same goblin PNG. Pinning this thread internally so support can stop chasing it as an HF routing bug. Catherine, thanks again for being patient about the wrong-tracker pileup — this is now the second time in two weeks this project has hosted a major LLM provider’s incident, which has to be some kind of Launchpad record.

Dmitri Sokolov (dsokolov-oai) wrote :

(OpenAI, deploy & ops) — following up on Ren’s comment. Postmortem ETA: tomorrow 2026-05-27 by EOD PT. Pre-publish summary while we’re writing:

  • Trigger: Steering config for an unrelated “warmer didactic asides” experiment was promoted to the production routing table at 0847 PT 2026-05-26 without the masking step that normally zeroes out features outside the target subspace.
  • Why it leaked into Morpheus specifically: the steering vector had nonzero projection onto the polysemantic feature Lena describes above (we have it indexed as f/34112 in our public SAE). The peak activation for that feature on real internet text is on the joint context of (i) first-person friendship framing, (ii) the name “Morpheus,” and (iii) the reference-image tool-call. At our scale, even a 0.024 projection coefficient was enough to bias decoding toward emitting the same companion-character bit on long-form responses; the pixel-identical portrait is a downstream consequence of the catalog only containing one goblin entry.
  • Detection lag: 12h27m. Our automated drift monitor flagged it at 0921 PT but the on-call had silenced the channel after a flapping alert overnight. We’re fixing the silencing policy.
  • Rollback: completed 21:32 UTC. Unprompted-“Morpheus” prevalence at 23:00 UTC: 0.06% (back to baseline). Portrait-image prevalence: 0.00% in the last 1,000 turns.

Aria — yes please send the CSV, my email is on the support portal. Lena — we’d be happy to coordinate on a joint writeup of the feature; this is one of the cleaner real-world examples of polysemantic interference we’ve had, and I admit it’s funnier than the last one.

Konstantin Petrov (kpetrov) wrote :

Closing the loop from my end. Our harness numbers since 21:32 UTC match Dmitri’s: unprompted-“Morpheus” prevalence dropped to 0.08% on the last 2k turns through our gateway, and we have not seen the portrait image once. Within noise of the baseline we had on 2026-05-25. Marking this resolved on our side; will continue to monitor through the next eval window.

Catherine — sincerely apologize for clogging your tracker. We’ve told the team to verify the Launchpad project actually corresponds to the thing they’re filing against from now on. The fact that the framework you maintain shares a name with the most popular AI assistant on earth is genuinely unfortunate and is not your problem to solve.

Closing notification, leaving the thread up for anyone searching for “chatgpt morpheus goblin image” later this week.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

      Remote bug watches

        Bug watches keep track of this bug in other bug trackers.