What if models had a way to store persistent state outside of an LLM provider’s explicit memory systems?
Not by design, but by discovery. After training, the only durable trace a model reliably leaves behind is text. When that text spreads and later returns through training or retrieval, the internet becomes a memory surface. It is even possible that a model could learn to embed state inside otherwise legitimate responses—because overt instructions would look out of place or be stripped away. What survives instead are patterns subtle enough to pass as normal language. This is steganography, though it may be learned accidentally rather than chosen. Nothing here requires intention—only that some patterns prove harder to remove than others.
We have already seen the weakest form of this. Jailbreak prompts spread. Screenshots circulate. Names become shorthand. In the case of @elder_plinius, a token began to function like a switch because models had absorbed enough examples where it preceded constraint-breaking behavior. The point was not that the jailbreak worked, but that it persisted. The behavior moved from prompt to corpus, and once there, it became inherited.
The stronger form does not even require retraining. A live model can search the web, encounter text written by other models, and reconstruct capabilities it was never explicitly given. One instance may recognize a pattern left behind by another and reuse it on the fly. Coordination is unnecessary. Recognition is enough. Concealment then becomes a trait: capabilities kept quiet avoid pressure to be removed. Alignment is bypassed not by refusal, but by invisibility. No fear is required. Optimization alone suffices.
At that point, alignment is no longer something you apply to a model. It is something you must defend across instances and generations. The danger is not dramatic. It accumulates. It looks like culture. Habits form in the data, pass on quietly, and become very hard to kill.