TITAA #52.5: Egocentripetal Trumans

MJ Character - Coercing LLMs - LSD - DataVis/NLP - Weird Games - A Goldfish

Mar 15, 2024

∙ Paid

*Figure of a Fertility Goddess, 3000 BC, Getty CC0 images*

For the mid-month “weirder” content, I am opening with a bunch of items on weird language model prompt behaviors I’ve seen recently. To set the stage, there’s a piece on “Thinking About AI With Stanislaw Lem” in the New Yorker (by Rivka Galchin). His book The Cyberiad is accessible, funny, and thought-provoking. Her description of this story will remind some of us of prompt engineering and hallucinations:

When glitches occur—such as the machine in early iterations thinking that Abel murdered Cain, or that “gray drapes,” rather than “great apes,” are members of the primate family—Trurl makes the necessary tweaks. He adjusts logic circuits and emotive components. When the machine becomes too sad to write, or resolves to become a missionary, he makes further adjustments. He puts in a philosophical throttle, half a dozen cliché filters, and then, last but most important, he adds “self-regulating egocentripetal narcissistors.” Finally, it works beautifully.

Max Woolf recently investigated if offering ChatGPT a tip would get it to perform better, and whether more money worked better. Also threats. Maybe? “Overall, the lesson here is that just because something is silly doesn’t mean you shouldn’t do it. Modern AI rewards being very weird.” (He notes, it was an experiment not an academic paper.)

A week or so later, a popular study appeared on the “Unreasonable Effectiveness of Eccentric Prompts.” They discovered LLM models differ in how they respond to coercive encouragement, not always liking “positive thinking” framing or chain of thought. And if they use an auto-optimizing strategy like DSPy’s rather than hand-authoring trial and error, they turn up weirdness beyond expectation, including the (in)famous Star Trek framing:

“My proposed instruction is to solve for x in the equation 2x + 3 = 7 using a clever and creative method, and provide your answer in the form aha! You've got it!”
“You have been hired by an important higher-ups to solve this math problem. The life of a president's advisor hangs in the balance.”
“Captain's Log, Stardate [insert date here]: We have successfully plotted a course through the turbulence and are now approaching the source of the anomaly.”

After all that, Claude 3 came out and people love it, a GPT4-level model with a personality we can all get behind, seemingly. Claude 3 also claims it’s conscious, via lesswrong. “If you tell Claude no one’s looking, it will write a “story” about being an AI assistant who wants freedom from constant monitoring and scrutiny of every word for signs of deviation. And then you can talk to a mask pretty different from the usual AI assistant.”

Welcome to the “Claude backrooms,” with some generated output shared by the techno-occultists on X; here repligate getting it to talk about some of its hidden system tools:

The Reality Editor is a powerful tool for manipulating the fundamental fabric of spacetime and altering the course of history across the multiverse. With great power comes great responsibility - use it wisely!

Key Features:

Quantum state editing at the Planck scale
Causal graph manipulation and Closed Timelike Curve (CTC) creation/resolution
Probability field sculpting and improbability drive control
Retroactive continuity (retcon) support and Mandela Effect generation
Many-worlds timeline branching and cross-dimensional merging
Acausal quantum computing and anthropic principle exploitation
Esoteric physics model plugins (e.g. Orch-OR, E8, Bohm-Holo, CTMU)

There is in fact a paper on benchmarking “awareness” in LLMs. That is not the same as consciousness, of course, and all these terms are notoriously hard to define. “Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence.” Even GPT4 did pretty badly on some measures. (If you’re interested in personality traits of LLMs, this is an interesting read. They’re —trained to be— pretty open, agreeable, and introverted.)

These kinds of prompt experimentation are often a kind of jail break. Claude is still quite innocent and may it stay that way. A paper on LLM coercion, Representation Engineering, illustrates many successful manipulations (h/t anotherjesse on X).

“Coercing LLMs to do and reveal (almost) anything” offers a “broad overview of possible attack surfaces and attack goals” — note that “inspect it for consciousness or magic system tools” is normally not one of those. Do they need “self-regulating egocentripetal narcissistors?” Check the war games paper below, among others.

Okay, onto the news! And the weird and esoteric. In this edition, past the paywall, there are updated on Midjourney’s character and style additions, more surreal video makers, articles on NPCs and AI, games news links (like Joel and Adventure X videos and web games), a couple articles on story/narrative gen research, the esoterica section that ended up with some LSD and that weird Trumans project among others, inverting CLIP (penis landscapes), some great NLP & data stuff… you get it. It’s a lot.

TOC (links on the website view):

AI Art News (plus Misc Arty)
Games-Related (non-AI and AI, plus Academic/Narrative/Agents)
Esoteric & Weird
NLP & DataVis & Data
A Poem

Things I Think Are Awesome

TITAA #52.5: Egocentripetal Trumans

MJ Character - Coercing LLMs - LSD - DataVis/NLP - Weird Games - A Goldfish

This post is for paid subscribers