TITAA #65: AI Bakeoff Hits a Spaceship

Image Gen with 4o & More - 3D Gen & Games - inZoi - Creative Ngram Slop - Logic Puzzles

Mar 31, 2025

*The Colosseum, made in Blender, by wolf*

It’s been a long couple of weeks, with a lot of big AI releases! So no fun intro article. Long issue, visit the web? Nutshell: I took GPT 4o image gen for a strong test, and it didn’t measure up entirely to all promises, but it’s very good at some things. There were other good models released too. • Some great 3D gen code releases and demos. • Gemini 2.5 Pro experimental re-energized vibe game generation, and I pulled out a bunch of the latest (mostly three.js) examples. • The new big LLM models upset some of the creative writing benchmarks! • There was at least one very interesting AI-related game early release. • Plus some good games writing and news. • And as always, a lot of other links and tools and articles and I’m ready for a vacation. In two weeks I’ll report on some weird exhibits in the UK, see you then!

TOC (links on the web page):

AI Creativity (Image Gen, Video, 3D, Games Vibecoded)
Web Misc / Procgen / Arty Other
Games News (with Research & Game AI Section)
Narrative Text Gen (Benchmarks, Research)
Data Science / NLP / Tools
A Poem

AI Creativity Tools

Images

Within just a few days, there were a number of important image gen model releases, with the biggest bang coming from GPT 4o’s multimodal chat-during-image-creation release. It’s slow and expensive to generate images: behind the scenes, it seems to be an autoregressive model like Google’s Parti was, which means it’s intense to run; currently it offers limited generations and isn’t open to free users due to unexpected popularity, mostly from people making Ghibli-style knockoffs, which… never mind.

But it’s very good at following directions and has a few hidden super powers that are surfacing, like creating transparent backgrounds. It is very good at making comics, visual posters, infographics etc, including from rough sketches and little direction. Here’s a visual treatment it did for the first stanza of Yeats’ Stolen Child poem, which I provided, asking only for “appropriate visual treatment and flourishes.”

However, hype advertised tile-creation hasn’t been perfect, nor has sprite sheet creation. You’ll notice that this art deco flowers tile request (as viewed in a seamless viewer tool) works right/left but not top bottom:

*Created with a variant of this prompt from X adjusted for artistic tiling.*

Midjourney still does this well with the “—tile” flag. All of these came out quickly and produced good tiles:

With a sprite sheet request, GPT 4o definitely doesn’t understand sizes like 64x64 or 128x128. I would have to fix this to make it work, the figures don’t quite line up. The transparency background is good though.

Other claims about its abilities that I haven’t tested: generate mazes trick (you need to generate the solution and then ask it to remove the path), generate PBR texture from a photo, normal maps etc, generate stereoscopic 3D images (left and right eye view), create comics. I put it in a content creation contest below, so keep reading:

Reve.art Image — the startup from Christian Cantrell (formerly of the Adobe Photoshop Stability plugin, then of Stability Product, then of quitting and working on this plus writing novels apparently?!). The claim is that it’s much better at interpreting language instructions, and very good at text-in-image generation. It supports some back and forth with the model as you instruct it how to redo the image by prompt editing, but it’s not the same kind of intelligence as GPT 4o. I did find it good at my quick tests, but it fails the “wine glass full to the brim, almost overflowing” test that I’ve seen on X. Here’s my test on the poem text, where I specified an art deco style (it doesn’t fully get all the text, but did a pretty good job trying to?).

Now let’s do the very hard, “an asteroid hitting a luxury space ship” prompt, inspired by the Swedish SF movie Aniara that I highly recommend. This has been a challenging prompt for every model, forever.

Reve wasn’t too bad, and offers the option of AI-aided prompt rewriting, plus editing that if you like:

It struggled with showing the asteroid at point of impact, with an explosion of the right size. The one you see above is not exactly what I had in mind but does a good job of the concept: luxury ship, some kind of explosion that could be from an impact.

OTOH, the GPT 4o model, after 3 edits, got me the composition I was after. We started from this—and then I got it to make the rock smaller and the ship less boaty and more space-worthy:

It lost the plot on the nebula background and the ship looks more derelict than luxury, but this is fair:

Gemini Flash Experimental with image ability (their blog post)—while very good at some problems— just had no idea what was going on here, bless. I’m a bit bothered by its overly apologetic personality; it makes me feel bad. (Remember Anthropic hired went to a lot of effort designing Claude’s character?)

Ideogram 3.0 also launched with better text-art generation tooling, some style influence ability, and general model improvements including prompt following. I like it! It did some nice visuals for the poem, but did not even get close to including all the proper text, unfortunately:

Another open source text-art generation model dropped, LexArt: LeX-Art: Rethinking Text Generation for Visual Content. Haven’t tested it or seen a demo yet.

Midjourney is having another v7 output ratings game now. Sadly, it just all looks like similar generic AI art to me. My preference is to use my trained styles on MJ, and that could be replaced with other solutions… But it is good at tiles.

Video

AI video incorporation in projects, especially using multiple tools, is only getting better and better… Look, there’s been a lot of video coverage in other issues of mine, and I have to get this out the door. So just a few:

Cuco - A Love Letter To LA on Vimeo — A Paul Trillo AI and vfx project with an artist collaborator. I highly recommend the breakdown explanation to give an idea of the tools and skills used (I heard like 30 people were involved?). It features custom Loras, 3d modeling, and much more.

Two more tools for developing your AI video story: Hongos, by Samim, on github, and LTX Studio’s new updates support storyboarding and brainstorming down to detailed shots.

3D

SuperSplat has an editor. Not sure I knew this. Can I get my splats off Niantic Scaniverse, is the question. PlayCanvas generally is worth watching, they have done a ton of improvements recently. Also check on Splatrograph API, a splat interface from the command line.

VibeDraw (github) for tldraw. Turn a sketch into 3d.

Code releases:

gfodor/text2vox - deployed on Replicate. A text to voxel engine/tool that generates MagicaVoxel models. Huh. It worked well on this snowy pine tree request:

SynCity: Training-Free Generation of 3D Worlds — Generate complex and immersive 3D worlds from text prompts without any training or optimization. The web page has one small compressed demo world, which has incoherent landscape but I still found it weirdly compelling, as a fan of open worlds.

Bolt3d: from Google and Oxford, generating 3D scenes in seconds. No code.

Blender MCP: How to Set Up Tripo in Blender and Sync with Cursor (h/t Luokai). Probably goes well with Tripo’s Image2Texture model demo here! Btw: I tried another Blender MCP last week, and via Claude it was plagued by network timeouts, fails on the fetches for assets to third parties, Claude losing connection…. I think that using MCP via a more granular tool like Cursor is a good idea, since a fail won’t wipe everything out. Here’s one that suggests it does image-to-3d in cursor/Windsurf: blender-mcp-vxai.

Hunyuan’s fast 3D image input models on HuggingFace. Here’s 2mv’s demo. It wants 4 pictures for the angles. You could use Gemini Flash multimodal gen or GPT 4o for this! (I did, it worked great.)

Roblox Cube3D demo and open sourced model: This goes with their paper on 3D below in the Games / Research section. I had unconnected oddness with the results from some of my prompts, as you can see below:

*Castle tower ok, wall with gate has lots of floaters, tree with no leaves had 3 disconnected branches.*

After using that, you need the Image2Texture MV Adapter, which worked pretty well for me after some settings fiddling (a generated glb plus a MJ image):

And see this next section:

AI Generated “Vibe Games”

Since the release of Claude 3.7 (see my post Claude Goes 3D), the new Deepseek (try it with DeepSite), and Gemini Pro 2.5 Exp, everyone on X is building games with a prompt. NB: After I wrote that, I took out the quotes on “game.” My position is that it’s interesting to see what people want to build, and interesting that so many want to do games of some variety. Of course a good game is a lot of work to make. There’s a ton more to unpack in here, imo, but I’m on a deadline right now.

A lot of the LLM game generators are in some way using three.js, and the three.js team is making an llms.txt file to help as api context for models. (Here’s some background on that concept, and in my Data Science tools section I mentioned my MCP solution for it. Speaking of llms.txt, that levelsio dude had everyone do #vibejam games too, and I picked up a few links I liked, either related or not to his jam.

VAPOR - AI Adventure System. — This is a websim.ai special, not a vibejam game (afaik), that does interactive fiction on demand. It’s actually amazingly good, briefly, if you turn off the terrible soundtrack. You need to enter a theme or words first, then make sure you look around the 3d space for the clickable actions as things render.

Explore the World with Glenn. This started out as a cool 3D maps experiment but seems to have turned into another driving sim with complicated controls…. ymmv.

Planetary — a space flight sim with things to find in space. Brought my Mac to its knees, but maybe it was too many open tabs with vibejam games.

Sweetgrass, another 3d explorer world, where you try to pick stuff up. Btw, the vibejam games have “portals” that take you into another of the games in the jam. Be aware, you can fall in.

Indiana Bones 3d shooter, kind of. It’s so hard to navigate in these 3d spaces without a controller! I went thru a portal right away and got confused.

Paige Bailey gave a PDF of a book to Gemini 2.5 and got a CYOA game from it (reminiscent of what Steve Johnson did for NotebookLM but in this case, the UI is made by the tool). I need to play with this approach.

Minecraft as made by Gemini, running on Codepen.

Alex Chen and his team from Google are sharing a lot of good Gemini 2.5 projects, including an animated pelican riding a bicycle done in p5.js for Simon Willison (Gemini canvas link).

Misc Web Procgen Arty

HTML Review’s latest (Spring 2025), it is so good. Web art, ascii, poetry, and interactive oddities, go enjoy.

This animated frame is incredibly cool, I can’t remember who I got it from. It’s the combo of the pseudo 3d, animation, and mouseover turning the still image to colorful and alive…

Learn Threejs Shading Language and Signed Distance Fields - YouTube courses. Via mr.doob.

The Useless Web - via Clive Thompson. Just go somewhere random. Related: the Marginalia Search Engine (via Vicki Boykis).

Terrible as UI, weird fun concept: ZUI on Wikipedia links (Hypertext link zoom). Via Gorilla Sun.

Animation libs: react-bits: An open source collection of animated, interactive & fully customizable React components for building stunning, memorable user interfaces. — Lots of text effects which is why I rec.

This 3D game-like (inspired by Inside) portfolio is 2 years old and yet holy crap: https://pawel-brod.com/ and on github, with toolset.

StarVector models for SVG generation on HuggingFace.

Allison Parrish has a new book coming out (procgen poetry). Two of Pentacles, by Allison Parrish.

Games News

Steampeek — an indie game recommender system (h/t Matt Muir). Look for a game you like, get similar picks. It’s had db problems on my Strange Horticulture search but finally works today? Hah, it recommended Children of Clay which I had in my recs newsletter (it’s not super related, though).

The Balatro Timeline — LocalThunk. A look at how the famously addictive (I won’t even try it!) game got created and launched. Games are hard, especially/even indie ones? It’s a bit of a painful read, tbh.

I got a DM on twitter from a scout at Playstack, my eventual publisher. I was super excited but this also complicated things. This was a very tumultuous time in the history of the game because I was in limbo between nothing will come of this game and I want to move on with my life and what if I could do this as a job?

A new promising games newsletter, The Bathysphere, featuring articles and recs/links from Keith Stuart (who writes for the Guardian) and Florence Smith Nicholls (archaeology gaming friend) and Christian Donlan (someone I don’t know who I assume is great). Donlan’s first essay on the micro-trend of “resting” in games, often to admire the view, was terrific. Since I’ve been spending a lot of time looking at scenery in gorgeous Witcher 3, it resonated.

But then there are these moments, again, often up high, on the edge of a cliff, where you find a blanket laid out and a few pillows or a stack of books or an old radio, and you're encouraged to just stop and take everything in.

⭐️ A Theory of the MMO — A Raph Koster interview in Ryan Rigney’s newsletter. I really enjoyed this history on what was cool in multi user games… I wrote my dissertation about social behavior and conversation in a MOO. Go Raph! "The dream of having an alternate holodeck world you step into is too damn big for us to walk away from. People are going to keep trying."

Narrascope 2025 schedule is up! “NarraScope 2025 at Drexel University is an event that supports interactive narrative, adventure games, and interactive fiction by bringing together writers, developers, and players.” You can register now!

😳 Excel Hell: GDC_2025 slides: A_Series_of_Microtalks_about_Spreadsheets: “Are you ready for 200 slides about spreadsheets?” This was evidently popular at GDC (the Game Dev Conference), but I find the slides a bit triggering as someone who has struggled to teach college kids proper use of Excel.

Procgen: Mother Machine - The Art of Chaos on Steam News— a look at the procgen cavern generation (via Gorilla Sun). This is very detailed and interesting, worth a dive for developers. I did not choose the coolest pic:

Another procgen game moving into beta: Wunder Entertainment's Debut Title: Lost Isle — a first person survival thingy. “By procedural, it means almost everything in the gaming world is generated procedurally, from the biome and landscape generation to point-of-interest locations to the stats and effects of weapons and armor.” 🏆 And note that procgen Caves of Qud won Excellence in Narrative at IGF awards 2025.

Introducing Babylon.js 8.0. The web game engine has a bunch of great new features and especially rendering improvements.

Kenney of the awesome free game asset packs has a Starter Kit City Builder for Godot 4.3.

“This package includes a basic template for a 3D city builder in Godot 4.3 (stable). Includes features like:

Building and removing structures
Smooth camera controls
Dynamic MeshLibrary creation
Saving/loading
Sprites and 3D Models (CC0 licensed)”

Invisclues Infocom game archive for the text adventure fans.

Research & AI in Games

👥 Do we need to talk about inZoi? I think so; it’s a hot-selling preview of an AI-powered Sims-like social game, which already has 10K reviews at “very positive.” It’s being billed as the Sims 4 killer. It was the most wishlisted on Steam for a little while. The key thing of interest to me here is “AI-powered” and “hot-selling” in the same sentence. Is the disgust at AI-assisted gaming going away when the results are fun? Just like with image gen and Ghibli-fication?

AI statement from the developers:

Players can generate unique textures for character outfits and various items based on text input. They can also create 3D objects from image input, which can be used as interior decorations or accessories, and add distinctive motions to their Zoi using video input. Additionally, the actions and thoughts of Zois are controlled through sLM technology, enabling more engaging and intuitive interactions.

The inZoi folks are being defended for using small LLMs (SLMs) that are custom and copyright safe, although their early development articles sounded less restrictive. Some articles: PC Games on AI defense; Vice article with “strong rec” despite “a ton of AI”, article praising it for being crazy fun because of the AI: “Naturally, I started throwing a party within inappropriate places like the convenience store or K-pop training studio. Weirdos kept visiting me at my house and I enjoyed kicking them out. And all was right with the world, again.” Hopefully some of the games writers I follow will dive into this one in detail. FYI: Gameplay trailer. And FYI: “bug” in which you can run over children was fixed.

Sims-adjacent research:

Slice of Life: A social physics game with interactive conversations using symbolically grounded LLM-based generative dialogue by Mark J. Nelson and others. “The purpose of this paper is to provide a detailed account of Slice of Life’s design, how its social physics simulation enables interactive conversations based on social practices, and to illustrate how the generative possibilities of LLMs can be uniquely useful when applied as its natural language generation (NLG) system, without giving up authorial control of the gameplay or story.” Bold mine. This is built on “Ensemble with Social Practices” for state management but Gemini Pro 1.0 for LLM text generation. Think how it might be now!

The Rise of the AI Simulated 'Game' — A breakdown of an emerging trends in big model AI research on games, from Tommy Thompson’s AI games newsletter. A good overview of the world simulation thing going on in research labs (less so in prod).

Minecraft stuff: More Jarvis work on Minecraft, JARVIS-VLA, which has cute pictures. Also: CraftJarvis/MineStudio: MineStudio: A Streamlined Package for Minecraft AI Agent Development.

Roblox Cube: A Roblox View of 3D Intelligence - the research paper on their position and models (model release up in 3D).

Narrative & Fiction Gen

A couple moves on AI benchmarks, since the massive LLM updates of the past week. A surprise upset in the creative writing benchmark by Lech Mazur… Gemini 2.5 Pro did not shift the top scorers, but the relatively quiet release of the new GPT 4o did (it was quiet before the Ghibli style generations, I mean):

Meanwhile, Sam Paech updated his creative writing benchmark to include slop ngrams, by model, so you can see their verbal ticks. It’s tremendously useful, if you want to prompt them to avoid those. His ranking differs slightly.

Ngrams slop example for gemini 2.5 Pro exp:

The CHI (Human Computer Interaction) Conf’’s “Digital Storytelling” session, which includes the Midjourney ToyTeller experiment I’ve mentioned before. Also here: “WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling,” which helps users explore the space of possible narratives, visualize, and expand… applicable to games.

More takes on how the OpenAI generated short story is not actually good, from Lincoln Michel, and “not a violent crime against writing” from Max Read. Icyc.

Data Science / Tools

The new Gemini Pro 2.5 Experimental is evidently an excellent model, and is extremely good at coding. It’s free in AI Studio, which you should use. It’s available already in Cursor and Windsurf (switch it on). I recommend reading Simon Willison’s post about it. I gave both it and Openai o1-mini a picture of a logic puzzle grid and clues (4x4, moderate difficulty) generated with Puzzle Baron, and the clues, and they both solved it reasonably quickly, showing their work. The Claudes: well, Claude Sonnet 3.7, I guess not the thinking one in the app, failed. The Extended thinking version of 3.7 succeeded but it took a while. (Last time I tried it with a Gemini model it didn’t solve it, or have the tokens for the necessary thinking process.)

*The actual picture they were given and asked to solve for book properties.*

AI "Deep Research" Tools Reviewed - by Sarah Constantin. A good post. I’ve been meaning to do such a comp myself. She finds fault with all of them!

Introducing Ai2 Paper Finder | Ai2 — Ai2 Paper Finder is an LLM-powered literature search system that mimics the iterative paper-finding process. I am finding this useful.

MCP tooling: a good looking web browser option is Microsoft’s new Playwright MCP server. 👉 I’m also getting in on Firecrawl.dev, with MCP, for my web site and LLMs.txt needs; single page markdown via Jina’s Reader API isn’t cutting it anymore.

dylanhogg/llmgraph: Create knowledge graphs with LLMs, looks interesting.

Training and Finetuning Reranker Models with Sentence Transformers v4 (sentence transformers v4 is out, btw!).

⭐️ Probably the most interesting LLM research this past week came from Anthropic’s study of Claude Haiku 3.5 using “circuits,” which are their way of inspecting LLM behavior. (Post 1, on the “biology”, post 2 on circuit tracing.) They find evidence of planning ahead, even in autoregressive models like Claude. Coming from folks at Anthropic who worked on the lovely distill articles with interactive visualizations (Chris Olah, Adam Pearce..), these also feature lovely inline data vis. I need a vacation to read these—oh look, I have one now.

A Poem: Olympus

Text within this block will maintain its original spacing when published

I was a cobbler in the house of the Gods. 
It took a lot of anonymous people 
to make the mountain what it was. 
I did not make swords, axes, or bolts 
of lightning. I stretched leather until 
it fit comfortably on the feet of the divine.  
I made sandals for the Champion of War. 
I did my work, then went home. I never 
fought in His campaigns, but the skulls 
that were crushed beneath his heel sometimes 
made a sound. It was not like thunder. 
It was quiet. Dead leaves. 
My name. Wind through dry grasses.

—Matthew Olzmann

I’m on the road seeing museums, friends, and I hope some nice countryside with Norman churches and pubs for the next 2 weeks. Hopefully I’ll be back a bit charged up and less stressed by AI and US news. Happy spring!

Best, Lynn (@arnicas on mostly bluesky, ex twitter, mastodon).

The Bird Soup Diaries

Mar 31

This was a super useful post thanks. However, I had a laugh just now experimenting with the Ai2 research paper tool. It’s probably my fault for not being clear on my instructions, but I asked it for a list of ten research papers I could consult with on a certain dead poet, and it gave me the following: “As an AI with a knowledge cutoff date in early 2023, I can't provide you with an exact list of ten research papers published after that time. However, I can suggest some hypothetical titles and authors based on the kind of research you might find on…”

I love that it then went to all the effort of making up ten research paper titles and potential author names for these 😆

Expand full comment

2 replies by Lynn Cherny and others

Anton

Apr 14

Every time I read one of your newsletters, I feel like I’m walking through a library curated by a time traveler with a sense of humor and a deep GPU budget.

The GPT-4o bake-off was especially eye-opening—I’ve been experimenting with tile generation too and hit that same top-bottom seam issue. Totally agree that MJ still wins there for now. Also loved the asteroid-vs-luxury-ship test. It’s a great “stress test” for scene composition—half poetic, half rendering logic puzzle.

Really appreciate your insight into the increasing AI fatigue. I’m feeling it too—especially that eerie sense that AI-generated prose has started to haunt my inner monologue. That “slop ngram” bit from Sam Paech was hilarious and way too real. Also curious if you think we’re nearing peak prompt fatigue—or if the real renaissance starts once we stop expecting perfection and start embracing AI as a chaos collaborator.

Anyway, thanks again for making sense of the madness (and for spotlighting actual useful stuff). Looking forward to the weird UK exhibits next time.

5 more comments...

Things I Think Are Awesome

Discussion about this post