TITAA #70: Cottage Door Generation

Nano Banana - Kling Video - Minecraft Art - Narrative - Game Worlds - Glifs

Sep 01, 2025

Castle of Muiderslot, a Dutch masters work — Muiderslot by Gerrit Berckheyde (a Dutch master and good castle name)

I’ve been on vacation the past 2 weeks, some of which meant road tripping without laptops (egad! but I did use Copilot and git push from my iPad once because crazy), so this is a bit briefer than usual. Yay, you say! I was also buried in personal LLM coding projects that didn’t go super well, and therefore got expensive, so I’m running late. I did read a lot of fun books though, and watched a couple good shows, so the recs post today (for supporting subs) will be good.

Ok then, here we go…

Table of Contents (I promise to link on the site this time):

AI Creativity (Image Gen, Video, 3D, Misc)
Web Misc Links and Fun
Narrative (Misc and Research)
Games (Misc and Research)
NLP / Data Science / Tools
A Poem

AI Creativity

Images and Image Animation

The biggest news is probably the new Gemini 2.5 Flash image model, code-named Nano Banana (try here and on almost every API provider). It has strong ability to edit images or combine on demand, including a bunch of amazing abilities with maps and geospatial knowledge. Here’s an X thread from Google with some examples including fantasy maps, making comics with consistent characters, image edits, etc.

Alex Chen and various Google Creative Lab people have made some fun apps for it, like this isometric 3d graphic world thingy:

The Exquisite Banana app is really cute and fun. Generate body parts and randomized ones and it stitches them together (my contributions below were an owl in a party hat, a gymnast’s torso, and table legs).

There are glifs on glif.app for using it for things like Style Transfer, e.g. here (I asked for a witch on a pile of frogs, which it did as a combo of style and image editing, I love it):

TIA: I was gifted some glif.app credits and have been spending some time checking out their workflows and forthcoming agentic release. I’m having fun, and will write more about this tooling in 2 weeks.

Nano Banana is a weird little model — talented, but also still dumb? Look what it did when I asked for a version of the pixel art cottage it made on the left with the door open. Notice anything odd about the hinge?

Something similar happened in Google Whisk, when I asked for an animation of the door opening on a cottage generated there — wtf (I imagine the animation is Veo in this case, but the “logic” is all gemini’s gift):

How a model can not understand doors opening, and still be able to do the things Nano does is beyond me! For instance, draw an arrow on a map and ask for a picture of what it points at — I did this from a map zoom in on the Pyrenees:

When I asked it to justify this picture, I was told: “Of course. The image you provided shows a map with an arrowhead pointing towards a valley. I identified this as the Valle de Estós in the Spanish Pyrenees. The image I generated is a representation of the view looking up this valley, featuring the Rio Estós and the surrounding mountains, which is consistent with the geography of that region.” Fair enough! You might also like the fantasy maps various people have tried (X link).

In other image gen news, I have spent a lot of time in this little app on glif.ai that does Medieval-art style icons.

Finally, I am boggling at the news of Midjourney partnering with Meta; although Meta AI seems to be Going Thru Some Stuff (related to hiring for millions and those superstars bailing). I guess this is related to Holz not wanting to manage running an API himself (he never liked the idea of offering an API which always seemed crazy to me) and his interest in 3D gen.

Video

Kling 2.1’s start and end frame animation is very good, and popular. Here it is via a glif.ai app connecting my nano banana cottages with the weird door, despite the odd design:

The Kling video gen from start and end frames is being used on a lot of Nano Banana gen clips, including this massive “ride thru Middle Earth” video (on X).

Hailuo has start and end frames too, via Replicate here.

Announcing Real-time Video - A Krea waitlist for this. It looks really nice, but we don’t know yet…

Yan – Foundational Interactive Video Generation — via Dreaming Tulpa. Another game world creator and editor, including styles. They say HF model coming soon?

You can now use Wan 2.2 to generate video from a reference image and an audio file.

Probably deserves the “weird” newsletter, but this is about a theory of mind to enhance long video generation (OmniHuman 1.5).

ToonComposer for helping make cartoons, with code. Not to be obnoxious, but is this now obsolete, with nano-banana and Kling 2.1 start-end generation?

And with agentic tooling that helps hook things together, like Fabian Stelzer’s glif agents (in dev and beta testing) to make windows paint style mini movie riffs (link to X), this is one screen grab from the video:

3D

World Labs (Fei-Fei Li’s thing, her X post here) is allowing companies and people to test their 3D environment generation, which is very game-like and handles artistic styles, too: for example, this one on X from Martin Casado. Rosebud AI games is also sharing some tests on X.

I’m tentatively impressed by the results of Hunyuan 3d-2.1 from a quick bad demo case, another cottage pic where I didn’t even bother removing background:

VoxHammer - editing in 3d space with prompts (has code, demo coming). This (Tinker) also claims similar on splats but no code yet.

Snap-Snap, also with code, to make human 3d splats from 2 photos in milliseconds, it says.

3D Layered Text: The Basics: “A client asked me to create a bulging text effect. With a bit of cleverness and some advanced CSS, I managed to get a result I’m genuinely proud of, which is covered in this three-part series.”

Other Stuff

Microsoft’s VibeVoice is open source. “VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.”

mendableai/open-lovable: 🔥 Clone and recreate any website as a modern React app in seconds. Did you know Lovable is one of the top most visited AI providing sites? Now you do.

Web Fun Misc

Big Box Collection: “Remember When Video Games Came in Big Boxes?” well now you can look at the boxes in 3d. An archaeo-historical-games project of love.

Christian Marclay, Doors video collage - I saw via Matt Muir’s Web Curios first, then Kottke; I have a thing for door clips. Windows too. Unrelated to my attempt to generate pixel art cottages with opening doors (well, maybe).

Color Lisa - Color palette masterpieces of the world’s greatest artists. via codepen nl. “Color Lisa is a curated list of inspiring color palettes based famous work of the world's greatest artists. Each palette was painstakingly created by color obsessed designers, artists, museum curators, and masters of color theory.”

Yoav Goldberg’s eternal struggle yin yang web idler.

Kevin Kelly’s extensive and useful report on Everything He Knows about Self Publishing.

Narrative

NYT on romantasy and fan fiction (gift link) that’s a popular crossover path right now - Dramione (Hermione/Draco). I guess I could have guessed romantasy was being partly fueled by fanfic writers. Also that romantasy was propping up the book industry rn. (I’m not really reading it myself.)

Showrunner wants to turn you into a happy little content prompter for the ‘Netflix of AI’. “Who doesn’t want to be an unpaid worker for an entertainment megacorporation?” Needless to say, snide reactions to this. I myself haven’t bought their demos or story yet, despite the fact that it could be big with fan fic folks if it has certain capabilities. However, it’s not clear to me that will pay their bills.

Listening to Saatchi describe what he wants Showrunner to become, it actually sounds a bit like Roblox and Fortnite. Not the building or battle royale of it all, but rather the way those games encourage players to create their own maps, share them, and get other people to do the same thing. The Roblox Corporation and Epic have both built platforms where being a consumer can also essentially mean being a worker — one whose labor serves only to contribute to the corporations’ bottom lines.

See also Runway’s Game Worlds in the Games section.

Research

There’s been a lot since various games AI conferences (Conference on Games (these are locked though) in particular and now AIIDE).

Stephen Ware posted a link of his lab’s papers on Bluesky here.

A Case Study on User Perception of Parameterized LLM-Generated Narratives | Semantic Scholar, one of the CoG ones which means locked. “The results support our hypothesis that LLMs can effectively generate stories with discernible intensity variations in response to parameterized input, and that users can reliably perceive these variations as intended.” I don’t have an unlocked link yet.

TencentARC/AudioStory: AudioStory: Generating Long-Form Narrative Audio with Large Language Models. Hmm. This could work on my insomnia the way I am using Librivox Oz books right now (very effectively).

Do Language Models Agree with Human Perceptions of Suspense in Stories? A Mark Riedl joint. Tl; dr: no. “We conclude that, while LMs can superficially identify and track certain facets of suspense, they do not process suspense in the same way as human readers.” Disappointing but unsurprising.

Narrative Memory in Machines: Multi-Agent Arc Extraction in Serialized TV. “Serialized television narratives present significant analytical challenges due to their complex, temporally distributed storylines that necessitate sophisticated information management.” This is really hard.

Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs. Visual narratives like comics and graphic novels.

Games

They're Making Dutch Golden Age Art in Minecraft - Ryan Rigney on important artwork efforts. “And by "they" I mean 23-year-old furries from the Netherlands.”

There is still kind of a taboo or something about the Conquest Reforged mod adding all these other blocks and making things rather too realistic—like it’s no longer Minecraft (hahaha). I do understand that vanilla Minecraft is something cool and fun to play, but I mainly use it for artistic expression rather then gameplay.

Runway Game Worlds in beta but now available to try — still feels like a new version of AI Dungeon and if it were book-based, a more free-form Hidden Door, but delivered as a visual novel/comic with images at each stage.

They have high production values (music, etc) and game-focused framing. For instance, you start with points in the mystery one:

And similar to Hidden Door, you have preset actions you can take, or try your own:

Ethan Mollick posted some screen caps of his efforts on X, mine aren’t generating the images right now. Again, with Nano Banana, making such things with character consistency seems much easier now…

A few misc items:

How to write in video game script format by Greg Buchanan. "What does a video game script even look like?", "Should I use Final Draft?", "What do you mean, there's no such thing as 'a' video game script format?', and more... (Updated 2025).”

Has Tides of Tomorrow cracked the narrative multiplayer formula? — “Shared storytelling doesn't have to be about voting on choices.”

Cutscenes | A Hit of Pure Videodrome: Sam Barlow and Natalie Watson on Full-Motion Video Games. “With “Immortality,” “Telling Lies,” and “Her Story,” Barlow may have finally cracked the code of the interactive film.” Via v21.

Hylian 101: The Secrets Hiding in the Language of Zelda: Is Hyrule lore so rich that it created its own linguistic traditions? via Critical Distance.

I just learned about Atelico, building AI engines for game studios.

Research

CoG program here. IEEE, so papers here are locked. We need a solution.

PuzzleJAX: A Benchmark for Reasoning and Learning. We introduce PuzzleJAX, a GPU-accelerated puzzle game engine and description language designed to support rapid benchmarking of tree search, reinforcement learning, and LLM reasoning abilities.

HeroBench — a research project for long range planning in an RPG like setting. I include for the charm of the art:

Talking to NPCs: Three LLM-Driven Approaches to Dynamic RPG Dialogue | Semantic Scholar. This is one of the CoG papers which means locked till I find an unlocked link. “Quest of Aivengarde is a custom-built roleplaying game that features a traditional dialogue tree alongside three alternative systems, each incorporating large language models to varying degrees — ranging from rephrasing to fully open-ended conversations. … The prototype is intended as a modular testbed for future research, offering a flexible framework to experiment with dialogue models in controlled, interactive settings.”

Evaluating Quality of Gaming Narratives Co-Created with AI | Semantic Scholar. Another CoG locked paper. “This approach synthesizes story quality dimensions from literature and expert insights into the Kano model framework to understand their impact on player satisfaction, and can inform game developers on prioritizing quality aspects when co-creating game narratives with generative AI.”

Also see Stephen Ware’s list of his lab’s papers here. Here are some ugly (sorry) bare pdf links to make some easier for you: siler2025pareto (narrative planning for NPCs), khan2025detecting (detecting duplicate states), birchmeier2025speeding (speeding up narrative planning), siler2025answer (answer set programming and theory of mind).

Data Science / NLP / Tools

OpenAI has been beefing up their Codex programming tool, and have docs. I am hearing good things about GPT 5 (Claude has had issues recently) and am worried I might need to get in there.

Learning Deep Representations of Data Distributions — an ebook for teaching about neural nets.

OLMoASR: A series of open speech recognition models | Ai2 from Allen AI.

Dedicated machine translation from Cohere: Command A Translate: Secure translation for global enterprises.

MiniPCM’s new model is “A GPT-4o Level MLLM for Single Image, Multi- Image and High-FPS Video Understanding on Your Phone.” It sounds very good.

iblameandrew/local-deepsearch-academic: An implementation of Google Deep Search 🕵️ with support for 1000+ references, local inference, chatting with your scraping session using RAPTOR, and report generation.

Poem

Text within this block will maintain its original spacing when published

From an old Italian urn
grew a red-orange flower like a banner,

a lone, stray mood uninterested in progress.

The air was damp and sweetish
with tuberose and lemon.

Meats and herbs seethed in oil and acid.

Many nights of brittle hail and long, stiff whips
of lightning rattled the wooden shutters.

Days filled with oppressive heat that seemed to loop
like a rope with a noose. Here is summertime.

The world was another several thousand years
older in an afternoon.

My mind sunk into the depths
of crummy fantasies which held it like concrete.

A cigarette smoking itself in the ashtray.
A great wetness staining the mountains blue.

The earth saying language and vision
are nothing.

— Cattiveria by Sandra Lim

Good luck heading back to the office. At least maybe the heat is done…

Best, Lynn (@arnicas on mostly bluesky, ex twitter, mastodon).

Things I Think Are Awesome

Discussion about this post