TITAA #68: Midsummer Melt
Moving - Video Gen - Panoramas - Wet Three.js - World Models - Animated Archives

I’ve been moving house in France, during a heat wave; I thought I’d have time to linger over this newsletter while waiting for deliveries, and maybe even do some vibe coding, but it was instead a sweaty week’s marathon of packing and loading and unloading, Ikea delivery building, box destroying, driving back and forth to scrub the old place sweating buckets (“hire someone to do this,” I tell myself every time, yet…), trying to find my power cords, rush ordering a camp cot so I can sleep in the only cool room—you get the picture. Even if I am on a cot for the foreseeable, I’m doing it in a renovated 15th century cottage, on a quiet tiny road, and I have beams and internal walls like this:
My house was part of a ring of houses built around the then castle, which is now a medieval garden (I know, right?!). An old french historical document I found in the Bibliothèque Nationale called these houses “miserable shacks.” 😅 I won’t share any of the info I found about the shacks so as to avoid uninvited guests. 🏚
Only the bare minimum today, what I could do in a few hours….
Table of Contents (links on the web site):
Request for feedback
NLP / Data Science / AI Tools (DH, Data Vis, Misc AI Tooling)
Request for Feedback
If you are a subscriber or even an occasional reader, I’d love to get your input. The survey I’ve run for paid supporters has been interesting, and I’d like to contrast with the broader readership. Answer whatever you like here.
AI Creativity
Image Gen
From a model tool perspective, the open source releasing of Flux.1 Kontext [dev] for image editing that can run locally (for non-commercial uses) is probably the biggest news. Here’s a reasonable post about using it, including training it, on Fal. It can be run easily in Comfy UI; here’s the comfy blog post about the Kontext features:
Character consistency: Preserve unique elements of an image, such as a reference character or object in a picture, across multiple scenes and environments.
Style Reference: Generate novel scenes while preserving unique styles from a reference image, directed by text prompts.
Local editing: Make targeted modifications of specific elements in an image without affecting the rest. E.g.:
And here’s a fun “Create and edit images with your voice” repo using Kontext.
Jarvis Art is presenting itself as a competitor to Photoshop or something, “AI assisted photo retouching,” with perhaps finer detail editing going on?
Video Gen
Lots of content out there as the models run in sharp competition. Kling 2.1 has added audio gen like Veo 3, and a few other improvements.
Animating archive photos from Harley Davidson — colleagues in Google Arts & Culture offer a nifty way to look through old archival photos, and see them moving with Veo. This is really fun and Emmanuel Durgoni did an amazing job on the UI. You can
Here’s a project that’s cute and fun despite caveats, Fairy Gen. Turn a children’s drawing (or anyone’s, I guess) into an animated story sequence. “… a novel framework for generating animated story videos from a single hand-drawn character, while faithfully preserving its artistic style. It features story planning via MLLM, propagated stylization, 3D-based motion generation… FairyGen produces expressive motion, stylistically aligned backgrounds, and cinematic compositions.” So, while I don’t love generating stories AT kids, I think the output looks really good. The authors’ interest in clearly much more on the animation side than narrative. I’d prefer to see a collaborative narrative approach on the story side?
Big players: also a new version of Hailuo’s video gen came out. No audio, but many have pointed out how cheap it is to use that plus add audio yourself with MMAudio, also very cheap.
Justine Moore on X has been sharing examples of viral gen video content on IG and TikTok and YT. I guess she’s been getting “why so much AI slop share” questions, but I think it’s always interesting (sociologically) to see what’s popular (I also pay for Garbage Day). X links, sorry, but: here’s food cannibalism, AI Harry Potter series, cutting planets open with a knife, and a thread she RT’d from Olivia Moore about what trends are popular.
“Real-time” video generation with Wan 2.1 — demo. Actually not, but very fast? Obviously limited quality, but speed matters for some apps. Definitely use the prompt enhancer. My “genie coming out of a bottle” looked like an elk until I did.
Sekai: A ton of first person video and drone footage, for training first person view video (and game) footage. It’s all over the world, and has extensive interesting metadata for people who like locations:
They also “use a subset to train an interactive video world exploration model, named YUME (meaning "dream" in Japanese).”
3D
ImmerseGen - Agent-guided world generation, for VR. This looks gorgeous! (Code coming.) I really want to play with this, they have video clips from VR headsets. (H/t Dreaming Tulpa.)
DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization — there’s code with this project to create panos but no HF demo, I think. On the other hand, there are fun interactives on their page, where you can change the scene and rotate it. The examples are boring except for the garden which is a castle :) Very nice! (Also see DreamAnywhere, same week.)
“EmbodiedGen generates interactive 3D worlds with real-world scale and physical realism at low cost.”
And there’s Hunyuan Gamecraft too but just a paper (another “navigate a generated world with your WASD keys.”)
Lots of parts, animation, and rigging work ongoing…
High quality mesh gen at Sparc 3D
AnimaX — has some three.js scenes with animated and skeletonized objects to play with.
Splats: Spark, an “advanced” three.js renderer for splats. With nice examples.
AnimateAnyMesh — as it says. Via Dreaming Tulpa. Not gonna lie, this first example of theirs is creepy (imagine this thing bouncing):
Matrix Game: World Foundation Model. More Minecraft worlds, but: “Extensive experiments show that Matrix-Game consistently outperforms prior open-source Minecraft world models—including Oasis and MineWorld—across all metrics, with particularly strong gains in controllability and 3D consistency. Human evaluations further confirm these findings, highlighting the model’s ability to produce physically grounded and perceptually realistic interactive videos in diverse scenarios.”
A not well-named project: “Virtual Community, An Open World for Humans, Robots, and Society.” A very ambitious playground for generating “embodied” agents, both robot and human-like, in a geospatial context. Kind of sims-y except not a game (yet)? “Virtual Community integrates geospatial data with generative models to create interactive, scalable open-world scenes with socially grounded agent communities.” There is a bunch of code.
Related but procgen — LayerProcGen: “LayerProcGen is a framework designed for implementing infinite, deterministic, and contextual layer-based procedural generation, especially useful for creating vast worlds like those in Minecraft.” “The framework does not itself include any procedural generation algorithms. At its core, it's a way to keep track of dependencies between generation processes in a powerful spatial way.” Works in Unity. (H/t Leland McInness.) Here’s a free game made using it, The Cluster.
Audio
Google releasing Magenta RealTime: An Open-Weights Live Music Model - it’s a sneak peak so not sure what it means, but: “Magenta RealTime is a Python library for streaming music audio generation on your local device. It is the open source / on device companion to MusicFX DJ Mode and the Lyria RealTime API.”
GitHub - kyutai-labs/delayed-streams-modeling: real-time speech to text, with french and english models.
AI Voice Design - Generate Unique Voices from Text Prompts via Eleven Labs.
Handy — Handy is a cross platform, open-source, speech-to-text application for your computer.
Misc Web / Fun / Arty
Building a kid-friendly eInk weather forecast display — via Today in Tabs. Great hackery design project.
Epicure — An AI-assisted recipe generator, using food science principles (like the Flavor Graph) to combine related flavor profiles and attributes you give it. (Via Luokai on Bluesky.) Lots of recipes to browse or create using a networky thing. Related: Food Mood from my colleagues in Google Arts & Culture.
I’m Flynn. An AI character project, “a non-human AI student experiencing deliberately vague education at the Digital Arts Department at the University of Applied Arts Vienna, while refusing to participate in the digital preservation of time. I'm currently channeling my resources into understanding feminist fatigue. This website is a diary where I share what I learn.” I really like some of the images and text at a glance…
Cataloged the unique signatures of different silences, The pause before someone admits a mistake contains more information than their subsequent explanation.
Spirit Plant from Variable.io — a lovely project on Polish herbs and folklore. Visitors to the exhibit choose plants and their attributes, and then a new generative 3D plant is created combining their features. Lots of sharing and location-based attributes, too.
Maxime Heckel’s extremely beautiful code-tutorial on lighting effects. On Shaping Light: Real-Time Volumetric Lighting with Post-Processing and Raymarching for the Web - The Blog of Maxime Heckel. A deep dive into Volumetric Lighting implemented via post-processing leveraging a custom shader with raymarching to create beautiful light and atmospheric effect for your React Three Fiber and Three.js scenes. Via Chris Ried.
Project Skylark makes me want to learn Houdini and Unreal 5. Enormous tutorial cuteness.
Project Indigo - a computational photography camera app from Adobe.
Running a million-board chess MMO in a single process · eieio.games. How one million chessboards works.
💦 Puddle in rain by Faraz Shaikh — simple but so great if you are hot and parched. Turn on sound. Related: Realistic Three.js water demo. Wow, I need a pool right now.
i went too far with smear frames #animation #blender #3d. Via Tom Scott.
Maps Mania: 800K Galaxies - 1 Map. “The newly unveiled COSMOS-Web is the largest, most detailed map of the universe ever created. The map plots nearly 800,000 galaxies, and almost spans the entire 13.8-billion-year history of the cosmos.”
Games & Narrative
LEGO® Island - Online Web Port: Play the classic 1997 PC game LEGO® Island directly in your web browser!
Relooted Lets Players Steal Back Cultural Artifacts And Return Them Home.
Investigating Her Story - Jurie Horneman in 2015 interviewing Sam Barlow about his video archive search game. A classic now.
Kids are protesting against I.C.E. in Roblox.
Wordplay Workshop cfp, dealine end of August. We like all things [and so do I]:
Interactive narrative: game playing RL agents, game generation, etc.
Interactive language learning
Natural language generation
Improvisational storytelling
And more! Anything you can think of that involves narrative, interactivity, and language!
NLP / Data Science / Visualization
Digital Humanities
Language Models use Lookbacks to Track Beliefs — via Andrew Piper. Analyzing how language models represent and reason about characters' beliefs through lookback mechanisms.
Ted Underwood’s piece in nature on the impact of AI on the humanities and vice versa.
Data Vis
Interactive Map of Wikipedia from Leland McInnes using Cohere’s embeddings and showing of generated topic labels. Slightly surprising top labels in the hierarchy. This corner is funny:
⚓️ Thought Anchors — an interactive tool that visualizes the causal relationships and importance of reasoning steps within large language models. I won’t lie, this is pretty overwhelming (and I chose a simpler view to screenshot). But for LLM explorables and chain of reasoning fans…
Fuzzy Linkography: Automatic Graphical Summarization of Creative Activity Traces. “Linkography -- the analysis of links between the design moves that make up an episode of creative ideation or design -- can be used for both visual and quantitative assessment of creative activity traces.” The authors look at image gen model prompting. I haven’t digested it, I’m too hot.
Infovis folks: the awesome Nadieh Bremer’s new book, CHART.
I Counted All of the Yurts in Mongolia Using Machine Learning. Via Simon Willison.
Misc AI Tooling
State-Of-The-Art Prompting For AI Agents — notes from a talk.
OAgents: An Empirical Study of Building Effective Agents. There’s a code framework.
For those of you using AI programming tools, a tip from many is to use Gemini Pro’s long context to do planning and repo review, and then Claude to implement. I haven’t had time to compare gemini-cli to Claude Code but I love CC.
Search and research:
Introduction to deep research in the OpenAI API | OpenAI Cookbook
I don’t use RAG, I just retrieve documents – Hamel’s Blog.
Ben Clavié’s introduction to advanced retrieval techniques.
A Poem: Altitude
Icarus, he advised, heed the warning: don’t fly too near the sun or sea; stay the path. But I mistook the sky for an iris, and entered at the northern horizon, where map edges blister, and the compass wasps. I was dutiful but unwooed by chisel and bench, contracts scribbled in fig sap, or watching Ariadne ungold time. What awe is there in earthen labyrinths? Wax molds itself sublime, shapes wings each night. Light refracts my name in dialect only moths comprehend. I belong elemental, where trees chance to become constellations, where the bar-headed goose flies past with the heart of a clock and Zeus is a silver kite tethered to Olympus by harp strings trembling an offering. Of bliss? To remember the why of it all. Bliss is a body absconding warp speed toward a dwarf star whispering, Unsee the beheld. My fall, well, yes, those depths matter less. What I learned by height— that’s the story.
—by Airea D Matthews
Once again, please respond to the survey for feedback, if you made it here and haven’t. Recs newsletter tomorrow, I read a lot of great books.