TITAA #52: Latent Ghosts Fiddling
Video Gen Fun - Image Gen Alts - NPC Labor - Embedding Vis - Dune Ornithopters - Game AI Reports
I’m a day late, so no intro article! I’ve already used up my time for the week. Just a ton of news, a lot has happened in AI and tool-land this past couple of weeks. We had Sora video gen articles, weird video errors on show (I love them), new image models that provide solid alternatives to Midjourney, a few interesting AR application concepts; the game section is a banger, with some great articles, a fab video on NPC workers, some “state of AI in games” reports. Then in Data Science, a bunch of tooling around visualizing embeddings and Observable framework uses, plus a bit of good LLM writing. And you get my usual load of books and a couple fun game recs, including Dune in MS Flight Sim and Assassin’s Creed Nexus in VR.
A reminder that the paid subs get the option of a separate mailing of the media recs, a bit longer and more fun (I hope). This time I cut the TV from this post and it’s only in the other post. It helps me a lot if you become a paid supporter, my time is not at all covered by my supporters right now, sigh!
TOC (links on the web version):
AI News (Video, New Image Gen, Misc AI & Text Gen)
AI Creativity News
Video Gen
Right as I hit send on the issue 2 weeks ago, OpenAI announced Sora, their video generation model (not released yet). The results look a lot better than the other tools I’ve played with, but I am also usually testing everything with illustration, not photoreal images. Sora seems to be good at both photo and illustration: Here’s a Sora illustration animation, for instance, that absolutely nothing I’ve tested would get close to in animating the actual figures:
Predictably if you read this newsletter, the creepy errors charm me the most, like the exploding basketball, the weird smoking physics, the archaeological plastic chair that warps reality. You can see the blooper examples at this YT link. Here’s a tiny extract from their creep chair example (to go with my Cthulic chair shadows in the last issue):
Check out the recent WaPo on Sora and AI video, a good article with more on the errors. But Sora is really excellent at some things — it’s a strong enough realism generator that people can make actual 3D gaussian splats off the videos (ScottieFox on tsfka Twitter with the gallery scene, and this paper/code that compares Sora’s video consistency with Runway Gen2 and Pika based on 3d reconstruction). Some folks think Unreal Engine footage was used as a training source.
In other video model news, things that are actually available:
Stable Video 1.1 is out (model and API!), and in their web app here. The results aren’t always, haha, stable? But with photoreal ones, you get soft motions like background effects, zooms, etc. I don’t think you can help it with a prompt in the image init case? I am not giving it much that is easy to work with. This was a funny collapse, not sped up:
Deforum Studio’s trippy latent generator has added image inits! Yay! (I had an example in the last newsletter.) I like the cutaway haunted houses I can make with it, using init images from Midjourney - this one is cut short to embed as a gif:
Runway Gen2 struggled a lot with these cutaway houses, making just a camera pan around, or smoke when I asked for ghosts. With their motion brush to indicate what you want to animate, I got slightly better results… this one was creepy but fun until the cadillac drove in from the right side:
Pika’s results were similar to Stable Video’s — mostly moving the camera. This is a fail mode for a lot of these tools right now, which is why the animated monsters gif above from Sora is so interesting to me.
Meanwhile, Pika video gen has also added lipsync capability if you feed it audio, much like the well-received (no code yet) Alibaba EMO project.
Finally: another practicing AI video author I like: DavidSZauder on IG, especially this Portishead shiny monsters video:
Image Gen Model News
So, if Midjourney does a deal with Musk/X, I’m out. I wasted a ton of time on volunteer moderating there back in the day, not to mention a brief attempt to work on NLP mod help. But moderation on stuff made with X users now? No longer feasible. Luckily there are a lot of alternates right now that look very good and have actual web site interfaces. (I find MJ’s alpha web UI a bit confusing and it’s—years later—still in dev?)
⭐️ The forthcoming Stable Diffusion 3 respects artists’ “do not train requests” and still looks amazing. But below are ones you can use right now.
There’s a new Ideogram.ai release, which is pretty good, honestly. It is also good at text rendering. It does an optional chat-gpt-ish “magic prompt” from your lame-ass simplistic prompt to try to get better results. I’m fascinated (and a little irked?) by some of the narrative content that is probably irrelevant to what you actually get.
Lexica.art has a new model out, which looks comparable to the others I tested and also understood some more complicated formats. It’s pretty amazing how far that model has come, here’s that same (hard) prompt then and now:
You can compare new aesthetic model Playground 2.5 with SDXL (Stable Diffusion XL model) output from anotherjesse’s app here, on the Parti prompts dataset of hard things. This is a model tuned on judgments of what’s prettier. Judgments of what’s pretty by large groups are getting as meh as instruction tuned chatbots, though. I find it overbaked and less interesting for abstract prompts, a tendency to the kitsch that is automatically associated with generic AI output now. While I walked through Jesse’s pairs, I also fell in love with SDXL at seed 0 and started to hate Playground’s 0. It’s not true that SDXL is always brown, by the way, it’s just these prompts. Notice how the abstract prompts generate very similar inspirational posters from Playground but more raw, imaginative SDXL? I may write more in my “esoteric and weird” section in 2 weeks (for paid subs).
Meanwhile, there are more performance improvements on “real-time” generation or near as. SDXL Lightning on Fastdxl.ai requires you to figure out your prompt plan in advance to get good results as you type— we need better understanding of prepositional phrases and composition (yes, an active area of research). I won’t put in the gif I made for size, but the results aren’t always fab, unfortunately, despite being almost real-time. You can also review annotherjesse’s SDXL vs. Lightning outputs on Parti here (go, seed 0!).
Trajectory Consistency Distillation is also fast and looks good too, demo on HuggingFace, but the results (while big and pretty and fast) aren’t accurate on my test prompts (witches and asteroids).
Misc AI & Text Gen
Fab concept for AR baking aid designed for Spectacles that looks genuinely good — Augmented Baking by Lauren Cason (h/t Luokai again).
Love the DJ mode that adapts in real-time for MusicFX (Google’s AI Test Kitchen app). You can add styles or instruments and it just keeps going.
I don’t know what this web pet Pogichat is or how we got here, but (via waxy):
“Rendering protein structures inside cells at the atomic level with Unreal Engine” — these videos are something else. Unreal for data vis, let’s do it. (H/t Jeremy Howard.)
“VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction” - code coming, but this is meant to be a fast creator of cityscape-style splats, which is awesome. They say “high fidelity real time rendering.” A detail from video:
Text gen poetics: Allison Parrish’s talk on Language Models as Ransom Notes and digital collage. It’s a lovely position statement, grappling with distance and citation — I might not agree with all of it, personally, but it is a good provocation.
That means that, as a computational poet, I am also a collagist. But I don’t want to make ransom notes. I would prefer to go about my collaging activities without behaving as a kidnapper might behave. I would prefer for my poems to not also be threats.
So I’m left with the task of setting forth some criteria that distinguish the kind of computational collage I want to make from the ransom notes that come out of LLMs. I wish there were an easy technical distinction between these two methods, but I don’t think there is: a Markov chain produces collage just as much as LLMs do; the difference boils down to a matter of scale.
Text gen for profit: “Inkitt, the self-publishing platform using AI to develop bestsellers, nabs $37M,” in Tech Crunch — hrm. Like Wattpad but for AI books.
Its business has already attracted 33 million users and dozens of bestsellers, the company said. The new funding that it’s raised, a Series C, will be used to expand the kind of content it produces: AI to write stories based on your original ideas and to produce versions of its fiction personalized for specific readers; a move into games and audiobooks; and more video content adapted from fiction published on its platform — video that is produced with humans today but will, eventually, also be generated using AI.
Gemini alignment: Margaret Mitchell’s good OpEd article in Time on how to think about AI alignment design and output problems wrt the ridiculous (my opinion) uproar over Gemini’s diverse popes and Nazis. (Also a thread on bluesky if you prefer.) Also see Max Read’s questioning of why the hell we care what a chatbot makes moral equivalences between and what the role of these things is, anyway. I’m not a huge Max fan but this was on point; mostly I want tools, not chat; but some people really do want to talk to AI characters, including Laurie Anderson.
Copyright: “Between Copyright and Computer Science: the Law and Ethics of Generative AI,” a paper co-authored by Mark Riedl, 63 pages.
Links: Awesome Story Generation links repo, for work on narrative gen. Lots of things I’ve had in this newsletter.
Games Related (AI-related at end)
❤️🔥 NPC Labor in Red Dead Redemption 2, and what it says about us, a fab video from Total Refusal, an Austrian Marxist video collective who make videos using footage in games. (“We upcycle video games in order to reveal the political apparatus beyond the glossy and hyperreal textures of this media.”)
Made of entirely game footage of background “workers” in RDR2, with an NPC carpenter who pounds nails into the same board over and over, laundresses who wander the streets at night and work in the rain, a street sweeper who sweeps the same board all day to no effect and then sleeps next to it… Sometimes her broom disappears, and she is out of work, idling against a wall. Those shots are very moving. Meanwhile, I sit at my computer all day, what if it disappeared? I’d stare into my phone or my TV, and it would not be moving.
It’s an eerie, poetic look at glitches, stereotyping, and the recreation of capitalism as background. Non-AI games have a lot of weirdness already in their NPCs. (H/t Weird Studies discord peeps who got this one from Erik Davis, I gather. For another good movie about RDR2 and glitches, I strongly rec The Grannies.)
TiltFive - AR for tabletop gaming. With a lot of games already. Actually, this does look fun. (H/t Luokai.)
📱 V Buckenham’s Downpour app for building little mobile games with your camera is coming out this week! You take pictures and scan them to make navigation links. It’s very cute and you can already play some games testers have made (maybe I’ll do one this weekend). Some good press from Polygon and Verge and others!
The Tragedy of Skull and Bones by JB Oger. A take on AAA games and the difficulties of building a good open-ended, multiplayer game compared to something smaller (e.g., Assassin’s Creed: Black Flag, which it is being negatively compared to and which I will go play immediately upon shipping this). There are a lot of sad things in this piece about the current landscape of layoffs, but it’s a good read from a game designer who worked on S&B.
How can we get high-quality new AAA again? I suggest trying out building new franchises through smaller & faster projects, letting the team learn and grow until they can eventually hit the jackpot. Companies that operate like this today are the winners of tomorrow. Elden Ring would never have been a massive hit without all of From Software's previous games. The same goes for Supergiant and their game Hades (not a AAA budget, I know, but definitely AAA revenues) or Baldur's Gate 3 or Helldivers 2.
Cooking in Hyrule - by Coty Craven, about a 96 year old learning Breath of the Wild and how wonderful it is. Yes! (I think h/t Cat Manning.) A lovely piece.
Polaris Game Design retreat reports for 2023 are up. There is a fun one on suspense and surprise by Grinblat et al (including online pals Max Kreminski and Cat Manning). There’s discussion of genre and conventions and mental models… and how to use them in design; and frankly a bunch of that is why this video for Kingmakers is so excellent (watch to 23 seconds if you haven’t yet). Also:
For example, in Caves of Qud a bug resulted in an inanimate table being chosen for a village pet instead of a living creature. The experience was consistent enough with the desired feeling of Caves of Qud to inspire a function that encapsulates this “most of the time to this, but rarely do that” design pattern, so that even after the bug fix, an inanimate object can be chosen as village pet 1 in 1000 times. [See also Jason’s “Before You Fix a Leak, Ask If It’s a Fountain” video.]
Can you pet it? because:
“Stroke of genius? How one developer earned over £250k from games made in 30 minutes” — in the “things that might depress you,” or make you 🤔 category. It’s animal petting games: “More than 120,000 PlayStation users have paid £3.29 to pet virtual hamsters, dogs and beavers.”
“Inside the New York Times’s Big Bet on Games” in Vanity Fair.
AI focused stuff for games…
You Play This Vampire Game By Using Your Voice (and Lying) - video by MadMorph. The game is called Suck Up and is in early access. It looks totally hilarious. Yes, it uses AI. Basically you are trying to crack the AI NPCs in the town to get them to let you inside their houses. Maybe there’s a whole genre-to-be had that really takes advantage of the weaknesses of AI agents? (H/t James Yu.)
GENIE from Google Deepmind, playable generative game levels. No, no code. But it does generate a sort of playable platformer from images as well as photos, which is kind of cool. (Note previous work on generating games from video by Matt Guzdial and Mark Riedl.) This is a screencap of an Imagen image plus a generated level frame (which tbh doesn’t look that awesome yet).
A new game-based reinforcement learning environment, Craftax, which combines Crafter with Hethack using Jax. It’s open-ended and can be used for unsupervised environment gen too.
Just one character agents paper this time: “Human Simulacra: A Step toward the Personification of Large Language Models,” on setting up their character personas.
How AI is Actually Used in the Game Industry, a report by Tommy Thompson. This covers a lot of ground on both the player view (in play) and the content development side, including machine learning vs. current generative and older symbolic AI tools. If you want a good overview, recommended. Also there’s a new book on Game AI Uncovered (vol 1!) from Routledge. And this new “Large Language Models and Games” reference paper just out today on arvix, by Gollata et al.
NLP & Data Vis & Data Science
Lots in data vis land this time! Plus some good writeups and search tools.
I think I just sneaked this in last time, but: Observable’s new static Framework tooling looks fun. You can read about using it from Bob Rudis and check out Jeff Heer’s examples using UW’s Mosaic (based on Vega) and duckdb-wasm. (More on using duckdb-wasm in browser here.)
Ian Johnson (enjalot) has launched a v1 of his latentscope, a tool to cluster and review embeddings of datasets. It’s very v1, I have a list of things I want for it, but it’s worth watching. It will ingest your data, embed, umap, cluster, and label (with an LLM) and then allow you to browse that with a visual. Lilac does some similar things, except they lack the visuals right now. They both lack the “select from the visual and label” which is supported by TNT (thisnotthat).
Meanwhile in cool embedding visuals-land, Leland McInnes updated the pretty-layout producer of DataMapPlot to add in interactivity, almost as an afterthought 😆. I’m impressed and have immediate use.
For R users, there is a new release of ggraph! It looks great.
Via Bob Rudis, Retool will let you generate an API from a CSV (up to 24MB).
Vicki Boykis did a deep dive on understanding the GGUF format used in llama.cpp.
A report from W&B on how they built their documentation help bot with RAG (retrieval augmented generation), lots of meaty tips. And yet another RAG research summary.
Search engines! A couple fun new tools…
Globe is a kind of knowledge outliner for a search topic, which is useful for a quick view of a subject. They’re slammed with traffic so it’s not optimal right now, but still useful. E.g., an outline on entity linking (my NLP bane). It embeds little images when it can (traffic etc). (H/t Luokai on threads again.)
Kagi.com is a very good search engine, actually. For images as well as text. I am using a lot of custom search tools to figure out various subjects I’m encountering in OCR projects, and this one is quite helpful at both wide looks and research/code-repo focused queries. I will pay for it. (H/t Vicki Boykis & Chris Albon.)
Book Recs
I’m shortening this up compared to what I sent in the paid subs Recs mailing yesterday, because of length.
⭐️ The Warm Hands of Ghosts, by Katherine Arden (fantasy). Terrific ghost story about WW1. A Canadian nurse who was discharged goes back to the front to look for her lost brother. Meanwhile, her brother has joined forces with a German soldier to try to survive an apocalyptic post-push landscape in which ghosts of the dead are wandering, among them a strange fiddler who runs an oasis inn and requires stories in return for shelter.
Emily Wilde’s Map of the Otherlands, by Heather Fawcett (fantasy). A fun sequel to her first. This one has a hunt for magical doors in the Austrian Alps, one of which might be a door to Wendell’s homeland. The faery folk are a great range across cute and helpful to outright terrifying. Also, there is a cat.
Nettle & Bone, by T. Kingfisher (fantasy). I quoted this for the scary puppet in my last newsletter edition. It’s a fun quest read, with cool magical women and complicated family feelings.
The Circumference of the World, by Lavie Tidhar (sf). A cryptic nested koan of a book about a magic book by a maybe crazy golden-era sf writer (who also started a cult— around the era of Jack Parsons and L. Ron Hubbard). Delia, a character in the magical pulp book and in the real world, is looking for her father, and the book. Very tangly but fun themes.
🖼 A Glancing Light, by Aaron Elkins (mystery). A good art-history mystery! Our hero curator hunts the source of fakes and stolen paintings in Bologna, including run-ins with the Sicilian mafia. Super location detail, if you read for tourism (I do).
Games Recs
Dune missions in MS Flight Simulator! Ok, it’s a promo for the movie, but what a great promo: You can run some missions in the free extension to fly an ornithopter on Arrakis. I crashed the hell out of that little guy! Landing is hard! Then I rewatched the movie to get ready for the new one, and paid a lot of attention to the flight controls. No, it did not help. There’s no free flight option, sadly, because the rest of MS Flight Sim’s locations are based on actual satellite and scanned earth data—which is why it’s so cool.
In VR-land:
💔 Assassin’s Creed Nexus VR! Only on Quest 3, which is a bummer; and Ubisoft was disappointed by sales and won’t be doing more VR, I guess. But this is (mostly) fab. It does have some weird controller problems sometimes, like with lock picking. And I was rocking along but at 3/4 of the way I hit a bug that prevents finishing a mission. I’m extremely bummed, I was having a really good time. You have 3 characters and locations: Italy (Montereggio and Venice) in 1500’s, Greece (Delos and Athens) in around 400 BC, and Revolutionary War Boston. Parkouring over tiled roofs and canals is so much fun in VR!
TV Recs
Actually cutting this here, it was a big list in the paid subs recs list yesterday, covering True Detective, Dark Winds, Delicious in Dungeon, Monsieur Spade, and others.
A Poem
The violinist recounts a fairy tale of a boy kept years with others like him in captivity. They buff the witch’s floors to the sheen of glass, gather the fine amber dust in the air to bake into bread, the dewdrops in the hearts of roses to feed her unslakable thirst. Later, trying to remember, the one bewitched says phrases over and over. But there is no one there to catch his mistakes, to help him put the pieces back together. And you, you’ve been such a good student of that epistemology, of thinking-into-being: don’t you know that spells are made of words? Remember too: not all saying is true.
From “Why appropriation is not necessarily the same as mastery” by Luisa A. Igloria
That seems like a poem Allison Parrish would like.
February is short but it felt long. Just like this edition :) Please let me know if you found it useful or interesting!
Best, Lynn (@arnicas on the sfka twitter, mastodon, and bluesky and now Threads)
Great roundup as always @lynn!
Sad news about Inkitt’s pivot to AI, but not surprising. My mom edits (used to edit?) for Galatea, which involved practically rewriting poorly written stories line by line to make them grammatical and readable. Now that work has completely dried up. There may be some work in the future for people to clean up AI-generated text for this platform, but one thing generative AI doesn’t do so much is make grammatical mistakes or ill-formed sentences. The stories she cleaned up may have been messy, but they were human. (They were also largely variations on werewolf erotica, in case anyone is wondering what secret storytelling formula Inkitt has found to keep people glued to their screens all that time.)
Wattpad could be worth a look, though!