TITAA #28: Visual Poetry, Humans and Humanoids
Text2Image with Midjourney - Creative AI - NLP & Text Gen - Book Recs - A Cheese.
[A slightly unusual newsletter edition, with a longer intro article on AI art. Following are the usual links to CreativeAI/NLP/Storytelling/Books. Web permalink here.]
Last month I posted a panel of AI-generated images that had really impressed me, and said that the tech had really leaped forward; well, turns out a bunch of them had been made with the beta release of Midjourney. I joined the beta, and there’s virtually no point in making another panel of good stuff, there’s just too much work that is excellent. Let’s just talk about why MJ is fun and expressive.
What is Midjourney?
Midjourney is pre-announcement phase, so I can’t say much (and don’t know much) about them yet. Their Twitter bio calls them “a new research lab,” and they have a link you can use to apply to their beta program.
Midjourney is using diffusion models. If you are a bit technical and want to learn about the evolution from GANs to diffusion models, this video is a good overview. (Thanks @nicoptere for finding it.)
Importantly, the command to the image generation tool is SIMPLE to make, and you get GOOD RESULTS from simple prompts, and you get them FAST. This encourages play and experimentation and incidentally makes new users feel great. Fantastic on-boarding experience. Here’s what I get from the short request for “a witch’s cabin in the woods”:
The experience is easier than using a Disco Diffusion notebook on Google colab. There is an entire complicated guide to how to use the settings in Disco Diffusion (see my links section below for more info on DD and a view of the controls). Midjourney doesn’t need a complicated guide in order to get good results.
Generally Midjourney folks have done a super job with the beta community internally and externally, including asking the public on Twitter what they want visualized (I helped with the emoji prompt creations). They’re a very communal service right now, with people working together, visibly, which is something I really appreciate. You can learn a lot from watching how other people get their results.
My Experiments With Poetry
When I got access, I tried illustrating poetry lines immediately, and had some outstandingly evocative results.
Around this time, Clive Thompson posted about his results with Emily Dickinson’s “Because I could not stop for Death” using Wombo.art, which are, uh, not as nice. He fed it in couplet by couplet, and thought the results were lacking in subtlety and showed no understanding of metaphor. Here’s his results for “Because I could not stop for death, he kindly stopped for me” (which, alright, he also asked for Steampunk, which, ok):
Here’s my first shot on MJ, 2 renders from using 4 content lines, not 2, which I think was important.
Needless to say, the results are visually poetic, in my opinion. I think Emily might have liked them. I tried again on the bot in EleutherAI’s Discord run by @BoneAmputee, which uses @RiversHaveWings’s 512x512 diffusion model. This was the prompt I tried there (cargo culting other people’s prompts for most of the switches):
“.diffusion Because I could not stop for Death – He kindly stopped for me – The Carriage held but just Ourselves – And Immortality. behance, deviantart. -clip_guidance_scale 30000 -tv_scale 45000 -range_scale 25 -clamp_max 0.1 -cutn 128 -cutn_batches 2 -cutn_whole_portion .25 -cutn_bw_portion .25 -seed 1 -cut_pow .5 -h 5500 -w 5500 -skip_timesteps 0”
It’s still excellent! More work to get to it, but high quality and also poetic.
Some other lines produced great results in Midjourney, such as:
Thompson makes various low-hanging points about these models not understanding metaphor, lack of long-range consistency of theme, etc; I think these are all parameters we can work with while collaborating with these tools. It’s a process. I should note, however, that it’s a bit harder to compose exactly the scene you want, semantically (see below on “Make-A-Scene”), and that people, some animals, and body parts are tough to make coherent right now. (Don’t try “wombat.”) However, I still made fun “boring business meeting” pictures.
While I was musing over artistic degrees of freedom available, I revisited an old talk I had given about digital poetry and other AI text toys at Eyeo 2016 (slides). (Side note, that talk was actually the genesis of this newsletter. My talk slides are great, I was impressed by myself and wonder if I’ve just been sliding downhill ever since.) ANYWAY: In them, I refound Caroline Bergvall’s amazing VIA, a poem composed of the subtly differing English translations of the first lines of Dante’s Inferno. For instance, the MacKenzie (1979) translation:
Midway upon the journey of our life, I found that I had strayed into a wood So dark the right road was completely lost.
Given the aptness of the lines, I tried some in Midjourney.
The visual, like the text, are indeed different, in nuanced ways. And I haven’t even tried to control style or look in any way yet.
A Note on Being and Feeling Creative
What makes you feel creative using this (I do), or what right do I have to feel creative using this tool? People say “thank you” when you tell them their image turned out great. But indeed, they chose the prompt, they iterated to reach an image goal.
Some people are great at funny, clever, surprising prompts. It commences as a writing activity, after all. Or a selection/curation activity, in my poetry experiments. And as you can tell from existing guides to prompt writing and prompt art style experiments like this, there are techniques to getting the “right” results from these models, due to what’s currently in the training data captions.
People are sharing other “technical” tips for how to get original and interesting effects. Check the astounding results in the example links section below. Even without all the controls that are available in Disco Diffusion, there are ways to be an artist and feel ownership of your results here.
“it is human nature to get bored of things and to seek the novel.… [But if you want to be more than a dilettante], one of the skills is to learn to substitute nuance for novelty.” — Angela Duckworth (quoted in my really very good Eyeo 2016 slides)
The usual analogy for new tools like this is the one comparing photography and painting: Compared to painting, photography wasn’t initially considered an art form till the artists with the method showed what could be done with it. After all, it just captured “reality,” right? In this case, we’re working with many millions of averaged photos and paintings with a clever search algorithm; that’s a kind of reality, too. Being an artist means differentiating how you look at things, and what you choose to look at.
A few MJ fun experiments I’ve seen…
@mrdoob did a series of portraits of software developers with Midjourney’s model, and we can see the image dataset bias at work here in the load of bearded dudes, the only female-presenting one being the Ruby dev. (See this paper for notes on bias, porn, violence etc in these large labeled image datasets.) But also note the semantic confusion the model has about what, e.g., an AI developer or Python developer are. Are they robots and snakes who do development? And it’s charmingly at sea over kubernetes, like most of us.
I liked @mattdesl’s photographic walking tour of alt-London. (I assume made with MJ.)
This thread of “boring corporate meetings” in the style of various artists by @hermansaksono is quite funny. I then did a thread of “boring business meetings” in various weird places.
Bladerunner scenes made with different photographic lens/style prompt modifications, by @Biernacki. These are amazing.
Ryan Moulton’s article on depth of field is illustrated with MJ images (@moultano).
I’ve really enjoyed Helena Sarin’s red cats (e.g., here and here), which I only learned recently were made with Midjourney. Her usual distinctive style! It is possible to differentiate yourself!
@karpi, who is evidently hilarious, has visualized Paddington Bear Framed for Murder and Vladimir Putin Having Trouble Opening a Jar of Pickles. I’ve seen other funny works of theirs under development. (Thanks @grossbart.)
Martin Wattenberg used MJ to generate clip art for slides (you know, instead of using a free image site for pennies; jk, Martin). It’s a clever thread of tips on how to get what you want.
Other AI Art/Tech Links & Resources
I made a list of AI Art folks to follow on Twitter, which I am growing. Feel free to suggest people.
You can browse what the gigantic LAOIN image dataset “knows about” using this web tool built with clip and k-nearest neighbors. It’s a useful diagnosis mechanism for bad style or content results with Disco Diffusion or Midjourney. For instance, a search for “Dr. Seuss” shows that most of the content is book covers with text, and very little actual internal book art, so “in the style of Dr. Seuss” won’t help you much.
Disco Diffusion Asides
If you aren’t in the beta, probably the closest antecedent for Midjourney’s model(s) is the diffusion model trained by Katherine Crowson that is used in the popular Disco Diffusion colab notebooks (the 512x512 unconditioned fine-tuning of the ImageNet model from OpenAI, seen in many colabs as model “512x512_diffusion_uncond_finetune_008100.pt”). You can also get great results using this.
DD Tips: Things to help if you want to play with knobs: Zippy’s Disco Diffusion cheat sheet has notes on many of the technical params, but little on the text prompt format. EZ Charts is another diffusion parameter setting guide with visual results. There is a Diffusion artist studies page by Harmeet which is probably not up to date yet (but follow @sureailabs and @proximasan to see them in progress, they’ve done 700 artists, I think?). In Disco Diffusion, the best CLIP model choice is generally ViT-L/14 if you have memory for it.
There is a new Streamlit UI based colab from @multimodalart called MindsEye with Disco Diffusion and the Hypertron2 VQGAN model.
The LAOIN Discord has an actual “#colab-notebooks” channel with pinned info on cool text2image colabs out there. Also, Kia maintains this list of links. Nice, thanks to @KiaManniquin!
Scene Generation
Make-A-Scene made the rounds this week (someone’s attempt to recreate the code seems to be starting here). It’s a more procedural way to construct a scene using semantics and relations. Yes please, or at least, also? Many people have thought of generating on-demand children’s books when confronted by Midjourney — that idea is executed in a supporting video with Make A Scene here. The image quality is worse than Midjourney, but the semantic composition is sometimes better than I can achieve when I try to spell things out.
Similarly, this paper on Autoregressive Image Generation Using Residual Quantization has code that promises slightly more controlled content generation. @multimodalart tried it out and it’s heavy and slow for mortals to use, but the results look pretty good?
Yesterday I saw people experimenting with some updates to Jack0’s GLID-3, which are photorealistic and fast to generate. Using meta b’s colab (from the LAION Discord, thanks to @multimodalart!), I got this when I asked for a little red boat in a stormy ocean, showing reasonable comprehension of the content relations, although the storm wasn’t superb:
Captioning Stuff
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer, emotional and narrative text description generation from images. No code yet, but I’m excited.
I really enjoy the ballsy, saucy captions the BATbot in EleutherAI’s #art channel produces (thanks to dev by @BoneAmputee). The bot is using this, the “personality” caption model from @dzryk. E.g., for a green man I generated using the EleutherAI Discord BATbot diffusion model, the caption is shown below:
Style-Babel dataset of artistic annotations for images: “No prior dataset of fine-grained artistic style description exists.” This kind of annotation effort will improve the type of input/output relations we can use in models trained on them.
Misc Other Arty
Code for Informative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics. There’s a demo on Huggingface.
CLIP-Mesh: I mean, who doesn’t want to use text to generate 3D models from text. Soon we can. Well, actually, we can, with this CLIP-Matrix notebook from @NJetchev.
Poetry about Paintings (“ekphrasis”): A NYT deep dive on Musée de Beaux Arts, the Auden poem, and the Bruegel painting it describes. I tried some of it (the expensive delicate ship bit) in Midjourney and didn’t get great results maybe due to a Bruegel confusion. You can check out my results of trying to generate the art described by the poem in the style of the artist who painted it here.
Data Science Links
Stop Aggregating Away the Signal in Your Data, a post and deep dive by the very smart Zan Armstrong from Observable.
Tool for fetching Google Street View panoramas in js, PanomNom.js.
Max Woolf’s imgembeddings lib for CLIP should be small and fast. Lots of demo notebooks.
NLP
A tutorial on how to build a text style transfer app using Gradio.
SeeKeR is an OS text gen model from parl.ai that uses a search engine to stay up to date and reduce hallucinations.
The Embedding Comparator, by Boggust et al., a paper and tool to allow exploration of dataset embeddings for most similar and least similar. For instance, with GloVe embeddings of Wikipedia and news on the left, and Twitter on the right, the word “dick” has very different neighbors. Tom, Dick, and Harry are buddies on the newsy left, and body parts are buds on Twitter.
Related, Emblaze is a Jupyter notebook embedding comparison tool using animated scatterplots.
TripMap: another dimensionality reduction method, now with a Jax implementation. There are a couple Colabs.
Classy-classification: Use SpaCy with few shot or zero shot classification for sentences.
Concise-Concepts: SpaCy for NER with embedding neighbors to help, few-shot.
A library of R scripts for large scale text mining, from Andrew Piper.
Text Generation
An examination of decoding strategies for text generation models, Wiher et al.
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge, Tan et al.
Mix and Match: Learning-free Controllable Text Generation using Energy Language Models, by Mireshghallah et al., with code.
You.com is offering text generation too now. OpenAI with GPT-3 is offering inline content creation, i.e., paying attention to context and creating sandwiched new text (“edit and insert”). Better than single word masking generation!
“Loom” from Latitude.io (makers of AI Dungeon) is a new alpha-release GPT-J/GPT-Neo front-end that allows you to interact with the model options with a visible graph of your text. I actually found it confusing and the text quality out of the box was not great (it reproduced the end of a famous poem, then stuck in lines from other famous poems, etc). You have to pay to play with it.
Poetry gen: A very good article on using GPT-J, fine-tuned, to generate haiku, by Robert Gonsalves. “I used the GPT-J 6B model from Eluther as the basis for Deep Haiku. After quantizing the model down to 8-bits to run on a Google Colab, I fine-tuned it using the filtered Haikus as training data for ten epochs, which took 11 hours on Google Colab with a Tesla V100 GPU.” He does due diligence with toxicity and topic checking too.
RiTa: I’ve been a fan of this JS text analysis and generation lib for a while now, but just saw that there have been some cool updates and demo tutorials on Observable, by Daniel Howe. I don’t remember seeing the friendly looking RiScript for text gen before. There’s even a tutorial on text generation in Twine with RiScript.
Games/Story Links
“Annals of the Parrigues,” Emily Short’s procgen travel guide of an imaginary country, has great tips on procgen content at the end and is available here. (A reminder from Cat Manning.)
HiddenDoor, an AI-powered game tech platform, launched their site this week and was covered in Venture Beat. (Disclosure: I work there!)
This Bad Writer game looks a bit too depressing for me (but funny).
Downpour, an game-building tool for mobile from the wonderful @v21.
Papers
TaleBrush: Sketching Stories with Generative Pretrained Language Models, by Chung et al. This uses GPT-Neo plus GeDi to steer generation using narrative arcs drawn by the user. The site with video promises code soon.
Immersive Text Game and Personality Classification, Li et al, using Myers-Briggs stuff.
A paper on visual storytelling from image sequences using knowledge graphs, Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling, Li et al.
CLSEG: Contrastive Learning of Story Ending Generation by Xie et al.
Pruned Graph Neural Network for Short Story Ordering by Golestani et al.
Books
The Embroidered Book, by Kate Heartfield. (Fantasy) Alt-history of Europe in late 18th century, centered on Marie Antoinette and her sister queen Charlotte in Naples. Both have encountered and learned some magic, but they take different paths, one joining the secret society of magicians who limit its use, the other trying to free it for the world. Both want to use it to make their subjects’ lives better. But war is coming, and I felt more and more blue towards the end. It’s still very good.
The Atlas Six, by Olivie Blake. (YA Fantasy) Apparently popular via #BookTok on TikTok, it’s a great page turny read about a creepy training school for powerful magicians who will get access to the alt-dimensional hidden Library of Alexandria. One of them will not make it, but who? Lots of politicking. Cliffhanger.
The Paradox Hotel, by Rob Hart. (SF) A hotel for time travelers is having some Issues. The chief security officer is in late stages time travel illness (“unstuck”), and she keeps seeing flashes of the future and the past, including her dead lover. Plus, a dead body no one else sees. Good thriller time travel romp. You even get baby dinos.
The Rabbit Back Literature Society, by Pasi Ilmari Jääskeläinen. (Fantasy/Magical Realism) In a small Finnish town, the newest member of an exclusive literary society investigates after the society’s founder disappears in a snow storm inside her house. She starts to play “The Game,” in which the society’s members ambush one another to force each other to spill their truths. It’s full of twisted fairy tale elements— ghosts and nixies and gnomes— and long-buried memories. Compulsively fascinating.
⭐ The Employees, by Olga Ravn. (SF) This is a novella-length book of interview excerpts by HR staff with the employees on a space ship above an alien planet. Morale on the ship is not good, because a class divide has grown between the humans and humanoids (synthetic workers). Also, mysterious artifacts collected from the planet are impacting everyone psychologically. I expected to be annoyed by the lit-fic-ness of this but was utterly drawn in. It’s funny and sad and lovely. There is a lot in here about memory and the ineffable, not to mention about work; and I found myself pondering Midjourney and AI art while reading about “humanoid” musings on things they’ve never known but imagined, and the way the humans remembered earthly things in patchwork perfect snippets.
… as I climbed one of the hills and looked out over the woods, the ducks suddenly came flying in arrow formation from beyond the trees and passed over my head. They were quacking loudly as they flew, and I breathed in deeply. I stored that landscape inside me forever. The only thing I think about now is that day. The day I experienced something that wasn’t part of the program.
For length and “meh” reasons, I am skipping my month of TV and games.
Poem
You Will Notice a Faithful Block of Cheese is Present
it cannot harm you
it doesn’t even see you
or know that you’re here
it is merely a bewitched cheese
that is faithful to me
and also does some typing
and filing, organizing really
it is an extremely wonderful cheese
do not be alarmed
I am its master
Matthew Rohrer (2020) (via Matt Ogle’s newsletter)
Whew, this one was hard and long to write, and apologies if it was off-topic for what you expect usually. If you liked it, drop me a note. Back to more folklore and games and the regular recs-fest next time! Hang in there.
Best, Lynn / @arnicas