THIS. IS. TENOCHTITLAN!

I’ve been neck deep in Pytorch lessons and I don’t have a whole lot to share yet. I’ve had to review the math behind vector and matrix operations as well so that’s all keeping me busy. However, I did want to continue exploring and playing around as well.
I always loved to read but for whatever reason I just could not absorb the information I needed to learn from history books in elementary school. I hated history mostly because I just didn’t understand what a paragraph of text was telling me or why it was important. It wasn’t until I had a class with Mr. Goldthorpe in high school that I was finally able to appreciate history. The reason for this was that he was a really good storyteller. He would set the scene and describe what life was like, he would present the world events in terms of the characters involved and we would follow them along with a good understanding of why things happened because we understood their motivations.
Now Mr. Goldthorpe had a special talent but it did get me thinking about how history was presented to me and what could have made things easier in the absence of Mr. Golthorpe’s genius. So I figured I’d try adding a visual component to a description of a historical event.
My thinking is, I’m going to take a paragraph from wikipedia, get a comic-book script generated, have illustrations generated and then add text for captions and speech bubbles. I’m sure we’re at a place where all of that is achievable, right? I’m never going to claim that any of this will replace the art a writer and comic book artist can achieve, but it might be able to graphically represent something so the information is easier to absorb.
So here’s the paragraph I plucked out of wikipedia:
Following an earlier expedition to Yucatán led by Juan de Grijalva in 1518, Spanish conquistador Hernándo Cortés led an expedition (entrada) to Mexico. The next year, Cortés and his retinue set sail for Mexico. The Spanish campaign against the Aztec Empire had its final victory on 13 August 1521, when a coalition army of Spanish forces and native Tlaxcalan warriors led by Cortés and Xicotencatl the Younger captured the emperor Cuauhtémoc and Tenochtitlan, the capital of the Aztec Empire. The fall of Tenochtitlan marks the beginning of Spanish rule in central Mexico, and they established their capital of Mexico City on the ruins of Tenochtitlan.
I’m already seeing the problems with this endeavor but let’s just power through and see what we get. I put the GPT 4 script generated and the matching Dall-E illustrations below:
Panel 1
Scene Description: A stormy sea with Hernando Cortés’s ships approaching the Mexican coast ominously.
Text Box: “In 1519, Hernando Cortés, a man driven by greed and ambition, embarked on a perilous journey.”
Cortés Speech Bubble: “Mexico will bow before us!”

Ok, clearly some issues here but the information seems to be clear enough. There’s a sly guy in Spanish garb who sailed across the sea. Nevermind the snake-birds and the implications of the ships overflowing with people. Moving on..
Panel 2:
Scene Description: Cortés and his men, armed and imposing, land on the shores of Mexico, facing the lush jungles with determination.
Text Box: “Landing on foreign shores, Cortés eyed the rich lands with a conqueror’s gaze.”
Cortés Speech Bubble: “This land and its treasures shall be mine!”

Here we go! Now the problems begin. People are dressed from a different century with all kinds of different sombreros, many of which have feathers. We’re looking down on the ships landing and a crowd approaching. There are cathedrals and huts. The message still could be “Cortes arrives” but now the context is all wrong. Continuing…
Panel 3:
Scene Description: A tense meeting between Cortés, his men, and wary Tlaxcalan warriors.
Text Box: “He sought allies among the Tlaxcalans, not for friendship, but for power against the Aztecs.”
Cortés Speech Bubble: “Join me, and share in the spoils of victory.”
Tlaxcalan Leader Speech Bubble: “We join, but with watchful eyes.”

Holey moley! Tlaxcaltec were ripped! They were also spartans and nords apparently. Does this still read? Yeah, but things are getting muddled. Are these the same guys as the ones on the hill? Who is talking to the Tlaxcaltec? Is one of them Xicotencatl? Let’s see what happens next…
Panel 4:
Scene Description: A dramatic battle scene with Cortés and his coalition forces attacking the Aztec warriors with advanced weapons.
Text Box: “With his new allies, Cortés unleashed war upon the Aztec Empire.”
Cortés Speech Bubble: “Forward! Let nothing stand in our path!”
Aztec Warrior Speech Bubble: “Defend our home! Protect Tenochtitlan!”

Badass! Ok now we’re talking! This reads as a battle, so message received. Problem is, everything else! Gotta love the fantasy horses, the firearms, the Spartan/Tlaxcaltec warrior garb, the pyramid palace haunted by Dementors. I want to watch this movie! Cortes is nowhere to be found. Typical. It’s also really hard to tell who is fighting who. I’m guessing poster-boy is Xicotencatl? What happens next?
Panel 5:
Scene Description: The capture of Tenochtitlan, with Cortés standing victoriously as the city burns and the emperor Cuauhtémoc is captured.
Text Box: “The once-mighty Tenochtitlan fell, its people and their emperor defeated.”
Cortés Speech Bubble: “Tenochtitlan is mine!”
Cuauhtémoc Speech Bubble: “You may capture our city, but never our spirit.”

I’m guessing that huge frustrated humonculous is Cuahutemoc? The message of destruction is clear, but the characters and events are not. Also the architecture is an interesting blend. Last one…
Panel 6:
Scene Description: The ruins of Tenochtitlan with Cortés overseeing the construction of Mexico City, indifferent to the suffering around him.
Text Box: “On the ruins of a great civilization, Cortés built Mexico City, a symbol of his conquest.”
Cortés Speech Bubble: “A new empire rises—on my terms.”
Child’s Voice (off-panel): “And so, a dark chapter in history was written.”
Closing Box: “The fall of Tenochtitlan marked the painful birth of Spanish rule in Mexico, forever altering the lives of its people

I am so glad this illustration exists now and it is bonkers! I love the style which seems to blend lithograph with mural art. It has it all, pyramids, colonial palaces, modern cities, actively crumbling ruins, airplanes, flying skulls, calaca citizens, exploding vegetables…Not the goal, but there is something so delightful to me about the absurdity of it all.
Anyway, this was an informal and whimsical exploration. This was not meant to be a proof of concept or a test to see how viable this tool could be. What it does do is point out the challenges present in doing a straight-up text-to-comic tool which include and are definitely not limited to:
- Art style – I don’t expect the output to be particularly creative. In fact, I don’t want it to be. We just need a visual cue. However, a big part of the appeal of the explosion of image-generating AI is for people to say “look how easy it is to make stuff that looks just as good as any professional artist”. As a result, you’re going to get these models trained on what we deem to be “quintessential comic illustrations” to produce those results. I feel like we need to create an informational model that is trained exclusively on airline safety brochures to get something that just reads clearly vs. whatever we’re getting in the delightfully random images above.
- Cultural depictions – It is dangerous business trying to depict anyone of any culture and from a different time-period. You’re going to get a lot of misconceptions and mis-matched results. It’s interesting to see the cracks like when we see a heroic Tlacaltec since “heroic” is evidently trained on a bunch of norse buff-boys, so we just got them dressed up in Aztec warrior garb and called it a day.
- Time and Place – Similar to the depiction of people is the issue of showing the place and time. I think this is less “problematic” in the sense that it won’t necessarily offend as much, but it is a big deal when you’re trying to teach history and there are modern firearms and airplanes in the illustrations. Communication about the event can happen here but you’re also doing the opposite of educating if you are presenting false or misleading illustrations.
- Consitent characters – I know there are techniques to keeping characters consistent. It seems pretty important to do that here. Even more important is that the prompts did not emphasize the characters as much so we lost them pretty quickly. I think that having us follow Hernan, seeing Xicotencatl reluctantly join him and then see Cuahutemoc defeated are all at the heart of this story and we did not follow their journey at all.
- Speech Bubbles – I don’t have a way to do this yet but by now I’ve seen how Dall-E handles text so that would have to be an entirely separate process. Placing the caption boxes and the bubbles could be trained, I suppose. But the text would have to be another process altogether, I would think.
So there is a long way to go on this. It was still a lot of fun to attempt it.