Saturday, January 16, 2021

Computer AI Comics

There's been an ever-increasing influx of AI doing increasingly complex formulas to recognize the imagery and language that many of us intuit naturally.  But there's still one area that computers are sorely lacking in, and that would be the realm of storytelling.  Sure, there's the humourous botnik website showing mock algorithms displaying a kind of warped dream logic storytelling that showcases programs almost but not quite getting it.  Showing classics such as a Star Trek Next Gen script, or a lost Harry Potter chapter where all kinds of weird stuff happens, or Keaton Patti's claim of forcing a bot to watch 1000 hours of media and create a script based on what they read, then showing us the first page.

This has led to the amusing creation of turning human-made computer-prompted scripts into actual comics.  A kind of recursive creativity.  But this practice is far more different than what actual computers do.  See, predictive scripts while useful in spurts, has limited practical uses.  It can't plan ahead for long-term storytelling, such as inserting a Checkov's gun to be applied later.

When we look at a comic, there's more than one way to do so.  Some people look at the text first, before moving onto the pictures.  Some look at the pictures first before looking at the words.  And some skip reams of redundant dialogue and narrative, and skip to the end entirely to see how it all turns out, then goes back to see how it got to that point.  Those plans work fine for humans, but for machines, their thought process is much different.  Instead of looking at an individual panel and wrestling meaning from it and making connections from one panel to the next, the computer has no idea what's going to happen next.

To parse the difference between words and pictures, computers would have to take in two disparate elements of fiction at the same time.  A way to bypass this overwhelming influx of information would be that rather than analyze every page on a panel-to-panel basis would be to absorb the entire page at once.  This would be equivalent to devouring the entirety of a novel in one go.  There’s no sense of pacing, of slowing down or speeding up individual panels for personal dramatic effect.  There’s no wandering eye, trailing from the text, to the images, (or images to text) and back.  It’s akin to speed-reading on a quantum level.

Storytelling is basically telling a puzzle, making due with limited information gradually revealed over time.  When the story is finished, we may mull over it, trying to figure out what made parts of it work, and others not.  What possible reason could a machine have for going over the tropes of a story?  Other than trying to parse obvious connections that crop up over time.  (See TvTropes)  For a machine to appreciate comics, they’d have to understand narrative.  And the best way to do that would be to create works that speaks to their tastes.

I’m sure you can see the logical fallacy here.

In order to create a comic that would appeal to machines, we’d have to devise an algorithm that would work along their preferences.  And what appeals to machines may very well be indecipherable to human taste.  Here's an example of a comic created by an AI program:

As creative entries go, it's certainly different. The poetic nonsensical prose reminds me of Josei narratives. Still, the lack of clear cohesive storytelling and abstract art makes telling what's going on near impossible. What's the comic even about?? That's the kind of maddening question that's going to plague linguistic comic schools.

The AI also has trouble with exaggeration of facial expressions, even though a human could easily tell when someone’s deforming their body under extreme stress or for comedic reasons.  Then there’s psychedelic backgrounds to heighten emotion or using unique pay layout to showcase optical illusions.  All various ways of using the medium to convey a message.

What we’re seeing is like a child’s first attempt at making their first comic without really understanding what they’re creating.  Of course it’s going to be a narrative mess!

Before machines can conduct stories of their own, they need to understand how stories work.  And the best way to encourage them is by giving them a basic setup, then have them try to figure out what happens next.  The whole appeal of cliffhangers is that the audience is filled with anticipation for what may or may not happen in the next installment, which can lead to runaway imaginations that can sometimes be better than the actual product.

You know how in movies, they show the T-800 using their infrared sight, getting data helpfully analyzed for the viewer?  Chances are a smarter computer would multitask, inputting thousands of pertinent and irrelevant information all at once.

It’s the difference between seeing a movie play out, and reading the wikipedia summary.  The webpage could be taken in as a whole, but wouldn’t have the same impact as the film, despite both sources containing the same information.

Computers have enough difficulty deciphering a simple image all on its own.  How’re they supposed to figure out how two seemingly meaningless panels individually connect?

You’d think the randomized Garfields would be a logical parameter for working with, but the Garfields that've been popularized online only work best when given limited parameters.  When given only three panels to choose from, the alignment is completely randomized, with no thought given to whether the resulting mishmash is funny or not.  No actual creativity is applied there.

The Garfield computer-generated remix results in panels where the trio of the cast, Jon, Odie and Garfield morph into each other, the background and the text winds up being backwards and nonsensical with no meaning attached to any of the made-up panels present.  (Check the link to see the animated sequence in action)  In that light, that’s an *actual* presentation of a truly randomized Garfield.

I posted an entry that the best way for a computer to learn would be through constant repetition of repetitive layouts with different captions, so they could figure out the pattern for setup and punchlines.  The iteration of Wizard of Id and BC strips would be a better factor in figuring out patterns than anything, since the use of setup, timing and punchline would give computers a way to figure out the formula much sooner.

I think comics is still something solely dominant in the human domain... for now. (Randomized Garfields notwitstanding) The computer-generated Manga page I posted earlier may seem impressive at first, but it was actually half of a page.  Even if you saw the other half, you'd hardly notice.  Do the two sides narratively connect?  Look at the repetitive panel layout! Apart from the speech bubbles bleeding over the borders, it's static, but boring.  This?  This is the computer equivalent of doodling.  Free-thought association without coherent goals, letting ideas flow from one subject to another.

For computers to understand a narrative, it would be helpful to see them adapting another work of fiction to their interpretation.  Then we could see how much they put of their own take, depending on what they add and/or subtract from the narrative.  Though I would suggest starting with smaller simpler stories before moving onto more complex works such as Frankenstein.  But even the simplest Fairy Tales have a kind of internalized dream logic that has an inherent morality that would escape their mindset.

At least there's SOMETHING we puny humans are better at than computers!  Until a machine can comprehend the elaborate innovative storytelling of Cerebus, we're a useful source of amusement.

"I HAVE FOUND MY CALLING."

"Don't give up your day job."