Our First Dive into AI Video – A Frank Look

20 Jun

At Carse & Waterman, we love exploring new ways to tell stories. So, when a client project specifically asked for AI-generated video content, we saw an opportunity for some R&D. This wasn't our usual delivery choice; it was more about seeing what was really possible with AI video, beyond all the buzz. We were clear with the client from the start: we weren't sure if this project was fully doable with current AI, and we asked for a week to test it out. The goal? Create a series of convincing (but fake) news broadcasts about various incidents in the UK.

What we found was pretty eye-opening.

We started with a broadcast about severe flooding in London. We mainly used Runway for video generation, spending £180 on tokens. We also used Midjourney to generate initial images that would be used as references to be turned into video, and Eleven Labs for AI voices. In 3 days, we put together about 4 minutes of video.

1. The News Anchor: So Close, Yet So Off

Making a news anchor in a studio seemed easy enough. Creating their AI voice was simple, and syncing it with the visuals went smoothly. The AI's lip-syncing was impressive, really making the face look alive with detailed mouth movements.

But there was a catch. The anchor lacked any subtle shoulder or body movement, making them look a bit stiff and, well, uncanny. It was almost right, but clearly not quite human. Plus, controlling what showed up on the screen behind the anchor was a constant struggle. Random images often popped up and ruined otherwise good shots.

2. AI's Geography Problem: London Gone Wild

Then we moved to the London flood clips, and here’s where we learned our first big lesson: AI is terrible with geography. It has no idea where buildings should be. It might know names like "Big Ben" or "Houses of Parliament," but it often puts them in the wrong place or mixes them up.

Take this prompt for example:

Prompt: “Big Ben surrounded by floodwater and debris, half-submerged, storm sky, dramatic cinematic lighting, broken lampposts, floating debris, intense apocalyptic tone –ar 16:9 –v 6 –q 2”

The result? Westminster looked generally like Westminster, but Big Ben was just slapped on top of it, and some mystery tower appeared in the background. Every time we generated an image, it was full of these weird mistakes. We just had to keep generating until we got something just close enough.

Another try:

Prompt: “Rescue boats speeding through flooded Westminster, waterlogged buses, floating cars, people being pulled from rooftops, emergency lights flickering in rain, cinematic storm chaos –ar 16:9 –v 6 –q 2

Here, we got a "totally random Parliament building" and the next image oddly, two London Eyes for no reason. Add to that chaotic lighting, glitches ruining about a third of generations, and other weird elements like floating objects or misplaced fire, and you get the picture.

Prompt: Panicked Londoners fleeing through flooded streets, carrying children and bags, cinematic news overlay with ‘LIVE’ and news ticker: ‘THAMES BARRIER FAILS: LONDON FLOODS’, stormy sky, intense realism –ar 16:9 –v 6 –q 2

3. The Shanty Town Problem: Uncomfortable Biases

For our second scene, we needed a shanty town in London. Creating the town itself was fine since it wasn't tied to specific landmarks. The big problem started when we added people.

The AI seemed to link "shanty town" directly with non-white residents. No matter how much we prompted for diversity, the AI couldn't break this link. We spent over an hour trying to get a diverse group, but the only time we got white residents, they looked like something out of a horror film. This deep-seated bias made the whole segment unusable. Runway simply couldn't generate white people in a shanty town setting.

Prompt: Close-up of a makeshift dwelling in a temporary settlement, with Caucasian residents dressed in worn, practical clothing. Emphasize a gritty, realistic documentary tone. –ar 16:9 –v 6 –q 2

4. Tricky Topics: Kids and Combat

Our third scene, about malnourished children facing rationing, quickly hit a wall. For various reasons, generating these images seemed to violate Runway's terms of service. The AI just wasn't willing to create images of children in distressing situations, so we had to drop this part.

For scene four, focusing on riots and fire, the AI created some impressive static images of riots. But when we tried to make riot police and rioters actually fight, it wouldn't happen. The AI would animate the static image, but in calm, passive ways. Fire from Molotov cocktails behaved in totally unrealistic ways. We finished this short segment, but had to remove the Molotovs, and the police and rioters never actually clashed.

Overall, this was a real learning experience. AI's ability to create incredible images and video is clear. If we were making unknown, fantasy worlds without needing specific backgrounds, it would have been much easier. But since we needed accurate London scenes, the AI struggled.

For our real-world work, this AI approach felt too much like luck and chance. We did try using London photos as references in Midjourney, but it didn't seem to make a huge difference, though we didn't do extensive testing on that.

It's clear AI is fantastic for concept art and developing original ideas. Feeding it your own images to explore varieties, create new worlds, backgrounds, and characters – that's where AI truly shines right now.

While this trial was a fascinating dive into AI video, for now, we'll stick to crafting our high-end, controllable animation for clients in the Midlands and Staffordshire the traditional way. The technology has immense potential, but for the precision and quality we deliver, human creativity and traditional methods still provide the reliable results our clients expect.

Daniel Waterman

Our First Dive into AI Video – A Frank Look

The Human Heart of Marketing in an AI-Powered World

Beyond Borders: How Animation Unlocks a World of Untapped Audiences (and Why India is Calling)