Anxious about artificial intelligence changing everything? Not us... Not much. After a year of experimentation and embracing the new, it's time to celebrate the debut of our first real-world-client-commissioned-AI film for The IVF Network which came out last month. It's a great occasion to share some of our machine learning discoveries along the way.
Over the past year, Tilt have been racing to stay ahead of the game with the rapidly evolving technologies that are springing up everywhere as AI begins to seismically changes our industry. Alongside our Tilt Talks exploring digital ethics, we’ve introduced regular internal AI Throw-downs, where team members share their experiments and help colleagues become inspired as opposed to overwhelmed. Whether it’s enhancing our proposals and storyboards with incredible Midjourney images, synthesising an AI Jimmy Stewart through D-ID and Eleven Labs trained on historic interviews, writing video games in Chat GPT and Replit, making insanely noisy audio crystal clear through AI voice isolation in Resolve, using simple AI motion capture in Rokoko for our Unreal Engine projects, experimenting with text-to-video / video-to-video tools like Gen-1 in Runway and Kaiber, making trippy music videos in Disco Diffusion, or training Stable Diffusion models through Google Colab to find new approaches to animation, it’s been an exhilarating time of experimentation for us all.
Jimmy Stewart does Dave Stewart (MidJourney, D-ID and Eleven Labs).
Clearly, AI technologies are in a Wild West phase at the moment, and Tilt have taken a very pragmatic approach of ‘let’s play with these things so we know the landscape’. Over time, a feeling of terror that we’re entering Skynet territory has been mostly replaced with the inspiration that a whole new set of incredible storytelling tools at our disposal brings. But let’s not get too carried away. The Jimmy Stewart clip is a good example of our experimenting with the technology that’s out there – demonstrating what’s possible here in a fair use journalistic sense – but understanding the legal boundaries of plagiarism as you would when it comes to any monetised project, where permission remains very much the key.
Enter The IVF Network. They asked us to create a marketing video for them on a limited budget, and we decided that now was the time to embrace our learnings and apply them to a real world live project – using AI as much as possible to see what can be achieved. Thankfully the client was game for trying something new. Infertility is a sensitive topic, and our challenge was to see if AI was capable of helping to produce something that felt both emotive and human – quite a task.
It can be difficult with animated AI to maintain a stability of image from frame-to-frame as the AI reinterprets things slightly differently every time. Where stills can be very accomplished, when you string them together, the animated sequence can be confused and unpredictable to a point that it feels like a crazy acid trip – great for a music video, but not a film about fertility. Software like Runway’s Gen-1 and Gen-2 are beginning to solve these problems, but at the time we only had access to Stable Diffusion in code form on Google Colab.
Tilt music video created by running drone footage through Stable Diffusion.
Kaiber AI vs. Gen-1.
Gen-2 text-to-video test.
We attempted to specifically train AI models so that Stable Diffusion better understands your intentions. We began by creating very specific still images of human characters in MidJourney via Discord, with clear recognisable details in the prompt – ‘straight red hair in a ponytail’, or ‘with a short manicured beard and black plastic-framed glasses’ for example.
In this way, it would be forgiving on the eye if the shots produced varied slightly. We would generate two hundred or so images for each character from various angles, choose, say, fifty that best represented each specific character and use these shots to train a model of each character through TensorFlow on Colab. In theory, you can then get Stable Diffusion to generate the same character each time, in whatever pose or setting you like, or better still, film anyone against a green screen, and transform them into specific AI animated characters from your footage. Anime Rock, Paper Scissors is a brilliant, pioneering project by Corridor Digital that does this very well.
The results were pretty good, but it’s easy to get carried away with the technology and not consider ‘is this truly production-ready and right for the project?’ Like Marvel’s controversial AI generated title sequence to Secret Invasion (described as “a discombobulating, slightly grotesque, AI-generated title sequence”) we didn’t think the technology was quite there yet for this particular client’s purpose.
We took the decision to limit our visual AI to MidJourney – treating it as a virtual illustrator and embedding the brand colours/style within the text prompts. For each still to be used in the film, we would produce multiple versions of it, and morph between these with frame-blending in After Effects to produce a purposeful animated effect – one that played into the narrative of the piece where the world around our central character is overwhelming.
Abstract MidJourney frame-blending animation tests.
While all this trial and error was going on, we were scripting with the help of ChatGPT. Where this worked best was in thinking of ChatGPT as another creative brain in the room with whom to bounce ideas. ChatGPT can very quickly condense lots of information into short scripts. Generally speaking these tend to be generic and dry, but they’re a great starting point that can spark inspiration. When you start asking things like ‘how about doing the same script, but this time…’, ‘Let’s have a point of positive turn-around in the middle where we introduce…’ or ‘How can we better build the emotion in this section…’ – that’s when ChatGPT really starts to shine.
Once we had the script in a good place, it was time to see what we could do with AI for the voiceover. We’ve been playing around with Respeecher, where you can synthesise artificial speech, using your own voice as an input. Similarly, Eleven Labs has been helping us to generate convincing dummy VOs for a while, and we wanted to see how it dealt with a production-quality VO – more specifically a sensitive, emotive voice for which you would normally require a good actor. We weren’t convinced that this would be possible, but we embarked on some general tests.
We imagined who we would like to tell the narrative if we had infinite budget. To find out what was possible, we experimented by training a model on the voice of Kate Winslet in Eleven Labs, based on two minutes of YouTube dialogue. The results were scarily close, including details such as intakes of breath – fun in an educational sense, but as with the Jimmy Stewart clip, beyond fair use of learning and critique, this would legally be shaky ground for a commercial project without express permission.
Kate Winslet trained in Eleven Labs.
For the final VO we used the generative tools that Eleven Labs offer. By playing with the Stability and Clarity settings, and downloading multiple generated takes, it was possible to splice together an extremely plausible dialogue. The rest was a matter of weaving everything together within an After Effects comp, using 2.5D planes and typography in a traditional (human hand) motion sense, along with beautiful ‘ink-wash’ scene transitions.
All-in-all, we’re very pleased with the results – the culmination of an extensive learning process over the last twelve months, that brought us our first animated AI commission for a very happy client. You can watch the finished piece below:
If you’d like to chat about how creative AI can help you, then please get in touch or sign up for our awesome new newsletter.