From Using AI to Leading It: My TTS Workflow Revolution

How I solved YouTube verification issues, used AI to direct other AI for better text-to-speech output, and created a workflow that transforms content creation using SSML and smart prompting.

Preberi v slovenščini

Hey there!

last week, I left you on a cliffhanger. I’d just discovered how impressive Google’s Text-to-Speech (TTS) was but hit a snag with YouTube. This week, I’m back with an update, a solution, and a workflow that will change how you think about creating audio content.

The First Hurdle: Verifying the YouTube Channel

As promised, I wanted to share the audio I generated. The first step was to get my new YouTube channel, Ideas Universe, ready for action. But when I tried to upload the audio file (as a video), I was blocked from uploading anything longer.

I needed to verify my channel with a phone number. Simple enough, right? Wrong. I kept getting a weird error. After some trial and error, I found the culprit: my phone number format.

Pro Tip: If you ever need to verify a YouTube channel, make sure to enter your phone number with a + and the country code (e.g., +38631123456). Using 00 instead of the + sign will result in an error. It’s a small detail, but it saved me a ton of frustration.

From “Using” AI to “Leading” It

The initial audio from Google’s AI Studio was good, but I knew it could be better. I started by manually adding instructions into the text, like [pause here] or [speak this part louder]. This worked, but I couldn’t imagine doing it with longer texts.

I thought, “There has to be a better way.”

What if I could use AI to direct the other AI?

The key to controlling TTS output is something called Speech Synthesis Markup Language (SSML). It’s a set of tags you can wrap around your text to control everything from pauses and emphasis to pitch and volume. Manually writing SSML is time-consuming and feels like coding.

So, here’s the workflow I used:

  1. Find the Rulebook: I knew that high-end TTS services have detailed guides on how to prompt their models for the best results. I found a ​great prompting guide​ from ElevenLabs, known for its detailed instructions on achieving emotional and expressive range.
  2. Teach the AI: I opened up ChatGPT, pasted the link to the ElevenLabs documentation, and gave it a simple instruction: “Learn this.”
  3. Delegate the Task: I then gave ChatGPT a small part of my text and a new prompt: “Now, apply what you learned to my text.”

Here’s the before and after:

Input in ChatGPT

The AI-generated “directors cut”:

[serious tone] This is the story of how I stopped “using” AI… and started leading it. [a pause, tone shifts to conversational curiosity] A few months ago, a friend called me, completely frustrated. [imitating a frustrated tone, slightly louder] “This AI stuff is garbage,” he said. [sighs]

The result was better. The AI had perfectly captured the narrative flow and emotional shifts, adding nuanced directions that would have taken me ages to write manually. It translated my intent into precise instructions for the TTS model.

The Final Result: Hear it for Yourself

I took the text from ​this LinkedIn newsletter​, generated the audio with Gemini, and uploaded it to my newly verified YouTube channel.

You can listen to the final version here:

Is ElevenLabs More Powerful? Yes, But…

Of course, a premium tool like ElevenLabs is incredibly powerful. It offers advanced features like professional voice cloning, a library of sound effects, and sophisticated editing tools that let you create multi-character audiobooks and automatically dub videos. You can create hyper-realistic clones of your own voice with enough data.

But here’s my point: for my specific need to quickly and efficiently turn a newsletter into a great-sounding audio version, Gemini was the perfect fit. The voice sounded more natural for my content, the process was seamless, and let’s not forget the best part: it’s free.

This journey taught me a valuable lesson. The real power of AI isn’t just in using the tools but in orchestrating them. It’s about finding clever ways to make different AIs work together to automate the boring parts and amplify your creativity.

What tedious task in your content creation process do you wish you could automate? Hit reply and let me know!

Talk soon, Primož

Email me to book your free 15-min AI strategy call.

Book a Free Call