Voiceover

The .voiceover() stage sends subtitle text to a TTS provider and generates audio narration timed to match the video. The resulting audio is mixed into the final output.

Basic usage

import { Recast, OpenAIProvider } from 'playwright-recast'

await Recast
  .from('./traces')
  .parse()
  .subtitlesFromSrt('./narration.srt')
  .voiceover(OpenAIProvider({ voice: 'nova' }))
  .render({ format: 'mp4' })
  .toFile('demo.mp4')

Provider concept

Voiceover requires a TTS provider. playwright-recast ships with three built-in providers:

OpenAI TTS

import { OpenAIProvider } from 'playwright-recast/providers/openai'

.voiceover(OpenAIProvider({
  voice: 'nova',             // alloy, echo, fable, onyx, nova, shimmer
  model: 'gpt-4o-mini-tts',
  speed: 1.2,
  instructions: 'Calm, professional demo narration.',
}))

Requires OPENAI_API_KEY environment variable or apiKey option.

ElevenLabs

import { ElevenLabsProvider } from 'playwright-recast/providers/elevenlabs'

.voiceover(ElevenLabsProvider({
  voiceId: 'onwK4e9ZLuTAKqWW03F9',   // Daniel
  modelId: 'eleven_multilingual_v2',
  languageCode: 'cs',                  // Force language (ISO 639-1)
}))

Requires ELEVENLABS_API_KEY environment variable or apiKey option.

Amazon Polly

import { PollyProvider } from 'playwright-recast/providers/polly'

.voiceover(PollyProvider({
  region: 'us-east-1',
  voice: 'Joanna',          // Matthew, Ruth, Stephen, Ivy, …
  engine: 'neural',         // standard | neural | long-form | generative
}))

Credentials resolve via the AWS SDK default chain — env vars, shared config, or an IAM role on EC2/ECS/Fargate/Lambda.

See the Providers section for full provider documentation.

Each subtitle entry generates a separate TTS audio clip. Clips are placed at the subtitle's start time with silence padding between them. When TTS audio is shorter than the subtitle duration, the remaining time is silence. When TTS audio would run longer, the speed processor can fast-forward to accommodate.

Two kinds of hold can stretch the timeline, and both extend the audio, subtitles, and video together so narration never drifts out of sync:

Audio-overflow freezes — when a narration's audio is longer than its visual window (often at a waitForNarration() boundary), the renderer holds the last frame until the line finishes.
Click approach holds — the click() helper freezes the painted frame so the cursor can glide in (approachMs). When voiceover is present these holds are folded into the same audio/subtitle shift, so spoken lines after a held click stay aligned.

Text processing

For best TTS results, add Text Processing before voiceover:

.subtitlesFromSrt('./narration.srt')
.textProcessing({ builtins: true })
.voiceover(OpenAIProvider({ voice: 'nova' }))

Text processing cleans smart quotes, em dashes, and other typographic characters that cause artifacts in TTS output, without affecting the displayed subtitle text.

CLI equivalent

# OpenAI
npx playwright-recast -i ./traces --srt narration.srt --provider openai --voice nova

# ElevenLabs
npx playwright-recast -i ./traces --srt narration.srt --provider elevenlabs --voice onwK4e9ZLuTAKqWW03F9

# Amazon Polly
npx playwright-recast -i ./traces --srt narration.srt --provider polly --voice Joanna

Tips

Voiceover requires subtitles. Add a subtitle stage (.subtitlesFromSrt(), .subtitlesFromTrace(), or .subtitles()) before .voiceover().
If using text processing, place it between subtitles and voiceover in the pipeline.
TTS providers require network access. API calls are made for each subtitle entry.
Combine with Background Music for a professional result — music auto-ducks during voiceover.