TTS Providers

Amazon Polly

Generate voiceover narration using Amazon Polly text-to-speech.

Setup

The Polly provider uses the @aws-sdk/client-polly package as a peer dependency. Install it alongside playwright-recast:

npm install @aws-sdk/client-polly

Credentials are resolved via the AWS SDK default chain — env vars, ~/.aws/credentials, SSO, or an IAM role attached to your EC2/ECS/Fargate/Lambda compute. No explicit keys are required when running on AWS infra.

# Local development (one option among many)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

Usage

import { Recast } from 'playwright-recast'
import { PollyProvider } from 'playwright-recast/providers/polly'

await Recast
  .from('./traces')
  .parse()
  .subtitlesFromSrt('./narration.srt')
  .voiceover(PollyProvider({
    region: 'us-east-1',
    voice: 'Joanna',
    engine: 'neural',
  }))
  .render({ format: 'mp4' })
  .toFile('demo.mp4')

Configuration options

OptionTypeDefaultDescription
regionstringprocess.env.AWS_REGION ?? 'us-east-1'AWS region
voicestring'Joanna'Polly voice ID (e.g. Matthew, Ruth, Stephen, Ivy)
engine'standard' | 'neural' | 'long-form' | 'generative''neural'Polly synthesis engine
sampleRate'8000' | '16000' | '22050' | '24000''24000'Output sample rate (Hz)
languageCodestringundefinedOverride language for bilingual voices (e.g. 'en-US', 'cs-CZ')
textType'text' | 'ssml''text'Set to 'ssml' to pass SSML markup
accessKeyIdstringprocess.env.AWS_ACCESS_KEY_IDExplicit credential (skip the default chain)
secretAccessKeystringprocess.env.AWS_SECRET_ACCESS_KEYExplicit credential
sessionTokenstringprocess.env.AWS_SESSION_TOKENOptional session token (STS / temporary creds)

Engines

Polly offers four engines with different quality / cost tradeoffs:

EngineUse case
standardCheapest, basic concatenative synthesis. Fine for short notifications.
neuralDefault. High-quality neural voices, good for product demos.
long-formOptimized for long-form content (podcasts, narration).
generativeMost expressive, conversational tone. Higher latency and cost.

Not every voice supports every engine — see the Polly voice list for engine availability per voice.

Running on AWS

Polly is a natural fit when the rest of your pipeline runs on AWS:

  • EC2 / ECS / Fargate: attach an IAM role with the polly:SynthesizeSpeech permission. No API keys needed in the container.
  • Lambda: attach the same role to the function. Note that the full pipeline (Playwright + ffmpeg) is heavy for Lambda — Fargate is usually a better fit for the rendering step.

Minimum IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "polly:SynthesizeSpeech",
      "Resource": "*"
    }
  ]
}

CLI usage

npx playwright-recast -i ./traces --srt narration.srt --provider polly --voice Joanna
npx playwright-recast -i ./traces --srt narration.srt --provider polly --voice Matthew

Environment variables

VariableRequiredDescription
AWS_REGION / AWS_DEFAULT_REGIONRecommendedAWS region for Polly. Defaults to us-east-1.
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEYOptionalExplicit credentials. Not needed with IAM roles.
AWS_SESSION_TOKENOptionalRequired when using temporary STS credentials.
AWS_PROFILEOptionalUse a specific profile from ~/.aws/credentials.
RECAST_POLLY_ENGINEOptionalMCP default engine. Defaults to neural.

On this page