Amazon Polly

Setup

The Polly provider uses the @aws-sdk/client-polly package as a peer dependency. Install it alongside playwright-recast:

npm install @aws-sdk/client-polly

Credentials are resolved via the AWS SDK default chain — env vars, ~/.aws/credentials, SSO, or an IAM role attached to your EC2/ECS/Fargate/Lambda compute. No explicit keys are required when running on AWS infra.

# Local development (one option among many)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

Usage

import { Recast } from 'playwright-recast'
import { PollyProvider } from 'playwright-recast/providers/polly'

await Recast
  .from('./traces')
  .parse()
  .subtitlesFromSrt('./narration.srt')
  .voiceover(PollyProvider({
    region: 'us-east-1',
    voice: 'Joanna',
    engine: 'neural',
  }))
  .render({ format: 'mp4' })
  .toFile('demo.mp4')

Configuration options

Option	Type	Default	Description
`region`	`string`	`process.env.AWS_REGION` ?? `'us-east-1'`	AWS region
`voice`	`string`	`'Joanna'`	Polly voice ID (e.g. `Matthew`, `Ruth`, `Stephen`, `Ivy`)
`engine`	`'standard' \| 'neural' \| 'long-form' \| 'generative'`	`'neural'`	Polly synthesis engine
`sampleRate`	`'8000' \| '16000' \| '22050' \| '24000'`	`'24000'`	Output sample rate (Hz)
`languageCode`	`string`	`undefined`	Override language for bilingual voices (e.g. `'en-US'`, `'cs-CZ'`)
`textType`	`'text' \| 'ssml'`	`'text'`	Set to `'ssml'` to pass SSML markup
`accessKeyId`	`string`	`process.env.AWS_ACCESS_KEY_ID`	Explicit credential (skip the default chain)
`secretAccessKey`	`string`	`process.env.AWS_SECRET_ACCESS_KEY`	Explicit credential
`sessionToken`	`string`	`process.env.AWS_SESSION_TOKEN`	Optional session token (STS / temporary creds)

Engines

Polly offers four engines with different quality / cost tradeoffs:

Engine	Use case
`standard`	Cheapest, basic concatenative synthesis. Fine for short notifications.
`neural`	Default. High-quality neural voices, good for product demos.
`long-form`	Optimized for long-form content (podcasts, narration).
`generative`	Most expressive, conversational tone. Higher latency and cost.

Not every voice supports every engine — see the Polly voice list for engine availability per voice.

Running on AWS

Polly is a natural fit when the rest of your pipeline runs on AWS:

EC2 / ECS / Fargate: attach an IAM role with the polly:SynthesizeSpeech permission. No API keys needed in the container.
Lambda: attach the same role to the function. Note that the full pipeline (Playwright + ffmpeg) is heavy for Lambda — Fargate is usually a better fit for the rendering step.

Minimum IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "polly:SynthesizeSpeech",
      "Resource": "*"
    }
  ]
}

CLI usage

npx playwright-recast -i ./traces --srt narration.srt --provider polly --voice Joanna
npx playwright-recast -i ./traces --srt narration.srt --provider polly --voice Matthew

Environment variables

Variable	Required	Description
`AWS_REGION` / `AWS_DEFAULT_REGION`	Recommended	AWS region for Polly. Defaults to `us-east-1`.
`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`	Optional	Explicit credentials. Not needed with IAM roles.
`AWS_SESSION_TOKEN`	Optional	Required when using temporary STS credentials.
`AWS_PROFILE`	Optional	Use a specific profile from `~/.aws/credentials`.
`RECAST_POLLY_ENGINE`	Optional	MCP default engine. Defaults to `neural`.

Disk caching

Set cacheDir to skip Polly API calls for text the provider has already synthesized in a previous run. The cache key is a SHA-256 over (text, voice, engine, sampleRate, languageCode, textType).

PollyProvider({
  voice: 'Joanna',
  engine: 'neural',
  languageCode: 'en-US',
  cacheDir: './.recast-cache/polly',
})

Cache layout: <cacheDir>/<hash>.mp3 (flat). Omit cacheDir to disable disk caching; intra-batch dedup still applies.

On this page