How GenerateAudio Works

GenerateAudio helps creators, teams, and businesses turn text into natural-sounding speech for podcasts, videos, audiobooks and voice-over workflows. The app uses Google Text-to-Speech to produce human-like audio with flexible controls for voice model, speaking style, output format, and delivery workflow. This page explains the core features, limits, and production options.

Generated audio can be used for personal or commercial projects, subject to your rights in the source material, applicable law, and the usage rules described in our Terms of Use.

Get Started Today

Create a free account and unlock an always-free monthly package of 1,000 credits for eligible accounts.
Each monthly refresh is sufficient for generating up to 10,000 characters of premium audio, depending on the voice model you choose.

Credits & Taxes

Credits Policy

Paid Credits: Purchased credits do not expire. Your paid balance remains valid until used.
Monthly Free Package: Eligible accounts receive the current free package amount every month. The monthly free balance resets to the configured package size, and unused free credits do not roll over.
No Refunds: Credits are non-refundable. Unused credits cannot be refunded or converted to cash.

Tax Information

Taxes at Checkout: Any applicable taxes are calculated by Stripe at checkout based on the billing details and location provided during payment.
Business Details: Stripe may collect the billing information required to determine the correct tax treatment for your purchase.
Receipts and Invoices: Purchase confirmations and related payment documents are issued through Stripe / Lemon Squeezy as part of the checkout flow.

How to Use the App

Input: Paste or type your text directly into the synthesis window. You can also upload text from a simple .txt file (for logged-in users).
Configure: Select your desired language, choose a voice model (including Custom Voices if you've created any), adjust audio parameters, and optionally enter a name for your output file.
Choose Mode: Select between Short Audio (up to 5,000 bytes, instant) or Long Audio (up to 100,000 bytes, asynchronous processing).
Generate & Download: Click "Generate Audio". For short audio, you'll get instant results. For long audio, monitor progress and download when complete.
AI speech synthesis does not always produce identical results. Test a few variations to find the tone, pacing, and style that best match your content.

Languages and Voices

Our library includes 380+ voices across 75+ languages and regional variants, including less common options for multilingual production workflows.

Voice Model Breakdown

Chirp3-HD: The latest generation of premium Text-to-Speech technology. Available in 30 distinct styles and designed to capture nuanced, expressive human intonation.
Neural2: High-quality, human-like speech for general applications, still at a competitive price.
Wavenet: Legacy voice. Warm, good quality at a very competitive price point, providing a significant upgrade from standard voices.
Standard: Legacy voice. Cost-effective model, available across most supported languages. May sound robotic.
Custom: Personalized synthetic voices created from your own voice samples using Chirp3-HD technology. Available only to registered users. See the Custom Voice Feature section below for detailed information.

For pricing information on voice models please visit our Pricing page.

Key Production Controls

Audio formats:

WAV (Linear16): Perfect for professional, lossless studio quality (default).
MP3: Ideal for small file sizes.

Fine-grained control over four audio parameters:

Speech Speed (Rate): Adjust the speaking pace from 0.5x (slow) up to 2.0x (fast), with 1.0x being the voice's natural speed.
Volume Gain (dB): Increase or decrease the loudness of the voice (in decibels) to perfectly match your production needs.
Pitch: Shift the voice's tone higher or lower by up to ±10 semitones. (Note: This control is disabled for Chirp3-HD high-fidelity voices, as they are optimized for a fixed, natural pitch).
Sample Rate (kHz): Select the audio resolution:

We encourage you to experiment with these controls to achieve the exact vocal performance your content requires!

Custom Voice Limitations

Custom voices are currently only available for short audio synthesis (up to 5,000 bytes). For long audio synthesis (up to 100,000 bytes), please use regular voices such as Chirp3-HD, Neural2, Wavenet, or Standard.

Understanding Characters vs Bytes

Characters: The visible symbols in your text (letters, numbers, spaces, punctuation).
Bytes: The actual storage size of your text, which varies by language.

In some languages, a single character requires 2 or 3 bytes of storage. For example:
English/Latin: Most characters use 1 byte, for example English "Hello" = 5 characters = 5 bytes, while Czech "Dobré ráno" = 10 characters = 12 bytes.
Non-Latin: Characters often require 2 or 3 bytes, for example Chinese "你好" = 2 characters = 6 bytes, Japanese "こんにちは" = 5 characters = 15 bytes, and Arabic "مرحبا" = 5 characters = 10 bytes.

Our system uses byte limits for synthesis (5,000 bytes for Short Audio, 100,000 bytes for Long Audio). When you input text, the app displays both character count and byte count so you can monitor your usage accurately.

Note: All your audio operations are tracked in your My Account page, where you can access your operation history and download files.
Download access expires after 24 hours.

Custom Voice Feature

This feature allows you to clone your voice from short audio samples, giving you a unique voice model that only you can use.

Who Can Use Custom Voices

Custom Voice creation is available exclusively to registered users. Anonymous users cannot create or use custom voices.

How It Works

The Custom Voice creation process requires just two short recordings:
Consent Recording (~10 seconds): Read a consent statement that authorizes Google to use your voice for synthesis. This recording must match the exact consent script word-for-word.
Reference Recording (~10 seconds): Provide a natural sample of your voice speaking. The content doesn't matter - just speak naturally with your desired tone and energy.

Once both recordings are submitted, our system processes them to generate your unique voice cloning key. Your custom voice will then appear in the voice selector for all your future short audio synthesis projects. Custom voices do not support long audio generation.

Cost and Pricing

For custom voice pricing information please visit our Pricing page.

Benefits

Uniqueness: Your custom voice is exclusive to your account
Privacy: Voice cloning keys are stored securely server-side and never exposed to clients
Consistency: Maintain the same voice across all your projects
Quality: Powered by Google's state-of-the-art technology

Visit the Custom Voices page to get started!

Long Audio Synthesis

Generate high-quality audio from large text up to 100,000 bytes using asynchronous Long Audio synthesis feature. Perfect for audiobooks, long-form content, educational materials, and extensive narration projects.

Key Features

Large Text Support: Process up to 100,000 bytes of text (up to 100,000 characters depending on language)
Asynchronous Processing: Long audio synthesis runs in the background, allowing you to continue working while your audio is being generated
Status Monitoring: Check the progress of your synthesis operations at any time
Professional Format: Output in WAV (LINEAR16) format for lossless, studio-quality audio
Availability: Download your completed audio files for up to 24 hours after generation

Important Notes

Credit Deduction: Credits are deducted when synthesis starts. If synthesis fails, credits are automatically refunded.
No Cancellation: Once a long audio operation has started, it cannot be cancelled. The synthesis will continue to completion, and credits will be deducted accordingly.
Processing Time: Large files may take several minutes to process. Use the status checker to monitor progress.
Operation History: All long audio operations are tracked in My Account page, where you can view operation details, check status, and download completed files. Download expires after 24 hours.

My Account Features

My Account page is your central hub for managing your TTS operations, tracking usage, and accessing your account information.

Short Audio Operations

Operation History: See your last 50 short audio operations in a detailed table.
Operation Details: Each entry shows the date/time, filename, character count, credits cost, and status.
Download Access: Download your generated audio files (available for 24 hours).
Status Tracking: See the completion status of each operation at a glance.

Long Audio Operations

Operation History: View your last 20 long audio operations.
Detailed Information: See start date/time, filename, operation ID, character count, credits cost, and current status.
Status Monitoring: Check the progress of ongoing operations and download completed files.
Operation Management: Access all your long audio operations from one convenient location.

Transaction History

Purchase Records: View all your credit purchase transactions with dates, credit amounts, and payment amounts.
Receipts and Invoices: Access official receipts and invoices through the Stripe Customer Portal.
Complete Records: All transactions are permanently recorded for your reference.

Additional Account Features

Credit Balance: View your current credit balance, including monthly free credits and paid credits, and purchase additional credits.
Usage Statistics: See detailed character usage breakdown by voice model.
Profile Management: Edit your profile name.
Password Management: Change your account password.
Account Deletion: You can delete your account at any time. If you delete your account, you will lose all credits (both free and paid). This action cannot be undone. If you create a new account later using the same email address, device or IP, you may not qualify for the free package again.

Rate limits and Temporary blocks

Rate limits and cooldowns: To protect the service and prevent automated abuse, we may enforce (a) request rate limits (how many requests can be made in a period of time) and/or (b) cooldown periods (a minimum time interval between certain actions, such as generating audio). If you exceed these limits, your request may be rejected and you may need to wait before retrying.

Temporary blocks: If we detect excessive, automated, or abusive usage patterns (for example, repeated rate-limit violations), we may temporarily restrict access to some or all features for your account for a period of time. Temporary blocks are an automated protective measure and may be lifted automatically after the restriction period ends.