Product guide

How GenerateAudio Works

GenerateAudio helps creators turn text into natural-sounding speech with flexible voice models, production controls, and downloadable audio for real projects.

Personal and commercial use

Produced audio can be used for personal and commercial projects, subject to your rights in the source material, applicable law, and the usage rules described in our Terms of Use.

Get Started For Free!

Create a free account and unlock a free monthly package of 5,000 credits for eligible accounts.
Each monthly refresh is sufficient for generating up to 50,000 characters or 25 minutes of premium audio, depending on the voice model you choose.

Credits & Taxes

Credits Policy

Paid Credits: Purchased credits do not expire. Your paid balance remains valid until used or refunded.
Monthly Free Package: Eligible accounts receive the current free package amount every month. The monthly free balance resets to the configured package size, and unused free credits do not roll over.
Unused Paid Credit Refunds: Unused paid credits can be refunded manually on request, minus a fixed 10% payment processing cost. Free monthly credits, promotional credits, and credits already used for audio generation are not refundable. To request a refund, use the Contact Us form. Refunds are processed manually and may take up to 7 business days.

Tax Information

Taxes at Checkout: Any applicable taxes are calculated by Stripe at checkout based on the billing details and location provided during payment.
Business Details: Stripe may collect the billing information required to determine the correct tax treatment for your purchase.
Receipts and Invoices: Purchase confirmations and related payment documents are issued through Stripe / Lemon Squeezy as part of the checkout flow.

How to Use the App

Input: Paste or type your text directly into the synthesis window. You can also upload text from a simple .txt file (for logged-in users).
Configure: The Home page offers a quick Gemini preview. Use Generator for the full workflow: language, voice model, Short or Long Audio mode, voice tuning, export format, quality, sample rate, and optional output filename.
Choose Mode: Select between Short Audio (up to 5,000 bytes, instant) or Long Audio (up to 100,000 bytes, asynchronous processing). Gemini and Custom voices are short-audio only.
Generate & Download: Generate Short Audio for immediate playback and download, or start Long Audio and monitor progress until the completed file is ready.
AI speech synthesis does not always produce identical results. Test a few variations to find the tone, pacing, and style that best match your content.

Languages and Voices

Our library includes 380+ voices across 75+ languages and regional variants, including less common options for multilingual production workflows.

TTS Voice Model Breakdown

Gemini 3.1 Flash: Expressive Gemini preview voice for short audio, style-guided narration, and quick storytelling drafts. Gemini voices are billed by generated audio duration.
Gemini 2.5 Pro: Higher-quality Gemini preview voice for expressive short narration where quality matters most. Gemini voices are billed by generated audio duration.
Chirp3-HD: The latest premium Text-to-Speech model. Offers 30 distinct voice styles designed to capture subtle nuances and expressive human intonation. Delivers exceptional realism and emotional depth.
Neural2: High-quality, natural-sounding speech ideal for most use cases. Balances strong performance with a competitive price.
Wavenet: Legacy voice model with warm, natural sound quality. It may be available as a free or budget-friendly option depending on the current catalog.
Standard: Legacy voice model. A basic option for high-volume drafts and simple narration that may be free or budget-friendly depending on the current catalog.
Custom: Personalized synthetic voices created from your own voice samples using Chirp3-HD technology. See the Voice Cloning Feature section below for detailed information.

For pricing information on voice models please visit our Pricing page.

Key Production Controls

Audio export formats:

MP3: The default, compatibility-focused format for publishing platforms, players, and editing tools that expect MP3 audio.
AAC / M4A: Modern export format, suitable for polished creator workflows and compact high-quality files.
WAV (LINEAR16): The lossless option for workflows that need uncompressed audio.

Audio quality settings:

MP3 quality Bit Rate presets: 128 kbps (default), 192 kbps and 256 kbps.
M4A quality Bit Rate presets: 96 kbps, 128 kbps (default), 192 kbps and 256 kbps.
Sample Rate presets (all output formats): 24 kHz (default) or 48 kHz.

Voice tuning settings vary by voice model:

Gemini short audio supports Temperature, Style instructions, and optional two-speaker dialogue setup.
Legacy Wavenet and Standard voices support Speaking Rate, Pitch, and Volume Gain.
Chirp3-HD, Neural2, and Custom voices support Speaking Rate and Volume Gain; Pitch is not available for these model families.

We encourage you to experiment with these controls to achieve the exact vocal performance your content requires!

Custom Voice Limitations

Custom voices and Gemini voices are currently only available for short audio synthesis (up to 5,000 bytes). For long audio synthesis (up to 100,000 bytes), please use regular voices such as Chirp3-HD, Neural2, Wavenet, or Standard.

Understanding Characters vs Bytes

Characters: The visible symbols in your text (letters, numbers, spaces, punctuation).
Bytes: The actual storage size of your text, which varies by language.

In some languages, a single character requires 2 or 3 bytes of storage. For example:
English/Latin: Most characters use 1 byte, for example English "Hello" = 5 characters = 5 bytes, while Czech "Dobré ráno" = 10 characters = 12 bytes.
Non-Latin: Characters often require 2 or 3 bytes, for example Chinese "你好" = 2 characters = 6 bytes, Japanese "こんにちは" = 5 characters = 15 bytes, and Arabic "مرحبا" = 5 characters = 10 bytes.

Our system uses byte limits for synthesis (5,000 bytes for Short Audio, 100,000 bytes for Long Audio). When you input text, the app displays both character count and byte count so you can monitor your usage accurately.

Note: All your audio operations are tracked in your My Account page, where you can access your operation history and download files.
Download access expires after 24 hours.

Voice Cloning Feature

This feature allows you to clone your voice from short audio samples, giving you a unique voice model that only you can use.

Who Can Use Voice Cloning

Custom Voice creation is available exclusively to registered users. Anonymous users cannot create or use custom voices.

How It Works

The Custom Voice creation process requires just two short recordings:
Consent Recording (~10 seconds): Read a consent statement that authorizes Google to use your voice for synthesis. This recording must match the exact consent script word-for-word.
Reference Recording (~10 seconds): Provide a natural sample of your voice speaking. The content doesn't matter - just speak naturally with your desired tone and energy.

Once both recordings are submitted, our system processes them to generate your unique voice cloning key. Your custom voice will then appear in the voice selector for all your future short audio synthesis projects. Custom voices do not support long audio generation.

Cost and Pricing

For custom voice pricing information please visit our Pricing page.

Benefits

Uniqueness: Your custom voice is exclusive to your account
Privacy: Voice cloning keys are stored securely server-side and never exposed to clients
Consistency: Maintain the same voice across all your projects
Quality: Powered by Google's state-of-the-art technology

Visit the Voice Cloning page to get started!

Long Audio Synthesis

Generate high-quality audio from large text up to 100,000 bytes using the asynchronous Long Audio synthesis feature. Perfect for audiobooks, long-form content, educational materials, and extensive narration projects.

Key Features

Large Text Support: Process up to 100,000 bytes of text (up to 100,000 characters depending on language)
Asynchronous Processing: Long audio synthesis runs in the background, allowing you to continue working while your audio is being generated
Status Monitoring: Check the progress of your synthesis operations at any time
Export Settings: Use the selected Generator output format and quality settings. Long Audio can be delivered as AAC/M4A, MP3, or WAV; WAV is used during synthesis before compressed formats are converted.
Availability: Download your completed audio files for up to 24 hours after generation

Important Notes

Credit Deduction: Credits are deducted when synthesis starts. If synthesis fails, credits are automatically restored to your account; this is separate from manual payment refunds for unused paid credits.
No Cancellation: Once a long audio operation has started, it cannot be cancelled. The synthesis will continue to completion, and credits will be deducted accordingly.
Processing Time: Large files may take several minutes to process. Use the status checker to monitor progress.
Voice Availability: Gemini and Custom voices are short-audio only. For Long Audio, choose Chirp3-HD, Neural2, Wavenet, or Standard voices.
Operation History: All long audio operations are tracked in My Account page, where you can view operation details, check status, and download completed files. Download expires after 24 hours.

My Account Features

My Account page is your central hub for managing your TTS operations, tracking usage, and accessing your account information.

Short Audio Operations

Operation History: See your last 50 short audio operations in a detailed table.
Operation Details: Each entry shows the date/time, filename, character count, credits cost, and status.
Download Access: Download your generated audio files (available for 24 hours).
Status Tracking: See the completion status of each operation at a glance.

Long Audio Operations

Operation History: View your last 20 long audio operations.
Detailed Information: See start date/time, filename, operation ID, character count, credits cost, and current status.
Status Monitoring: Check the progress of ongoing operations and download completed files.
Operation Management: Access all your long audio operations from one convenient location.

Transaction History

Purchase Records: View all your credit purchase transactions with dates, credit amounts, and payment amounts.
Receipts and Invoices: Access official receipts and invoices through the Stripe Customer Portal.
Complete Records: All transactions are permanently recorded for your reference.

Additional Account Features

Credit Balance: View your current credit balance, including monthly free credits and paid credits, and purchase additional credits.
Usage Statistics: See detailed character usage breakdown by voice model.
Profile Management: Edit your profile name.
Password Management: Change your account password.
Account Deletion: You can delete your account at any time. If you delete your account, you will lose all credits (both free and paid). If you may be eligible for a refund of unused paid credits, request it through the Contact Us form before deleting your account. This action cannot be undone. If you create a new account later using the same email address, device or IP, you may not qualify for the free package again.

Rate limits and Temporary blocks

Rate limits and cooldowns: To protect the service and prevent automated abuse, we may enforce (a) request rate limits (how many requests can be made in a period of time) and/or (b) cooldown periods (a minimum time interval between certain actions, such as generating audio). If you exceed these limits, your request may be rejected and you may need to wait before retrying.

Temporary blocks: If we detect excessive, automated, or abusive usage patterns (for example, repeated rate-limit violations), we may temporarily restrict access to some or all features for your account for a period of time. Temporary blocks are an automated protective measure and may be lifted automatically after the restriction period ends.