Skip to main content

Text-to-Speech (TTS)

Use MiniMax TTS to generate speech from text (text-to-speech). Supports Chinese (Mandarin), English, and Japanese preset or custom voices. Output MP3 is automatically imported as AudioClip, saved to Assets/TJGenerators/History/.

Suitable for character voiceover, narration, notification sounds, dialogue lines, etc. For background music, use the BGM from Audio Generation; for sound effects, use SFX.

Getting Started

Start from GUI

Menu: AI > Generate > Generate Audio, in the opened TJGenerators Audio window, switch the "Model" to MiniMax Text-to-Speech to use TTS: enter the text to synthesize, select a voice character, and click generate.

Select MiniMax TTS model in TJGenerators Audio window to generate TTS

Start from CLI

Use the generate_tts tool to generate speech from text. After generation, the speech is automatically imported as an AudioClip and ready to use in your scene.


Models

MiniMax Text-to-Speech (Only Model)

  • generator_id: minimax-tts
  • Use Cases: Text-to-speech, character voiceover, narration
  • Output: MP3, automatically imported as AudioClip
  • Key Parameters:
ParameterTypeDefaultDescription
promptstringRequiredText to synthesize
voice_idstringChinese (Mandarin)_GentlemanVoice ID (see table below, supports custom)
output_pathstringAuto-generatedAsset save path (.mp3 auto-appended)
play_on_awakeboolfalseWhether AudioSource auto-plays on entering Play Mode

Preset Voices (voice_id)

valueDescription
Chinese (Mandarin)_GentlemanChinese Male - Gentleman (default)
Chinese (Mandarin)_Humorous_ElderChinese Male - Humorous Elder
Chinese (Mandarin)_Cute_SpiritChinese Female - Cute Spirit
Chinese (Mandarin)_Warm_BestieChinese Female - Warm Bestie
English_WiseScholarEnglish Male - Wise Scholar
English_captivating_female1English Female - Captivating
Japanese_LoyalKnightJapanese Male - Loyal Knight
moss_audio_f0c5494c-7c25-11f0-8d70-a2abf1fbea61Japanese Female

voice_id supports custom values (allowCustom), you can enter other voice IDs provided by MiniMax.


Optimization

Text Optimization

  • Use the language matching the voice: Chinese voice for Chinese text, English voice for English text
  • Add appropriate punctuation: Commas and periods help control pauses and intonation
  • Segmented synthesis: Long dialogue lines can be split into multiple sentences for separate generation, making it easier to trigger them in the engine as needed

Voice Selection

  • Character dialogue: Choose voices matching character gender/personality (Gentleman/Humorous Elder/Cute Spirit, etc.)
  • Narration/commentary: Prefer steady male or female voices
  • Multi-language projects: Use voices in the corresponding language for each part

AudioSource Configuration

  • One-time lines/notification sounds: play_on_awake: false, triggered by scripts or events
  • Narration that plays on entering scene: play_on_awake: true

Notes

  • ⚠️ Entry point: GUI via AI > Generate > Generate Audio window, switch model to "MiniMax Text-to-Speech"; CLI via generate_tts tool
  • ⚠️ prompt is required: The text to synthesize cannot be empty
  • ⚠️ Language and voice must match: Chinese text with Chinese voice, otherwise pronunciation may be abnormal
  • ⚠️ Generation takes 10–30 seconds
  • ⚠️ Output is AudioClip (MP3): When placing in scene, bind as sound effect (AudioClip SFX) to AudioSource
  • ⚠️ Output path: Default Assets/TJGenerators/History/
  • ⚠️ Domain Reload: Do not write .cs files to disk during generation, use execute_csharp_script instead