Flow Dashboard
Set up your Flow player, select voices, add website areas, and activate the player.
Getting Started
Simply Listen Studio is a text-to-speech and translation generator that helps you turn written content into downloadable audio. It is designed for creators, schools, organizations, businesses, and accessibility-focused teams that need a simple way to create spoken versions of written information.
With Simply Listen Studio, you can paste text, choose a language, select a voice, generate audio, and download the finished file. Studio also supports translation, so English content can be translated and generated in another language when needed.
What Simply Listen Studio does
Simply Listen Studio helps you create audio from text. You can use it for announcements, instructions, web content, training material, accessibility support, multilingual communication, and other situations where audio may help people understand or access information more easily.
Studio includes:
Text-to-speech audio generation
Language and voice selection
Female and male voice options where available
Translation-supported audio generation
Downloadable audio files
A pronunciation helper for improving how specific words are spoken
Credit-based usage so you only use credits when audio is generated
How to use Simply Listen Studio
Start by typing or pasting your text into the Studio text box. As you type, Studio shows the number of characters in the current generation and estimates how many credits the request will use.
Next, choose a language. Popular languages appear first for quick access, and the full language list is available below them. After selecting a language, choose the voice option you want to use.
When you are ready, select Generate Audio . Studio will create the audio file and make it available in the audio player. After the audio is generated, you can listen to it and download the file.
Audio created with Simply Listen Studio may be used for personal, educational, business, or commercial purposes. There are no additional usage restrictions on the audio you create. Attribution is appreciated but not required.
Using translation
When you select a non-English language, Studio uses translation before generating audio. Translation uses more credits than standard text-to-speech because the request includes both translation and audio generation.
Before a translated audio file is created, Studio will ask you to confirm the credit cost. Review the message carefully before continuing.
Machine translation and audio generation may contain errors. Review all translated and generated content before publishing or sharing. Simply Listen is not responsible for errors in translation, pronunciation, or audio generation.
Using the Pronunciation Helper
The Pronunciation Helper lets you improve how specific words are spoken. This can be helpful for names, places, acronyms, brand terms, school names, or words that a text-to-speech voice may not pronounce the way you prefer.
After saving a pronunciation, Studio can apply that pronunciation when generating audio for the same account. Saved pronunciations can be used across supported Simply Listen applications, so you do not need to recreate the same pronunciation in each tool.
Credit usage
Simply Listen Studio uses credits based on the number of characters in each generation.
Standard text-to-speech currently uses 1 credit per character.
Translation currently uses 2 credits per character.
Your available credit balance appears inside Studio. If a request needs more credits than you have available, Studio will let you know before generating audio.
Large text generations
Studio supports larger text generations, but longer requests may take more time to process. If your text is over 5,000 characters, Studio will show a message so you know the request may take longer.
The current backend generation limit is 10,000 characters. This limit helps keep audio generation reliable and helps prevent unusually large requests from timing out.
Whats Coming Soon!
We are currently developing a feature that generates viseme timing data alongside your MP3 audio. This update will provide audio offsets, viseme IDs, SVG animation data, and 3D blend-shape animation data.