EdgeTTS API

Introduction

EdgeTTS API allows you to convert text into speech using a variety of multilingual voices provided by Microsoft TTS Services. This means you can have text read aloud in different languages and accents, making it versatile for various applications.

Features

Multilingual
Natural-sounding
Streaming (generate chunk by chunk)
Non-streaming (generate an entire file)
Deployable with Docker
Super Fast
Compatible with OpenAI API

How It Works

Simply send a text and choose a voice from the available options, and the API will generate an audio output. The process is seamless and can be used for both streaming and non-streaming audio outputs.

Voices

voices.yaml

en-US-AvaMultilingualNeural en-US-AndrewMultilingualNeural en-US-EmmaMultilingualNeural en-US-BrianMultilingualNeural fr-FR-VivienneMultilingualNeural de-DE-SeraphinaMultilingualNeural

Deployment

Option 1: Deploy with Python

Clone the repository:

git clone https://github.com/taowang1993/edgetts-api
cd edgetts-api

Install dependencies:

pip install -r requirements.txt

Start the service:

python main.py

The service will be available at http://localhost:5000

Option 2: Deploy with Docker

Build the image:

docker build -t edgetts-api .

Run the container:

docker run -d -p 5000:5000 edgetts-api

API Documentation

1. List Available Voices

Retrieves all supported voice options.

GET /voices

Response example:

{
    "code": 200,
    "message": "OK",
    "data": [
        {
            "Name": "en-US-GuyNeural",
            "ShortName": "en-US-GuyNeural",
            "Gender": "Male",
            "Locale": "en-US"
        },
        // ... more voices
    ]
}

2. Text-to-Speech (Download)

Convert text to speech and download the audio file.

POST /tts

Request body:

{
    "text": "Hello, World!",
    "voice": "en-US-AvaMultilingualNeural"    // Optional, defaults to "en-US-AvaMultilingualNeural"
}

Note: The file_name parameter is optional and will default to a temporary file if not provided.

Response:

Content-Type: audio/mpeg
Returns audio file stream

3. Text-to-Speech (Streaming)

Convert text to speech with streaming output. Choose this endpoint for real-time playback.

POST /tts/stream

Request body:

{
    "text": "Hello, World!",
    "voice": "en-US-AvaMultilingualNeural"    // Optional, defaults to "en-US-AvaMultilingualNeural"
}

Response:

Content-Type: application/octet-stream
Returns audio stream

OpenAI-compatible Streaming

Convert text to speech with streaming output. This endpoint is compatible with the OpenAI TTS API format.

POST /v1/audio/speech

Request body:

{
    "model": "tts-1",              // Optional and currently ignored
    "input": "Hello, World!",      // Required: text to convert to speech
    "voice": "alloy"               // Optional, defaults to "alloy"
}

Voice Mappings:

OpenAI Voice	EdgeTTS Voice
alloy	en-US-AvaMultilingualNeural
echo	en-US-AndrewMultilingualNeural
fable	en-US-EmmaMultilingualNeural
onyx	en-US-BrianMultilingualNeural
nova	fr-FR-VivienneMultilingualNeural
shimmer	de-DE-SeraphinaMultilingualNeural

Response:

Content-Type: audio/mpeg
Returns audio stream

Example usage with OpenAI Python client:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5000/v1",  // Your edgetts url
    api_key="your_api_key_here"  // Required but not used
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world!"
)

# Save the audio to a file
response.stream_to_file("output.mp3")

Usage Examples

Python

import requests

# Get available voices
response = requests.get('http://localhost:5000/voices')
voices = response.json()['data']

# Text-to-Speech (Download)
data = {
    "text": "Hello, World!",
    "voice": "en-US-AvaMultilingualNeural",
    "file_name": "output.mp3"
}
response = requests.post('http://localhost:5000/tts', json=data)
with open('output.mp3', 'wb') as f:
    f.write(response.content)

# Text-to-Speech (Streaming)
response = requests.post('http://localhost:5000/tts/stream', json=data, stream=True)
with open('stream_output.mp3', 'wb') as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

Curl

# Get available voices
curl http://localhost:5000/voices

# Text-to-Speech (Download)
curl -X POST http://localhost:5000/tts \
    -H "Content-Type: application/json" \
    -d '{"text":"Hello, World!", "voice":"en-US-AvaMultilingualNeural"}' \
    --output output.mp3

# Text-to-Speech (Streaming)
curl -X POST http://localhost:5000/tts/stream \
    -H "Content-Type: application/json" \
    -d '{"text":"Hello, World!", "voice":"en-US-AvaMultilingualNeural"}' \
    --output stream_output.mp3

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
static		static
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh
voices.yaml		voices.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EdgeTTS API

Introduction

Features

How It Works

Voices

Deployment

Option 1: Deploy with Python

Option 2: Deploy with Docker

API Documentation

1. List Available Voices

2. Text-to-Speech (Download)

3. Text-to-Speech (Streaming)

OpenAI-compatible Streaming

Usage Examples

Python

Curl

License

About

Releases

Packages

Languages

License

taowang1993/edgetts-api

Folders and files

Latest commit

History

Repository files navigation

EdgeTTS API

Introduction

Features

How It Works

Voices

Deployment

Option 1: Deploy with Python

Option 2: Deploy with Docker

API Documentation

1. List Available Voices

2. Text-to-Speech (Download)

3. Text-to-Speech (Streaming)

OpenAI-compatible Streaming

Usage Examples

Python

Curl

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages