SILMA TTS API

SILMA TTS Audio Generation API

1.0.5OAS 3.0

API for generating audio from text using a specific voice style and model.

API Base URL
  • Server 1:https://api.silma.ai/tts
Security
apiKeyAuth (apiKey)

An API key is a token that you provide when making API calls. Include the token in a header parameter called apiKey.

Example: apiKey: 123

Generate Audio

Generates audio from text based on a specific model, voice style, and configuration parameters.

post
https://api.silma.ai/tts/generate

Body

application/json
model_idstringrequired

The ID of the model to use for generation. Available options = silma-tts-l3-pro-ksa-large

Default:silma-tts-l3-pro-ksa-large

Example:silma-tts-l3-pro-ksa-large

textstringrequired

The text content to be converted to audio (with or without tashkeel).

Example:بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ

reference_audio_idstringrequired

The ID representing the voice style. Available options = [Sulaiman or Salma]. Change to “Custom” in case of using a custom reference audio.

Default:Sulaiman

Example:Sulaiman

nfe_stepsintegerrequired

Number of function evaluation steps (speed/quality trade-off). Recommended to be fixed at 16.

Default:16

seedinteger

Random seed for reproducibility.

Default:42

remove_silenceboolean

Whether to strip silence from the output.

Default:false

speaking_speednumber(float)

The speed of the speech, add or subtract increments of 0.1 - if needed.

Default:1.1

use_emaboolean

Whether to use Exponential Moving Average weights [false for KSA models. true for MSA models].

Default:false

normalize_numbersboolean

Whether to convert numbers in text to words.

Default:true

pronunciation_overridesobject

A dictionary of words and their custom phonetic pronunciations.

Example:{"اكل":"اُكِل"}

custom_ref_audiostring(byte)

Base64 encoded string of a custom reference audio file (optional).

Response

application/json

Successful generation

audio_base64_encodedstring

The generated audio file encoded in Base64.

textstring

The processed text included in the response.

inference_timenumber(float)

The time taken by the server to process the inference.

post/generate

Body

{ "model_id": "silma-tts-l3-pro-ksa-large", "text": "بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ", "reference_audio_id": "Sulaiman", "nfe_steps": 16 }
 
application/json