SILMA TTS API

SILMA TTS Audio Generation API

1.1.2OAS 3.0

API for generating audio from text using a specific voice style and model.

API Base URL
  • Server 1:https://api.silma.ai/tts
Security
apiKeyAuth (apiKey)

An API key is a token that you provide when making API calls. Include the token in a header parameter called apiKey.

Example: apiKey: 123

Generate Audio

Generates audio from text based on a specific model, voice style, and configuration parameters.

post
https://api.silma.ai/tts/generate

Body

application/json
model_idstringrequired

The ID of the model to use for generation. Available options:

  • silma-tts-pro-ksa-large -> Saudi dialect, 330M parameters
  • silma-tts-pro-ksa-small -> Saudi dialect, 150M parameters, Faster but less in quality
  • silma-tts-pro-msa-large -> Modern Standard Arabic, 330M parameters
  • silma-tts-pro-msa-small -> Modern Standard Arabic, 150M parameters, Faster but less in quality

Example:silma-tts-pro-ksa-large

textstringrequired

The text content to be generated into audio (with or without tashkeel).

Example:بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ

reference_audio_idstringrequired
  • The ID representing the voice style.
  • Available options for default voices = [Sulaiman, Salma, Salman, Sarah, Sam, Samantha].
  • Change to “Voice ID” (ex:voice_1769817467123) from the “Custom Voices” section in https://app.silma.ai/voices if you uploaded a custom voice and would like to use it.
  • If you use Voice ID here then you should use the “user_id” parameter as well.
  • Change to “Custom” in case of using the custom_ref_audio parameter.

Example:Sulaiman

nfe_stepsintegerrequired

Number of function evaluation steps (speed/quality trade-off). Recommended to be fixed at 16.

Default:16

seedinteger

Random seed for reproducibility. Changing the seed slightly changes the style of the generated audio.

Default:42

remove_silenceboolean

Whether to strip silence from the output.

Default:false

speaking_speednumber(float)

The speed of the speech, add or subtract increments of 0.1 - if needed.

Default:1.1

use_emaboolean

Whether to use Exponential Moving Average weights [false for KSA models. true for MSA models].

Default:false

normalize_numbersboolean

Whether to convert numbers in text to words.

Default:true

pronunciation_overridesobject

A dictionary of words and their custom phonetic pronunciations (using tashkeel).

Example:{"اكل":"اُكِل"}

custom_ref_audiostring(byte)

Base64 encoded string of a custom reference audio file (optional).

enable_server_pronunciation_overridesboolean

This indicates that you have added custom pronunciation overrides to your account via the app. Enabling this feature will automatically customize the model based on your overrides without sending the overrides word mapping in each API call

Default:false

user_idstring

Optional user identifier, needed only for pronunciation overrides and loading custom voices. Find it here https://app.silma.ai/api-keys

Default:true

Response

application/json

Successful generation

audio_base64_encodedstring

The generated audio file encoded in Base64.

textstring

The processed text included in the response.

inference_timenumber(float)

The time taken by the server to process the inference.

post/generate

Body

{ "model_id": "silma-tts-pro-ksa-large", "text": "بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ", "reference_audio_id": "Sulaiman", "nfe_steps": 16 }
 
application/json