Audio

Discover how to convert audio to text or text to audio. OpenAI compliant.

POSThttp://localhost:33322/v1/audio/transcriptions

Create transcription

Transcribes speech into text.

Required attributes

Name
file
Type
file
Description
The audio file to be transcribed. Supported file types: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Name
model
Type
string
Description
The model used for transcription.
- If the model name is "default", the audio model from the configuration is used (see Documentation » Configuration for details).
- If the model name follows the format repo-owner/repo-name/model-name, the indicated model is used and, if it is not present, it will be downloaded from huggingface. If it cannot be downloaded, Edgen responds with an error. Example: "distil-whisper/distil-small.en/ggml-distil-small.en.bin".
- If the model name contains just a file name, e.g.: "my-model.bin", Edgen will try using the file of this name in the data directory as defined in the configuration. If the the file does not exist there, Edgen responds with an error.

Optional attributes

Name
language
Type
string
Description
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

Name
prompt
Type
string
Description
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

Name
response_format
Type
string
Description
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

Name
temperature
Type
float
Description
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Name
create_session
Type
bool
Description
If present and true, a new audio session will be created and used for the transcription and the session's UUID is returned in the response object. A session will keep track of past inferences, this may be useful for things like live transcriptions where continuous audio is submitted across several requests.

Name
session
Type
UUID
Description
The UUID of an existing session, which will be used for the transcription.

Request

POST

/v1/audio/transcriptions

curl http://localhost:33322/v1/audio/transcriptions \
  -H "Authorization: Bearer no-key-required" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="default"

Response

{
  "text": "The woods are lovely, dark and deep, but I have promises to keep and miles to go before I sleep, and miles to go before I sleep."
}

GEThttp://localhost:33322/v1/audio/transcriptions/status

Transcription status

Shows the current status of the audio transcriptions endpoint (e.g. downloads)

Response attributes

Name
active_model
Type
string
Description
The model that is currently active for this endpoint.

Name
donwload_ongoing
Type
bool
Description
The model for this endpoint is currently being downloaded.

Name
donwload_progress
Type
number
Description
The progress of the ongoing model download in percent.

Name
last_errors
Type
string[]
Description
Errors that occurred recently on this endpoint.

Request

GET

/v1/audio/transcriptions/status

curl http://localhost:33322/v1/audio/transcriptions/status \
  -H "Authorization: Bearer no-key-required"

Response

{"active_model":"ggml-distil-small.en.bin","download_ongoing":false,"download_progress":100,"last_errors":["Custom { kind: PermissionDenied, error: \"verboten\" }]}