Audio
Discover how to convert audio to text or text to audio. OpenAI compliant.
Create transcription
Transcribes speech into text.
Required attributes
- Name
file
- Type
- file
- Description
The audio file to be transcribed. Supported file types: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
- Name
model
- Type
- string
- Description
The model used for transcription.
If the model name is "default", the audio model from the configuration is used (see Documentation » Configuration for details).
If the model name follows the format repo-owner/repo-name/model-name, the indicated model is used and, if it is not present, it will be downloaded from huggingface. If it cannot be downloaded, Edgen responds with an error. Example: "distil-whisper/distil-small.en/ggml-distil-small.en.bin".
If the model name contains just a file name, e.g.: "my-model.bin", Edgen will try using the file of this name in the data directory as defined in the configuration. If the the file does not exist there, Edgen responds with an error.
Optional attributes
- Name
language
- Type
- string
- Description
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
- Name
prompt
- Type
- string
- Description
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
- Name
response_format
- Type
- string
- Description
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
- Name
temperature
- Type
- float
- Description
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
- Name
create_session
- Type
- bool
- Description
If present and true, a new audio session will be created and used for the transcription and the session's UUID is returned in the response object. A session will keep track of past inferences, this may be useful for things like live transcriptions where continuous audio is submitted across several requests.
- Name
session
- Type
- UUID
- Description
The UUID of an existing session, which will be used for the transcription.
Request
curl http://localhost:33322/v1/audio/transcriptions \
-H "Authorization: Bearer no-key-required" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="default"
Response
{
"text": "The woods are lovely, dark and deep, but I have promises to keep and miles to go before I sleep, and miles to go before I sleep."
}
Transcription status
Shows the current status of the audio transcriptions endpoint (e.g. downloads)
Response attributes
- Name
active_model
- Type
- string
- Description
The model that is currently active for this endpoint.
- Name
donwload_ongoing
- Type
- bool
- Description
The model for this endpoint is currently being downloaded.
- Name
donwload_progress
- Type
- number
- Description
The progress of the ongoing model download in percent.
- Name
last_errors
- Type
- string[]
- Description
Errors that occurred recently on this endpoint.
Request
curl http://localhost:33322/v1/audio/transcriptions/status \
-H "Authorization: Bearer no-key-required"
Response
{"active_model":"ggml-distil-small.en.bin","download_ongoing":false,"download_progress":100,"last_errors":["Custom { kind: PermissionDenied, error: \"verboten\" }]}