Audio

Discover how to convert audio to text or text to audio. OpenAI compliant.


POSThttp://localhost:33322/v1/audio/transcriptions

Create transcription

Transcribes speech into text.

Required attributes

  • Name
    file
    Type
    file
    Description

    The audio file to be transcribed. Supported file types: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

  • Name
    model
    Type
    string
    Description

    The model used for transcription.

    • If the model name is "default", the audio model from the configuration is used (see Documentation » Configuration for details).

    • If the model name follows the format repo-owner/repo-name/model-name, the indicated model is used and, if it is not present, it will be downloaded from huggingface. If it cannot be downloaded, Edgen responds with an error. Example: "distil-whisper/distil-small.en/ggml-distil-small.en.bin".

    • If the model name contains just a file name, e.g.: "my-model.bin", Edgen will try using the file of this name in the data directory as defined in the configuration. If the the file does not exist there, Edgen responds with an error.

Optional attributes

  • Name
    language
    Type
    string
    Description

    The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

  • Name
    prompt
    Type
    string
    Description

    An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

  • Name
    response_format
    Type
    string
    Description

    The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

  • Name
    temperature
    Type
    float
    Description

    The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

  • Name
    create_session
    Type
    bool
    Description

    If present and true, a new audio session will be created and used for the transcription and the session's UUID is returned in the response object. A session will keep track of past inferences, this may be useful for things like live transcriptions where continuous audio is submitted across several requests.

  • Name
    session
    Type
    UUID
    Description

    The UUID of an existing session, which will be used for the transcription.

Request

POST
/v1/audio/transcriptions
curl http://localhost:33322/v1/audio/transcriptions \
  -H "Authorization: Bearer no-key-required" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="default"

Response

{
  "text": "The woods are lovely, dark and deep, but I have promises to keep and miles to go before I sleep, and miles to go before I sleep."
}

GEThttp://localhost:33322/v1/audio/transcriptions/status

Transcription status

Shows the current status of the audio transcriptions endpoint (e.g. downloads)

Response attributes

  • Name
    active_model
    Type
    string
    Description

    The model that is currently active for this endpoint.

  • Name
    donwload_ongoing
    Type
    bool
    Description

    The model for this endpoint is currently being downloaded.

  • Name
    donwload_progress
    Type
    number
    Description

    The progress of the ongoing model download in percent.

  • Name
    last_errors
    Type
    string[]
    Description

    Errors that occurred recently on this endpoint.

Request

GET
/v1/audio/transcriptions/status
curl http://localhost:33322/v1/audio/transcriptions/status \
  -H "Authorization: Bearer no-key-required"

Response

{"active_model":"ggml-distil-small.en.bin","download_ongoing":false,"download_progress":100,"last_errors":["Custom { kind: PermissionDenied, error: \"verboten\" }]}