Chat
Generate text from text.
Create chat completion
Given a list of messages belonging to a chat history, generate a response.
Required attributes
- Name
messages
- Type
- array
- Description
A list of messages representing a chat history. It is essentially the context used by the model to generate a response.
- Name
model
- Type
- string
- Description
The model used for chat completions.
If the model name is "default", the chat model from the configuration is used (see Documentation » Configuration for details).
If the model name follows the format repo-owner/repo-name/model-name, the indicated model is used and, if it is not present, it will be downloaded from huggingface. If it cannot be downloaded, Edgen responds with an error. Example: "TheBloke/neural-chat-7B-v3-3-GGUF/neural-chat-7b-v3-3.Q4_K_M.gguf".
If the model name contains just a file name, e.g.: "my-model.bin", Edgen will try using the file of this name in the data directory as defined in the configuration. If the the file does not exist there, Edgen responds with an error.
Optional attributes
- Name
frequency_penalty
- Type
- float
- Description
A number in
[-2.0, 2.0]
. A higher number decreases the likelihood that the model repeats itself.
- Name
logit_bias
- Type
- map
- Description
A map of token IDs to
[-100.0, +100.0]
. Adds a percentage bias to those tokens before sampling; a value of-100.0
prevents the token from being selected at all. You could use this to, for example, prevent the model from emitting profanity.
- Name
max_tokens
- Type
- integer
- Description
The maximum number of tokens to generate. If
None
, terminates at the first stop token or the end of sentence.
- Name
n
- Type
- integer
- Description
How many choices to generate for each token in the output.
1
by default. You can use this to generate several sets of completions for the same prompt.
- Name
presence_penalty
- Type
- float
- Description
A number in
[-2.0, 2.0]
. Positive values "increase the model's likelihood to talk about new topics."
- Name
seed
- Type
- integer
- Description
The random number generator seed for the session. Random by default.
- Name
stop
- Type
- string or array
- Description
A stop phrase or set of stop phrases. The server will pause emitting completions if it appears to be generating a stop phrase, and will terminate completions if a full stop phrase is detected. Stop phrases are never emitted to the client.
- Name
stream
- Type
- bool
- Description
If true, stream the output as it is computed by the server, instead of returning the whole completion at the end. You can use this to live-stream completions to a client.
- Name
response_format
- Type
- string
- Description
The format of the response stream. This is always assumed to be JSON, which is non-conformant with the OpenAI spec.
- Name
temperature
- Type
- float
- Description
The sampling temperature, in
[0.0, 2.0]
. Higher values make the output more random.
- Name
top_p
- Type
- float
- Description
Nucleus sampling. If you set this value to 10%, only the top 10% of tokens are used for sampling, preventing sampling of very low-probability tokens.
- Name
tools
- Type
- array
- Description
A list of tools made available to the model.
- Name
tool_choice
- Type
- string
- Description
If present, the tool that the user has chosen to use. OpenAI states:
none
prevents any tool from being used,auto
allows any tool to be used, or- you can provide a description of the tool entirely instead of a name.
- Name
user
- Type
- string
- Description
A unique identifier for the end user creating this request. This is used for telemetry and user tracking, and is unused within Edgen.
- Name
one_shot
- Type
- bool
- Description
Indicate if this is an isolated request, with no associated past or future context. This may allow for optimisations in some implementations. Default:
false
- Name
context_hint
- Type
- integer
- Description
A hint for how big a context will be.
Warning
An unsound hint may severely drop performance and/or inference quality, and in some cases even cause Edgen to crash. Do not set this value unless you know what you are doing.
- Default
- Streaming
Request
curl http://localhost:33322/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key-required" \
-d '{
"model": "default",
"messages": [
{
"role": "system",
"content": "You are EdgenChat, a helpful AI assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
Response
{"id":"f403d6f4-4826-40b1-8798-77e4837e5041","choices":[{"message":{"role":"assistant","content":"Hello! How can I help you today?","name":null,"tool_calls":null},"finish_reason":null,"index":0}],"created":1708958149,"model":"main","system_fingerprint":"edgen-0.1.3","object":"text_completion","usage":{"completion_tokens":0,"prompt_tokens":0,"total_tokens":0}}
Chat completion status
Shows the current status of the chat completions endpoint (e.g. downloads).
Response attributes
- Name
active_model
- Type
- string
- Description
The model that is currently active for this endpoint.
- Name
donwload_ongoing
- Type
- bool
- Description
The model for this endpoint is currently being downloaded.
- Name
donwload_progress
- Type
- number
- Description
The progress of the ongoing model download in percent.
- Name
last_errors
- Type
- string[]
- Description
Errors that occurred recently on this endpoint.
Request
curl http://localhost:33322/v1/chat/completions/status \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key-required"
Response
{"active_model":"neural-chat-7b-v3-3.Q4_K_M.gguf","download_ongoing":false,"download_progress":100,"last_errors":["Custom { kind: PermissionDenied, error: \"verboten\" }]}