How to Use the Chat Endpoint in Vultr Serverless Inference

by Blog Admin
November 7, 2025
9

How to Use the Chat Endpoint in Vultr Serverless Inference

Vultr Serverless Inference chat endpoint enables users to engage in chat conversations with Large Language Models (LLMs). This service allows for real-time interaction, leveraging advanced AI capabilities to facilitate dynamic and responsive communication. The endpoint also supports tool calling, letting models invoke defined functions (for example, fetch live data or call an API) during a conversation to produce data-driven responses. By integrating this endpoint, users can enhance their applications with sophisticated conversational AI, improving user experience and operational efficiency.

Follow this guide to utilize the chat endpoint on your Vultr account using the Vultr Customer Portal or API.

Vultr Customer Portal

Navigate to Products, click Serverless, and then click Inference.
Click your target inference subscription to open its management page.
Open the Chat page.
Select a preferred model.
Provide Max Tokens value.
Send a message in the chat window.
Click History to view chat history.
Click New Conversation to create a chat window.

Vultr API

Chat with a Model Using the API

Send a GET request to the List Serverless Inference endpoint and note the target inference subscription’s ID.

console

$ curl "https://api.vultr.com/v2/inference" \
    -X GET \
    -H "Authorization: Bearer ${VULTR_API_KEY}"

Send a GET request to the Serverless Inference endpoint and note the target inference subscription’s API key.

console

$ curl "https://api.vultr.com/v2/inference/{inference-id}" \
    -X GET \
    -H "Authorization: Bearer ${VULTR_API_KEY}"

Send a GET request to the List Models endpoint and note the preferred inference model’s ID.

console

$ curl "https://api.vultrinference.com/v1/models" \
    -X GET \
    -H "Authorization: Bearer ${INFERENCE_API_KEY}"

Send a POST request to the Create Chat Completion endpoint to chat with the prefered Large Language Model.

console

$ curl "https://api.vultrinference.com/v1/chat/completions" \
    -X POST \
    -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
    -H "Content-Type: application/json" \
    --data '{
        "model": "{model-id}",
        "messages": [
        {
            "role": "user",
            "content": "{user-input}"
        }
        ],
        "max_tokens": 512
    }'

Visit the Create Chat Completion API page to view additional attributes you can apply for greater control when interacting with the preferred inference model.

Use Tool Calling with the Chat Endpoint

Note

Tool calling is currently supported only on the kimi-k2-instruct model.

Define your tools using the "tools" parameter in the request body.
Set "tool_choice" to "auto", "required" or "none" to control when the model triggers a tool call.

Send a POST request to the Create Chat Completion endpoint to send a message that can trigger tool calls.

console

$ curl "https://api.vultrinference.com/v1/chat/completions" \
    -X POST \
    -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
    -H "Content-Type: application/json" \
    --data '{
        "model": "kimi-k2-instruct",
        "messages": [
            { "role": "user", "content": "Ask a question that requires a tool response." }
        ],
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "function_name",
                    "description": "Briefly describe the purpose of the function.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "parameter_name": {
                                "type": "string",
                                "description": "Describe the expected input parameter."
                            }
                        },
                        "required": ["parameter_name"]
                    }
                }
            }
        ],
        "tool_choice": "auto"
    }'

The model responds with a structured tool call such as:

{
"role": "assistant",
"tool_calls": [
    {
    "type": "function",
    "function": {
        "name": "function_name",
        "arguments": "{\"parameter_name\": \"example_value\"}"
    }
    }
]
}

You can execute this function locally or via API, then send the output back to the model in a second request.

For a complete implementation example, see How to Use Tool Calling with Vultr Serverless Inference.

How to Use the Chat Endpoint in Vultr Serverless Inference Vultr Serverless Inference chat endpoint enables users to engage in chat conversations with Large Language Models (LLMs). This service allows for real-time interaction, leveraging advanced AI capabilities to facilitate dynamic and responsive communication. The endpoint also supports tool calling, letting…

How to Use the Chat Endpoint in Vultr Serverless Inference Vultr Serverless Inference chat endpoint enables users to engage in chat conversations with Large Language Models (LLMs). This service allows for real-time interaction, leveraging advanced AI capabilities to facilitate dynamic and responsive communication. The endpoint also supports tool calling, letting…

How to Apply a Public Read Policy for Vultr Object Storage Subscription

How to Use the Prompt Endpoint in Vultr Serverless Inference

Cobra Softwares Blog