Create Chat Completion

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "A doubleword is a data unit that is twice the size of a standard word in computer architecture, typically 32 or 64 bits depending on the system.",
        "role": "assistant"
      }
    }
  ],
  "created": 1703187200,
  "id": "chatcmpl-abc123",
  "model": "Qwen/Qwen3-30B-A3B-FP8",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 36,
    "prompt_tokens": 24,
    "total_tokens": 60
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

POST

chat

completions

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "A doubleword is a data unit that is twice the size of a standard word in computer architecture, typically 32 or 64 bits depending on the system.",
        "role": "assistant"
      }
    }
  ],
  "created": 1703187200,
  "id": "chatcmpl-abc123",
  "model": "Qwen/Qwen3-30B-A3B-FP8",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 36,
    "prompt_tokens": 24,
    "total_tokens": 60
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

Use this endpoint for real-time chat generations on the OpenAI-compatible API. This documentation covers the OpenAI-compatible chat/completions endpoint. If an SDK defaults to the newer OpenAI Responses API, configure it to use chat completions instead.

Base URL

https://api.dottxt.ai/v1

Structured output

Use response_format with type: "json_schema" to constrain the model output to your schema.

curl https://api.dottxt.ai/v1/chat/completions \
  -H "Authorization: Bearer $DOTTXT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Classify: My card was charged twice for order ORD-9842. Need refund today." }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "ticket",
        "schema": {
          "type": "object",
          "properties": {
            "category": {
              "type": "string",
              "enum": ["billing", "technical", "account", "shipping"]
            },
            "priority": {
              "type": "string",
              "enum": ["low", "medium", "high", "urgent"]
            },
            "summary": {
              "type": "string",
              "minLength": 10,
              "maxLength": 120
            },
            "tags": {
              "type": "array",
              "items": { "type": "string" },
              "minItems": 1,
              "maxItems": 4
            }
          },
          "required": ["category", "priority", "summary", "tags"],
          "additionalProperties": false
        }
      }
    }
  }'

Response

{
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "{\"category\": \"billing\", \"priority\": \"high\", \"summary\": \"Customer reports duplicate card charge on order ORD-9842, requesting refund\", \"tags\": [\"refund\", \"duplicate-charge\"]}"
      }
    }
  ]
}

category is always one of the four enum values. summary is between 10 and 120 characters. tags has 1–4 items. See the supported features for the full list of enforceable constraints.

JSON Patch streaming

Set stream: "patch" alongside a response_format JSON schema to stream the structured response field-by-field as JSON Patch operations instead of returning a single completion. The server emits one RFC 6902 add operation per field as the model generates it, so downstream work (routing, dispatching, UI updates) can begin the moment the relevant field lands.

curl https://api.dottxt.ai/v1/chat/completions \
  -H "Authorization: Bearer $DOTTXT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "I was charged twice this month. Refund the duplicate." }
    ],
    "stream": "patch",
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "ticket",
        "schema": {
          "type": "object",
          "properties": {
            "intent":  { "type": "string", "enum": ["billing", "technical", "account"] },
            "urgency": { "type": "string", "enum": ["low", "medium", "high", "critical"] },
            "reply":   { "type": "string", "maxLength": 400 }
          },
          "required": ["intent", "urgency", "reply"],
          "additionalProperties": false
        }
      }
    }
  }'

stream: "patch" is the only request difference from a normal structured-output call — messages, temperature, max_tokens, seed, and the rest of the chat-completions parameters all behave the same way.

Wire format

The default response framing is NDJSON (Content-Type: application/x-ndjson): one JSON object per line, no event: prefix, no trailing terminator record — the stream ends when the connection closes.

Response (NDJSON)

{"op":"add","path":"","value":{}}
{"op":"add","path":"/intent","value":"billing"}
{"op":"add","path":"/urgency","value":"high"}
{"op":"add","path":"/reply","value":"Hi Jane, I've processed the refund..."}

The endpoint also speaks Server-Sent Events for clients that prefer SSE framing. Send Accept: text/event-stream on the request, and you’ll get:

Response (SSE)

event: patch
data: {"op":"add","path":"","value":{}}

event: patch
data: {"op":"add","path":"/intent","value":"billing"}

event: patch
data: {"op":"add","path":"/urgency","value":"high"}

event: patch
data: {"op":"add","path":"/reply","value":"Hi Jane, I've processed the refund..."}

event: done
data: {}

The JSON Patch payloads are identical between the two framings — only the transport differs. SSE adds a final event: done record before the connection closes; NDJSON closes silently.

Operation shape

Every record is an RFC 6902 add operation:

{
  "op": "add",
  "path": "<JSON Pointer>",
  "value": "<field value>"
}

op — always "add" in this mode.
path — JSON Pointer to the location being filled in. The root document arrives first as path: "" with value: {} (or value: [] for array-rooted schemas).
value — the value being inserted. For leaf fields, the JSON primitive (string, number, boolean, null). For nested objects and arrays, an empty container that subsequent ops will fill.

Order of operations

Operations arrive in schema order:

Root seed — {"op":"add","path":"","value":{}} opens the document. For array-rooted schemas, value is [].
Leaf adds in schema order — top-level scalar fields like intent, urgency, reply.
Container seeds + items — when the schema contains nested objects or arrays, the container is seeded first with {} or [], then each item arrives as a separate add (/steps/0, /steps/1, …). Nested objects work the same way (/address, then /address/city, /address/zip).

A field’s position in the schema determines when it streams. Design the schema so high-priority fields (routing keys, classifications, gates) come first; long-form fields (replies, explanations) come last. Example for a {intent, urgency, steps: [...], reply} schema:

{"op":"add","path":"","value":{}}
{"op":"add","path":"/intent","value":"billing"}
{"op":"add","path":"/urgency","value":"high"}
{"op":"add","path":"/steps","value":[]}
{"op":"add","path":"/steps/0","value":"verify charge"}
{"op":"add","path":"/steps/1","value":"issue refund"}
{"op":"add","path":"/reply","value":"Hi Jane, I've processed the refund..."}

Reconstructing the document

Each op is applied to the document state from the previous op. If you collect every op and apply them in order, you end up with the same JSON object a non-streaming request would have returned. The Python SDK exposes the running snapshot directly via event.snapshot; see JSON Patch Streaming for details.

Patch streaming errors

The stream opens with 200 OK once the model starts emitting. Validation errors (bad schema, malformed request, auth failure) come back as the standard JSON error response before any patch records are sent.
A non-200 status means no patch records will arrive; read the body for the error payload.
Connection failures mid-stream surface as a closed stream without a terminator. The Python SDK turns these into dottxt.PatchStreamError.

Plain chat

curl https://api.dottxt.ai/v1/chat/completions \
  -H "Authorization: Bearer $DOTTXT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize why batch processing is useful." }
    ],
    "temperature": 0.3,
    "max_tokens": 180
  }'

Example response shape

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1703187200,
  "model": "openai/gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Batch processing reduces cost and improves throughput for non-urgent workloads."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 36,
    "total_tokens": 60
  }
}

Notes

Set stream: true to receive token deltas as server-sent events; set stream: "patch" for JSON Patch streaming.
For the Python SDK helper that consumes patch streams into PatchEvent objects, see JSON Patch Streaming.
For model discovery, call GET /models.
For auth setup, see Authentication.
For failures, inspect the HTTP status and the error object in the response body.

Authorizations

Authorization

string

header

required

API key authentication. Include your key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

API keys can be created and managed in the dashboard.

Body

application/json

Request body for chat completions.

messages

object[]

required

A list of messages comprising the conversation so far.

Show child attributes

model

string

required

ID of the model to use.

Example:

"Qwen/Qwen3-30B-A3B-FP8"

frequency_penalty

number<float> | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.

Example:

0

max_tokens

integer<int32> | null

The maximum number of tokens to generate in the chat completion.

Example:

256

integer<int32> | null

How many chat completion choices to generate for each input message.

Example:

1

presence_penalty

number<float> | null

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.

Example:

0

stop

string[] | null

Up to 4 sequences where the API will stop generating further tokens.

response_format

any

Output format for structured responses. Set to {"type": "json_schema", "json_schema": {...}} to constrain the response to a JSON schema. Required when stream is "patch".

seed

integer<int64> | null

Random seed for sampling. If provided and supported by the model, sampling is deterministic.

stream

Streaming mode. false/unset returns a single JSON response. true streams token deltas as SSE. "patch" streams one JSON Patch add per schema field — see the JSON Patch streaming reference.

Example:

false

temperature

number<float> | null

What sampling temperature to use, between 0 and 2.

Example:

0.7

tool_choice

any

Controls which (if any) tool is called by the model.

tools

object[] | null

A list of tools the model may call.

Show child attributes

top_p

number<float> | null

An alternative to sampling with temperature, called nucleus sampling.

Example:

1

user

string | null

A unique identifier representing your end-user.

Response

Chat completion generated successfully. Response framing depends on the stream parameter and Accept header:

Default (stream unset or false): a single JSON ChatCompletionResponse.
stream: true: an SSE stream of token deltas.
stream: "patch": an NDJSON stream of JsonPatchOperation records (one per line), or SSE if Accept: text/event-stream is sent.

Response from chat completions.

choices

object[]

required

A list of chat completion choices.

Show child attributes

created

integer<int64>

required

The Unix timestamp of when the chat completion was created.

Example:

1703187200

string

required

A unique identifier for the chat completion.

Example:

"chatcmpl-abc123"

model

string

required

The model used for the chat completion.

Example:

"Qwen/Qwen3-30B-A3B-FP8"

object

string

required

The object type, always "chat.completion".

Example:

"chat.completion"

system_fingerprint

string | null

The system fingerprint of the model.

usage

null | object

Usage statistics for the completion request.

Show child attributes

Example:

{
  "completion_tokens": 36,
  "prompt_tokens": 24,
  "total_tokens": 60
}

Models Catalog List Models

​Base URL

​Structured output

​JSON Patch streaming

​Wire format

​Operation shape

​Order of operations

​Reconstructing the document

​Patch streaming errors

​Plain chat

​Example response shape

​Notes

Authorizations

Body

Response

Base URL

Structured output

JSON Patch streaming

Wire format

Operation shape

Order of operations

Reconstructing the document

Patch streaming errors

Plain chat

Example response shape

Notes