add operation, so downstream work (routing, dispatching, UI updates) can begin the moment the relevant field lands. The model generates fields in schema order. Field order in your schema decides when each field becomes available; routing keys, classifications, and gates should come first.
For a broader discussion of JSON Patch stremaing, see The closing brace.
This page covers the Python SDK helper. For the underlying wire format (stream: "patch", NDJSON, SSE), see JSON Patch streaming on /chat/completions.
Quickstart
AsyncDotTxt.stream(...) yields PatchEvent objects as the model fills in your schema:
intent arrives — typically tens of milliseconds into generation — while reply continues to stream.
The PatchEvent object
Each yielded event carries:
event.op— the raw RFC 6902 operation:{"op": "add", "path": ..., "value": ...}.event.snapshot— an independent deep copy of the document built up to and including this op. Safe to stash; later events do not mutate earlier snapshots.event.field— the JSON Pointer with the leading/stripped. Top-level keys read as"intent", array items as"steps/0", nested fields as"address/city".event.value— the op’s value.
match site for the common case; event.op and event.snapshot are available when you want the raw patch or the partial document so far.
Parameters
AsyncDotTxt.stream(...) mirrors generate(...):
model(str) — model identifier.input(str | list[dict]) — prompt string or chat-message list.response_format(Any) — any schema input accepted bygenerate(...): Pydantic model, JSON Schema dict/string, TypedDict, dataclass, etc.temperature,max_tokens,seed,timeout— optional.extra(dict | None) — additional chat-completions body fields.
Examples
Print each field as it arrives
The smallest possible patch-stream consumer: iterate, print. No buffering, no closing brace.Route on a classification field before the long field finishes
Order the schema so the routing key (here,intent) comes before the long-form reply. The dispatch decision fires tens of milliseconds in; the reply lands seconds later. The elapsed-time prefix on the reply is the punchline — how much later the full message lands compared to when routing was already settled.
Fan out work on each array item
When the schema has a top-level array, each item streams in as a separate field (steps/0, steps/1, …). Launch a coroutine the moment each one arrives, so step 0’s work is already underway while step 1 is still being generated. Total wall-clock time tends to be roughly one research interval longer than generation, not the sum of all research times.
Mid-stream human approval
Order high-risk decisions ahead of their effects. The proposedaction arrives before the reply; prompt the operator between the two. If they decline, the rest of the stream is still consumed, but the reply is never sent.
The full object so far
If you need the partial object (e.g. to log progress or hand a partial object to another service), useevent.snapshot. Each snapshot is an independent deep copy, so events can be stashed without later ops mutating earlier views: