Skip to main content
Chain-of-thought reasoning improves model accuracy, but dumping raw reasoning text into production output creates problems: unpredictable length, no structure for reviewers to scan, and reasoning that drifts from the input text. A better approach is to capture reasoning as structured, bounded fields, such as short evidence snippets and a decision summary, rather than a freeform "thinking" string. This gives you the accuracy benefits of reasoning while keeping outputs auditable, compact, and machine-parseable.

Use case

You need explainable classifications for internal reviewers. Each classification should include the specific evidence from the input that drove the decision, plus a short summary a reviewer can scan in seconds.

Schema pattern

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "enum": ["billing", "technical", "account", "other"]
    },
    "evidence": {
      "type": "array",
      "items": { "type": "string", "minLength": 10, "maxLength": 140 },
      "minItems": 1,
      "maxItems": 3
    },
    "decision_summary": { "type": "string", "minLength": 20, "maxLength": 240 },
    "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
  },
  "required": ["label", "evidence", "decision_summary", "confidence"],
  "additionalProperties": false
}

Prompt snippet

Return concise evidence bullets and a short decision summary.
Do not include hidden or verbose reasoning; keep evidence grounded in the input text.

Example output

{
  "label": "technical",
  "evidence": [
    "User reports crash immediately after app update.",
    "Issue reproduced only on Android 15 devices."
  ],
  "decision_summary": "Symptoms and reproduction details indicate an app stability bug rather than account or billing issues.",
  "confidence": 0.91
}

Why this works

The evidence array (1-3 items, each 10-140 characters) forces the model to ground its reasoning in specific observations from the input, not generate vague explanations. The length bounds prevent evidence bullets from becoming mini-essays. decision_summary (20-240 characters) gives reviewers a one-sentence rationale without scrolling. Combined with confidence, it lets your application route low-confidence decisions to human review while auto-approving high-confidence ones. The fixed property order also helps. By consistently generating label, then evidence, then decision_summary, then confidence, the model follows a predictable structure instead of deciding its own output sequence on the fly. Because all reasoning fields are bounded and typed, you can store them in structured logs, aggregate them in dashboards, and search across them. None of that is practical with freeform reasoning text.