Skip to main content
Every schema we review has the same problem: it describes the shape of the data but not its boundaries. Fields that should be enums are bare strings. Arrays have no size limits. Formats are described in description instead of enforced with pattern. The schema looks correct, the output is structurally valid, and then something downstream breaks because a “summary” came back as 3,000 characters or a “status” came back as "In Progress (Pending Review)". The fix is always the same: encode what you already know. If status has four valid values, use an enum. If a summary feeds into a 240-character column, set maxLength: 240. If a date must be ISO 8601, use format: "date". The model works within whatever constraints you give it; the question is whether you give it enough.

Use enums for known values

If status is always one of four workflow states, say so. A bare "type": "string" lets the model return "open", "Open", "OPEN", "currently open", or anything else. Before:
{
  "type": "object",
  "properties": {
    "status": { "type": "string" }
  },
  "required": ["status"]
}
After:
{
  "type": "object",
  "properties": {
    "status": {
      "type": "string",
      "enum": ["open", "in_progress", "resolved", "closed"]
    }
  },
  "required": ["status"],
  "additionalProperties": false
}

Distinguish optional from nullable

“Not provided” and “explicitly empty” are different states. If your code treats them the same, you can’t tell whether a field was never captured or was captured and found to be missing, and that ambiguity cascades into storage, analytics, and every system that touches the data. Use required to control presence and "type": ["string", "null"] to allow explicit null values. A field that is both not required and nullable can be absent (not applicable), present with a value, or present as null (asked but unknown):
{
  "type": "object",
  "properties": {
    "middle_name": { "type": ["string", "null"] }
  },
  "required": [],
  "additionalProperties": false
}
See Optional vs Null for the full pattern.

Add discriminators to unions

Without a discriminator, your code has to guess which branch the model chose by looking at which fields are present. This works until two branches share a field name, and then it doesn’t. A const discriminator makes the branch explicit: your runtime reads one field and knows exactly what it’s dealing with. Before:
{
  "oneOf": [
    {
      "type": "object",
      "properties": { "name": { "type": "string" } },
      "required": ["name"]
    },
    {
      "type": "object",
      "properties": { "company": { "type": "string" } },
      "required": ["company"]
    }
  ]
}
After:
{
  "type": "object",
  "properties": {
    "kind": { "type": "string", "enum": ["person", "company"] },
    "name": { "type": "string" },
    "company": { "type": "string" }
  },
  "required": ["kind"],
  "allOf": [
    {
      "if": { "properties": { "kind": { "const": "person" } } },
      "then": { "required": ["name"] }
    },
    {
      "if": { "properties": { "kind": { "const": "company" } } },
      "then": { "required": ["company"] }
    }
  ],
  "additionalProperties": false
}

Encode field dependencies

Requiring a card_number when the payment method is PayPal is noise. Omitting a card_number when it’s card payment is a bug. If some fields only make sense together, or only make sense for certain values of another field, say so in the schema rather than leaving it to the model’s judgment. Before:
{
  "type": "object",
  "properties": {
    "payment_method": { "type": "string", "enum": ["card", "paypal"] },
    "card_number": { "type": "string" },
    "paypal_email": { "type": "string" }
  },
  "required": ["payment_method", "card_number", "paypal_email"]
}
After:
{
  "type": "object",
  "properties": {
    "payment_method": { "type": "string", "enum": ["card", "paypal"] },
    "card_number": { "type": "string" },
    "paypal_email": { "type": "string" }
  },
  "required": ["payment_method"],
  "allOf": [
    {
      "if": { "properties": { "payment_method": { "const": "card" } } },
      "then": { "required": ["card_number"] }
    },
    {
      "if": { "properties": { "payment_method": { "const": "paypal" } } },
      "then": { "required": ["paypal_email"] }
    }
  ],
  "additionalProperties": false
}

Compose independent rules with allOf

When you have multiple independent conditions (one based on channel, another based on priority), flatten them into a single if/then and the logic gets unreadable fast. allOf lets you keep each rule as a separate block that’s easy to read, test, and extend independently. Before:
{
  "type": "object",
  "properties": {
    "channel": { "type": "string", "enum": ["email", "sms"] },
    "priority": { "type": "string", "enum": ["low", "high"] },
    "email_subject": { "type": "string" },
    "phone_number": { "type": "string" },
    "escalation_reason": { "type": "string" }
  },
  "required": ["channel", "priority"]
}
After:
{
  "type": "object",
  "properties": {
    "channel": { "type": "string", "enum": ["email", "sms"] },
    "priority": { "type": "string", "enum": ["low", "high"] },
    "email_subject": { "type": "string" },
    "phone_number": { "type": "string" },
    "escalation_reason": { "type": "string" }
  },
  "required": ["channel", "priority"],
  "allOf": [
    {
      "if": { "properties": { "channel": { "const": "email" } } },
      "then": { "required": ["email_subject"] }
    },
    {
      "if": { "properties": { "channel": { "const": "sms" } } },
      "then": { "required": ["phone_number"] }
    },
    {
      "if": { "properties": { "priority": { "const": "high" } } },
      "then": { "required": ["escalation_reason"] }
    }
  ],
  "additionalProperties": false
}
Each rule in the allOf is self-contained. Adding a new condition means adding a new block, not touching the existing ones.

Bound your strings and arrays

An unbounded "type": "string" can return anything from one character to an essay. An unbounded "type": "array" can return zero items or two hundred. In both cases, the model produces reasonable output most of the time, and then once in a while it doesn’t, and something downstream breaks. Set bounds based on what your system actually accepts. Before:
{
  "type": "object",
  "properties": {
    "summary": { "type": "string" },
    "tags": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["summary", "tags"]
}
After:
{
  "type": "object",
  "properties": {
    "summary": { "type": "string", "maxLength": 240 },
    "tags": {
      "type": "array",
      "items": { "type": "string", "maxLength": 24 },
      "maxItems": 8
    }
  },
  "required": ["summary", "tags"],
  "additionalProperties": false
}
Where do the numbers come from? Your database column width, your UI’s line limit, the maximum items your frontend renders. If you don’t know the exact limit, pick a reasonable one: maxLength: 240 is better than no limit, even if the real answer turns out to be 280.

Use constraints, not descriptions

If a field must be a valid email address, "description": "Must be a valid business email" is a suggestion the model might follow. "pattern": "^[^@]+@[^@]+$" is a constraint it cannot violate. Descriptions don’t guarantee your output’s format; constraints ensure the model will follow the schema’s syntax. Before:
{
  "type": "object",
  "properties": {
    "email": {
      "type": "string",
      "description": "Must be a valid business email"
    }
  },
  "required": ["email"]
}
After:
{
  "type": "object",
  "properties": {
    "email": {
      "type": "string",
      "pattern": "^[^@]+@[^@]+$"
    }
  },
  "required": ["email"],
  "additionalProperties": false
}
This applies everywhere: dates should be format: "date", not "description": "ISO 8601 date". Country codes should be pattern: "^[A-Z]{2}$", not "description": "Two-letter country code". Anything you can express as a constraint, express as a constraint.

Send us your schema

Every dottxt customer gets access to a shared Slack channel with our team. Send us your JSON Schema before you ship it, and we’ll tell you what’s fragile and what will break. For a full audit, see Schema Review.