Data extraction from unstructured text, including invoices, receipts, medical records, and contracts, is the bread and butter of structured output. The schema defines what “correctly extracted” looks like: which fields must be present, what types they have, and what ranges are valid. This turns extraction from a fuzzy NLP task into a well-defined contract: the output either conforms to the schema or it doesn’t. The tighter your schema constraints, the less post-processing you need. ADocumentation Index
Fetch the complete documentation index at: https://docs.dottxt.ai/llms.txt
Use this file to discover all available pages before exploring further.
format: “date” constraint on the invoice date means you get ”2026-02-12” instead of ”Feb 12, 2026” or ”12/02/2026”. A pattern constraint on currency codes means you get ”USD” instead of ”US Dollars”.
Goal
Extract invoice data from OCR text into a normalized, storage-ready record with validated types and bounded arrays.Schema contract
Example input
Example output
Implementation tips
- Narrow fields to business needs. Don’t add a catch-all
"raw_text"field. Each field should map to a column in your database or a field in your downstream API. If you don’t need it, don’t extract it. - Bound arrays to realistic limits.
maxItems: 100on line items is generous but prevents runaway generation on malformed OCR input. Without it, a noisy scan could produce thousands of phantom line items. - Use
formatandpatternfor normalization.format: "date"oninvoice_dategives you ISO 8601 dates regardless of how the source text formats them.pattern: "^[A-Z]{3}$"oncurrencygives you three-letter codes, not spelled-out currency names. - Consider per-field confidence. For high-stakes extraction (financial documents, medical records), add a
confidencenumber field next to each extracted value. This lets your application flag low-confidence extractions for human review rather than trusting everything equally.
Related docs
- Optional fields: make fields the model can omit when the source text doesn’t contain them
- Optional vs Null: choose between “field absent” and “field present but null”
- String bounds: control length, format, and regex on extracted strings
- Bounded arrays: set min/max item counts on repeated structures
- Object reference | String reference