Skip to main content
Genson is a Python library that builds a JSON Schema by observing JSON instances. Feed it one or more examples and it produces a schema that accepts all of them. Genson includes a top-level "$schema" field in its default output. It is omitted from the examples below for readability.

Install

pip install genson

Basic usage

from genson import SchemaBuilder

builder = SchemaBuilder()
builder.add_object({
    "name": "Alice Johnson",
    "email": "alice@acme.com",
    "role": "Product Manager"
})

print(builder.to_json(indent=2))
Genson infers types from values and marks all observed fields as required. Note that Genson sorts required alphabetically and does not add additionalProperties: false; you should add that manually.

Multiple instances

Genson’s strength is incremental learning. Feed it several examples and it merges them, detecting which fields are always present (required) and which appear only sometimes (optional):
builder = SchemaBuilder()
builder.add_object({"name": "Alice", "email": "alice@acme.com", "phone": "+1-555-0100"})
builder.add_object({"name": "Bob", "email": "bob@acme.com"})
builder.add_object({"name": "Carol", "email": "carol@acme.com", "phone": "+1-555-0102"})
Since phone is missing from the second example, it becomes optional while name and email stay required.

Nested objects

Genson handles nested structures, inferring a full sub-schema for each nested object:
builder = SchemaBuilder()
builder.add_object({
    "name": "Alice",
    "address": {
        "street": "123 Main St",
        "city": "Springfield",
        "country": "US"
    }
})
Unlike Pydantic or Quicktype, Genson inlines nested objects rather than extracting them into $defs.

Arrays

Genson infers array item types from the elements it sees:
builder = SchemaBuilder()
builder.add_object({
    "question": "What is your favorite color?",
    "options": ["red", "green", "blue"],
    "tags": ["survey", "color"]
})
Genson does not add minItems or maxItems; add those manually. See Bounded Arrays.

Arrays of objects

When arrays contain objects, Genson infers the item schema by merging all observed elements:
builder = SchemaBuilder()
builder.add_object({
    "contacts": [
        {"name": "Alice", "email": "alice@acme.com"},
        {"name": "Bob", "email": "bob@acme.com"}
    ]
})

Mixed types

When the same field has different types across samples, Genson produces a type union:
builder = SchemaBuilder()
builder.add_object({"value": 42})
builder.add_object({"value": "hello"})

Seeding with an existing schema

You can start from a hand-written schema and let Genson extend it with fields observed in data. Constraints from the seed schema (like enum) are preserved:
builder = SchemaBuilder()
builder.add_schema({
    "type": "object",
    "properties": {
        "category": {
            "type": "string",
            "enum": ["billing", "account", "bug"]
        }
    },
    "required": ["category"]
})
builder.add_object({"category": "billing", "summary": "Can't login"})
The enum constraint on category is preserved from the seed. summary is added to properties but not to required since it was not in the seed’s required list.

Limitations

Genson infers structure and types but does not add semantic constraints. The generated schema will not include:
  • enum values (unless seeded)
  • minLength, maxLength, or pattern for strings
  • minItems, maxItems for arrays
  • additionalProperties: false on objects
  • description on fields or the schema
The output is a starting point. Tighten it by adding constraints manually or follow Improve your schema.