Skip to main content

Prevent information drift by generating from code

·2 mins

JSON Schema is the most widely adopted representation of JSON APIs. As stated on the website:

While JSON is probably the most popular format for exchanging data, JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.

Because it’s so popular there are many tools that integrate with it, such as Postman for building and using APIs and many libraries for generating documentation, integrating with different programming languages, and editor support.

However, writing JSON Schema documents can be a pain. The JSON spec is complex, and maintaining a separate JSON schema document away from the code means the information drifts over time and become incomplete or outdated.

That’s why it’s common for the JSON Schema to be generated from the code - the same code that defines the APIs. For example, you can define your user in Go code:

type User struct {
	// Unique sequential identifier.
	ID int `json:"id" jsonschema:"required"`
	// Name of the user
	Name string `json:"name"`
}

And using a library create the JSON Schema document from that, including the documentation:

{
  "$schema": "http://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/User",
  "$defs": {
    "User": {
      "required": ["id"],
      "properties": {
        "id": {
          "type": "integer",
          "description": "Unique sequential identifier."
        },
        "name": {
          "type": "string",
          "description": "Name of the user"
        }
      },
      "additionalProperties": false,
      "type": "object",
      "description": "User is used as a base to provide tests for comments."
    }
  }
}

It’s not just Go, there are libraries for all the popular programming languages, including Pydantic for Python.

This has the benefit of maintaining the specification, and the documentation, within the code that defines it. There is only one place to keep it up to date, and changing the code and the documentation can be done in a single pull request, preventing information drift.

Data contracts can also be defined in code, for all the same reasons. The ecosystem isn’t there to make that easy yet, but it will come.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.