Service catalog

A service catalog is an increasingly common component of internal developer platforms.

They primarily serve two personas/use cases.

The first is a system owner/engineer, who are responsible for services and/or the code behind them.

The service catalog gives them one place to go to find out everything about their service or any other service they might be interested in, e.g. to integrate with or to debug an issue in production. This could include:

API documentation
Links to the Git repository
Observability and SLO dashboards
Ownership and support
Access requests

And so on.

Hand-drawn illustration of a person sitting at a laptop with text indicating they are a system owner/engineer who uses the service catalog to view and understand services and related code such as API docs, SLOs, and request access.

The second persona is the engineering leader/policy maker, who want visibility across their engineering estate, so they know where potential issues are so they can decide where to invest or where to drive change to resolve them.

They will use the service catalog to capture metrics about each service, either through automated checks or manual assertions by the service owner, and view the results of those metrics in dashboards.

Simple line drawing of a person with a smiling face, one arm raised, next to text explaining their role as a leader/policy maker and their use of a service catalog for visibility to identify potential issues and guide investments.

Generally the service catalog is simply presenting this information in a common interface, with the data and metadata itself created and managed outside of the systems catalog.

An example use case for a service catalog would be to track open vulnerabilities in each service. The data itself would likely be gathered by a vulnerability management tool (e.g., Qualys, Tenable, Snyk), which would then make the results available to the service catalog.

System owners and engineers would see the results in the catalog page for their individual services, which increases the visibility of the issues and may encourage some teams to prioritise addressing them.

Engineering leaders/security teams can look at the data in aggregate and see where teams are not addressing the issues proactively, and use that to engage with those teams to understand and resolve the problem.

That’s just one example, but the same pattern would apply to any service/code quality metric you want to monitor and improve. For example, it could be a data governance metric set by the data team, documentation quality, the use of golden paths, and so on - whatever metric you feel will drive meaningful change across your engineering estate.

Popular implementations include Backstage, Opslevel, and Cortex.