Sharing Protobuf schemas across services
The system that we're building at Community.com is made of a few services (around fifteen at the time of writing) that interact with each other through a basic version of event sourcing. All events flow (published and consumed) through RabbitMQ and are serialized with Protobuf. With several services already and many more coming in the future, managing the Protobuf schemas becomes a painful part of evolving and maintaining the system. Do we copy the schemas in all services? Do we keep them somewhere and use something akin to Git submodules to keep them in sync in all of our projects? What do we do?! In this post, I'll go through the tooling that we came up with in order to sanely manage our Protobuf schemas throughout our services and technology stack.
Photo by Fahrul Azmi on Unsplash
This post will go through the steps that led us to the solution we're currently using, so feel free to skip ahead if you just want to know what we ended up with.
When we started defining schemas for our Protobuf events, we were only using such schemas from Elixir. We created a library called
events and started hosting it on our private Hex repository.
events contained all the
.proto schema files and depended on the exprotobuf library to "compile" the Protobuf schemas to Elixir code. exprotobuf uses a different approach than most Protobuf libraries for other languages that I encountered: it loads the
.proto schema definitions at compile time instead of using
protoc to compile the Protobuf files to
.ex Elixir files. Essentially, we created a module like this:
use Protobuf, from: Path.wildcard(Path.expand(, __DIR__)) end
The most common approach to turning a Protobuf schema into a data structure representable by the programming language you're using is to turn the schema into some kind of "struct" representation. That's exactly what exprotobuf does for Elixir. This is one of our schemas:
# In schemas/event_envelope.proto syntax = "proto3";
exprotobuf loads this up at compile time and turns it into roughly this Elixir definition:
defstruct [:timestamp, :source] # A bunch of encode/decode functions plus # new/1 to create a new struct. # ... end
This whole approach worked okay for a while, but soon we needed to use our Protobuf definitions from a service written in Python.
The first thing that came to mind when we thought about sharing Protobuf schema definitions across different programming languages was Git submodules. We created a
proto_schemas repository containing all the
.proto files and added it as a Git submodule to our
events Elixir library and to a single Python service. Not much changed on the Elixir side, but on the Python side things were working a bit differently. The Protobuf Python library uses a common approach among Protobuf libraries, which is to use a plugin to the
protoc compiler in order to generate Python code from the
.proto files. Essentially, you call:
So, for example, our
event_envelope.proto file would become
event_envelope.pb2.py once compiled.
The Git-submodule approach worked okay for a while, but it presented two main challenges: how to uniformly version schemas across languages? How to avoid having every project contain a copy of the Protobuf schemas and having to compile them individually to the host language?
Lucky for me, one day I was discussing these problems with my friend Eric from the Elixir team, and we figured out a way to only keep the Protobuf schemas in a single place, compile them all in a single place, but use them from different languages all around.
I did some research on available Protobuf libraries for Elixir and was lucky to find an alternative to exprotobuf called protobuf-elixir. The APIs that this library exposes to manipulate Protobuf structs and manage serialization were exactly the same as the APIs exposed by exprotobuf, so compatibility was not an issue. However, this library had a key features that I was interested in: it supported code generation through the
protoc Protobuf compiler. It worked like it does in Python (and many other languages).
At this point, I had code generation through
protoc working for both Python and Elixir. The next step was to map Protobuf packages (that is, collections of Protobuf schemas) to language libraries and publish those to package repositories for the respective languages. We'll go through what we did for Elixir, but the setup for Python looks almost identical.
All our events-related Protobuf schemas live in the
events Protobuf package. So, we decided to map that to an Elixir library called
events_schemas. The nice thing about this library is that it only contains the auto-generated code for the Protobuf schemas and nothing else. It essentially exposes an interface to the Protobuf schemas from Elixir. In the same repository where we keep the
.proto files, we created a
languages/elixir/ directory to store everything necessary for compiling to Elixir and publishing this library on our private hex.pm repository. The "skeleton" of the
events_schemas library looks like this:
events_schemas ├── .gitignore ├── lib │ └── .gitkeep └── mix.exs
Basically, the library is empty. The
mix.exs file looks like this:
use Mix.Project [ app: :events_schemas, version: version(), elixir: , deps: deps(), package: package() ] end [extra_applications: ] end  end [ organization: , files: [ , , , ] ] end |> File.read!() |> String.trim() end |> File.read!() |> String.trim() end end
There are a couple of peculiar things here. First, we read the version of the library from a magic
VERSION file. We keep this file alongside the
.proto schemas. This file contains the version of the schemas themselves. Keeping it alongside the schemas means that we can copy it over in the right places when building libraries for different languages so that the
events_schemas library can have the same version across all target languages. We copy this file to the root of the
events_schemas Elixir directory before building and publishing the library. We use a similar idea for the
PROTOBUF_EX_VERSION file. This file contains the version of the protobuf-elixir library that we use. We keep that in a separate file so that we can make sure it's the same between the plugin for the
protoc compiler as well as the dependency of the
Other that the things we just talked about, this looks like a pretty standard
mix.exs file. Now, the magic happens in CI.
Our CI system of choice is Concourse CI. Concourse lets you define pipelines with different steps. Here, we're interested in the last step of our CI pipeline: publishing the libraries for all the languages. Our Concourse pipeline looks like this:
The last step,
build-and-publish, is triggered manually by clicking on it and telling it to start. This means that if you want to release a new version of the
events_schemas library in all languages, you have to go to Concourse and click this. That's all you need to do. Note that we have Docker containers that build and publish the library for each target language so that we don't have to install anything on the CI system. At this point, Concourse will do the same routine for all target languages:
- Copy all the necessary version files to the right places.
- Generate code for the Protobuf schemas through
- Publish to the right package repository (for example, hex.pm for Elixir).
That's all. Now, our services can depend on an actual library that contains the auto-generated code representing the Protobuf schemas that we use. However, services don't need to have access to the original
.proto files containing the schemas. We're delighted with this system since it feels streamlined and straightforward to use, while providing everything we need.
We moved even a bit further than what I described. In fact, currently we have three Protobuf packages: one for common custom types, one for events, and one for inter-service RPCs. The events and RPC packages both depend on the custom types package. The way we solve the inter-package dependencies is to simply publish the
types_schemas libraries first and then depend on that library from the
rpc_schemas libraries. For example, in Elixir we changed the
mix.exs file we looked at earlier (for the
events_schemas library) and added the dependency:
[ , ] end
To recap, our Protobuf pipeline and workflow currently work like this. First, you make changes to a Protobuf schema. Then, you bump a version in a file. Then, you push those changes up to GitHub. Once CI makes sure you didn't break anything, you log into Concourse and kick-start the
build-and-publish task. A new version of the right library gets published to different package repositories for different languages. It's not the simplest system, but the workflow is easy to use and effective. Most of all, this workflow can apply to most programming languages and make it easier to manage versioning and evolving shared collections of Protobuf schemas.