Microservices in the microcosm

Network latency and failure, inconsistent state, and operational complexity — the challenges associated with developing a microservice infrastructure are manifold. In the realm of operational complexity, one of the most significant challenges is testing. Unit and component tests offer relatively poor insight into the fitness of the application when key functionality depends on potentially complex interactions between multiple services. Instead, tests which exercise those interactions (whether ‘integration’, ‘API’ or ‘contract’ tests) acquire a greater importance. In this post, we shall examine a technique for taming operational complexity and improving the efficacy of such tests: recreating the external universe of our infrastructure, as did the ancient alchemists, within the supervised laboratory of our local machine. Our alembic? docker-compose.

Many of the tips and tricks you shall enjoy below, dear reader, have been transmitted to me by Magister Tom Gallacher (@tomgco), an old Democritean. To him wend your appreciation.

Environmental flux

Since one of the main reasons for adopting microservices is the enhanced division of labour facilitated by a stricter separation of concerns, it is to be expected that any shared development environment will experience a constant flux of incompatible features. Compounded with the inherent difficulty of eliminating non-deterministic behaviour from integration-level tests, this environmental flux can make the source of particular bugs difficult to pin down, and the health of the system as a whole unclear. If left unaddressed, these problems increase the likelihood that an organisation will slip into the ‘metaversion’ antipattern (see under The Metaversion in Building Microservices). Features will be deployed slowly, one-by-one, to a separate environment (probably QA), and only promoted from there en masse as a stable set or metaversion. Not only does this create a bottleneck for deploying to production and expose a separate surface for maintenance, it increases the likelihood that services will become tightly coupled.

One solution to this problem is the cloud-based ephemeral (dynamic / on-demand) environment: give every developer the ability to quickly spin up a new testing environment (or part of an environment) on production infrastructure. Although becoming easier with tools such as linkerd and runnable, establishing this capability is often a complicated task, and costly. Simpler, and equally effective, is recreating the environment entirely on the local machine using virtualization tools to replicate any external services. This approach has the added advantages of giving the developer greater control over every element of the environment and a fast feedback loop for experimentation, which together allow individuals working on isolated components to develop greater ownership of the system as a whole.

Virtualisation

Selecting the right stand-ins for external dependencies is an important part of creating a reliable and accurate local environment. For precision mocking and stubbing, Mountebank is a popular choice. It is effectively a declaratively configured Node server, which allows for fine-grained control over request matching and responses. In some cases, however, it is more convenient to use a service that will mock a common API out of the box. Minio, for example, replicates the AWS S3 API exactly, and using it is as simple as passing a different endpoint to your AWS client.

Basic recipe

Our preferred solution to microservice orchestration at YLD is Kubernetes, among whose benefits stands the ability to exactly mirror a production environment on a local machine using minikube. Unfortunately, in practice many organisations are committed to other infrastructure solutions which lack such neat provision for development. So long as their services are containerized, however, we can use docker-compose to bring up a local environment quickly.

Assuming you have a relatively fixed set of virtualized external dependencies, the necessary steps for spinning up an environment with docker-compose are these:

Obtain a list of build artefacts (docker images) from your deployment pipeline. This could be through, for example, a pre-prepared manifest describing a particular environment or a call to a container orchestration API.
Inject any images developed locally for testing against the deployed environment. For example, a list of local repositories could be passed as arguments to a setup script in order to overwrite the corresponding services in the deployment manifest.
Identify the external dependencies of the images. Typically each image will contain a declaration of its dependencies. This must be established by convention.
Generate a docker-compose configuration file that ties together the images with their dependencies and provides the appropriate runtime configuration for each.

The final step is where things become interesting. The eventual docker-compose file must be generable from arbitrary combinations of services and dependencies, each with their own requirements for set-up and scheduling. Let's examine this step in more detail.

A dynamic docker-compose

Since we have assumed a fixed or slowly-changing set of external dependencies, it makes sense to exploit the extends feature of docker-compose to separate the fixed from the dynamic parts of the environment. In docker-compose.yml we set up our fixed dependencies and generate the rest inside a docker-compose.override.yml, which gets merged in automatically when we run docker-compose up. Our base configuration file might contain databases, the API gateway (kong in this example), and virtual external services:

version: '3'
services:  
   mysql:
     image: mysql:5.7
     ports:
        - "3307:3306"
     volumes:
       - mysql_data:/var/lib/mysql
     environment:
       MYSQL_ROOT_PASSWORD: password
   kong-database:
     image: postgres:9.4
     environment:
       - POSTGRES_USER=kong
       - POSTGRES_DB=kong
     volumes:
       - "kong_data:/var/lib/postgresql/data"
   gateway:
     image: kong:0.9.9
     environment:
       - KONG_DATABASE=postgres
       - KONG_PG_HOST=kong-database
     ports:
       - "8000:8000"
       - "8443:8443"
       - "8001:8001"
       - "7946:7946"
       - "7946:7946/udp"
     depends_on:
       - kong-database
   minio:
      image: minio/minio
      ports:
        - "9000:9000"
      environment:
        - "MINIO_ACCESS_KEY=$AWS_ACCESS_KEY_ID"
        - "MINIO_SECRET_KEY=$AWS_SECRET_ACCESS_KEY"
volumes:  
   mysql_data:
   kong_data:

Generating the override file is a matter of looping over the images, adding a new key under services for each one, merging in configuration (e.g. volumes, environment, depends_on, command) and -- this is where things can become complicated -- ensuring that each service is initialised in a manner appropriate for its dependencies.

In the case that any services or dependencies are not resilient to connection failure, it will be necessary to schedule for connection readiness. This requires passing service startup commands to a wrapper script that will test for the availability of a designated port on the dependent service. It is worthwhile standardising on a script such as wait-for-it.sh, which uses only bash builtins, rather than writing curl or wget commands, because some containers may not have those binaries installed.

Typically, a service will require a number of ancillary tasks to be completed in order to function properly: database creation, seeding, and registration for service-discovery, for example. Some of these may be necessary steps before connection can occur, in which case they should be executed inside a script that opens a specific port when the task is complete. For example:‍

/bin/wait-for-it.sh some-service:8000 -- /bin/run-task.sh
(wait 100; exit) & nc -l -p 1234

Other containers can then wait for port 1234. Note, however, that netcat will close the port after the first successful connection, so if multiple services are waiting for this task you may need to use something like python -m SimpleHTTPServer 1234. Since this container only runs an initialisation task, it does not need to stay up once the task is completed, and we schedule it to exit after 100 seconds.

While “one process per container” is a rule of thumb, separating each task for each service into its own container can create a lot of additional complexity when scheduling is necessary. If it is possible to accomplish these tasks inside the service container without installing new software, it may be worth doing. They can be executed inside a script passed inline to the service’s command property (command: /bin/sh -c "run-task-1; run-task-2"), but perhaps better is to write them in a separate script mounted into the container (along with wait-for-it.sh if necessary). We might, for example, create a script run-tasks.sh, whose job is to wait for something then run run-task-1.sh, followed by run-task-2.sh. All the scripts still need to be mounted, but the command is simpler:‍

volumes:  
  - "./scripts/run-tasks.sh:/bin/run-tasks.sh"
  - "./scripts/run-task-1.sh:/bin/run-task-1.sh"
  - "./scripts/run-task-2.sh:/bin/run-tasks-2.sh"
  - "./scripts/wait-for-it.sh:/bin/wait-for-it.sh"
command: ["/bin/run-tasks.sh"]

Keeping scripts in separate files rather than writing them inside the docker-compose file under command keys decreases the need for complex escaping and makes programmatically composing arbitrary dependency chains much easier, since they can simply be concatenated, along with any arguments, to the command array. For this concatenation to work, each script must act as a wrapper for the following, which can be accomplished using shiftand exec:‍

/bin/wait-for-it.sh some-service:8000 -- /bin/run-task.sh $1
shift 1  
exec "$@"

In this case the script expects at least two arguments: one ‘real’ argument followed by an executable script or command and its arguments; shift 1 removes the argument consumed here with $1 from the list of arguments (accessed by "$@"), and exec replaces the current process with the next item in that list. Note that by using exec instead of, say, eval, we ensure that we are not spawning multiple processes and that any SIGKILL events will be able to kill the container.

Some gotchas

Docker-compose starts services within a default network isolated from the network on the host, mapping each service to a hostname identical to the name of the service in the configuration file. While this behavior is in general a huge boon when it comes to configuring connections between containers (and preventing port conflicts on localhost), it can create problems:

Any modules hardcoded for local development with localhost will require workarounds. To give one example, a popular kinesis client for Node only allows unsigned requests to localhost. However, there may be good reason for running our virtualised kinesis service on an unsecured connection (namely, if we are using a pared-down aws-cli image without ssl support).
For some testing and debugging purposes, it is useful to expose our services (and not just our API gateway) on localhost, in which case each will need to be mapped to a separate port. You can either find the port yourself with something like

python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()'

and map it manually, or allow Docker to do so, and read back the mappings with‍

docker inspect --format='{{(index (index .NetworkSettings.Ports "8787/tcp") 0).HostPort $CONTAINER_ID

(in that case 8787/tcp should be changed to whatever port you are exposing from inside the container).

Even if a container is exposed on localhost, network addresses which are valid for requests from inside a container will always be invalid for requests from the host, and vice-versa. This may present a problem if, for example, an endpoint on the public API generates a redirect response from a url which is also used for requests to other containers. On Linux, it it is possible to connect the container network to the host machine, but not on OSX. There is discussion about fixing this, but in the meantime you may need to manually add entries to /etc/hosts, or else run your tests (or anything that needs access to the containers) inside another container on the docker network.

Be aware that Docker (especially on OSX) is not a perfectly stable piece of software. You may run into memory issues from time to time ( a known problem with the official mysql image). Running docker system prune, which removes unused data, can help to address such problems. Also, note that if you let your host computer sleep, timekeeping within containers may drift, preventing connections between them.

The scheme for running microservices locally set out above does not allow for meaningful performance testing because it does not replicate the physical infrastructure of production, but it does allow for simple API or contract tests with a fast feedback loop in a controlled environment. Now go forth and experiment!

Microservices in the microcosm

Environmental flux

Virtualisation

Basic recipe

A dynamic docker-compose

Some gotchas

Further reading

View more blogs

Why Evals are the missing link to your AI strategy

Combining GenAI & Agentic AI to build scalable, autonomous systems

Get in touch