I've run into a problem with the docker-compose Buildkite plugin that I thought I'd raise.
I have a matrix build and push step that uses the docker-compose plugin to build images for amd64 and arm64 architectures.
Each one targets a Buildkite agent that is running on the architecture in question.
The built images get tagged with the architecture as part of the tag; and then a step afterwards produces a manifest list containing both architectures and is tagged without the architecture in it, and pushed to our repository.
Docker, Kubernetes, and so on refer to the tag name without the architecture and pulls the appropriate image based on the architecture they are running on.
Then I have a run step for the docker-compose service.
But it ends up using an image tag based on which of the images above was the last to finish.
This is a problem when it runs the one with the wrong architecture for the host the run step is executing on.
A simple example of one of our pipelines with a lot of stuff removed (and some redacted), along with a docker-compose.yml file to hopefully show the issue.
docker-compose.yml
services:
myapp:
build:
context: .
dockerfile: Dockerfile
.buildkite/pipeline.yml
env:
DOCKER_BUILD_TAG: $BUILDKITE_BRANCH-$BUILDKITE_BUILD_NUMBER
ECR_REPO: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/$BUILDKITE_PIPELINE_SLUG
steps:
- label: ':docker: Build + push docker image ({{matrix}})'
agents:
queue: build-{{matrix}}
env:
BUILDX_NO_DEFAULT_ATTESTATIONS: 1
DOCKER_DEFAULT_PLATFORM: linux/{{matrix}}
key: build-architectures
matrix:
- amd64
- arm64
plugins:
- docker-compose#v5.9.0:
build: myapp
push:
- myapp:$ECR_REPO:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER
- ecr#v2.9.0:
login: true
- label: ':docker: Generate manifest list + push'
agents:
queue: build
branches: '!master'
command:
- docker-manifest-list.sh
--dst-tags "$BUILDKITE_BRANCH"
--dst-tags "$DOCKER_BUILD_TAG"
--registry "$ECR_REPO"
--src-tags "$BUILDKITE_BRANCH-amd64-$BUILDKITE_BUILD_NUMBER"
--src-tags "$BUILDKITE_BRANCH-arm64-$BUILDKITE_BUILD_NUMBER"
depends_on: build-architectures
key: build-image
plugins:
- ecr#v2.9.0:
login: true
- label: 'Run stuff'
agents:
queue: build-amd64
command: do-stuff.sh
depends_on: build-image
plugins:
- docker-compose#v5.9.0:
build: myapp
run: myapp
- ecr#v2.9.0:
login: true
For the push step, the following command is run (with {{matrix}} replaced by amd64 or arm64):
buildkite-agent meta-data set docker-compose-plugin-built-image-tag-myapp 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER
But in both cases they are setting a key called docker-compose-plugin-built-image-tag-myapp so the last one to run is the last one to have updated this meta-data key.
Then the run step calls
buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp
It is running on the build-amd64 host, but if the arm64 image was the last one to be built above, then it'll say
$ buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp
Found a pre-built image for myapp
Creating docker-compose override file for prebuilt services
services:
myapp:
image: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:main-arm64-1234
At that point things don't tend to go well...
Some workarounds I could do are probably (I haven't tried yet):
-
When pushing just a single image could use a build-alias: myapp-{{matrix}} parameter and then for the run step use a hardcoded architecture to fine the meta-data key for a specific architecture like so: run: myapp-amd64
This approach can't work when pushing multiple images though...
-
I could duplicate the service in the docker-compose.yml file, calling themmyapp_amd64 and myapp_arm64 and using myapp_{{matrix}} in the build and push parameters in the pipeline step that pushes images, and use myapp_amd64 for the build and run parameters in the run step.
But it seems to me, the problem would go away if the plugin was modified so that in lib/metadata.bash had the get_prebuilt_image and set_prebuilt_image functions (or the plugin_get_metadata and plugin_set_metadata functions these call) determine the architecture of the host they are running on, and include it in the key they use.
So instead of using docker-compose-plugin-built-image-tag-myapp use docker-compose-plugin-built-image-tag-amd64-myapp or docker-compose-plugin-built-image-tag-arm64-myapp instead.
Does the above seem like a suitable solution to you?
Or do you have other ideas of how to tackle this problem?
I've run into a problem with the
docker-composeBuildkite plugin that I thought I'd raise.I have a
matrixbuild and push step that uses the docker-compose plugin to build images foramd64andarm64architectures.Each one targets a Buildkite agent that is running on the architecture in question.
The built images get tagged with the architecture as part of the tag; and then a step afterwards produces a manifest list containing both architectures and is tagged without the architecture in it, and pushed to our repository.
Docker, Kubernetes, and so on refer to the tag name without the architecture and pulls the appropriate image based on the architecture they are running on.
Then I have a run step for the docker-compose service.
But it ends up using an image tag based on which of the images above was the last to finish.
This is a problem when it runs the one with the wrong architecture for the host the run step is executing on.
A simple example of one of our pipelines with a lot of stuff removed (and some redacted), along with a docker-compose.yml file to hopefully show the issue.
docker-compose.yml.buildkite/pipeline.ymlFor the push step, the following command is run (with
{{matrix}}replaced byamd64orarm64):But in both cases they are setting a key called
docker-compose-plugin-built-image-tag-myappso the last one to run is the last one to have updated this meta-data key.Then the run step calls
It is running on the
build-amd64host, but if thearm64image was the last one to be built above, then it'll sayAt that point things don't tend to go well...
Some workarounds I could do are probably (I haven't tried yet):
When pushing just a single image could use a
build-alias: myapp-{{matrix}}parameter and then for the run step use a hardcoded architecture to fine the meta-data key for a specific architecture like so:run: myapp-amd64This approach can't work when pushing multiple images though...
I could duplicate the service in the
docker-compose.ymlfile, calling themmyapp_amd64andmyapp_arm64and usingmyapp_{{matrix}}in thebuildandpushparameters in the pipeline step that pushes images, and usemyapp_amd64for thebuildandrunparameters in the run step.But it seems to me, the problem would go away if the plugin was modified so that in
lib/metadata.bashhad theget_prebuilt_imageandset_prebuilt_imagefunctions (or theplugin_get_metadataandplugin_set_metadatafunctions these call) determine the architecture of the host they are running on, and include it in the key they use.So instead of using
docker-compose-plugin-built-image-tag-myappusedocker-compose-plugin-built-image-tag-amd64-myappordocker-compose-plugin-built-image-tag-arm64-myappinstead.Does the above seem like a suitable solution to you?
Or do you have other ideas of how to tackle this problem?