Skip to content

Issue when pushing images for multiple architectures and then run steps using the wrong one #495

@jim-barber-he

Description

@jim-barber-he

I've run into a problem with the docker-compose Buildkite plugin that I thought I'd raise.

I have a matrix build and push step that uses the docker-compose plugin to build images for amd64 and arm64 architectures.
Each one targets a Buildkite agent that is running on the architecture in question.
The built images get tagged with the architecture as part of the tag; and then a step afterwards produces a manifest list containing both architectures and is tagged without the architecture in it, and pushed to our repository.
Docker, Kubernetes, and so on refer to the tag name without the architecture and pulls the appropriate image based on the architecture they are running on.

Then I have a run step for the docker-compose service.
But it ends up using an image tag based on which of the images above was the last to finish.
This is a problem when it runs the one with the wrong architecture for the host the run step is executing on.

A simple example of one of our pipelines with a lot of stuff removed (and some redacted), along with a docker-compose.yml file to hopefully show the issue.

docker-compose.yml

services:
  myapp:
    build:
      context: .
      dockerfile: Dockerfile

.buildkite/pipeline.yml

env:
  DOCKER_BUILD_TAG: $BUILDKITE_BRANCH-$BUILDKITE_BUILD_NUMBER
  ECR_REPO: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/$BUILDKITE_PIPELINE_SLUG

steps:

  - label: ':docker: Build + push docker image ({{matrix}})'
    agents:
      queue: build-{{matrix}}
    env:
      BUILDX_NO_DEFAULT_ATTESTATIONS: 1
      DOCKER_DEFAULT_PLATFORM: linux/{{matrix}}
    key: build-architectures
    matrix:
      - amd64
      - arm64
    plugins:
      - docker-compose#v5.9.0:
          build: myapp
          push:
            - myapp:$ECR_REPO:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER
      - ecr#v2.9.0:
          login: true

  - label: ':docker: Generate manifest list + push'
    agents:
      queue: build
    branches: '!master'
    command:
      - docker-manifest-list.sh
         --dst-tags "$BUILDKITE_BRANCH"
         --dst-tags "$DOCKER_BUILD_TAG"
         --registry "$ECR_REPO"
         --src-tags "$BUILDKITE_BRANCH-amd64-$BUILDKITE_BUILD_NUMBER"
         --src-tags "$BUILDKITE_BRANCH-arm64-$BUILDKITE_BUILD_NUMBER"
    depends_on: build-architectures
    key: build-image
    plugins:
      - ecr#v2.9.0:
          login: true

  - label: 'Run stuff'
    agents:
      queue: build-amd64
    command: do-stuff.sh
    depends_on: build-image
    plugins:
      - docker-compose#v5.9.0:
          build: myapp
          run: myapp
      - ecr#v2.9.0:
          login: true

For the push step, the following command is run (with {{matrix}} replaced by amd64 or arm64):

buildkite-agent meta-data set docker-compose-plugin-built-image-tag-myapp 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER

But in both cases they are setting a key called docker-compose-plugin-built-image-tag-myapp so the last one to run is the last one to have updated this meta-data key.

Then the run step calls

buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp

It is running on the build-amd64 host, but if the arm64 image was the last one to be built above, then it'll say

$ buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp
Found a pre-built image for myapp
Creating docker-compose override file for prebuilt services
services:
  myapp:
    image: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:main-arm64-1234

At that point things don't tend to go well...

Some workarounds I could do are probably (I haven't tried yet):

  • When pushing just a single image could use a build-alias: myapp-{{matrix}} parameter and then for the run step use a hardcoded architecture to fine the meta-data key for a specific architecture like so: run: myapp-amd64
    This approach can't work when pushing multiple images though...

  • I could duplicate the service in the docker-compose.yml file, calling themmyapp_amd64 and myapp_arm64 and using myapp_{{matrix}} in the build and push parameters in the pipeline step that pushes images, and use myapp_amd64 for the build and run parameters in the run step.

But it seems to me, the problem would go away if the plugin was modified so that in lib/metadata.bash had the get_prebuilt_image and set_prebuilt_image functions (or the plugin_get_metadata and plugin_set_metadata functions these call) determine the architecture of the host they are running on, and include it in the key they use.
So instead of using docker-compose-plugin-built-image-tag-myapp use docker-compose-plugin-built-image-tag-amd64-myapp or docker-compose-plugin-built-image-tag-arm64-myapp instead.

Does the above seem like a suitable solution to you?
Or do you have other ideas of how to tackle this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions