Issue when pushing images for multiple architectures and then run steps using the wrong one

I've run into a problem with the `docker-compose` Buildkite plugin that I thought I'd raise.

I have a `matrix` build and push step that uses the docker-compose plugin to build images for `amd64` and `arm64` architectures.
Each one targets a Buildkite agent that is running on the architecture in question.
The built images get tagged with the architecture as part of the tag; and then a step afterwards produces a manifest list containing both architectures and is tagged without the architecture in it, and pushed to our repository.
Docker, Kubernetes, and so on refer to the tag name without the architecture and pulls the appropriate image based on the architecture they are running on.

Then I have a run step for the docker-compose service.
But it ends up using an image tag based on which of the images above was the last to finish.
This is a problem when it runs the one with the wrong architecture for the host the run step is executing on.

A simple example of one of our pipelines with a lot of stuff removed (and some redacted), along with a docker-compose.yml file to hopefully show the issue.

`docker-compose.yml`
```yaml
services:
  myapp:
    build:
      context: .
      dockerfile: Dockerfile
```

`.buildkite/pipeline.yml`
```yaml
env:
  DOCKER_BUILD_TAG: $BUILDKITE_BRANCH-$BUILDKITE_BUILD_NUMBER
  ECR_REPO: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/$BUILDKITE_PIPELINE_SLUG

steps:

  - label: ':docker: Build + push docker image ({{matrix}})'
    agents:
      queue: build-{{matrix}}
    env:
      BUILDX_NO_DEFAULT_ATTESTATIONS: 1
      DOCKER_DEFAULT_PLATFORM: linux/{{matrix}}
    key: build-architectures
    matrix:
      - amd64
      - arm64
    plugins:
      - docker-compose#v5.9.0:
          build: myapp
          push:
            - myapp:$ECR_REPO:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER
      - ecr#v2.9.0:
          login: true

  - label: ':docker: Generate manifest list + push'
    agents:
      queue: build
    branches: '!master'
    command:
      - docker-manifest-list.sh
         --dst-tags "$BUILDKITE_BRANCH"
         --dst-tags "$DOCKER_BUILD_TAG"
         --registry "$ECR_REPO"
         --src-tags "$BUILDKITE_BRANCH-amd64-$BUILDKITE_BUILD_NUMBER"
         --src-tags "$BUILDKITE_BRANCH-arm64-$BUILDKITE_BUILD_NUMBER"
    depends_on: build-architectures
    key: build-image
    plugins:
      - ecr#v2.9.0:
          login: true

  - label: 'Run stuff'
    agents:
      queue: build-amd64
    command: do-stuff.sh
    depends_on: build-image
    plugins:
      - docker-compose#v5.9.0:
          build: myapp
          run: myapp
      - ecr#v2.9.0:
          login: true
```

For the push step, the following command is run (with `{{matrix}}` replaced by `amd64` or `arm64`):
```
buildkite-agent meta-data set docker-compose-plugin-built-image-tag-myapp 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:$BUILDKITE_BRANCH-{{matrix}}-$BUILDKITE_BUILD_NUMBER
```
But in both cases they are setting a key called `docker-compose-plugin-built-image-tag-myapp` so the last one to run is the last one to have updated this meta-data key.

Then the run step calls
```
buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp
```
It is running on the `build-amd64` host, but if the `arm64` image was the last one to be built above, then it'll say
```
$ buildkite-agent meta-data get docker-compose-plugin-built-image-tag-myapp
Found a pre-built image for myapp
Creating docker-compose override file for prebuilt services
services:
  myapp:
    image: 000000000000.dkr.ecr.ap-southeast-2.amazonaws.com/myapp:main-arm64-1234
```
At that point things don't tend to go well...

Some workarounds I could do are probably (I haven't tried yet):

- When pushing just a single image could use a `build-alias: myapp-{{matrix}}` parameter and then for the run step use a hardcoded architecture to fine the meta-data key for a specific architecture like so:  `run: myapp-amd64`
  This approach can't work when pushing multiple images though...

- I could duplicate the service in the `docker-compose.yml` file, calling them`myapp_amd64` and `myapp_arm64` and using `myapp_{{matrix}}` in the `build` and `push` parameters in the pipeline step that pushes images, and use `myapp_amd64` for the `build` and `run` parameters in the run step.

But it seems to me, the problem would go away if the plugin was modified so that in `lib/metadata.bash` had the `get_prebuilt_image` and `set_prebuilt_image` functions (or the `plugin_get_metadata` and `plugin_set_metadata` functions these call) determine the architecture of the host they are running on, and include it in the key they use.
So instead of using `docker-compose-plugin-built-image-tag-myapp` use `docker-compose-plugin-built-image-tag-amd64-myapp` or `docker-compose-plugin-built-image-tag-arm64-myapp` instead.

Does the above seem like a suitable solution to you?
Or do you have other ideas of how to tackle this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when pushing images for multiple architectures and then run steps using the wrong one #495

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue when pushing images for multiple architectures and then run steps using the wrong one #495

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions