Skip to content

Changes to ObjectStore spec do not get applied until after the cluster is healthy #918

@dancollins

Description

@dancollins

We had an issue yesterday where we updated QNAP QuObjects, causing some change to the way S3 is handled. The resultant error from barman cloud was:

ERROR: Barman cloud WAL archiver exception: An error occurred (InvalidDigest) 
when calling the PutObject operation: The Content-MD5 or checksum value that 
you specified is not valid.

The fix is already mentioned in the documentation, where we add the following to the object store:

---
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
spec:
  configuration:
     ...
  instanceSidecarConfiguration:
      env:
        - name: AWS_REQUEST_CHECKSUM_CALCULATION
          value: when_required
        - name: AWS_RESPONSE_CHECKSUM_VALIDATION
          value: when_required

and I can confirm this works perfectly. However. Because one of the pods is stuck (1/2 Ready, blocked on the error above) the update never gets applied. What I had to do was apply the fix and then manually delete that one stuck pod (3-node cluster, this pod was a replica).

I think this is a bug, just because this configuration change was specifically required to resolve the issue I was having (S3 semantics changed). It's a little more complicated, here, because it seems like I had a primary switchover (perhaps due to similar S3-related errors) which left a bunch of WALs waiting to get uploaded. Would be good to see if there's a way this could be improved such that the barman-cloud-plugin operator is able to detect the S3-related failure and apply outstanding configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions