We had an issue yesterday where we updated QNAP QuObjects, causing some change to the way S3 is handled. The resultant error from barman cloud was:
ERROR: Barman cloud WAL archiver exception: An error occurred (InvalidDigest)
when calling the PutObject operation: The Content-MD5 or checksum value that
you specified is not valid.
The fix is already mentioned in the documentation, where we add the following to the object store:
---
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
spec:
configuration:
...
instanceSidecarConfiguration:
env:
- name: AWS_REQUEST_CHECKSUM_CALCULATION
value: when_required
- name: AWS_RESPONSE_CHECKSUM_VALIDATION
value: when_required
and I can confirm this works perfectly. However. Because one of the pods is stuck (1/2 Ready, blocked on the error above) the update never gets applied. What I had to do was apply the fix and then manually delete that one stuck pod (3-node cluster, this pod was a replica).
I think this is a bug, just because this configuration change was specifically required to resolve the issue I was having (S3 semantics changed). It's a little more complicated, here, because it seems like I had a primary switchover (perhaps due to similar S3-related errors) which left a bunch of WALs waiting to get uploaded. Would be good to see if there's a way this could be improved such that the barman-cloud-plugin operator is able to detect the S3-related failure and apply outstanding configuration.
We had an issue yesterday where we updated QNAP QuObjects, causing some change to the way S3 is handled. The resultant error from barman cloud was:
The fix is already mentioned in the documentation, where we add the following to the object store:
and I can confirm this works perfectly. However. Because one of the pods is stuck (1/2 Ready, blocked on the error above) the update never gets applied. What I had to do was apply the fix and then manually delete that one stuck pod (3-node cluster, this pod was a replica).
I think this is a bug, just because this configuration change was specifically required to resolve the issue I was having (S3 semantics changed). It's a little more complicated, here, because it seems like I had a primary switchover (perhaps due to similar S3-related errors) which left a bunch of WALs waiting to get uploaded. Would be good to see if there's a way this could be improved such that the barman-cloud-plugin operator is able to detect the S3-related failure and apply outstanding configuration.