Skip to content

When a ping or balance fails because of a 4xx, a sensible retry strategy needs to be implemented #10

Description

@nevali

If a ping fails due to a 4xx (with the default ETCD_EXISTS), we should retry with ETCD_NONE to attempt to re-create the value. If that fails, we should close and re-open the directory.

  • If re-opening the directory fails, we should invoke a (new) callback to inform the application that the cluster has been forcibly left.
  • This process should be completed while the write-lock is held, which would prevent the other thread from interfering with it.
  • Once the cluster has been re-acquired, we should set a (new) flag to inform the other thread that the cluster state has changed.
  • Regardless of whether re-creating the value succeeds, cluster_etcd_balance_() should be invoked as part of the re-acquisition process to ensure that member state data is up to date.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions