Skip to content

Asynchronous workflows retry policies

Asynchronous workflows have a built-in retry mechanism governed by the following parameters:

  • Attempts: number of retries after the first error, so value 0 means no retries.
    Consecutive attempts are spaced out by an interval that is configured with the two parameters below.
  • Delay: pause in seconds before the first retry.
  • Multiplier: multiplier used in the formula below.

An error is any failure of any replica of any block, including replicas of shared services.
The number of retries applies to each block, not to individual replicas. So, for example, if a block has two replicas and one fails, if five additional retries are allowed, the input will be put back into the block's queue and the first available replica, which is not necessarily the one that failed, will take it. Retires are hence distributed between replicas.

If a block succeeds after retries and another block fails, the retries count starts from zero; each block is independent of the others and has all the retries available.
If all retries fail, the error "exits" the block to become a potential workflow error, unless "absorbed" by the toleration mechanisms of End Context and Switch operators blocks.

The formula to compute the delay between consecutive attempts before the first is:

So, for example, if the retry policies are:

  • Attempts: 4
  • Delay: 2s
  • Multiplier: 1.5

and there is an error, a maximum for four retries will be attempted, separated by the following pauses:

  • Before the first retry: 2s
  • Between the first and the second retry: 3s
  • Between the second and the third retry: 6.75s
  • Between the third and the fourth retry: 22.78125s