Autoscaling parameters

Introduction

Autoscaling parameters are for asynchronous workflows and are set at publishing time. In particular, the parameters are for the association of a workflow with a runtime: the same workflow can have different parameters in different runtimes. The parameters apply to any block or shared service that can have replicas and regulate the switching on and off of replicas as the load on the individual block or service varies.

If in the publishing wizard the user chooses Disabled, there is no autoscaling: the maximum number of replicas set by the designer for blocks and shared services is switched on and remains on while the workflow is published.

There are two presets for the parameters, but, once a preset is chosen as a starting point, the user can freely change the values of all the parameters.

Min Replicas

Minimum number of replicas of all the blocks and shared services of the workflow that must always be available, therefore turned on at the time of publication and kept on as long as the workflow is published. The higher this number, the greater the readiness of the workflow in dealing with waves of load, but also the greater the computational footprint (CPU, RAM and, therefore, virtual computers required) at rest and consequently the cost of the workflow even if inactive. In other words, when the system encounters a downscaling condition for a block or shared service and recalculates the number of replicas downwards, it cannot go below this value.

If a shared block or service has set the maximum number of replicas to a value lower than Min Replicas, all replicas of the same are started. In this case the actual minimum number of replicas is the maximum set by the designer.

In the performance-oriented Sport preset the value of this parameter is 1, so it is guaranteed that at least one replica of any block and shared service is always turned on and ready to handle load. Clearly this has a cost, because it is possible that virtual computers are kept on—and therefore generate costs—even in the absence of load.

In the Eco preset, oriented to saving, the value is 0. This means that immediately after publication the workflow is only virtually responsive, because it does not have any blocks or services available. Its readiness to deal with waves of load is minimal, but its computational footprint (CPU and RAM required and, therefore, virtual computers turned on) is null, its cost at rest is zero.
Replicas are started as soon as load appears based on the parameters Polling Interval, Mode, Value and Activation Value. When the load ceases, the value zero means that even the last replica of each block and shared service is deleted, bringing the workflow's own costs back to zero, but also making the workflow's readiness to new load minimal again.

Polling Interval

Frequency in seconds with which the system checks for the existence of an upscaling or downscaling condition for each block or shared service of the workflow, with possible execution of the scaling itself.

In the Sport preset this interval is shorter (2 seconds) than in the Eco preset (5 seconds). The reason is that the Sport preset favors performance so it checks more often if it is necessary to turn on more replicas as the load increases. The Eco preset favors savings, so it checks less often to delay the possible turning on of extra virtual computers due to the turning on of new replicas: the less time the computers are turned on, the lower the costs.

Cooldown Period

Time in seconds that a replica of a block or shared service which has been marked for deletion on a downscaling condition is kept turned on, if a upscaling condition does not occur in the meantime.

In the Sport preset this interval is much longer (10 minutes) than in the Eco preset (2 minutes). The reason is that the Sport preset prioritizes performance, and keeping idle replicas powered on for a relatively long time increases the responsiveness of the workflow if the load goes up. The Eco preset, on the other hand, prioritizes savings, so idle replicas are quickly deleted in an attempt to shut down expensive virtual computers. Clearly, responsiveness suffers if the load increases again in the short time.

Mode

The way the load on a block or shared service is measured to determine whether there is an upscaling or downscaling condition.

The Queue Length value indicates that the load is the number of messages in the queue. Each block or shared service has its own incoming queue in which execution requests can accumulate if the block or service is slower than the request generator, which can be a block upstream in the flow or the workflow client application.

The Message Rate value, on the other hand, indicates that the load is the rate, in messages per second, with which new requests arrive. The rate is compute as the average for the previous polling interval.

For both presets this parameter is set to Queue Length.

Value

Value used to calculate the number of replicas requested. Represents the fraction of the load that the users wants to assign to each replica.

For example, if the load is 50 and Value is 5, the load is divided between 50 / 5 = 10 replicas and therefore each replica will tend to take a tenth of the load.

The actual amount of load that the single replica will take from the queue is not pre-determined because it depends on the individual processing: if a replica receives an input requiring heavy processing, over time it will take fewer messages from the queue than another replica that receives a "lighter" input.

In the Sport preset this parameter is 1, a value that leads to having the maximum number of replicas, therefore maximum parallelism and maximum throughput of the block or service. For example, with an instantaneous load of 50, 50 replicas are required and each one must deal with only a fiftieth of the load.
In the Eco preset this parameter is 10, therefore fewer replicas for the same load and each replica receives a larger fraction of the load which, necessarily, must be processed in sequence, one message at a time.

Activation Value

Load threshold that determines the activation of the block or shared service. The parameter is considered only in the particular case of upscaling from zero that is confirmed if only the load is higher than the value of the parameter.

For example, if there are no replicas (the block or service is deactivated), the load is 2, Value is 1 and Activation Value is 9, the block or service is not activated. In fact, the replicas requested would be 2 / 1 = 2, but the load is not higher than the threshold represented by Activation Value, therefore no activation.

The value zero, which is the default for both presets, means that the parameter has no influence on scaling: any load, even a single message, is already sufficient to scale up from zero, that is, activate the block or service.

Warning

Given the impact of no activation, with the possibility that messages remain in the queue, carefully evaluate the opportunity to change the default value of this parameter.

Presets

Eco and Sport are the parameter presets. By choosing a preset you choose certain pre-defined parameter values.

Eco is the economy-oriented preset, Sport is the performance-oriented preset.
These are the parameter values for both:

Preset → Parameter ↓	Eco	Sport
Min Replicas	0	1
Polling Interval	5s	2s
Cooldown Period	120s	600s
Mode	Queue Length	Queue Length
Value	10	1
Activation Value	0	0

In the paragraphs above you will find, for each parameter, the rationale for the specific value used in the presets.
The user can use a preset as is or choose one as a starting point and then modify the parameters as desired.