Autoscaling allows you to optimize the use of computational resources of the NL Flow runtime, keeping the workflow always published and usable, but turning off all or part of its blocks when there is no workload and turning them back on when it returns to be loaded. Partial shutdown concerns blocks for which a number of replicas greater than one has been set.
Shutting down the blocks, in a Cloud-based installation, can in turn lead to the automatic shutdown of temporarily unused virtual computers—cluster nodes—, with lower costs. The economic advantage is offset by lower performance, because when the workload reappears, activating the switched off blocks takes time.
Autoscaling parameters are listed and described below, but, instead of setting them individually, you can choose from these presets:
- Eco: setting designed to contain costs and energy consumption at the price of lower workflow reactivity.
- Sport: more responsive workflow in the face of higher costs and consumption.
- Disabled: no autoscaling, all workflow blocks are always fully on. It is the most expensive and best performing option.
If you choose to customize the parameters, you can start from a preset then change the parameters individually.
This is the main parameter. A value of 0 indicates that all workflow blocks can be completely shut down if a no workload condition is detected (see the parameters below).
A value greater than zero indicates the minimum number of replicas of each block, up to the maximum number set with the editor, which always remain on. The queue management block, implicitly added to every asynchronous workflow, has a maximum number of replicas equal to one.
Interval in seconds between one check for autoscaling conditions and the next.
Cooldown Period Interval in seconds, after a no workload condition has been detected, before triggering a scale down. If during this time workload is detected again, the pending scale down operation is canceled.
This parameter determines the indicator to use to determine the presence or absence of workload. You can choose between:
- Queue Length: length of the queue, measured in number of analysis tasks submitted and waiting to be processed.
- Message Rate: number of new messages queued every second, that is the frequency of submission of new analysis tasks.
Threshold value for the indicator specified by Mode: if the indicator exceeds the threshold, there is a workload and, unless the workflow is already fully on, a scale up operation is triggered. The number of replicas to turn on for multi-replica blocks, is a function of the Value parameter.
If the indicator is equal to or below the threshold, the conditions exist for a scale down which will take place once the cooldown period has elapsed (see above).
This value is a multiple for the Mode indicator that determines when an additional replica should be activated for multi-replica blocks.
For example, if the value is 10, there are 11 messages queued and a block with maximum replicas set to 4 has only one replica turned on, a second replica is turned on. However, if there are 100 messages in the queue, all replicas are turned on.