Add optional plugins to basic cost function in CostBasedAutoScaler#18976
Add optional plugins to basic cost function in CostBasedAutoScaler#18976kfaraz merged 16 commits intoapache:masterfrom
Conversation
4061d5a to
6965594
Compare
...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java
Outdated
Show resolved
Hide resolved
...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java
Outdated
Show resolved
Hide resolved
...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java
Outdated
Show resolved
Hide resolved
...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java
Show resolved
Hide resolved
36c8fc4 to
16d4d78
Compare
…blestream/supervisor/autoscaler/CostBasedAutoScaler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
docs/ingestion/supervisor.md
Outdated
| |`idleWeight`|The weight of extracted poll idle value in cost function. | No | 0.75 | | ||
| |`defaultProcessingRate`|A planned processing rate per task, required for first cost estimations. | No | 1000 | | ||
| |`scaleDownBarrier`| A number of successful scale down attempts which should be skipped to prevent the auto-scaler from scaling down tasks immediately. | No | 5 | | ||
| |`useTaskCountBoundaries`|Enables the bounded partitions-per-task window when selecting task counts.|No|`false`| |
There was a problem hiding this comment.
Instead of a boolean flag, should we just make this an integer for the value of partitions-per-task window?
There was a problem hiding this comment.
No, I won't say so. This option be explained better ("intention of this option to make autoscaler more conservative, yada-yada"), but without specific details. The naming definitely may be better.
There was a problem hiding this comment.
No, I mean that since we are still in the validation phase, we should not freeze the value of the SQRT_TASK_INCREASE constant. Instead, we should allow it to be configurable for ease of testing, since we are adding a config to enable/disable it anyway.
| private final ServiceMetricEvent.Builder metricBuilder; | ||
| private final ScheduledExecutorService autoscalerExecutor; | ||
| private final WeightedCostFunction costFunction; | ||
| private OptimalTaskCountBoundariesPlugin boundariesPlugin = null; |
There was a problem hiding this comment.
This is nice! I like the plugin approach.
Couple of suggestions:
- Can we think of each plugin as being a
CostFunctionitself? - The
WeightedCostFunctioncould also implement theCostFunctioninterface. - The
CostFunctioninterface would have a single method:
CostResult computeCost(
CostMetrics metrics,
int proposedTaskCount,
CostBasedAutoScalerConfig config
);- Then we could do something like
f(g(h(weightedCost()))), wheref,gandhare various plugins on top of the cost function.
CostFunction costFunction = new WeightedCostFunction();
if (burstEnabled) {
costFunction = new BurstFunction(costFunction);
}
if (taskLimitEnabled) {
costFunction = new TaskBoundariesFunction(costFunction);
}
CostResult costResult = costFunction.computeCost();- This approach would allow
WeightedCostFunctionto remain agnostic of all plugins and it would make adding new plugins much simpler.
There was a problem hiding this comment.
As discussed offline, we are postponing this for a future patch.
.../indexing/seekablestream/supervisor/autoscaler/plugins/OptimalTaskCountBoundariesPlugin.java
Outdated
Show resolved
Hide resolved
| * 2. Small taskCount's get a massive relative boost, while large taskCount's receive more measured, stable increases. | ||
| * 3. Logarithmic lag response: diminishing returns at extreme lag values. | ||
| */ | ||
| public int computeScaleUpBoost( |
There was a problem hiding this comment.
In the current code flow, this class doesn't really behave very much like a plugin. For the time being, we might as well just move this method to WeightedCostFunction since that class has to be aware of the plugin anyway.
Once we add the CostFunction interface and its implementations, we can move out the method.
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
kfaraz
left a comment
There was a problem hiding this comment.
Left some non-blocking comments.
| "Proposed task count: %d, Cost: %.4f (lag: %.4f, idle: %.4f)", | ||
| log.info( | ||
| "Proposed task count[%d] has total cost[%.4f] = lagCost[%.4f] + idleCost[%.4f]." | ||
| + " Stats: avgPartitionLag[%.1f], pollIdleRatio[%.1f], lagWeight[%.1f], idleWeight[%.1f]", |
There was a problem hiding this comment.
Please don't log this here. This line will be logged for every proposed task count.
We should log the stats only once.
| * while large taskCount's receive more measured, stable increases. | ||
| */ | ||
| static int computeExtraMaxPartitionsPerTaskIncrease( | ||
| static int computeExtraPPTIncrease( |
There was a problem hiding this comment.
Please simplify this method by returning the max allowed task count itself.
There was a problem hiding this comment.
Would leave it for some follow-up PR, actually want to start testing that stuff soon.
| int minPartitionsPerTask = partitionCount / taskCountMax; | ||
| int maxPartitionsPerTask = partitionCount / taskCountMin; |
There was a problem hiding this comment.
Should we clamp these to the limits of [1, partitionCount]. Otherwise, they may overflow those bounds.
| */ | ||
| private final int highLagThreshold; | ||
| /** | ||
| * Represents the minimum duration between successful scale actions. |
There was a problem hiding this comment.
Please move these javadocs to the getters instead. That way, it would be easier for callers to look up the javadocs.
Changes
scaleDownBarrierhas been changed tominScaleDownDelay, which is nowDuration;Details
This change replaces the sqrt-based scaling formula with a logarithmic formula that provides more aggressive emergency recovery at low task counts and millions of lag.
Idle decay:
ln(lagSeverity) / ln(maxSeverity). Less aggressive, scales well with lag growth.Formula
K = P/(6.4*sqrt(C))means small task counts get massive K values (emergency recovery), while large task counts get smaller K values (stability).More details under the hood:
Details
Idle decay on high lag
No decay:

New idle decay:

Task boundaries: formula
Constant 6.4 was carefully chosen as the best 'good' multiplier during multiple analysis attempts for different cluster sizes and situations.
Details
x/(x+1)ln(x)Example Behavior (48 Partitions, threshold=50K)
Plot: