⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Add optional plugins to basic cost function in CostBasedAutoScaler#18976

Merged
kfaraz merged 16 commits intoapache:masterfrom
Fly-Style:cba-cost-adjustments
Feb 5, 2026
Merged

Add optional plugins to basic cost function in CostBasedAutoScaler#18976
kfaraz merged 16 commits intoapache:masterfrom
Fly-Style:cba-cost-adjustments

Conversation

@Fly-Style
Copy link
Contributor

@Fly-Style Fly-Style commented Feb 2, 2026

Changes

  • separate the logic of pure cost function, making all additional logic opt-in in config;
  • scaleDownBarrier has been changed to minScaleDownDelay, which is now Duration;
  • changes to high lag fast scaleup: logarithmic scaling formula for idle decay on high lag and task boundaries.

Details

This change replaces the sqrt-based scaling formula with a logarithmic formula that provides more aggressive emergency recovery at low task counts and millions of lag.

Idle decay: ln(lagSeverity) / ln(maxSeverity). Less aggressive, scales well with lag growth.

Formula K = P/(6.4*sqrt(C)) means small task counts get massive K values (emergency recovery), while large task counts get smaller K values (stability).

More details under the hood:

Details

Idle decay on high lag

No decay:
cost_based_case1_baseline

New idle decay:
cost_based_case1_ln_severity

Task boundaries: formula

deltaTasks = K * ln(lagSeverity)

where:
  lagSeverity = (aggregateLag / partitionCount) / lagThreshold
  K = (partitionCount / 6.4) / sqrt(currentTaskCount)

Constant 6.4 was carefully chosen as the best 'good' multiplier during multiple analysis attempts for different cluster sizes and situations.

Details
Property Old (sqrt-based) New (logarithmic)
Small cluster (C=1) Conservative (~4-6 tasks max) Controlled (~6-12 tasks)
Large cluster (C=24) Moderate Moderate
Lag response Saturates via x/(x+1) Unbounded via ln(x)
Growth factor K Increases with sqrt(C) Decreases with sqrt(C)

Example Behavior (48 Partitions, threshold=50K)

Current Lag K Delta Target Valid Range
1 5M 7.5 5.5 6-7 1-8
1 10M 7.5 10.7 12 1-12
3 10M 4.3 6.2 10 3-10
12 10M 2.2 3.1 16 12-16

Plot:

image

@Fly-Style Fly-Style changed the title Introduce additional temporary config params to tweak high lag handling Introduce temporary config params to tweak high lag handling Feb 2, 2026
@Fly-Style Fly-Style marked this pull request as draft February 2, 2026 16:16
@Fly-Style Fly-Style force-pushed the cba-cost-adjustments branch from 36c8fc4 to 16d4d78 Compare February 4, 2026 13:27
@Fly-Style Fly-Style requested a review from kfaraz February 4, 2026 13:27
…blestream/supervisor/autoscaler/CostBasedAutoScaler.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
@Fly-Style Fly-Style marked this pull request as ready for review February 4, 2026 13:29
@kfaraz kfaraz changed the title Introduce temporary config params to tweak high lag handling Add optional plugins to basic cost function in CostBasedAutoScaler Feb 4, 2026
|`idleWeight`|The weight of extracted poll idle value in cost function. | No | 0.75 |
|`defaultProcessingRate`|A planned processing rate per task, required for first cost estimations. | No | 1000 |
|`scaleDownBarrier`| A number of successful scale down attempts which should be skipped to prevent the auto-scaler from scaling down tasks immediately. | No | 5 |
|`useTaskCountBoundaries`|Enables the bounded partitions-per-task window when selecting task counts.|No|`false`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a boolean flag, should we just make this an integer for the value of partitions-per-task window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I won't say so. This option be explained better ("intention of this option to make autoscaler more conservative, yada-yada"), but without specific details. The naming definitely may be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean that since we are still in the validation phase, we should not freeze the value of the SQRT_TASK_INCREASE constant. Instead, we should allow it to be configurable for ease of testing, since we are adding a config to enable/disable it anyway.

Copy link
Contributor Author

@Fly-Style Fly-Style Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was reworked in 75b15d6

private final ServiceMetricEvent.Builder metricBuilder;
private final ScheduledExecutorService autoscalerExecutor;
private final WeightedCostFunction costFunction;
private OptimalTaskCountBoundariesPlugin boundariesPlugin = null;
Copy link
Contributor

@kfaraz kfaraz Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I like the plugin approach.

Couple of suggestions:

  • Can we think of each plugin as being a CostFunction itself?
  • The WeightedCostFunction could also implement the CostFunction interface.
  • The CostFunction interface would have a single method:
CostResult computeCost(
    CostMetrics metrics,
    int proposedTaskCount,
    CostBasedAutoScalerConfig config
);
  • Then we could do something like f(g(h(weightedCost()))), where f, g and h are various plugins on top of the cost function.
CostFunction costFunction = new WeightedCostFunction();
if (burstEnabled) {
   costFunction = new BurstFunction(costFunction);
}
if (taskLimitEnabled) {
   costFunction = new TaskBoundariesFunction(costFunction);
}
CostResult costResult = costFunction.computeCost();
  • This approach would allow WeightedCostFunction to remain agnostic of all plugins and it would make adding new plugins much simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we are postponing this for a future patch.

@Fly-Style Fly-Style requested a review from kfaraz February 4, 2026 15:31
* 2. Small taskCount's get a massive relative boost, while large taskCount's receive more measured, stable increases.
* 3. Logarithmic lag response: diminishing returns at extreme lag values.
*/
public int computeScaleUpBoost(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code flow, this class doesn't really behave very much like a plugin. For the time being, we might as well just move this method to WeightedCostFunction since that class has to be aware of the plugin anyway.

Once we add the CostFunction interface and its implementations, we can move out the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: de7f15f

Fly-Style and others added 2 commits February 5, 2026 10:58
@Fly-Style Fly-Style requested a review from kfaraz February 5, 2026 09:00
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some non-blocking comments.

"Proposed task count: %d, Cost: %.4f (lag: %.4f, idle: %.4f)",
log.info(
"Proposed task count[%d] has total cost[%.4f] = lagCost[%.4f] + idleCost[%.4f]."
+ " Stats: avgPartitionLag[%.1f], pollIdleRatio[%.1f], lagWeight[%.1f], idleWeight[%.1f]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't log this here. This line will be logged for every proposed task count.
We should log the stats only once.

* while large taskCount's receive more measured, stable increases.
*/
static int computeExtraMaxPartitionsPerTaskIncrease(
static int computeExtraPPTIncrease(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify this method by returning the max allowed task count itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would leave it for some follow-up PR, actually want to start testing that stuff soon.

Comment on lines 290 to 291
int minPartitionsPerTask = partitionCount / taskCountMax;
int maxPartitionsPerTask = partitionCount / taskCountMin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clamp these to the limits of [1, partitionCount]. Otherwise, they may overflow those bounds.

*/
private final int highLagThreshold;
/**
* Represents the minimum duration between successful scale actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these javadocs to the getters instead. That way, it would be easier for callers to look up the javadocs.

@kfaraz kfaraz merged commit 67abdc2 into apache:master Feb 5, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants