Add optional plugins to basic cost function in CostBasedAutoScaler by Fly-Style · Pull Request #18976 · apache/druid

Fly-Style · 2026-02-02T15:20:24Z

Changes

separate the logic of pure cost function, making all additional logic opt-in in config;
scaleDownBarrier has been changed to minScaleDownDelay, which is now Duration;
changes to high lag fast scaleup: logarithmic scaling formula for idle decay on high lag and task boundaries.

Details

This change replaces the sqrt-based scaling formula with a logarithmic formula that provides more aggressive emergency recovery at low task counts and millions of lag.

Idle decay: ln(lagSeverity) / ln(maxSeverity). Less aggressive, scales well with lag growth.

Formula K = P/(6.4*sqrt(C)) means small task counts get massive K values (emergency recovery), while large task counts get smaller K values (stability).

More details under the hood:

Details

Idle decay on high lag

No decay:

New idle decay:

Task boundaries: formula

deltaTasks = K * ln(lagSeverity)

where:
  lagSeverity = (aggregateLag / partitionCount) / lagThreshold
  K = (partitionCount / 6.4) / sqrt(currentTaskCount)

Constant 6.4 was carefully chosen as the best 'good' multiplier during multiple analysis attempts for different cluster sizes and situations.

Details

Property	Old (sqrt-based)	New (logarithmic)
Small cluster (C=1)	Conservative (~4-6 tasks max)	Controlled (~6-12 tasks)
Large cluster (C=24)	Moderate	Moderate
Lag response	Saturates via `x/(x+1)`	Unbounded via `ln(x)`
Growth factor K	Increases with sqrt(C)	Decreases with sqrt(C)

Example Behavior (48 Partitions, threshold=50K)

Current	Lag	K	Delta	Target	Valid Range
1	5M	7.5	5.5	6-7	1-8
1	10M	7.5	10.7	12	1-12
3	10M	4.3	6.2	10	3-10
12	10M	2.2	3.1	16	12-16

Plot:

…aler

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

…blestream/supervisor/autoscaler/CostBasedAutoScaler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

kfaraz · 2026-02-04T13:35:38Z

docs/ingestion/supervisor.md

 |`idleWeight`|The weight of extracted poll idle value in cost function. | No | 0.75 |
 |`defaultProcessingRate`|A planned processing rate per task, required for first cost estimations. | No | 1000 |
-|`scaleDownBarrier`| A number of successful scale down attempts which should be skipped  to prevent the auto-scaler from scaling down tasks immediately.  | No | 5 |
+|`useTaskCountBoundaries`|Enables the bounded partitions-per-task window when selecting task counts.|No|`false`|


Instead of a boolean flag, should we just make this an integer for the value of partitions-per-task window?

No, I won't say so. This option be explained better ("intention of this option to make autoscaler more conservative, yada-yada"), but without specific details. The naming definitely may be better.

No, I mean that since we are still in the validation phase, we should not freeze the value of the SQRT_TASK_INCREASE constant. Instead, we should allow it to be configurable for ease of testing, since we are adding a config to enable/disable it anyway.

It was reworked in 75b15d6

docs/ingestion/supervisor.md

kfaraz · 2026-02-04T13:55:40Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

  private final ServiceMetricEvent.Builder metricBuilder;
  private final ScheduledExecutorService autoscalerExecutor;
  private final WeightedCostFunction costFunction;
+  private OptimalTaskCountBoundariesPlugin boundariesPlugin = null;


This is nice! I like the plugin approach.

Couple of suggestions:

Can we think of each plugin as being a CostFunction itself?

The WeightedCostFunction could also implement the CostFunction interface.

The CostFunction interface would have a single method:

CostResult computeCost( CostMetrics metrics, int proposedTaskCount, CostBasedAutoScalerConfig config );

Then we could do something like f(g(h(weightedCost()))), where f, g and h are various plugins on top of the cost function.

CostFunction costFunction = new WeightedCostFunction(); if (burstEnabled) { costFunction = new BurstFunction(costFunction); } if (taskLimitEnabled) { costFunction = new TaskBoundariesFunction(costFunction); } CostResult costResult = costFunction.computeCost();

This approach would allow WeightedCostFunction to remain agnostic of all plugins and it would make adding new plugins much simpler.

As discussed offline, we are postponing this for a future patch.

…o cba-cost-adjustments

docs/ingestion/supervisor.md

.../indexing/seekablestream/supervisor/autoscaler/plugins/OptimalTaskCountBoundariesPlugin.java

kfaraz · 2026-02-04T16:40:20Z

...druid/indexing/seekablestream/supervisor/autoscaler/plugins/BurstScaleUpOnHighLagPlugin.java

+   * 2. Small taskCount's get a massive relative boost, while large taskCount's receive more measured, stable increases.
+   * 3. Logarithmic lag response: diminishing returns at extreme lag values.
+   */
+  public int computeScaleUpBoost(


In the current code flow, this class doesn't really behave very much like a plugin. For the time being, we might as well just move this method to WeightedCostFunction since that class has to be aware of the plugin anyway.

Once we add the CostFunction interface and its implementations, we can move out the method.

Done: de7f15f

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

kfaraz

Left some non-blocking comments.

kfaraz · 2026-02-05T14:02:50Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

-          "Proposed task count: %d, Cost: %.4f (lag: %.4f, idle: %.4f)",
+      log.info(
+          "Proposed task count[%d] has total cost[%.4f] = lagCost[%.4f] + idleCost[%.4f]."
+          + " Stats: avgPartitionLag[%.1f], pollIdleRatio[%.1f], lagWeight[%.1f], idleWeight[%.1f]",


Please don't log this here. This line will be logged for every proposed task count.
We should log the stats only once.

kfaraz · 2026-02-05T14:12:39Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

+   * while large taskCount's receive more measured, stable increases.
   */
-  static int computeExtraMaxPartitionsPerTaskIncrease(
+  static int computeExtraPPTIncrease(


Please simplify this method by returning the max allowed task count itself.

Would leave it for some follow-up PR, actually want to start testing that stuff soon.

kfaraz · 2026-02-05T14:14:38Z

...java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java

+    int minPartitionsPerTask = partitionCount / taskCountMax;
+    int maxPartitionsPerTask = partitionCount / taskCountMin;


Should we clamp these to the limits of [1, partitionCount]. Otherwise, they may overflow those bounds.

kfaraz · 2026-02-05T14:18:32Z

...rg/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScalerConfig.java

+   */
+  private final int highLagThreshold;
+  /**
+   * Represents the minimum duration between successful scale actions.


Please move these javadocs to the getters instead. That way, it would be easier for callers to look up the javadocs.

Fly-Style added 2 commits January 30, 2026 20:27

Adjust costs for burst scaleup during heavy lag for cost-based autosc…

ba5e39e

…aler

Checkstyle

8c77632

Fly-Style changed the title ~~Introduce additional temporary config params to tweak high lag handling~~ Introduce temporary config params to tweak high lag handling Feb 2, 2026

github-actions bot added the Area - Ingestion label Feb 2, 2026

Fly-Style marked this pull request as draft February 2, 2026 16:16

Introduce additional temporary config params to tweak high lag handling

6965594

Fly-Style force-pushed the cba-cost-adjustments branch from 4061d5a to 6965594 Compare February 3, 2026 19:02

github-actions bot added the Area - Documentation label Feb 3, 2026

Fly-Style added 3 commits February 3, 2026 21:04

Merge branch 'master' into cba-cost-adjustments

1706855

Checkstyle

0948f94

Self-review

116e984

kfaraz reviewed Feb 4, 2026

View reviewed changes

Refactor CostBasedAutoScaler: add plugin system to pure cost fucntion

16d4d78

Fly-Style force-pushed the cba-cost-adjustments branch from 36c8fc4 to 16d4d78 Compare February 4, 2026 13:27

Fly-Style requested a review from kfaraz February 4, 2026 13:27

Update indexing-service/src/main/java/org/apache/druid/indexing/seeka…

fe1b605

…blestream/supervisor/autoscaler/CostBasedAutoScaler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Fly-Style marked this pull request as ready for review February 4, 2026 13:29

Clarify minScaleDownDelay

504966f

kfaraz changed the title ~~Introduce temporary config params to tweak high lag handling~~ Add optional plugins to basic cost function in CostBasedAutoScaler Feb 4, 2026

kfaraz reviewed Feb 4, 2026

View reviewed changes

Fly-Style added 3 commits February 4, 2026 17:14

Merge branch 'cba-cost-adjustments' of github.com:Fly-Style/druid int…

6c71a88

…o cba-cost-adjustments

Move BurstScaleUpOnHighLagPlugin on logarithm base instead of sqrt

75b15d6

Get rid of useBurstScaleOnHeavyLag flag

e3b2733

Fly-Style requested a review from kfaraz February 4, 2026 15:31

kfaraz reviewed Feb 4, 2026

View reviewed changes

Fly-Style and others added 2 commits February 5, 2026 10:58

Align idle decay with task PPT boundary

de7f15f

Update docs/ingestion/supervisor.md

0272180

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Fly-Style requested a review from kfaraz February 5, 2026 09:00

Update docs

aeb7bec

kfaraz approved these changes Feb 5, 2026

View reviewed changes

Final review cleanup

83a0f23

kfaraz merged commit 67abdc2 into apache:master Feb 5, 2026
37 checks passed

		int minPartitionsPerTask = partitionCount / taskCountMax;
		int maxPartitionsPerTask = partitionCount / taskCountMin;

Conversation

Fly-Style commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Details

Idle decay on high lag

Task boundaries: formula

Details

Example Behavior (48 Partitions, threshold=50K)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fly-Style Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfaraz Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fly-Style commented Feb 2, 2026 •

edited

Loading

Fly-Style Feb 5, 2026 •

edited

Loading

kfaraz Feb 4, 2026 •

edited

Loading