⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

25.8.15 Stable Backport of #90059: Unblock ttl part drops for cold volumes#1363

Open
zvonand wants to merge 1 commit intoreleases/25.8.15from
backports/25.8.15/90059
Open

25.8.15 Stable Backport of #90059: Unblock ttl part drops for cold volumes#1363
zvonand wants to merge 1 commit intoreleases/25.8.15from
backports/25.8.15/90059

Conversation

@zvonand
Copy link
Collaborator

@zvonand zvonand commented Feb 3, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Split part ranges by volume characteristics to enable TTL drop merges for cold volumes. After this patch, parts with a max TTL < now will be removed from cold storage. The algorithm will schedule only single part drops. (ClickHouse#90059 by @Michicosun)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

…merges_in_cold_volumes

Unblock ttl part drops for cold volumes
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Workflow [PR], commit [016b43a]

@DimensionWieldr
Copy link
Collaborator

DimensionWieldr commented Feb 4, 2026

CI Failure Analysis

1. Grype Scan (alpine variant)

Failure: Grype Scan altinityinfra/clickhouse-server:1363-25.8.15.10001.altinitytest-alpine - "An error occurred"

Evidence: The regular clickhouse-server Grype scan passed. Only the alpine variant failed with a generic error.

Verdict: Transient infrastructure error. Unrelated to PR changes.


2. Regression Tests (aggregate_functions_1) - Pre-existing Failure

Failure: /aggregate functions/part 1/uniqTheta (both aarch64 and release)

Evidence these are pre-existing on base branch (releases/25.8.15):

  • MasterCI #21629577076 (Feb 03): AggregateFunctions (1) = failure
  • MasterCI #21579463803 (Feb 02): AggregateFunctions (1) = failure

Root cause: uniqTheta was updated in upstream. Test will be updated separately by QA team.

Verdict: Pre-existing failure on base branch. Unrelated to PR changes.


3. Regression Tests (S3 suites) - Pre-existing Failures

Failures:

  • s3_aws_s3_2 (aarch64 and release)
  • s3_azure_2 (aarch64 and release)
  • s3_minio_2 (aarch64 and release)

Evidence these are pre-existing on base branch:

Root cause: Zero copy replication tests failing - tracked in Altinity/clickhouse-regression#93

Verdict: Pre-existing failures on base branch. Unrelated to PR changes.


4. Regression Tests (alter_attach_1, kerberos, parquet_minio) - Infrastructure Failures

Failures:

  • Regression release alter_attach_1 - 17 steps (14 ok, 3 failed)
  • Regression release kerberos - 14 steps (11 ok, 3 failed)
  • Regression release parquet_minio - 14 steps (11 ok, 3 failed)

Root cause: All three failed with identical error: could not bring up docker-compose cluster

Evidence: These tests passed on base branch MasterCI #21629577076, indicating this is a transient Docker infrastructure issue rather than a test regression.

Verdict: CI infrastructure failure (Docker). Unrelated to PR changes.


5. Integration Tests (amd_asan, old analyzer, 3/6) - Infrastructure Timeout

Failure: Job timed out at 1h45m with no test results

Root cause: GitHub Actions runner timeout before tests could complete.

Verdict: CI infrastructure timeout. Unrelated to PR changes.


6. TSAN Test Failures - Known Flaky Pattern

Failures:

  • Integration tests (amd_tsan, 6/6) - fail: 1, passed: 881
  • Stateless tests (amd_tsan, s3 storage, parallel) - Failed: 1, Passed: 7519
  • Stateless tests (amd_tsan, s3 storage, sequential, 2/2) - Failed: 1, Passed: 306

Evidence this is a known flaky pattern:

  • PR #1332 (merged) had identical Stateless tests (amd_tsan, s3 storage, parallel) failure: Failed: 1, Passed: 7529
  • Base branch MasterCI does not run TSAN stateless/integration tests (only build), so no baseline exists
  • Single test failures among thousands of passing tests indicate flaky behavior, not systematic regression

Verdict: Known flaky TSAN tests. Unrelated to PR changes.


Summary

Test Category Failure Type Related to PR?
Grype Scan (alpine) Infrastructure error No
Regression aggregate_functions_1 (2 jobs) Pre-existing (uniqTheta upstream change) No
Regression S3 (6 jobs) Pre-existing (zero copy replication) No
Regression alter/kerberos/parquet (3 jobs) Docker infrastructure failure No
Integration tests amd_asan 3/6 Infrastructure timeout No
TSAN tests (3 jobs) Known flaky pattern No

Verdict: All failures are unrelated to the PR changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants