Feat: tree listing denormalization #1562

MarceloRobert · 2025-10-08T20:30:49Z

Changes

Adds a new table with the direct TreeListing data, without the need for processing
Adds a new view to consume that data
Updates ingester to insert the data into PendingBuilds
Updates the processPending command to also process the data for treeListing
- While at it, also refactors a couple of functions in order to be reused between hardware and treeListing denormalization
- Also renames the ProcessedHardwareStatus to be more generic and also used by the TreeListing

Benefits

Testing with a direct call to the old and new tree listing endpoints, the average of 10 calls was:

	average time
old	313.2ms
new	23.1ms

This represents a 13.56x improvement in performance with a small amount of data. Since the query for the new listing is a single SELECT statement, the response time won't grow as fast as the old query when more data is present, meaning that there will be an even better ratio. The tests were made with 50000 real-world submissions locally.

How it works

When a new checkout is received, it adds the tree to treeListing with fresh numbers. If that tree already exists and the checkout is newer, the tree will be updated and all counts will reset to 0; if the checkout is older then it does nothing. It will show the count as 0 until the pending tests and builds are processed. Showing 0 counts is useful in order to see if a checkout worked but the builds/tests weren't send for some reason, and such problem should be seen and addressed as soon as possible.

The processing of new builds follows what has been done for the hardwareListing. When a new build is received it updates the PendingBuilds table, and when the process_pending_aggregations command is executed it will count the pending build's status on a single treeListing item, then this item will update the treeListing table only for the checkouts that are already existing (so that builds of older checkouts don't update the counts of newer checkouts for a tree).

How to test

Run the migrations for the new table
Start the ingester
- Create a temporary folder for the submissions as backend/kernelCI_app/management/commands/tmp_submissions (you can change the path, just update the command below)
- Make sure you have your trees-name.yml file somewhere. It usually is on /backend/volume_data as a result of the treeproof command. If you are feeling lazy and you don't want to run the command, there's an example file here (Google Drive link)
- Make sure you have a couple of example submission files for testing. You can grab some in this zip file here (Google Drive link)
- Export the environment variables in your terminal, pointing one of the databases to the local one. Make sure the local one is set as the default one with USE_DASHBOARD_DB=True
- Now you'll be able to run the command with poetry run python3 ./manage.py monitor_submissions --spool-dir kernelCI_app/management/commands/tmp_submissions --trees-file volume_data/trees-name.yaml (being in the /backend directory, update the paths if necessary)
Insert submissions in the folder and check that they are being inserted in the database
You can also test the ingester on docker, make sure the environment variables are on .env.backend and use docker compose run --rm backend before running the command shown above.

Closes #1558

WilsonNet · 2025-10-13T13:59:54Z

backend/kernelCI_app/views/treeListingView.py

+            git_repository_branch,
+            git_commit_hash,
+            git_commit_name,
+            git_commit_tags,


shouldn't be more 1:1 to the frontend? like, only one commit tag instead of an array

I'm basing off of the treeView response, which returns the tags as an array (btw a list[list[str]] which I think is a mistake), and the frontend is the one to only select the first one I think

backend/kernelCI_app/views/treeListingView.py

backend/kernelCI_app/helpers/trees.py

backend/kernelCI_app/management/commands/helpers/denormal.py

backend/kernelCI_app/models.py

backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py

backend/kernelCI_app/models.py

backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py

backend/kernelCI_app/helpers/trees.py

backend/kernelCI_app/queries/tree.py

backend/kernelCI_app/management/commands/process_pending_aggregations.py

barbieri · 2025-12-19T00:03:39Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

-    builds_by_id: dict[str, Builds],
+    test_builds_by_id: dict[str, Builds],


why are you doing this rename?

To be easier to differentiate since these are only builds related to the tests being passed. In the case of treeDetails, we use general builds, not just ones with tests

ok, thanks (and worth writing in the function docs since you're already enhancing it)

backend/kernelCI_app/management/commands/process_pending_aggregations.py

barbieri · 2025-12-19T00:13:46Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+                    test_pass, test_failed, test_inc
                )
+                VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
+                ON CONFLICT (origin, tree_name, git_repository_url, git_repository_branch) DO UPDATE SET


what about git_commit_hash, git_commit_name, git_commit_tags? keep the original? update?

it would be nice to comment about those we left out.

@MarceloRobert i think we also need to update the other git_* values, no?

true, we just don't need to update the fields used on on conflict, but I missed those other ones

@gustavobtflores in such case, shouldn't the query for the hardware status also update the fields other than just the counts? check it in the current _process_hardware_status method

Actually, these fields don't need to be updated even in the tree query because the treeListing table is supposed to only contain the latest checkouts, so this query for updating the counts will only update the counts of existing checkouts, meaning that the values for commit_hash, commit_tags and others won't change. Same for the hardware query

barbieri · 2025-12-19T00:15:04Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

-                    ready_tests,
-                    ready_builds,
+        t0 = time.time()
+        with connections["default"].cursor() as cursor:


why change from connection to connections['default'], it's inconsistent across this file :-(

because django says that connection is for backwards compatibility, and to prefer using connections['default']

reverted to connection in order to be the same as the other one used in the file

barbieri · 2025-12-19T00:20:07Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+            ready_builds.append(pending_build)
+            build_checkouts_by_id[pending_build.build_id] = checkout
+
+        if not ready_builds:


is this clause correct? if we do have ready_builds (the missing "else" clause) then we return None? it will crash the caller 👀 (since it will unpack the tuple, which is not a tuple but None)

barbieri · 2025-12-19T00:24:49Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+                hardware_status_data, new_processed_entries_hardware = (
+                    aggregate_hardware_status(ready_tests, test_builds_by_id)
+                )
+                self._process_hardware_status(
+                    hardware_status_data,
+                    new_processed_entries_hardware,
+                )
+                self._process_new_processed_entries(new_processed_entries_hardware)


move these into another function, so whenever it returns hardware_status_data, new_processed_entries_hardware will be garbage collected.

Same for the block below with tree_listing_data, new_processed_entries_tree... otherwise all of these objects will be kept in memory (may cause memory peak)

otherwise all of these objects will be kept in memory (may cause memory peak)

wouldn't they be overwritten in the next loop?

yes, but just in the next loop iteration, keeping stuff alive when they are not needed may cause the memory peak, if you reduce context and GC runs once the smaller function returns, the objects related to processing hardware and then tree listing will be gone sooner

barbieri · 2025-12-19T00:31:01Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+                (
+                    ready_builds,
+                    build_checkouts_by_id,
+                    last_processed_build_id,
+                    skipped_no_checkout,
+                ) = self._get_ready_builds(
+                    last_processed_build_id=last_processed_build_id,
+                    batch_size=batch_size,


since this is only used by _aggregate_tree_listing, whenever you reorganize into smaller functions, move this block + _delete_ready_builds inside it...

that is:

here in this function you call _get_ready_tests() since it's shared.

call self._process_hardware_status(ready_tests, test_builds_by_id), internally it will call aggregate_hardware_status() and self._process_new_processed_entries(new_processed_entries_hardware)

call self._process_tree_listing(ready_tests, test_builds_by_id), internally it will call _get_ready_builds(), aggregate_tree_listing(), self._process_new_processed_entries(new_processed_entries_tree) and then self._delete_ready_builds(ready_builds).

this function cleans up self._delete_ready_tests(ready_tests)

Since _get_ready_builds uses last_processed_build_id and skipped_no_checkout, which are variables declared/used within this loop, I think that they could stay inside the loop as well. The similarity between processing the tests for hardware and builds for trees also makes it easier to group the functions and understand them IMO; I made some small changes as you proposed, not much.

barbieri · 2025-12-19T00:31:36Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+            ready_tests.append(pending_test)
+            test_builds_by_id[build.id] = build
+
+        if not ready_tests:


ditto, is this right?

gustavobtflores · 2026-01-05T18:02:01Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

+                    test_pass, test_failed, test_inc
                )
+                VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
+                ON CONFLICT (origin, tree_name, git_repository_url, git_repository_branch) DO UPDATE SET


@MarceloRobert i think we also need to update the other git_* values, no?

backend/kernelCI_app/management/commands/process_pending_aggregations.py

Part of the changes for the treeListing denormalization

backend/kernelCI_app/management/commands/process_pending_aggregations.py

gustavobtflores · 2026-01-14T15:24:42Z

backend/kernelCI_app/models.py

+    # If we already processed an item, but the previous status is null and the new one is not-null,
+    # we need to process it again. That's why we store the status here.
+    status = models.CharField(
+        max_length=1, choices=SimplifiedStatusChoices.choices, null=True
+    )


we aren't using it yet, right? maybe remove this status from the table and comeback to it when we decide to really solve the issue with the null -> non-null status thing

gustavobtflores · 2026-01-14T15:27:11Z

backend/kernelCI_app/management/commands/process_pending_aggregations.py

    if to_process in existing_processed:
        return False


i think here we have the culprit on why the countings in the tree listing are incorrect, we could check if the existing_processed has a different status and then count a -1 or +1 to the tree that was affected by the status change

Closes #1558

In case some submissions are sent twice, one might have null status and the next one might have the correct status. Since the usual table coalesces the values, so should do the new one.

The hardware listing only considers tests that have a platform defined, but for the treeListing we should also allow tests that have no platform to go through the PendingTest table

MarceloRobert self-assigned this Oct 8, 2025

MarceloRobert force-pushed the feat/new-tree-view branch from a07b7e7 to de536f8 Compare October 9, 2025 19:37

MarceloRobert changed the title ~~Feat: new tree view~~ WIP: Feat: new tree view Oct 9, 2025

MarceloRobert force-pushed the feat/new-tree-view branch from de536f8 to dc4f725 Compare October 10, 2025 19:52

WilsonNet reviewed Oct 13, 2025

View reviewed changes

backend/kernelCI_app/views/treeListingView.py Outdated Show resolved Hide resolved

MarceloRobert force-pushed the feat/new-tree-view branch from bc8ee59 to 2fa5dc4 Compare December 9, 2025 19:00

MarceloRobert force-pushed the feat/new-tree-view branch 3 times, most recently from bf77a51 to c1214ef Compare December 17, 2025 18:27

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/helpers/trees.py Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/helpers/trees.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/management/commands/helpers/denormal.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/management/commands/helpers/denormal.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/models.py Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/models.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 17, 2025

View reviewed changes

backend/kernelCI_app/models.py Outdated Show resolved Hide resolved

MarceloRobert force-pushed the feat/new-tree-view branch 3 times, most recently from 9a1d7a5 to 28cd6a5 Compare December 18, 2025 19:51

MarceloRobert changed the title ~~WIP: Feat: new tree view~~ Feat: tree listing denormalization Dec 18, 2025

MarceloRobert added Backend Most or all of the changes for this issue will be in the backend code. Database Issue that alters only configs of a database itself labels Dec 18, 2025

MarceloRobert marked this pull request as ready for review December 18, 2025 19:57

MarceloRobert commented Dec 18, 2025

View reviewed changes

backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py Outdated Show resolved Hide resolved

MarceloRobert commented Dec 18, 2025

View reviewed changes

backend/kernelCI_app/management/commands/helpers/aggregation_helpers.py Show resolved Hide resolved

MarceloRobert force-pushed the feat/new-tree-view branch from 28cd6a5 to b4b8876 Compare December 18, 2025 20:17

gustavobtflores reviewed Dec 18, 2025

View reviewed changes

backend/kernelCI_app/helpers/trees.py Outdated Show resolved Hide resolved

backend/kernelCI_app/helpers/trees.py Outdated Show resolved Hide resolved

gustavobtflores reviewed Dec 18, 2025

View reviewed changes

backend/kernelCI_app/queries/tree.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 19, 2025

View reviewed changes

backend/kernelCI_app/management/commands/process_pending_aggregations.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 19, 2025

View reviewed changes

backend/kernelCI_app/management/commands/process_pending_aggregations.py Outdated Show resolved Hide resolved

barbieri reviewed Dec 19, 2025

View reviewed changes

MarceloRobert force-pushed the feat/new-tree-view branch 3 times, most recently from 0cf02d7 to b57387a Compare December 19, 2025 15:07

gustavobtflores reviewed Jan 5, 2026

View reviewed changes

MarceloRobert force-pushed the feat/new-tree-view branch 6 times, most recently from 53b4a75 to 3e4a505 Compare January 8, 2026 16:42

refactor: rename ProcessedHardwareStatus to be more generic

edb875a

Part of the changes for the treeListing denormalization

MarceloRobert force-pushed the feat/new-tree-view branch 3 times, most recently from 5be6506 to bfcdbf4 Compare January 13, 2026 18:11

gustavobtflores reviewed Jan 14, 2026

View reviewed changes

MarceloRobert added 2 commits January 14, 2026 16:10

feat: add treeListing denormalization logic

681eb50

feat: add treeListingV2 view

cd3d747

Closes #1558

MarceloRobert force-pushed the feat/new-tree-view branch from bfcdbf4 to 465a602 Compare January 14, 2026 19:12

MarceloRobert added 2 commits January 14, 2026 16:48

fix: check null status before inserting pending items

d98d094

In case some submissions are sent twice, one might have null status and the next one might have the correct status. Since the usual table coalesces the values, so should do the new one.

fix: treeListing tests without platform

366480f

The hardware listing only considers tests that have a platform defined, but for the treeListing we should also allow tests that have no platform to go through the PendingTest table

MarceloRobert force-pushed the feat/new-tree-view branch from 465a602 to 366480f Compare January 14, 2026 19:48

		builds_by_id: dict[str, Builds],
		test_builds_by_id: dict[str, Builds],

Feat: tree listing denormalization #1562

Are you sure you want to change the base?

Feat: tree listing denormalization #1562

Conversation

MarceloRobert commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Benefits

How it works

How to test

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarceloRobert Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

MarceloRobert commented Oct 8, 2025 •

edited

Loading

MarceloRobert Jan 6, 2026 •

edited

Loading