⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 21, 2026

🚅 Deployed to the rivet-pr-3989 environment in rivet-frontend

Service Status Web Updated (UTC)
website ❌ Build Failed (View Logs) Web Jan 24, 2026 at 11:37 pm
frontend-inspector ❌ Build Failed (View Logs) Web Jan 24, 2026 at 11:37 pm
frontend-cloud ❌ Build Failed (View Logs) Web Jan 24, 2026 at 11:37 pm

Copy link
Member Author

NathanFlurry commented Jan 21, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 21, 2026

PR Review: Actor Messages & Queues

This PR introduces a comprehensive queue messaging system for Rivet Actors. Overall, the implementation is solid with good test coverage and appropriate error handling. Here are my findings:

✅ Strengths

  1. Comprehensive Test Coverage: The test suite in actor-queue.ts covers key scenarios including client-to-actor messaging, self-messaging, array-based queue names, timeouts, abort handling, and limit enforcement.

  2. Proper Error Handling: Well-defined error types for queue-specific issues:

    • QueueFull - when queue size limit is reached
    • QueueMessageTooLarge - message size validation
    • QueueMessageInvalid - type validation
    • ActorAborted - proper cleanup on abort
  3. Robust Metadata Management: The #rebuildMetadata() method provides recovery when metadata becomes corrupted, showing good defensive programming.

  4. Proper Resource Cleanup: Queue waiters properly clean up timeout handles and abort listeners to prevent memory leaks.

  5. Configurable Limits: Queue size and message size limits are configurable through actor options.

  6. Type Safety: Good use of TypeScript types and overloads in ActorQueue.next().


🔍 Issues & Recommendations

Critical Issues

1. Race Condition in Queue Waiter Resolution (queue-manager.ts:300-325)

The #maybeResolveWaiters() method iterates through all waiters and drains messages for each one sequentially. If multiple waiters are waiting for the same queue name, the first waiter will drain all available messages, leaving subsequent waiters with nothing even if messages arrive.

// Current code - problematic
for (const waiter of pending) {
    const messages = await this.#drainMessages(
        waiter.nameSet,
        waiter.count,
    );
    if (messages.length === 0) {
        continue; // Next waiter gets nothing even if it wants the same queue
    }
    // ...
}

Recommendation: Implement fair distribution or FIFO ordering for waiters. Consider:

  • Processing waiters in the order they were added
  • Distributing messages fairly among waiters wanting the same queue
  • Or explicitly documenting this "first-come-first-served" behavior

2. Potential Message Loss on Metadata Update Failure (queue-manager.ts:288-298)

Messages are deleted first, then metadata is updated in a separate operation. If the metadata update fails, the messages are lost but the size counter remains incorrect.

await this.#driver.kvBatchDelete(this.#actor.id, keys);
await this.#driver.kvBatchPut(this.#actor.id, [
    [KEYS.QUEUE_METADATA, this.#serializeMetadata()],
]); // If this fails, messages are lost but size is wrong

Recommendation: Either:

  • Update metadata first, then delete messages (over-counting is safer than under-counting)
  • Use a transaction if the driver supports it
  • Add error recovery in the catch block to rescan and rebuild metadata

3. Missing Abort Signal Check After Async Operations (queue-manager.ts:311-315)

Between the async #drainMessages call and resolving the waiter, the actor could be aborted but we don't check for it.

const messages = await this.#drainMessages(waiter.nameSet, waiter.count);
// Actor could have been aborted during drainMessages
if (messages.length === 0) {
    continue;
}
waiter.resolve(messages); // Resolving even though actor is aborted

Recommendation: Add abort signal check after #drainMessages returns.


Medium Priority Issues

4. Inefficient Queue Scanning (queue-manager.ts:246-279)

#loadQueueMessages() is called frequently and always loads ALL messages from storage, decodes them, and sorts them. For large queues this is inefficient.

Recommendation: Consider:

  • Caching decoded messages in memory
  • Only reloading when necessary (after enqueue/dequeue)
  • Using a priority queue data structure

5. Size Mismatch Silently Corrected (queue-manager.ts:274-277)

When the actual message count doesn't match metadata size, it's silently corrected without logging.

if (this.#metadata.size \!== decoded.length) {
    this.#metadata.size = decoded.length;
    this.#actor.inspector.updateQueueSize(this.#metadata.size);
}

Recommendation: Log this event at warn/error level since it indicates data corruption or a bug.

6. Missing Input Validation (queue-manager.ts:156-161)

The receive method doesn't validate that names array is non-empty or that count is positive before clamping.

async receive(
    names: string[],  // Could be empty
    count: number,    // Could be negative or 0 before clamping
    timeout?: number,
    abortSignal?: AbortSignal,
): Promise<QueueMessage[] | undefined> {
    const limitedCount = Math.max(1, count);
    const nameSet = new Set(names);

Recommendation: Validate inputs and throw descriptive errors for invalid parameters.

7. Queue Proxy "then" Special Case (queue.ts:103)

The proxy returns undefined for the "then" property to prevent Promise auto-unwrapping, but this is a bit magical.

if (prop === "then") return undefined;

Recommendation: Add a comment explaining why this is necessary (prevents the proxy from being treated as a thenable).


Low Priority / Nice-to-Have

8. Inconsistent Error Message Format

Some error messages end with periods, others don't. Some include documentation links, others don't.

Recommendation: Standardize error message formatting across the codebase.

9. Type Safety in Queue Proxy

The QueueProxy uses string indexing which doesn't provide compile-time type safety for queue names.

Recommendation: Consider using a builder pattern or typed proxy that knows valid queue names from the actor definition.

10. Missing JSDoc Comments

Private methods in QueueManager lack JSDoc comments explaining their purpose and behavior.

Recommendation: Add JSDoc comments for complex private methods like #maybeResolveWaiters and #rebuildMetadata.

11. Inconsistent Logging Practices (queue-manager.ts:81-86)

Metadata decode failure logs the error object directly:

this.#actor.rLog.error({
    msg: "failed to decode queue metadata, rebuilding from messages",
    error,
});

But according to CLAUDE.md, structured logging should be used with field syntax like ?error.

Recommendation: Use this.#actor.rLog.error({ msg: "...", ?error }) for consistency with logging patterns.


📝 Code Quality Notes

  1. Good Use of Private Fields: Proper encapsulation with # private fields.
  2. Defensive Programming: Metadata rebuild logic shows good resilience.
  3. Clean Separation: Queue interface (ActorQueue) nicely separated from management logic (QueueManager).
  4. Proper TypeScript Overloads: The next() method uses function overloads effectively.

🧪 Testing Recommendations

  1. Add Concurrency Tests: Test multiple waiters for the same queue name to verify the race condition behavior.
  2. Add Failure Recovery Tests: Test metadata corruption scenarios and recovery.
  3. Add Stress Tests: Test with queue size limits and many messages.
  4. Add Edge Cases:
    • Empty string queue names
    • Very long queue names
    • Duplicate names in the names array
    • Negative timeouts

📋 Minor Suggestions

  1. Consider adding a clear() method to empty the queue.
  2. Consider adding a peek() method to view messages without removing them (note: getMessages() exists but isn't exposed publicly).
  3. Consider adding queue statistics (oldest message age, average message size, etc.).

✅ Summary

This is a well-structured implementation with good error handling and test coverage. The main concerns are:

  1. Critical: Race condition in waiter resolution
  2. Critical: Potential message loss on metadata update failure
  3. Medium: Performance issues with repeated full queue scans
  4. Medium: Missing input validation

Once the critical issues are addressed, this will be a solid feature. The code follows most of the project conventions and demonstrates good software engineering practices.

Recommendation: Address the critical issues before merging, and consider the medium-priority improvements in follow-up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants