fix: Race condition for reader after shape writer flushes data #3719
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Race Condition Analysis: PureFileStorage ETS Read/Write Race
Executive Summary
A race condition exists in PureFileStorage where readers can miss data that exists in the system
because they use stale metadata to decide where to read from, while the writer clears the ETS
buffer between the reader's metadata check and data access.
The Bug
Location
lib/electric/shape_cache/pure_file_storage.ex:1138-1143- Reader decision logiclib/electric/shape_cache/pure_file_storage/write_loop.ex:339- ETS clearing on flushRoot Cause
The reader reads metadata as a snapshot, then uses that snapshot to decide whether to read from
disk or ETS. However, the writer can flush (updating metadata and clearing ETS) between these
two operations.
The Race Window
The data IS on disk (flushed at T2), but the reader decided to read from ETS at T1 when
last_persisted = X. By T3, ETS has been cleared, so reader gets nothing.Impact
Formal Verification
See
PureFileStorageRace.leanfor a Lean 4 model proving:Elixir Test
See
test/electric/shape_cache/pure_file_storage_race_test.exsfor tests demonstrating:Recommended Fixes
Option A: Re-check metadata before ETS read (Minimal change)
Option B: Delayed ETS clearing
Don't clear ETS immediately on flush. Instead, use a generation counter or timestamp
to let readers drain before clearing.
Option C: Read-copy-update pattern
Keep old ETS data until readers are done, similar to RCU in kernel programming.
Comparison with First Bug
Code Locations
lib/electric/shape_cache/pure_file_storage.ex:1138-1143- Decision to read from ETSlib/electric/shape_cache/pure_file_storage/write_loop.ex:339- trim_ets calllib/electric/shape_cache/pure_file_storage/write_loop.ex:292-295- trim_ets implementation