Skip to content

[core] Make local table query lookup thread-safe#8356

Open
Stephen0421 wants to merge 1 commit into
apache:masterfrom
Stephen0421:lookup-concurrency-improve
Open

[core] Make local table query lookup thread-safe#8356
Stephen0421 wants to merge 1 commit into
apache:masterfrom
Stephen0421:lookup-concurrency-improve

Conversation

@Stephen0421

Copy link
Copy Markdown
Contributor

Purpose

This PR makes local table lookup safer for concurrent access.

LocalTableQuery previously synchronized the whole lookup method. This PR replaces the coarse-grained query-level synchronization with bucket-level state and read/write locking, so lookup() and refreshFiles() have explicit concurrency semantics per bucket.

It also hardens lookup file lifecycle handling:

  • Make LookupFile access and close idempotent/thread-safe.
  • Serialize lookup file creation per data file to avoid duplicate local lookup file creation.
  • Synchronize shared key serializer and persist processor usage.
  • Ensure local lookup files are cleaned when LookupFile.close or LookupLevels.createLookupFile fails on checked or unchecked exceptions.

This is intended as correctness groundwork for concurrent local lookup usage, rather than a standalone throughput benchmark change.

Tests

Added tests:

  • LookupFileTest#testCloseCleansLocalStateWhenReaderCloseFails
  • LookupLevelsTest#testCreateReaderRuntimeFailureCleansLocalFile
  • Additional lookup concurrency / table query coverage in LookupLevelsTest and PrimaryKeySimpleTableTest

Verified by:

  • mvn -pl paimon-core -Pfast-build -Dtest=LookupFileTest,LookupLevelsTest test
  • mvn -pl paimon-core -Pfast-build -Dtest=PrimaryKeySimpleTableTest#testTableQueryForLookup+testTableQueryForLookupLocalSortFile+testTableQueryForNormal+testTableQueryDownloadsRemoteLookupFile test

addLocalFile(file, lookupFile);
}
} finally {
lookupFileLocks.remove(fileName, lock);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the per-file lock here can split synchronization while other threads are already waiting on the old lock. If the first creator fails, or the newly-created lookup file is immediately evicted because of the disk-size limit, a later caller can install a new lock while an older waiter continues under the removed lock; then two creators may run for the same data file. With deterministic local file names this can surface as createNewFile() failures. Consider a ref-counted/striped keyed lock or a single-flight Future entry that is not removed until no waiters remain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by replacing the removable per-file lock map with fixed striped keyed locks. This avoids splitting waiters across old/new lock objects when lookup file creation fails or the cache entry is evicted. Added a regression test covering the first-creator-fails + old-waiter + later-caller scenario.

public synchronized InternalRow lookup(BinaryRow partition, int bucket, InternalRow key)
throws IOException {
Map<Integer, LookupLevels<KeyValue>> buckets = tableView.get(partition);
public InternalRow lookup(BinaryRow partition, int bucket, InternalRow key) throws IOException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing the method-level synchronization here, cache misses for different files in the same bucket can now enter LookupLevels concurrently. Those paths share the same KeyValueFileReaderFactory, whose formatReaderMappings cache is a plain HashMap mutated via computeIfAbsent during createRecordReader. The old synchronized lookup serialized that path; this change exposes the shared factory to concurrent mutation. Please make that cache thread-safe or serialize reader creation for the shared factory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

@Stephen0421 Stephen0421 force-pushed the lookup-concurrency-improve branch from bb4b206 to 924696e Compare June 26, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants