Skip to content

[refactor](catalog) Catalog spi 07 paimon#64446

Draft
morningman wants to merge 66 commits into
apache:branch-catalog-spifrom
morningman:catalog-spi-07-paimon
Draft

[refactor](catalog) Catalog spi 07 paimon#64446
morningman wants to merge 66 commits into
apache:branch-catalog-spifrom
morningman:catalog-spi-07-paimon

Conversation

@morningman

Copy link
Copy Markdown
Contributor

No description provided.

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman force-pushed the branch-catalog-spi branch from b8d6426 to f09b6df Compare June 12, 2026 14:22
morningman and others added 28 commits June 12, 2026 22:23
本 session 仅调研+设计。14-agent code-grounded recon + cross-cut 对抗复审,
覆盖 paimon 5 功能区(普通读/系统表/procedure/DDL/mtmv)旧框架实现 →
映射新 catalog SPI → 对齐 maxcompute 连接器接口一致性。

新增:
- research/p5-paimon-migration-recon.md: 5 区旧实现 + E1–E10 SPI 状态 +
  跨切面风险 + MC 一致性 11 约定 + 测试基线
- tasks/P5-paimon-migration.md: old→new 映射 + 30 TODO/B0–B9 批 +
  批次依赖图 + 验收标准

用户签字决策:
- D-037 (P5-D1): flavor=单 Catalog + createCatalog flavor switch(MC 一致,
  不建 backend 模块——5 个 backend 模块是空壳)
- D-038 (P5-D2): MTMV/MVCC 桥 P5 内实现(fe-core PaimonPluginDrivenExternalTable),
  翻闸 gated on 它,禁静默读 latest 回归

证伪 3 先验: backend 模块空壳(连接器走单 Catalog stub)/ FE 分发部分已预接
(残留=连接器 listPartitions)/ Base64 非 blocker(BE 有 STD fallback)。
procedure 区=零可迁 doc-only。

doc 同步: connectors/paimon.md(修 3 stale 表述)、decisions-log.md(+D-037/D-038,
36→38)、PROGRESS.md(header/§一/§二/§三/§四/§六/§七)、HANDOFF.md(覆盖,不留折叠历史)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T01: extract PaimonCatalogOps injection seam (5 read methods, B0 read-only)
over the paimon SDK Catalog; refactor PaimonConnectorMetadata to inject it
(6 call sites migrated, read path byte-for-byte unchanged); build the first
fe-connector-paimon test module (no-mockito recording fake, mirroring MC's
McStructureHelper): 9 metadata UTs pinning the databaseExists try/catch and
the getColumnHandles reload-fallback, FakePaimonTable (fail-loud on non-read
methods), and an env-gated live connectivity smoke.

T02: R-007 paimon.version 3-way pin invariant comment (FE connector + BE
paimon-scanner + preload-extensions already aligned at 1.3.1 via the single
fe/pom.xml property); offline FE->BE serialized-Table round-trip smoke (real
FileSystemCatalog -> connector encode -> BE-mirrored URL-first/STD-fallback
decode, asserts rowType/partition/primary keys); parity-baseline doc
inventorying the 41 existing regression suites as the after-cutover parity
gate plus the real connector-side gaps and the live-e2e hard gate.

Connector module: Tests run: 12, Failures: 0, Errors: 0, Skipped: 1 (the
skip is the env-gated live test); checkstyle 0; import-gate clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-Catalog flavor switch on paimon.catalog.type for all five flavors
(filesystem/hms/rest/jdbc/dlf), mirroring the legacy fe-core flavor
properties without importing fe-core/fe-common.

- New PaimonCatalogFactory: pure validate() + buildCatalogOptions()
  (paimon.catalog.type -> paimon `metastore` opt, per-flavor options,
  paimon.* passthrough excl storage prefixes) + buildHadoopConfiguration /
  buildHmsHiveConf / buildDlfHiveConf + requireOssStorageForDlf.
- PaimonConnector: thread ConnectorContext; createCatalog wires all 5
  flavors live (filesystem/jdbc with Hadoop Configuration, rest
  Options-only, hms/dlf with HiveConf), each wrapped in
  context.executeAuthenticated (Kerberos seam). JDBC DriverShim ported with
  driver-url resolution via getEnvironment() (replaces forbidden JdbcResource).
- PaimonConnectorProperties: all flavor key constants (multi-alias String[]).
- PaimonConnectorProvider: validateProperties override -> factory.validate.
- pom: add paimon-hive-connector-3.1 + hadoop-common + hive-common
  (hive-common over hive-catalog-shade to avoid the fastutil conflict).
- 31 new no-mockito unit tests (PaimonCatalogFactoryTest); module 43/0/0/1,
  checkstyle 0, import-gate clean.

hms/dlf live connection is gated on B7 cutover + live-e2e: the Thrift
metastore client is host-provided (not bundled) with a child-first
Configuration/HiveConf cross-loader hazard to verify; jdbc driver_url FE
security allow-list + external hive-site.xml file load are deferred. All
documented in code NOTEs and plan-doc. rest also requires warehouse
(legacy parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connector-side only; no fe-core / fe-connector-api / fe-connector-spi changes.
B2 and B3 were both uncommitted and are entangled in the same files
(PaimonConnectorMetadata, PaimonCatalogOps, PaimonConnector,
RecordingPaimonCatalogOps), so they are committed together.

B2 normal-read (T06-T10):
- T06 PaimonScanPlanProvider transient-Table reload fallback (planScan +
  getScanNodeProperties both guarded)
- T07 PaimonPredicateConverter parity-correct TZ (NTZ keeps UTC, LTZ not
  pushed) + supportsCastPredicatePushdown=false
- T08 listPartitionNames/listPartitions/listPartitionValues (legacy
  display-name parity) + seam listPartitions(Identifier)
- T09 doc-only pure-predicate pruning; T10 cache deferred to B8

B3 DDL metadata (T11-T15):
- T11 PaimonTypeMapping.toPaimonType (Doris->paimon, byte-parity with legacy
  DorisToPaimonTypeVisitor; narrow gap preserved)
- T12 PaimonSchemaBuilder (ConnectorCreateTableRequest -> paimon Schema)
- T13 createTable/dropTable + seam DDL methods + ConnectorContext threaded
  (D7=B: each DDL op wrapped in executeAuthenticated; read path un-wrapped)
- T14 supportsCreateDatabase/createDatabase (HMS-props gate) +
  dropDatabase(force) (enumerate-loop + native cascade)
- T15 offline UTs (no-mockito; WHY+MUTATION)

Verified: fe-connector-paimon Tests run: 96, Failures: 0, Errors: 0,
Skipped: 1 (live); checkstyle 0; connector import-gate 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port paimon system tables and MVCC snapshots onto the plugin connector SPI.

- T16: greenfield E7 SPI on ConnectorTableOps — listSupportedSysTables +
  getSysTableHandle (default no-ops; MC/jdbc/es/trino unaffected).
- T17: PaimonConnectorMetadata implements E7 — names from
  SystemTableLoader.SYSTEM_TABLES; sys table loaded via the existing
  getTable seam with a 4-arg Identifier(db,table,"main",sysName); sys
  handle carries sysTableName + forceJni (binlog/audit_log); shared
  PaimonTableResolver gives metadata + scan one sys-aware reload rule.
- T18: generic fe-core glue — PluginDrivenExternalTable centralizes handle
  acquisition into resolveConnectorTableHandle and delegates
  getSupportedSysTables to the connector; new PluginDrivenSysExternalTable
  (reports PLUGIN_EXTERNAL_TABLE) + PluginDrivenSysTable reuse the live
  SysTableResolver/NativeSysTable machinery (reusable by future connectors).
- T19: forceJni gate so binlog/audit_log go JNI not native; buildTableDescriptor
  -> HIVE_TABLE (also fixes a latent normal-table SCHEMA_TABLE descriptor gap,
  DV-024); PluginDrivenScanNode fail-loud guard rejects scan-params/time-travel
  on system tables.
- T20: first E5 MVCC consumer — beginQuerySnapshot/getSnapshotAt/getSnapshotById
  (empty table -> -1; sys handle -> empty) + SUPPORTS_MVCC_SNAPSHOT/TIME_TRAVEL
  capabilities. Inert until B5 wires the fe-core MvccTable consumer.

Decisions: D-039 (E7 reuses the live SysTable machinery; RFC §10's
$-suffix-via-getTableHandle design was never implemented and is superseded,
DV-023). Deviations: DV-023, DV-024.

Verification: import-gate 0; connector 124 tests pass (1 live skipped);
fe-core PluginDriven*Test 100 pass; checkstyle 0; no cutover/B5 leakage
(paimon not in SPI_READY_TYPES; PluginDrivenExternalTable still not an MvccTable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ridge + time-travel + procedure doc no-op

B5a (MTMV/MVCC bridge): source-agnostic PluginDrivenMvccExternalTable (MTMVRelatedTableIf+MTMVBaseTableIf+MvccTable, D-042) wiring the B4-inert E5 snapshot SPI; PluginDrivenMvccSnapshot; list-partitions-at-snapshot.
B5b (time-travel): scan-pin + AS-OF + tag + branch + @incr across connector (ConnectorTimeTravelSpec, PaimonIncrementalScanParams) and fe-core; holistic review fixes RD-1 (partitioned time-travel empty-universe scan-all guard in PluginDrivenScanNode) + RD-2 (@incr lists-latest partitions/schema).
B6/T26: procedure doc no-op — zero migratable code; closed-form reject verified (ExecuteActionFactory:59-62 / CallFunc:42-43).
All inert/gated until B7 cutover (paimon NOT yet in SPI_READY_TYPES). Excludes regression-conf.groovy (secrets) + scratch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview fixes

Combines all previously-uncommitted P5 paimon work into one commit (per request).

8 fullpath-review fixes (BLOCKERs + key MAJORs) — connector + SPI + fe-core bridge:
- FIX-STORAGE-CREDS: applyStorageConfig translates canonical s3.*/oss.*/AWS_* ->
  fs.s3a./fs.oss. (+DLF region->OSS endpoint)
- FIX-NATIVE-PARTVAL: per-type serializePartitionValue + session TZ (LTZ only);
  binary/varbinary drops the partition map (no [B@hash garbage)
- FIX-TZ-ALIAS: full legacy ZoneId.SHORT_IDS + 4 Doris overrides alias map
  (CST/PST/EST now resolve for FOR TIME AS OF datetime strings)
- FIX-TABLE-STATS: getTableStatistics override + PaimonCatalogOps.rowCount seam
  (normal AND system tables, via the sys-aware resolveTable)
- FIX-CPP-READER: honor enable_paimon_cpp_reader -> native DataSplit.serialize so
  BE's PaimonCppReader can decode the split
- FIX-READ-NOTNULL: mapFields forces read-path columns nullable (legacy parity)
- FIX-HMS-CONFRES: new ConnectorContext.loadHiveConfResources hook + 2-arg
  buildHmsHiveConf file-base merge (external hive-site.xml reaches the metastore)
- FIX-REST-VENDED: new ConnectorContext.vendStorageCredentials hook + scan-props
  vended AWS_* overlay (REST per-table tokens reach BE)

Also carries the previously-uncommitted B7 core cutover + D-045/D-046 restores.

Tests: fe-connector-paimon 213 pass / 0 fail / 1 skip (live-gated); fe-core compiles +
DefaultConnectorContextVendTest 2/0. Each fix's root-cause/patch/UT and impl-time
corrections are in plan-doc/tasks/designs/P5-fix-<id>-design.md.

Excluded from this commit: regression-test/conf/regression-conf.groovy (plaintext Aliyun
keys, pending scrub) and scratch dirs (.audit-scratch/, conf.cmy/, META-INF/, *.bak).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical scheme

Root cause: the paimon connector sent native ORC/Parquet data-file paths and
deletion-vector (DV) paths to BE un-normalized. The paimon SDK emits
warehouse-native schemes (oss://, cos://, obs://, s3a://, or the OSS
bucket.endpoint authority form); BE's scheme-dispatched S3 file factory only
recognizes s3://. On S3-compatible (non-AWS) warehouses this breaks native reads
outright (B-7DF, data file) and silently drops the DV so DELETEd rows reappear
(B-7DV, merge-on-read corruption). Legacy PaimonScanNode normalized both via the
2-arg LocationPath.of; the cutover dropped it. The two paths reach BE via
different mechanisms (data-file through PluginDrivenSplit's single-arg
LocationPath.of -> FileQueryScanNode:568; DV baked into thrift by the connector's
populateRangeParams), so a fe-core-bridge-only fix cannot reach the DV path.

Solution: new ConnectorContext.normalizeStorageUri SPI hook (identity default,
mirroring vendStorageCredentials), implemented in DefaultConnectorContext via the
engine's 2-arg normalizing LocationPath.of with the catalog's static storage map
(threaded via a new lazy supplier + 4-arg ctor; PluginDrivenExternalCatalog wires
it). The connector routes BOTH the data-file and DV paths through it inside the
extracted, unit-testable buildNativeRange. JNI path untouched (carries its own
FileIO). Fail-loud on un-normalizable paths (legacy parity). Static-vs-vended map
scope noted in DV-025 (the pure-vended edge belongs to credential fixes #2/#3).

Tests: fe-core DefaultConnectorContextNormalizeUriTest (oss->s3, s3 idempotent,
null/blank, empty-map fail-loud); connector PaimonScanPlanProviderTest x3 (both
paths normalized + call count, DV-less, no-context raw). paimon module 216/0/0,
fe-core targeted green, checkstyle 0, import-gate clean. Live OSS+DV e2e CI-gated
(not run). SPI RFC section 21 (E13), deviations DV-025.

Also includes the round-2 review report + task list this fix derives from.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark FIX-URI-NORMALIZE complete (commit 20b19d1) in the task list and update
HANDOFF: #1 summary + verification, next session starts at #2 (reuse the
normalizeStorageUri BE-scan-prop normalization seam), and the standing reminders
(regression-conf.groovy still holds a plaintext key -> path-whitelist only; P2
apache#8/apache#9 need user scope decision first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical AWS_*

Finding B-9 (BLOCKER, rereview2). The paimon connector copied static
catalog-level storage credentials/config verbatim into the BE scan-node
properties: PaimonScanPlanProvider.getScanNodeProperties iterated the raw
catalog properties and emitted location.<rawkey> for any s3./oss./cos./obs./
hadoop./fs./dfs./hive. prefix; the fe-core bridge only strips the location.
prefix. BE's native (FILE_S3) reader understands ONLY AWS_ACCESS_KEY/
AWS_SECRET_KEY/AWS_ENDPOINT/AWS_REGION/AWS_TOKEN, so static s3.access_key/
oss.access_key on a private bucket reached BE unintelligible -> no usable
credentials -> 403. This is the third credential seam (static->BE-scan),
missed by both the prior round and the 8 fixes (review §9.3); the catalog-
FileIO seam (FIX-STORAGE-CREDS) and the vended seam (FIX-REST-VENDED) were
already closed.

Root cause: legacy PaimonScanNode.getLocationProperties returns only
CredentialUtils.getBackendPropertiesFromStorageMap(storagePropertiesMap) (the
canonical AWS_*/hadoop/dfs map). The cutover replaced that single normalized
call with a raw prefix-copy loop; the connector cannot import fe-core's
StorageProperties so it had no access to the normalization.

Solution (D-048, user-signed full legacy-parity scope): new no-op-default SPI
ConnectorContext.getBackendStorageProperties(); DefaultConnectorContext returns
getBackendPropertiesFromStorageMap over the storagePropertiesSupplier already
wired in FIX-URI-NORMALIZE (no ctor change, CredentialUtils already imported).
The connector replaces its raw prefix-copy loop with a context-gated overlay of
that map; the vended overlay stays after it (vended wins on collision, legacy
precedence). Object-store creds -> AWS_*; HDFS -> canonical hadoop/dfs
(preserves user overrides + adds the legacy defaults, folding in the §211
MINOR); drops the non-parity hive.* passthrough. Investigated the
AWS_CREDENTIALS_PROVIDER_TYPE=ANONYMOUS two-step edge and confirmed via BE
s3_util.cpp (both providers prefer explicit ak/sk over cred_provider_type) that
it is harmless — no regression. Connector import-gate stays clean.

Tests: fe-core DefaultConnectorContextBackendStoragePropsTest (OSS static creds
-> AWS_*, raw alias absent; no-supplier -> empty); connector
PaimonScanPlanProviderTest (+getScanNodePropertiesNormalizesStaticCreds raw
alias not shipped; modified vended-overlay collision to canonical keys; renamed
no-context test -> emits no storage props). Fail-before/pass-after proven by
reverting the connector change (2/3 go red). Module 217/0/0 (1 CI-gated skip),
checkstyle clean, import-gate clean. Live private-bucket native-read e2e is
CI-gated (not run). SPI RFC §22 (E14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record FIX-STATIC-CREDS-BE commit d23d5df in the task-list and update
HANDOFF.md (HEAD, migration chain, completed/next sections). Next: #3
FIX-SCHEMA-EVOLUTION (B-1a+M-10) — the largest P0 SPI surface, independent of
#1/#2; recommend a fresh session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ema_info from the connector

Root cause (rereview2 BLOCKER B-1a): on the native (ORC/Parquet) read path the
paimon connector emitted only the per-file TPaimonFileDesc.schema_id but never
set the scan-level TFileScanRangeParams.current_schema_id / history_schema_info.
BE (table_schema_change_helper.h:219-237) then took the !__isset branch and fell
back to NAME-based file<->table column matching, so a schema-evolved (renamed /
reordered) table read NULL/garbage for the renamed columns silently. JNI path is
unaffected; native is the default. (M-10, Column.uniqueId=-1, deferred — DV-026.)

Design C (user-signed D-049): BE's field-id matcher (table_schema_change_helper
.cpp:312-430) reads only TField.id/name and a nested-vs-scalar type.type tag — no
Doris Type, no tuple descriptor — and org.apache.doris.thrift.* is import-legal in
connectors, so the connector builds the TSchema dictionary directly from paimon
SchemaManager and ships it via the existing populateScanLevelParams hook (the seam
DV-006 anticipated for hudi). Zero new SPI surface; connector-only.
  - current_schema_id = -1; history_schema_info = the -1/current (pinned) schema +
    one entry per SchemaManager.listAllIds() so every native file schema_id is
    covered (BE fails loud on a missing entry, never silent).
  - transport: base64 TBinaryProtocol carrier (a throwaway TFileScanRangeParams)
    via a props key, because getScanPlanProvider() is per-call (no shared state).

Clean-room 3-lens review found 2 real BLOCKERs in the -1/current entry (both fixed
+ re-verified): (1) column-name casing — BE keys the table-side StructNode by the
-1 entry's name verbatim while the native reader queries the lowercase Doris slot
name, and current_schema_id=-1 never hits the ConstNode fast-path, so a mixed-case
column crashed (std::out_of_range) even on never-evolved tables; fix lowercases
ONLY top-level names (default-locale, matching the slot-name producer + legacy
parseSchema:507; nested stays paimon-cased per legacy PaimonUtil:302). (2) time
travel — the -1 entry used schemaManager.latest() (absolute latest) instead of the
snapshot-pinned schema the tuple uses; fix builds it from FileStoreTable.schema()
(pinned) and narrows the guard DataTable->FileStoreTable. Eager all-schemas read
accepted as a fail-loud deviation (DV-027).

Tests: PaimonScanPlanProviderTest +5 (field-id/name carriage, nested ARRAY/MAP/
STRUCT shape + struct-child ids, scalar tag, rename round-trip apply, top-level
lowercase vs nested paimon-case, non-FileStoreTable skip). Module 222/0/0 (1
CI-gated skip), checkstyle clean, import-gate clean. e2e
test_paimon_full_schema_change.groovy is CI-gated (not run). Design doc + D-049 +
DV-026/DV-027 + SPI RFC §23 (no new SPI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…at CREATE (B-8a + B-8b)

rereview2 #4. JDBC-metastore-flavor paimon catalogs only. Connector-only, zero new SPI.

Root cause:
- B-8a (functional BLOCKER): PaimonScanPlanProvider.getBackendPaimonOptions forwarded
  driver_url to BE RAW and its `key.startsWith("jdbc.")` filter dropped the `paimon.jdbc.*`
  alias. A bare `jdbc.driver_url=mysql.jar` reached BE, where JdbcDriverUtils.registerDriver
  does `new URL(value)` -> MalformedURLException; a `paimon.jdbc.driver_url` alias was dropped
  outright. Legacy PaimonJdbcMetaStoreProperties.getBackendPaimonOptions emits
  `jdbc.driver_url=JdbcResource.getFullDriverUrl(driverUrl)` (resolved) + `jdbc.driver_class`.
- B-8b (security): driver_url was loaded into the FE JVM (URLClassLoader) and shipped to BE
  with no format / jdbc_driver_url_white_list / jdbc_driver_secure_path validation, plus a
  stale "paimon is not in SPI_READY_TYPES" disclaimer (false since the B7 cutover added paimon
  to CatalogFactory SPI_READY_TYPES).

Solution (reuses existing hooks; no new SPI surface):
- B-8a: getBackendPaimonOptions now reads driver_url via firstNonBlank(JDBC_DRIVER_URL) (honors
  both the jdbc.* and paimon.jdbc.* alias) and emits the canonical `jdbc.driver_url` RESOLVED to
  a scheme-bearing URL plus `jdbc.driver_class` (BE accepts both alias forms). Resolution is
  extracted to a shared static PaimonCatalogFactory.resolveDriverUrl(driverUrl, env) so FE driver
  registration and the BE-bound options resolve a given driver_url identically.
- B-8b: PaimonConnector overrides Connector.preCreateValidation to route a configured driver_url
  (either alias) through ConnectorValidationContext.validateAndResolveDriverPath at CREATE CATALOG
  (format/whitelist/secure-path; throws -> CREATE fails before the jar loads). Mirrors
  JdbcDorisConnector. Stale disclaimer replaced with an accurate note.

Scope (user-signed D-050; see DV-028/DV-029): validation is CREATE-time only — parity with the
JDBC reference connector. The FE-restart-reload / ALTER-CATALOG / scan-time re-validation gap is a
pre-existing fe-core limitation shared by all plugin connectors (default config is permissive);
accepted, with a cross-connector follow-up filed. BE-side paimon.jdbc.{user,password,uri} alias-drop
is out of scope (BE deserializes the table from serialized_table; only driver_url/driver_class are
consumed by registerDriverIfNeeded).

Tests: PaimonScanPlanProviderTest +5 (resolve bare name, honor paimon.jdbc.* alias, both-aliases
priority+override, preserve scheme-bearing, non-jdbc empty); new PaimonConnectorPreCreateValidationTest
+5 (validate jdbc/alias, skip non-jdbc/no-driver_url, propagate rejection). Module 232/0/0 (1 CI-gated
skip); fail-before verified (5/9 new tests red when neutered); checkstyle 0; connector import-gate clean.
Live e2e (JDBC flavor + remote jar) is CI-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#4 FIX-JDBC-DRIVER-URL committed as 2d15b1b (P0 BLOCKERs now all clear).
Fill the #4 task-list commit cell; rewrite HANDOFF to point at #5 (M-crit,
re-verify the dotted-vs-underscore type-mapping key facts before coding).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon type-mapping toggles

Root cause: after the SPI cutover the paimon connector reads the type-mapping
toggles from UNDERSCORE keys (enable_mapping_binary_as_varbinary /
enable_mapping_timestamp_tz; PaimonConnectorProperties:39,42 ->
PaimonConnectorMetadata.buildTypeMappingOptions), but fe-core only ever writes
the canonical DOTTED catalog keys (enable.mapping.varbinary /
enable.mapping.timestamp_tz; CatalogProperty:50,52, written/defaulted by
ExternalCatalog.setDefaultPropsIfMissing and hidden via HIDDEN_PROPERTIES).
PluginDrivenExternalCatalog.createConnectorFromProperties hands the connector
the raw catalog property map verbatim, so getOrDefault(underscore,"false") is
always false. Even when the user enables the mapping at CREATE CATALOG, Paimon
BINARY stays STRING and TIMESTAMP_WITH_LOCAL_TIME_ZONE stays DATETIMEV2 — a
silent cutover regression (legacy PaimonExternalTable:350 reads the dotted key
and honors it). The binary key is doubly drifted (separator . -> _ AND token
varbinary -> binary_as_varbinary), so a generic dot->underscore normalizer
would not fix it. Latent until the flag is enabled.

Re-confirmation: M-crit was critic-surfaced (not 3-lens-gated), so the finding
was independently re-verified by a 5-agent scout + adversarial synthesizer
(REAL_BUG, high confidence; false-positive steelman rejected — dotted is
canonical per the original feature PRs, every regression CREATE CATALOG, legacy
parity, and the JDBC connector which kept dotted in the same SPI PR).

Solution (connector-only, zero new SPI, no BE): re-point the two
PaimonConnectorProperties constants to the canonical dotted keys
(ENABLE_MAPPING_VARBINARY = "enable.mapping.varbinary", renamed from
ENABLE_MAPPING_BINARY_AS_VARBINARY to match the CatalogProperty/JDBC/iceberg
convention and fix both separator and token; ENABLE_MAPPING_TIMESTAMP_TZ =
"enable.mapping.timestamp_tz") and update the one reference in
PaimonConnectorMetadata. No logic change — the Options(mapBinaryToVarbinary,
mapTimestampTz) arg order is already correct. BE-side consistency verified:
PluginDrivenScanNode extends FileQueryScanNode and inherits the dotted-key read
for the BE scan param (FileQueryScanNode:192-193,635-678), so FE column type
and BE scan param now agree (they diverged before this fix).

Scope: paimon-only (user-signed D-051). NEW hive + iceberg connectors share the
identical root cause; logged as a cross-connector follow-up (DV-030), not fixed
here. Rejected an fe-core dot->underscore normalizer (broader blast, breaks
JDBC which already reads dotted, and insufficient for paimon's renamed token).

Tests (PaimonConnectorMetadataTest): +2 UT. getTableSchemaHonorsDottedMappingKeys
(bug-catcher) sets the dotted keys true and asserts BINARY->VARBINARY /
LTZ->TIMESTAMPTZ; getTableSchemaDefaultsMappingFlagsOff (guard) asserts the
default-off STRING/DATETIMEV2. Module 234/0/0 (1 CI-gated skip), checkstyle 0,
import-gate clean. Fail-before verified: the bug-catcher reddens on the
underscore key (expected <VARBINARY> but was <STRING>) while the guard stays
green. E2E test_paimon_catalog_{varbinary,timestamp_tz}.groovy are CI-gated
(enablePaimonTest=false + external fixture) — not run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROS-DOAS

- task-list #5 commit-cell filled with 9dcf6d1
- HANDOFF rewritten: #5 summary + #6 next (two scope questions for the user)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… all read RPCs in doAs (M-11)

Both are Kerberos-only (harmless on simple-auth: the no-op authenticator's
execute() == task.call()).

Root cause
- M-8 (fe-core): paimon filesystem/jdbc catalogs over Kerberized HDFS lost UGI
  doAs on the cutover path. The HDFS HadoopExecutionAuthenticator is built only
  inside initializeCatalog(), which is dead on the plugin path (only legacy
  PaimonExternalCatalog calls it), so PluginDrivenExternalCatalog read the base
  no-op from getExecutionAuthenticator(). HMS was unaffected — it wires the
  authenticator in initNormalizeAndCheckProps(), which always runs.
- M-11 (connector): metadata read RPCs (listDatabases/getDatabase/listTables/
  getTable[handle+sys+resolveTable]/listPartitions) ran without
  executeAuthenticated; only the 4 DDL ops were wrapped (signed D7=B read-vs-DDL
  asymmetry). On a Kerberos HMS catalog these reads ran outside the catalog
  principal. Legacy wrapped every read.

Fix
- M-8 (filesystem+jdbc only; DLF/REST/HMS excluded — DLF uses Aliyun STS not
  Kerberos, the review's "DLF" clause was overstated): new internal fe-core hook
  MetastoreProperties.initExecutionAuthenticator(List<StorageProperties>) (default
  no-op), invoked by PluginDrivenExternalCatalog.initPreExecutionAuthenticator from
  the already-built storage list; filesystem/jdbc override it to build the HDFS
  authenticator (shared AbstractPaimonProperties helper), mirroring HMS. No
  connector change; no connector SPI change.
- M-11 (full legacy parity, signed D-052, supersedes the D7=B read clause): wrap
  all 7 connector read RPCs in context.executeAuthenticated. A single resolveTable
  wrap covers all resolveTable callers (metadata + scan). Domain exceptions are
  caught INSIDE the lambda because Kerberos UGI.doAs wraps a thrown checked
  Catalog.*NotExistException in UndeclaredThrowableException.

Tests
- M-11: PaimonConnectorMetadataReadAuthTest (12) + 2 scan-path tests assert each
  read runs inside executeAuthenticated (RecordingConnectorContext failAuth/
  authCount). Connector module 248/0/0 (1 CI-gated skip).
- M-8: Paimon{FileSystem,Jdbc}MetaStorePropertiesTest assert getExecutionAuthenticator()
  returns HadoopExecutionAuthenticator after wiring without initializeCatalog;
  fe-core metastore-props 21/0/0 (DLF/HMS regression-clean).
- fail-before verified red for both (M-8: stays base no-op AbstractPaimonProperties$1;
  M-11: authCount/log-empty).
- True end-to-end doAs is live-Kerberos-e2e only (no paimon-kerberos suite); DV-031.

Decisions D-052 (M-11) / D-053 (M-8); deviation DV-031; design
plan-doc/tasks/designs/P5-fix-KERBEROS-DOAS-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-SCANNER

#6 fix commit = 2b1442f. Fill task-list commit cell; roll HANDOFF to #7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon connector scan path (M-1)

Root cause: the cutover (plugin) connector's split router read only the
name-derived handle flag paimonHandle.isForceJni() (the binlog/audit_log
NAME hatch) and never consulted the session var force_jni_scanner, so
ORC/Parquet always took the native reader — legacy's JNI escape hatch
(SET force_jni_scanner=true, used to dodge native-reader bugs incl. the
B2 schema-evolution class) was silently gone. The connector ported only
two of legacy's three native-gate conjuncts (PaimonScanNode.java:430:
!forceJniScanner && !forceJniForSystemTable && supportNativeReader); the
dropped !forceJniScanner conjunct is M-1.

Solution (pure connector; no SPI, no fe-core import, no BE param — legacy
serializes nothing for this var):
- new isForceJniScannerEnabled(session): byte-for-byte mirror of
  isCppReaderEnabled, reads key "force_jni_scanner" (byte-identical to
  SessionVariable.FORCE_JNI_SCANNER) from the same VariableMgr.toMap
  channel; null-guarded, default false (legacy default).
- Site A (correctness): shouldUseNativeReader gains an explicit
  forceJniScanner param (mirrors legacy's sibling boolean 1:1) ANDed into
  the native gate; planScan passes isForceJniScannerEnabled(session). The
  handle name-force is OR-sibling, never replaced (binlog/audit_log intact).
- Site B (correctness-neutral): getScanNodeProperties suppresses the
  native-only paimon.schema_evolution dict when force_jni_scanner routes
  every split to JNI (BE consumes it only on native ORC/Parquet ranges;
  JNI/cpp readers ignore it). Matches the connector's own documented contract.

Tests (fail-before + pass-after both verified):
- isForceJniScannerEnabledReadsSessionProperty: pins the exact key,
  default-false, null-safety.
- forceJniScannerRoutesNativeEligibleSplitToJni: a native-eligible split
  must route to JNI when force_jni_scanner=true (legacy parity).
- 3 existing shouldUseNativeReader calls updated for the new param.
- Module 250/0/0 (+1 CI-gated live skip); connector import-gate + checkstyle clean.
- Real BE reader selection is a CI-gated live-e2e check (no offline coverage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-COUNT-PUSHDOWN (P2, ask scope first)

- task-list: #7 row → ✅ design/impl/build(250/0/0)/commit `05132a42668` + DONE detail.
- HANDOFF: #7 summary (3rd-param overrides synthesizer call-site-OR per Rule 9;
  Site B correctness-neutral, no offline red test honestly noted); next = apache#8/apache#9
  P2 perf-parity → AskUserQuestion for scope (accept-or-defer) BEFORE implementing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(*) on plugin paimon (M-2)

Root cause: after cutover, COUNT(*) over a plugin-driven paimon table is
result-correct but slow. The COUNT enum already reaches BE
(FileScanNode.toThrift:90; PhysicalPlanTranslator:873 sets it on the plugin
node, not excluded) and the per-range emit seam is already built
(PaimonScanRange.Builder.rowCount -> paimon.row_count -> setTableLevelRowCount,
byte-identical to legacy PaimonScanNode:303-308). The missing half is the
signal + compute: DataSplit.mergedRowCount() is paimon-SDK-only (connector),
and the getPushDownAggNoGroupingOp()==COUNT signal lives only on the fe-core
node and reached nobody. So every split carried table_level_row_count=-1 and
BE materialized the full post-merge row set just to count (file_scanner.cpp:
1298-1326) — costly on PK/MOR tables.

Not pure-connector: the signal must cross the SPI boundary. Threading it via
ConnectorSession (the FIX-FORCE-JNI precedent) was rejected — the agg-op is a
per-query planner output, not a SET-variable, and would be a silent untyped
channel.

Solution (3 files; user signed off, D-054):
- SPI (ConnectorScanPlanProvider): new default planScan overload carrying
  `boolean countPushdown`, delegating to the 6-arg variant — mirrors the
  limit/requiredPartitions extension chain; other connectors are no-op (E15).
- fe-core (PluginDrivenScanNode.getSplits): read
  getPushDownAggNoGroupingOp()==TPushAggOp.COUNT and forward the flag. No
  post-loop math.
- connector (PaimonScanPlanProvider): extract planScanInternal(...,countPushdown)
  (4-arg delegates false, new 7-arg delegates the flag); add the count
  short-circuit as the FIRST routing arm (a count-eligible split must not also
  emit a data range, else BE double-counts vs deletion vectors / PK merge);
  collapse-to-one — sum every count-eligible split's mergedRowCount and emit ONE
  JNI count range bearing the total (= legacy's <=10000 singletonList +
  assignCountToSplits case). New members: static isCountPushdownSplit + buildCountRange.

Param shape = boolean (BE only needs COUNT-vs-not), scope = paimon-only
(default no-op). legacy's >10000 parallel-split trim is intentionally dropped
(connector has no numBackends, an fe-core-only concern) — perf-only divergence,
result identical (DV-032). No new thrift, no BE change.

Tests: connector PaimonScanPlanProviderTest +2 — isCountPushdownSplit eligibility
on a real split (true/2, disabled/false); end-to-end planScan over a PARTITIONED
PK table with asymmetric per-partition counts (2 + 3) asserting collapse-to-one
carrying the SUM (5, unreachable from any single split) and no row_count when the
flag is off. Connector 252/0/0 (1 CI-gated live skip), fe-core compile + checkstyle
0, import-gate clean. Fail-before verified: neuter isCountPushdownSplit->false ->
the count tests red; mutate `countSum +=` -> `=` -> the cross-split-sum assertion
red. Real BE CountReader selection / EXPLAIN = CI-gated live-e2e (existing legacy
paimon count regression covers the BE contract).

Adversarially reviewed (workflow wf_6ead7c2c-b58): one MAJOR caught and fixed
(the collapse/sum test was degenerate on a single-split fixture); two MINORs
refuted (batch-path signal moot for paimon; EXPLAIN count-line drop is cosmetic,
noted in DV-032).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…files for read parallelism (M-3)

Root cause: after cutover, a large native (ORC/Parquet) paimon data file gets
ONE scanner — no intra-file parallelism. The connector's native arm emitted
exactly one PaimonScanRange per RawFile (start=0, length=file.length()). Legacy
PaimonScanNode:434-465 sub-splits each large file via determineTargetFileSplitSize
+ fileSplitter.splitFile. Result is correct (BE reads the whole file either way);
only read parallelism regresses.

Recon (wf_ad764bf6-1c9) confirmed: it is a real gap (ORC/Parquet are
PLAIN/splittable, legacy does sub-split); DV x sub-split is SAFE (paimon
deletion-vector rowids are GLOBAL file row positions, BE native readers report
global positions even within a partial byte range, _kv_cache shares the DV bitmap
across sub-splits keyed by path+offset, iceberg uses the identical machinery on
routinely-split files); and it is pure-connector (the splitter math + 5 session
vars re-stated with plain longs — the connector cannot import fe-core
FileSplitter/SessionVariable).

Solution (pure connector, zero SPI, zero fe-core; D-055):
- Two pure statics: computeFileSplitOffsets(fileLength, targetSplitSize) ports
  FileSplitter.splitFile's specified-size branch byte-for-byte incl. the >1.1D
  tail guard (the last range absorbs a remainder up to 1.1x instead of a tiny
  tail split); determineTargetSplitSize(...) ports determineTargetFileSplitSize +
  applyMaxFileSplitNumLimit (the isBatchMode->0 branch omitted — paimon is never
  batch).
- sessionLong + lazy resolveTargetSplitSize read the 5 file-split session vars via
  the VariableMgr.toMap channel (like isCppReaderEnabled) and sum native-eligible
  file sizes once per scan.
- Native arm: emit one range per [start,length) sub-range via buildNativeRanges,
  attaching the SAME unmodified per-RawFile DeletionFile to EVERY sub-range (DV is
  global-row-position indexed; no offset re-basing). buildNativeRange gains
  (start, length); fileSize stays the whole file length.
- Under COUNT(*) pushdown a native split that is not count-eligible (no precomputed
  merged count, e.g. a DV with null cardinality) is kept WHOLE (target size 0 ->
  one whole-file range), mirroring legacy splittable=!applyCountPushdown.

The split-weight/target-size scheduling nicety is not ported (pre-existing native
path already omitted it; perf/scheduling-only, not correctness) -> DV-033.

Tests: connector PaimonScanPlanProviderTest +6 — computeFileSplitOffsets math
(250MB/64MB->4 with 58MB tail, exact-multiple, small-file-whole, empty, target<=0);
determineTargetSplitSize heuristic (file_split_size override, 32MB<->64MB threshold,
max_file_split_num floor); end-to-end append-only fixture (tiny file_split_size ->
>=2 contiguous sub-ranges tiling [0,fileLength); default -> 1 range); DV on every
sub-range; whole-file under count pushdown. Updated the 3 existing buildNativeRange
call sites to the new signature. Connector 258/0/0 (1 CI-gated live skip),
checkstyle 0, import-gate clean. Fail-before verified: neuter computeFileSplitOffsets
-> the 3 splitting tests red; attach DV only to the first sub-range -> the DV test
red. Real BE multi-range + DV read = CI-gated live-e2e (legacy paimon regression
covers the BE contract; no BE change).

Adversarially reviewed (workflow wf_4ac7479d-39d): 2 confirmed and fixed (the
count-pushdown sub-split parity gap + false comment; the missing DV-on-every-sub-range
test), 2 refuted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hand off P3 coverage-gap verification

- FIX-COUNT-PUSHDOWN (apache#8, M-2) = 525be03; FIX-NATIVE-SUBSPLIT (apache#9, M-3) = 2f5f467.
- Both recon'd (multi-scout workflow) + adversarially reviewed before commit; each review
  caught a real finding (degenerate test / parity gap) that was fixed.
- P0/P1/P2 all clear. Next: P3 coverage gaps (verify, not fix) — FIX-HMS-CONFRES re-check,
  DDL write parity, ANALYZE/column-stats, split-count accounting, cross-connector follow-ups.
- task-list apache#9 commit hash finalized; HANDOFF overwritten.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rejection in PluginDrivenExternalCatalog.createTable

Root cause: the generic fe-core bridge PluginDrivenExternalCatalog.createTable
collapsed legacy PaimonMetadataOps.performCreateTable's ordered remote-then-local
existence probe into a single `exists` OR that was consumed ONLY by the IF NOT
EXISTS branch. The !IF NOT EXISTS path ignored it and unconditionally called
metadata.createTable. So a table present only in the local FE cache (a case-variant
folded onto an existing name under lower_case_meta_names, absent on a case-sensitive
remote) was CREATED remotely instead of rejected with ERR_TABLE_EXISTS_ERROR --
silent metadata corruption. Found by the P3 plugin-vs-legacy parity audit
(adversarially verified); narrow, backend-dependent trigger (filesystem/jdbc paimon;
HMS lowercases so both sides reject). Generic bridge -> also affects MaxCompute /
future iceberg/hudi.

Solution (fe-core bridge only; zero SPI/connector/BE): split the `exists` OR into
remoteExists/localExists; under !IF NOT EXISTS, when localExists is true throw
ERR_TABLE_EXISTS_ERROR (legacy local-arm parity). A remote-only conflict still falls
through to connector.createTable (case A unchanged). Option-2 surgical (D-056); the
residual case-A / all-DDL-op generic-error-code collapse is pre-existing and out of
scope (DV-034).

Tests: new PluginDrivenExternalCatalogDdlRoutingTest
.testCreateTableLocalConflictWithoutIfNotExistsRejects (local-hit + remote-miss +
!IF NOT EXISTS -> asserts DdlException thrown + metadata.createTable never called +
no edit log). fail-before: exactly 1 new test red ("Expected DdlException...nothing
was thrown"); pass-after: 26/0/0. fe-core checkstyle 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…P3-fix landed)

P3 "go check" done via adversarial audit wf_25450c36-b7a: HMS-CONFRES /
ANALYZE-stats / split-count all PARITY_HOLDS; DDL write surfaced one MAJOR
correctness divergence -> FIX-CREATE-TABLE-LOCAL-CONFLICT (67a9b9d).
Updates HANDOFF for next steps (P4 cleanup / B8 legacy removal /
cross-connector follow-up). No P0/P1/P2/P3 blockers remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4 N10.1)

Root cause: the plugin read-direction type mapping
PaimonTypeMapping.toVarcharType used `len >= 65533` to overflow a paimon
VarCharType to STRING, while legacy PaimonUtil.paimonPrimitiveTypeToDorisType
uses `len > 65533`. 65533 == ScalarType.MAX_VARCHAR_LENGTH is the legal
exact-fit max VARCHAR, not the STRING wildcard, so the connector widened
VARCHAR(65533) to STRING — a DESCRIBE / SHOW CREATE TABLE reported-type
divergence (data and read correctness unaffected; STRING is a superset).

Fix: change the boundary `>= 65533` -> `> 65533` to match legacy byte-for-byte
(pure connector, 1 char). The unreachable `len <= 0` defensive guard is kept
untouched (paimon VarCharType min length is 1).

Tests: new read-direction PaimonTypeMappingReadTest pins the boundary intent
(65532 -> VARCHAR(65532); 65533 -> VARCHAR(65533) [the fix]; 65534 -> STRING).
Fail-before exactly the 65533 assertion red ("expected VARCHAR but was STRING");
pass-after green. Full module 260/0/0 (1 CI-gated live skip), checkstyle 0,
connector import-gate clean. No BE/SPI change; reported-type parity otherwise
covered by the CI-gated legacy paimon DESCRIBE regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion value to NULL (P4)

Root cause: PaimonScanRange.populateRangeParams routed paimon partition values
through ConnectorPartitionValues.normalize, which applies Hive-directory
null-sentinel coercion (a value of "\N" or "__HIVE_DEFAULT_PARTITION__" -> isNull).
That coercion is correct for hudi (path-encoded partitions) but wrong for paimon:
paimon partition values are TYPED — serializePartitionValue returns Java-null for a
genuine null and the literal toString() otherwise — so a null is never a directory
sentinel, and the coercion only ever bites a genuine literal value. A string
partition column literally holding "\N" (which paimon does NOT reserve) or
"__HIVE_DEFAULT_PARTITION__" was materialized as SQL NULL instead of the literal on
the native ORC/Parquet read, diverging from legacy PaimonScanNode.setScanParams
(source/PaimonScanNode.java:323-326) and yielding wrong rows for WHERE col='\N' /
col IS NULL. The dominant genuine-NULL case is unaffected (both sides set isNull=true
and BE ignores the rendered value string when is_null==true,
partition_column_filler.h:40-44).

Fix (1 file): derive isNull from the Java null ONLY (render genuine null as "",
legacy-exact); drop the unused ConnectorPartitionValues import. ConnectorPartitionValues
itself is left untouched — hudi (HudiScanRange.java:226) legitimately needs the
Hive-directory coercion. The residual scan-vs-prune skew for a literal
"__HIVE_DEFAULT_PARTITION__" value lives in the generic fe-core prune bridge
(TablePartitionValues), is pre-existing and unchanged by this fix, and is logged as a
deviation.

Tests: new PaimonScanRangePartitionNullTest pins genuine-null -> (isNull=true, "");
literal "\N" -> (isNull=false, "\N"); literal "__HIVE_DEFAULT_PARTITION__" ->
(isNull=false, verbatim); ordinary -> kept. Fail-before (re-inlined coercion) reds the
literal + render rows; pass-after green. Full module 261/0/0 (1 CI-gated live skip),
checkstyle 0, import-gate clean. Adversarial review (5 angles) SAFE_TO_COMMIT: total
convergence of all 3 range builders on populateRangeParams; no query goes correct->wrong.
No BE/SPI change; native partition materialization otherwise covered by the CI-gated
legacy paimon partition regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…035])

Records the P4 cleanup pass disposition (P0–P4 now all clear):
- FIX-VARCHAR-BOUNDARY (N10.1) `bcee91dcb52` + FIX-PARTITION-NULL-SENTINEL
  `4b2c2190dc2` landed as independent fix commits.
- 15 items accepted as deviations (M5.1 transient-only + 14
  display/perf/text/inert/connector-more-correct/false-premise) → [DV-035].
- D-057 logs the user-signed scope; DV-035 the accepted batch.
- task-list §P4 marked done; HANDOFF rolled to next session (B8 legacy
  deletion or cross-connector follow-up batch).

Read-only adversarial recon `wf_6884d37b-8ef` re-verified all ~17 review §5/§7
items against current code; the sentinel ACCEPT verdict was refuted by a
prune-path skeptic (converted to FIX) and M5.1's "cheap fallback" premise was
refuted at impl level (confirmed ACCEPT).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROPERTIES to paimon

Root cause: branch commit 98a73bf (D-046 paimon parity) added LOCATION+PROPERTIES
emission to the SHARED PLUGIN_EXTERNAL_TABLE branch of Env.getDdlStmt, gated only on
!properties.isEmpty(). JDBC/ES/Trino catalogs are plugin-driven with non-empty
getTableProperties() (connection props incl. credentials), so SHOW CREATE TABLE on a JDBC
external table emitted LOCATION '' + PROPERTIES("password"=...) instead of the legacy
comment-only ENGINE=JDBC_EXTERNAL_TABLE; — a correctness regression
(test_nereids_refresh_catalog) and a JDBC credential leak. Still present on HEAD.

Solution: gate the LOCATION+PROPERTIES emission additionally on
TableType.PAIMON_EXTERNAL_TABLE.name().equals(getEngineTableTypeName()) — only the paimon
engine type (the sole plugin-driven connector whose legacy DDL carried LOCATION/PROPERTIES)
renders them. JDBC/ES/Trino/MaxCompute revert to comment-only; the credential leak is
closed. Did NOT rebaseline the .out (would entrench the leaked-credential output).

Tests: fe-core compile SUCCESS + checkstyle clean; adversarial static review SOUND (paimon
incl. sys-table unwrap still renders LOCATION/PROPERTIES; jdbc/es/trino/maxcompute match
committed comment-only .out; getTableProperties has no other DDL consumer). e2e:
external_table_p0/nereids_commands/test_nereids_refresh_catalog (CI external pipeline). See
plan-doc/FIX-SHOWCREATE-PLUGIN-PROPS-{design,summary}.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@morningman morningman force-pushed the catalog-spi-07-paimon branch from d6c93da to f7114a2 Compare June 12, 2026 14:23
morningman and others added 28 commits June 13, 2026 06:05
…ma-cache (CI 968828)

Root cause: PluginDrivenSysExternalTable did not override getSchemaCacheValue(), so it
inherited ExternalTable.getSchemaCacheValue() which routes through ExternalCatalog.getSchema()
and re-resolves the table by name in the db map. A transient system table (e.g. tbl$snapshots /
tbl$manifests) is never registered in that map, so the lookup failed with "failed to load schema
cache value for: ...$snapshots". Regression from the paimon SPI migration; legacy
PaimonSysExternalTable avoided it by overriding getSchemaCacheValue()/initSchema() to compute on
the transient instance.

Solution: override getSchemaCacheValue() (and initSchema(SchemaCacheKey)) to compute the schema
directly via the inherited PluginDrivenExternalTable.initSchema() (which honors this class's
resolveConnectorTableHandle that threads the sys-table handle), memoized with double-checked
locking — mirroring legacy PaimonSysExternalTable.

Tests: covered by existing e2e suites paimon_system_table ($manifests), paimon_time_travel
($snapshots), test_paimon_system_table_auth (re-run in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…68828)

Root cause: PaimonConnectorMetadata.mapFields built ConnectorColumn via the 5-arg ctor, which
defaults isKey=false; ConnectorColumnConverter propagates it, so DESC showed Key=false for every
paimon column. Legacy PaimonExternalTable/PaimonSysExternalTable always set Column isKey=true (3rd
positional arg) for every column, so the .out files expect Key=true. Caused test_paimon_schema_change,
test_paimon_char_varchar_type, test_paimon_timestamp_with_time_zone DESC diffs.

Solution: pass isKey=true via the 6-arg ConnectorColumn ctor in mapFields (single chokepoint for
latest + at-snapshot + system-table schema paths; toSchemaCacheValue preserves isKey on remap).

Tests: extended PaimonConnectorMetadataTest.getTableSchemaForcesColumnsNullableForLegacyParity to
pin isKey=true for both a PK and a non-PK column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split (CI 968828)

Root cause: the paimon (and hudi) plugin-zip bundled org.apache.thrift:libthrift and loaded
org.apache.thrift.* child-first (not in the connector parent-first allowlist), while fe-thrift is
provided so org.apache.doris.thrift.TFileScanRangeParams resolves parent-first and implements the
PARENT's TBase. PaimonScanPlanProvider.encodeSchemaEvolution()'s TSerializer.serialize(carrier)
then mixes a child TSerializer with a parent-TBase carrier -> IncompatibleClassChangeError. Being an
Error (not Exception), it escaped catch(Exception) and the connection handler, killing the mysql
session. This was the dominant CI failure (~19 tests: 2 ANALYZE, the family-D connection drops, and
the predict/timestamp_tz/sql_block_rule explain failures).

Solution:
- Exclude org.apache.doris:fe-thrift + org.apache.thrift:libthrift from the paimon and hudi
  plugin-zip assemblies, so org.apache.thrift.* resolves from the single parent fe-core copy that
  also owns org.apache.doris.thrift.* (matches the es/jdbc/hive/maxcompute assemblies).
- Defense-in-depth: broaden encodeSchemaEvolution's catch to Exception | LinkageError so any future
  linkage error surfaces as a clean per-query failure instead of an uncaught Error that kills the
  whole connection (this is what turned ~5 real failures into ~19 collateral ones).

Verified: rebuilt paimon and hudi plugin zips no longer contain libthrift/fe-thrift.
Tests: e2e re-run in CI (the native-path paimon suites).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter scans (CI 968828)

Root cause: on the SPI plugin scan path, PaimonScanPlanProvider.getScanNodeProperties emitted the
paimon.predicate property only when filter.isPresent() && !predicates.isEmpty(), and
populateScanLevelParams set the thrift field only when non-null. So a paimon read with no
pushed-down filter (e.g. force_jni_scanner=true `select *`) omitted paimon_predicate entirely; BE
then omitted the JNI key, and PaimonJniScanner.getPredicates() called PaimonUtils.deserialize(null)
-> NPE "encodedStr is null". Legacy PaimonScanNode.createScanRangeLocations always serialized the
(possibly empty) predicate list, so the field was always present. Caused test_paimon_catalog_varbinary,
paimon_tb_mix_format, paimon_partition_legacy, paimon_timestamp_types, test_paimon_partition_table.

Solution:
- getScanNodeProperties always serializes the predicate list (empty list -> non-null base64 string)
  and emits paimon.predicate unconditionally, restoring the legacy invariant.
- BE backstop: PaimonJniScanner.getPredicates() treats a null paimon_predicate param as "no filter"
  (returns emptyList) so the JNI reader never NPEs on a missing param.

Tests: PaimonScanPlanProviderTest.getScanNodePropertiesAlwaysEmitsPredicateForNoFilterScan pins that
a no-filter scan emits paimon.predicate and it deserializes to an empty list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8-family root-cause analysis (adversarially verified) of the 37 external-regression failures.
7 in-scope paimon-SPI regressions + 2 out-of-scope (hive CTAS stale test; BE shutdown ASAN race).
RC-1/2/6/7 fixed (contained); RC-3/4/5 deferred to the docker-gated self-contained-classloader batch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imon plugin (CI 968828)

Root cause: the connector sets fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem, but that impl
ships only in the thirdparty jindofs jars (packaged by post-build.sh into fe/lib/jindofs, not a maven
artifact). The paimon plugin runs child-first, so JindoOssFileSystem resolves from the parent and
cannot be cast to the plugin's child-loaded org.apache.hadoop.fs.FileSystem -> "JindoOssFileSystem
cannot be cast to FileSystem" -> "Unknown database" on first OSS listing (paimon_base_filesystem,
test_paimon_deletion_vector_oss). The maven route is unbuildable (jindo-sdk/jindo-core are bound to an
undeclared jindodata repo -> "present but unavailable"; runtime jindofs is 6.10.4, not in maven).

Solution: after deploying the connector plugins, copy the jindofs jars (already placed in fe/lib/jindofs
by post-build.sh) into the paimon plugin lib so JindoOssFileSystem loads child-first alongside the
plugin's own hadoop FileSystem. Naturally gated (no-op unless --jindofs/DISABLE_BUILD_JINDOFS=OFF).

CAVEAT (docker-gated, enablePaimonTest=true): jindo-core ships a native lib that binds to one
classloader per JVM, so this is safe only while no concurrent non-paimon path loads jindo from
fe/lib/jindofs in the same FE process — must be confirmed by the docker paimon suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on plugin (CI 968828)

Root cause: the prior fix (FIX-PAIMON-HADOOP-CLASSLOADER) bundled hadoop-aws into the plugin
(S3AFileSystem child-first) but NOT the AWS SDK v2 (hadoop-aws declares it as software.amazon.awssdk:bundle,
which fe/pom.xml excludes). So the plugin's S3AInternalAuditConstants.<clinit> registered an
ExecutionAttribute against the single PARENT-loaded sdk-core static, colliding with fe-core's S3A in
ExecutionAttribute.ensureUnique() -> ExceptionInInitializerError that permanently poisoned S3A for the
whole FE JVM (test_iceberg_jdbc_catalog/statistics/case_sensibility, test_paimon_statistics).

Solution: bundle the AWS SDK v2 (software.amazon.awssdk:s3 + apache-client, BOM-managed 2.29.52) into the
plugin child-first, so the plugin's S3A registers against its OWN ExecutionAttribute static. s3's compile
closure brings sdk-core (ExecutionAttribute); apache-client is explicit (hadoop-aws wires ApacheHttpClient).
software.amazon.awssdk stays child-first (not parent-first) — the separate child SDK copy is the point.

Verified: rebuilt plugin zip bundles lib/sdk-core-2.29.52.jar containing
software/amazon/awssdk/core/interceptor/ExecutionAttribute.class. Runtime S3A read + assumed-role/STS
docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… client (CI 968828)

Root cause: paimon-hive-connector's RetryingMetaStoreClientFactory probes getProxy(HiveConf,...) via
reflection, but RetryingMetaStoreClient/HiveMetaHookLoader resolved from the parent hive-catalog-shade-3.1.1
whose getProxy overloads use the PARENT's Configuration/HiveConf Class objects -> exact Class-identity
mismatch across loaders -> all probes NoSuchMethodException -> "Failed to create the desired metastore
client" (test_create_paimon_table). The metastore itself is reachable.

Solution: bundle org.apache.hive:hive-metastore:2.3.7 (RetryingMetaStoreClient/HiveMetaStoreClient/
HiveMetaHookLoader + metastore api) child-first so its getProxy(HiveConf,...) overloads compile against the
SAME child-bundled hive-common-2.3.9 HiveConf the connector builds. 2.3.7 pairs with hive-common 2.3.9
(API-stable HiveConf) and is fastutil-CLEAN, so unlike hive-catalog-shade it does not reintroduce the
fastutil collision. libfb303 rides transitively; server-side datanucleus/derby/hbase/tephra, the stale
hadoop-2.7.2 trio + guava, and libthrift are excluded (libthrift stays parent-first like the other
connectors).

Verified: rebuilt plugin zip bundles lib/hive-metastore-2.3.7.jar (RetryingMetaStoreClient with 5
getProxy(HiveConf) overloads) + libfb303; 0 fastutil entries; no hadoop-2.7.2 leak. The thrift
0.9.3-vs-host-0.16.0 wire skew and the DLF ProxyMetaStoreClient path are docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RC-3 AWS SDK (b5205c4), RC-5 HMS client (7841830), RC-4 jindo via build.sh (e881247).
Runtime behavior gated on the docker paimon suite (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… BE crash (CI 968880)

Root cause (FE behavior change, no BE change): the paimon SPI scan path declared partition columns
inconsistently across its two FE channels. The per-split PaimonScanRange.populateRangeParams emits the
partition columns as columnsFromPath (so the BE APPENDS them), but the connector never emitted the
scan-node-level path_partition_keys property, so PluginDrivenScanNode.getPathPartitionKeys() returned
empty -> FileQueryScanNode.initSchemaParams did NOT exclude the partition columns from the file/decode
set (num_of_columns_from_file + classifyColumn). Since paimon physically stores partition columns IN the
ORC data file, the native OrcReader both DECODED dt/hh from the file AND APPENDED them from
columnsFromPath -> a row-count double-fill (dt column rows=2 vs data block rows=1) that aborts the BE via
DCHECK(block->rows()==col.column->size()) at vorc_reader.cpp:2638 (native ORC, intermittent under the
random force_jni_scanner fuzz). Legacy PaimonScanNode.getPathPartitionKeys() returned [dt,hh] and drove
BOTH the file-column exclusion AND the append from one source, so it never double-filled.

Solution: emit the path_partition_keys scan-node property (lower-cased partition key names, matching the
columnsFromPath keys and the Doris column names) in PaimonScanPlanProvider.getScanNodeProperties when the
table is partitioned. This restores the legacy invariant — the BE excludes partition columns from the
file decode set and appends them exactly once — for both the native ORC path (excluded from decode +
appended from columnsFromPath) and the JNI path (projected out of required_fields + filled by
_fill_columns_from_path). Mirrors the hive connector. The BE is unchanged.

Tests: PaimonScanPlanProviderTest.getScanNodePropertiesEmitsPathPartitionKeysForPartitionedTable pins
that a partitioned paimon table emits path_partition_keys.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…AWS-SDK interceptor cross-loader skew (CI 968994)

- root cause: plugin bundles hadoop-aws+s3+sdk-core child-first but NOT s3-transfer-manager.
  The SPI resource software/amazon/awssdk/services/s3/execution.interceptors (+ its
  ApplyUserAgentInterceptor) lives only in s3-transfer-manager.jar. ChildFirstClassLoader
  found no child copy and fell back to the PARENT s3-transfer-manager, whose
  ApplyUserAgentInterceptor implements the PARENT sdk-core ExecutionInterceptor (a different
  Class than the child's) -> SdkClientException -> S3A broken -> 'no file io for scheme s3'
  -> 'Unknown database' cascade (swallowed at ExternalCatalog.buildDbForInit:914).
- solution: bundle software.amazon.awssdk:s3-transfer-manager child-first (BOM-managed 2.29.52)
  so the resource + interceptor resolve against the child sdk-core.
- fixes Class A: 6 s3 tests (test_paimon_s3/minio/schema_change/char_varchar_type/
  full_schema_change/jdbc_catalog) + 18 'Unknown database' collateral.
- verified: zip lib/ now bundles s3-transfer-manager-2.29.52.jar; dependency:tree clean.
  Runtime gate: docker enablePaimonTest=true.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cross-loader cast (CI 968994)

- root cause: plugin did not bundle hadoop-huaweicloud, so OBSFileSystem resolved from the
  parent 'app' loader while the plugin's FileSystem is child-first -> 'OBSFileSystem cannot be
  cast to FileSystem' (paimon_base_filesystem obs:// branch). Same shape hadoop-aws already fixed
  for s3a.
- solution: bundle com.huaweicloud:hadoop-huaweicloud (managed 3.1.1-hw-46, compile) child-first;
  the -hw-46 jar is a fat jar self-containing OBSFileSystem + the OBS SDK (com/obs/*), so OBS is
  self-consistent in one child-first jar. hadoop-common stays the plugin's direct depth-1 copy
  via Maven mediation (no duplicate FileSystem). Consistent with fe-core/hadoop-deps which already
  depend on the same artifact.
- fixes Class B: paimon_base_filesystem.
- verified: zip lib/ now bundles hadoop-huaweicloud-3.1.1-hw-46.jar; dependency:tree clean.
  Runtime gate: docker enablePaimonTest=true.
- docs: plan-doc/task-list.md, plan-doc/fix-ab-packaging-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…amedTransport NoClassDefFound (CI 968994)

- root cause: paimon metastore=hive catalogs threw NoClassDefFoundError
  org/apache/thrift/transport/TFramedTransport. paimon's RetryingMetaStoreClientFactory
  reflects HiveMetaStoreClient (hive-metastore 2.3.7) constructor signatures, which reference
  the thrift-0.9.x old-package TFramedTransport. Host libthrift 0.16.0 moved it to
  .transport.layered, and RC-1 keeps org.apache.thrift parent-first (libthrift excluded from the
  plugin) so the doris-gen TSerializer/TBase 0.16.0 path works -> the old-package class is
  unsatisfiable. The two thrift consumers (doris-gen 0.16.0 vs HMS-client 0.9.x) cannot share the
  original org.apache.thrift namespace in one loader.
- solution: new module fe-connector-paimon-hive-shade that bundles paimon-hive-connector-3.1 +
  hive-metastore 2.3.7 + hive-common 2.3.9 + libthrift 0.9.3 and relocates org.apache.thrift ->
  org.apache.doris.paimon.shaded.thrift (+ defensive it.unimi.dsi.fastutil relocation). The
  connector depends on this shade instead of the raw hive deps. Mirrors the existing
  hive-catalog-shade precedent. The doris-gen 0.16.0 thrift path stays parent-first, untouched.
- fixes Class C: test_create_paimon_table, test_paimon_statistics.
- verified (static, runtime gate is docker enablePaimonTest=true):
  * shade jar: relocated org/apache/doris/paimon/shaded/thrift/transport/TFramedTransport present;
    HiveMetaStoreClient references the relocated name (0 references to the original).
  * plugin zip lib/: 0 genuine top-level org.apache.thrift .class (RC-1 preserved); no raw
    paimon-hive-connector/libthrift/hive-metastore/hive-common/hive-shims jars; shade jar present;
    single HiveConf.class; paimon-core still its own jar; HiveCatalogFactory SPI intact.
  * UT: fe-connector-paimon 285/0/0 incl PaimonTableSerdeRoundTripTest (RC-1 guard) + PaimonCatalogFactoryTest.
- 2 build-config notes: shade filter drops META-INF/versions/** (paimon-hive fat bundle ships
  Java-22 MR-jar classes shade's rewriter cannot parse; they are excluded parquet/jackson internals);
  shaded-in deps marked <optional> so the connector plugin-zip does not re-bundle the raw jars.
- new module fe-connector-paimon-hive-shade; fe/fe-connector/pom.xml module registration;
  fe-connector-paimon dependency swap. No production Java change. Design: plan-doc/fix-c-hms-thrift-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed by SPI cutover (CI 968994)

- root cause: PluginDrivenScanNode.getNodeExplainString is a full override that does NOT call
  super (FileScanNode), so the SPI paimon scan path silently dropped explain lines the legacy
  PaimonScanNode emitted: 'pushdown agg=COUNT (n)', the VERBOSE dataFileNum/deleteFileNum/
  deleteSplitNum block, and 'paimonNativeReadSplits=<raw>/<total>'. The 5 tests are PURE DISPLAY
  gaps — data queries return correct values; only the explain text was missing the lines
  (plugindriven-explain-override-gap re-manifested for paimon).
- solution (paimon-gated so other plugin connectors stay byte-unchanged):
  * SPI: ConnectorScanRange.getPushDownRowCount() (-1 default) + isNativeReadRange() (false default);
    ConnectorScanPlanProvider.getDeleteFiles(TTableFormatFileDesc) (empty default).
  * PaimonScanRange overrides the two getters (paimon.row_count / paimon.split).
  * PaimonScanPlanProvider.appendExplainInfo emits paimonNativeReadSplits from synthetic count keys
    the node injects; getDeleteFiles ports legacy PaimonScanNode.getDeleteFiles.
  * FileScanNode: behavior-neutral extract-method appendBackendScanRangeDetail (verbatim VERBOSE block).
  * PluginDrivenScanNode: accumulate native/total + pushdown-count in getSplits (pure statics);
    override getDeleteFiles; emit 'pushdown agg' UNGATED (restores the line FileScanNode emits for
    every other scan node), VERBOSE delete block paimon-gated, paimonNativeReadSplits paimon-only.
- fixes Class E: test_paimon_count, test_paimon_deletion_vector, test_paimon_deletion_vector_oss,
  test_paimon_catalog_varbinary, test_paimon_catalog_timestamp_tz.
- tests (independently re-run, build cache disabled): PluginDrivenScanNodeExplainStatsTest 7/7,
  PluginDrivenScanNodeDeleteFilesTest 4/4, PaimonScanExplainTest 9/9; existing
  PluginDrivenScanNodePartitionCountTest 5/5 (no shared-node regression). Tests encode WHY
  (the -1 sentinel survival, 0/N native accounting). Runtime gate: docker enablePaimonTest=true
  comparison-mode run cross-checks the values vs .out.
- shared FileScanNode/PluginDrivenScanNode changes verified non-perturbing to es/jdbc/maxcompute/
  iceberg/hive (extract is byte-identical; pushdown agg matches FileScanNode's unconditional emit).
  Design: plan-doc/fix-e-explain-gap-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or build resolves hadoop-huaweicloud (CI 968994)

- FIX-PAIMON-OBS-SELFCONTAINED (3c7adfe) added a com.huaweicloud:hadoop-huaweicloud dep to
  fe-connector-paimon, but that artifact (3.1.1-hw-46) is NOT in Maven Central / the Apache repos.
  fe-core resolves it via a <repository> id=huawei-obs-sdk it declares locally; the connector module
  does not inherit it (fe-connector / fe declare no repositories), so a clean-env FE build failed:
  'hadoop-huaweicloud:jar:3.1.1-hw-46 was not found in https://repo.maven.apache.org/maven2'.
  (My earlier local build only passed because the jar was already cached in ~/.m2 from a full FE build.)
- fix: declare the huawei-obs-sdk repository (https://repo.huaweicloud.com/repository/maven/huaweicloudsdk/)
  in fe-connector-paimon/pom.xml, mirroring fe-core; and scope the dep 'runtime' (mirrors fe-core —
  OBSFileSystem is loaded reflectively via the Hadoop FileSystem SPI, not referenced at compile time;
  plugin-zip.xml still bundles the runtime closure).
- verified: removed hadoop-huaweicloud from ~/.m2, rebuilt non-offline -> re-fetched from huawei-obs-sdk
  (_remote.repositories), plugin zip still bundles hadoop-huaweicloud-3.1.1-hw-46.jar. Repo serves the
  jar (HTTP 200). Local mirror is mirrorOf=central, so the huawei repo is reached directly (as in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…via SPI-routed CatalogFactory (CI 968994)

- PaimonMetadataOpsTest.beforeClass failed with 'No connector plugin loaded for catalog type
  paimon'. Pre-existing breakage (NOT from the CI-968994 packaging/explain fixes): 'paimon' was
  added to CatalogFactory.SPI_READY_TYPES by the SPI-framework cutover (5c32565), so
  CatalogFactory.createFromCommand('paimon', ...) now routes through the connector-plugin SPI and
  returns a PluginDrivenExternalCatalog — it throws when no plugin is installed in
  connector_plugin_root (the case in a plain fe-core UT), and even when loaded is not castable to
  the legacy (PaimonExternalCatalog) the test cast to. Either way beforeClass aborted the class.
- fix: the test exercises the still-live legacy PaimonMetadataOps, so construct the legacy
  filesystem catalog directly (new PaimonFileExternalCatalog(...) + makeSureInitialized()) instead
  of through the SPI-routed factory (mirrors ExternalMetaCacheRouteResolverTest which constructs
  new PaimonExternalCatalog(...) directly). Dropped the now-unused CatalogFactory/CreateCatalogCommand
  imports. No production change.
- verified: mvn -pl fe-core -am test -Dtest=PaimonMetadataOpsTest -Dmaven.build.cache.enabled=false
  -> Tests run: 6, Failures: 0, Errors: 0. (Only this test had the CatalogFactory-cast pattern.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… requested columns (CI 969249)

Root cause: a single BE FATAL (Check failed: children.contains(table_column_name),
table_schema_change_helper.h:166 <- vparquet_reader.cpp:488) on
"SELECT * FROM test_paimon_spark.test_schema_change" aborted the whole BE for the rest of
the run, cascading into ~47 "No backend available as scan node" collateral failures.

FIX-SCHEMA-EVOLUTION (01b7642) added current_schema_id=-1 + history_schema_info, which
switched BE from name-based file<->table matching onto the field-id path. That path keys the
table-side StructNode by the -1/current entry's field names and then looks up each query slot
(base_ctx->column_names) in it; a slot absent from the -1 entry trips the DCHECK. The
connector built the -1 entry from an INDEPENDENT paimon-SDK read (fileStoreTable.schema()) —
a different source than the Doris column list fe-core turns into the BE scan slots. When the
two skew (this Spark table did ALTER TABLE ADD COLUMN after its last snapshot, so the
resolved schema lagged the latest schema the slots come from) the added column was missing
from the -1 entry -> abort.

Legacy PaimonScanNode.doInitialize built the -1 entry from getTargetTable().getColumns() —
the SAME list as the slots — so the names matched by construction and the lookup could never
miss. Restore that invariant connector-side: buildSchemaEvolutionParam now keys the -1 entry
off the requested `columns` via selectCurrentSchemaFields, matching each to a paimon DataField
by name — the resolved (snapshot-pinned) schema wins on a name collision (time-travel + rename
stay correct), with the fresh latest() schema as a fallback so an add-column-after-snapshot
column is carried with its real field id (older files then fill NULL, the correct result).
current_schema_id stays -1 (legacy sentinel). Fails loud if a requested column is in neither
schema. +4 unit tests (add-column-after-snapshot, rename time-travel collision, fail-loud,
empty-columns count scan).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…m catalogs over hdfs (CI 969249)

Root cause: filesystem-metastore paimon catalogs on hdfs:// warehouses failed to create with
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs" (43x in
fe.log), swallowed by ExternalCatalog.buildDbForInit into a misleading "Unknown database"
(test_paimon_catalog_varbinary, test_catalog_upgrade_test).

The plugin runs child-first and no longer carries an hdfs FileSystem impl: hadoop-common's
service file registers only Local/viewfs/Har/Http(s), and FIX-PAIMON-HMS-THRIFT-SHADE
(5ac8c30) made hive-common <optional> in fe-connector-paimon-hive-shade, so maven-shade
dropped it AND its transitive hadoop-client-api — the prior carrier of DistributedFileSystem.
HMS-flavor catalogs (thrift metadata) and filesystem-on-S3 (hadoop-aws) were unaffected, which
is why only hdfs filesystem catalogs broke.

Add org.apache.hadoop:hadoop-hdfs-client (runtime, ${hadoop.version}) — it carries
DistributedFileSystem + the hdfs FileSystem service registration and reuses the plugin's single
hadoop-common FileSystem (hadoop-common excluded to keep exactly one copy — no cross-loader
split). Same self-contained child-first pattern as hadoop-aws/hadoop-huaweicloud. Verified on
the assembled plugin zip: DistributedFileSystem carriers=1, FileSystem.class carriers=1, hdfs
service entry present. Also corrects the now-false pom comment claiming hadoop-client-api is
transitively bundled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent-core for buildRealDataSplit tests (CI 969249)

The count-pushdown / native sub-split / cpp-reader serde tests in PaimonScanPlanProviderTest
write a REAL local-filesystem paimon table (buildRealDataSplit) to produce a real DataSplit;
paimon's writer/commit path references org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
That class reached the module transitively via hive-common until FIX-PAIMON-HMS-THRIFT-SHADE
(5ac8c30) made hive-common <optional> in fe-connector-paimon-hive-shade, severing it and
leaving 5 tests with NoClassDefFoundError: FileInputFormat (a pre-existing breakage, not from
the FIX-PAIMON-SCHEMA-DICT-SLOTS / FIX-PAIMON-HDFS-CLIENT changes).

Add hadoop-mapreduce-client-core at test scope only (version from dependencyManagement). The
production read/planScan path does NOT touch FileInputFormat — paimon reads in CI 969249
succeeded without it and it is absent from the assembled plugin zip — so it must NOT be bundled
into the plugin. Full paimon module suite now passes 297/0/0 (1 live-only skip).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…zedTable) native read

A native read of a paimon $ro (read-optimized) system table aborted the whole
BE with SIGSEGV. $ro resolves to a paimon ReadOptimizedTable, which WRAPS a
FileStoreTable but is NOT instanceof FileStoreTable, so buildSchemaEvolutionParam
skipped it and emitted no paimon.schema_evolution prop. With no history_schema_info,
BE's gen_table_info_node_by_field_id fell into the legacy name-matching branch
by_parquet_name(tuple_descriptor, ...), where PaimonParquetReader passes a still-null
_tuple_descriptor (get_tuple_descriptor() is populated only later in _do_init_reader,
after on_before_init_reader in the NVI sequence) and dereferenced it
(table_schema_change_helper.cpp:94) -> SIGSEGV.

Legacy PaimonScanNode.doInitialize set history_schema_info for ANY paimon table
(incl. $ro) unconditionally, so BE always took the field-id path. Restore that
parity FE-side (BE unchanged): resolveSchemaDictTable unwraps a ReadOptimizedTable
to its base FileStoreTable (reloaded via the 2-arg base Identifier, auth-wrapped
like resolveTable) and builds the dict from it; other non-FileStoreTable tables
(metadata sys tables, which take the JNI path) still emit nothing as before.

Test getScanNodePropertiesEmitsSchemaEvolutionForReadOptimizedSysTable: a real
FileSystemCatalog table wrapped in a ReadOptimizedTable now emits the dict
(current_schema_id=-1 + non-empty history). RED before the fix, GREEN after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gh ConnectorColumn

desc t_ltz for a paimon catalog table returned the ts_ltz row with an empty
DESC "Extra" column instead of WITH_TIMEZONE (the type timestamptz(3) was
already correct). The DESC Extra column is Column.getExtraInfo()
(IndexSchemaProcNode), set by legacy via Column.setWithTZExtraInfo() keyed on
the SOURCE paimon type root TIMESTAMP_WITH_LOCAL_TIME_ZONE, independent of the
enable.mapping.timestamp_tz flag. On the SPI path the schema flows
PaimonConnectorMetadata.mapFields -> ConnectorColumn ->
ConnectorColumnConverter.convertColumn -> Column, and ConnectorColumn had no
field carrying that marker, so it was dropped at the SPI boundary.

Carry the marker through the SPI:
- ConnectorColumn: add withTimeZone field + withTimeZone() wither +
  isWithTimeZone() getter (added to equals/hashCode; public ctors unchanged).
- PaimonConnectorMetadata.mapFields: mark the column when the source type root
  is TIMESTAMP_WITH_LOCAL_TIME_ZONE (regardless of the mapping flag).
- ConnectorColumnConverter.convertColumn: re-apply setWithTZExtraInfo() when marked.
- PluginDrivenExternalTable.toSchemaCacheValue: preserve the marker across the
  column-name remap branch.

Scoped to the paimon source type (not a generic timestamptz-type rule) so other
SPI connectors (jdbc_query TVF, maxcompute) and the hdfs TVF desc are unaffected,
matching legacy.

Tests: PaimonConnectorMetadataTest.getTableSchemaMarksLtzColumnsWithTimeZoneRegardlessOfMapping
(both mapping states) and ConnectorColumnConverterTest.testWithTimeZoneColumnSetsExtraInfo /
testPlainColumnHasNoExtraInfo. Full paimon suite 299/0F; checkstyle clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…enuine-null partition

Root cause: `<partcol> IS NULL` over a paimon table returned empty (EXPLAIN
partition=0/5) instead of the genuine-null row. Paimon renders a genuine-NULL
partition value as its partition.default-name sentinel (CoreOptions
.PARTITION_DEFAULT_NAME, default "__DEFAULT_PARTITION__") — show partitions shows
`category=__DEFAULT_PARTITION__`, distinct from the literal null/NULL/\N partitions.
The FE prune bridge PluginDrivenMvccExternalTable.toListPartitionItem built EVERY
partition value with new PartitionValue(v, false) (isNull=false, copied verbatim
from legacy PaimonUtil.toListPartitionItem), so the null partition was catalogued
as the literal string "__DEFAULT_PARTITION__". Nereids list pruning then matched
IS NULL against no null partition, pruned all of them, resolveRequiredPartitions
returned an empty list, and PluginDrivenScanNode.getSplits short-circuited to zero
splits. The native scan path was already correct (typed Java-null ->
serializePartitionValue null -> populateRangeParams isNull=true -> BE materializes
SQL NULL), which is why SELECT * returned the row but IS NULL did not. Latent in
master too (identical toListPartitionItem isNull=false + unchanged Nereids pruning;
paimon tests are docker-gated so it was never caught).

Fix (2 files):
- PaimonConnectorMetadata.listPartitions: read partition.default-name (the same way
  partition.legacy-name is read) and, when a spec value equals it, render the
  Doris-canonical ConnectorPartitionValues.HIVE_DEFAULT_PARTITION
  ("__HIVE_DEFAULT_PARTITION__") in the partition name. Checked BEFORE the legacy
  DATE-format branch, which also fixes a latent Integer.parseInt(
  "__DEFAULT_PARTITION__") crash for a null DATE partition.
- PluginDrivenMvccExternalTable.toListPartitionItem: derive isNull from
  TablePartitionValues.HIVE_DEFAULT_PARTITION.equals(value), mirroring the sibling
  TablePartitionValues.toListPartitionItem (the Doris-wide null convention; future
  iceberg/hudi SPI reuse get it for free).

Keys off paimon's partition.default-name, NOT "\N"/"__HIVE_DEFAULT_PARTITION__":
the sentinel fix (4b2c219) established paimon does not reserve those (they are
real literal data). Safe because paimon planScan IGNORES requiredPartitions (split
selection is predicate-driven via the paimon SDK), so the translated name never
reaches split selection — it only drives the FE empty-list short-circuit; and no
ConnectorMetadata SPI method takes a partition name back, so the rendered name
stays FE-internal (getPartitionSnapshot looks up the same FE-built map).

Tests: PaimonConnectorMetadataPartitionTest +3 (string-null -> canonical sentinel,
custom partition.default-name honored with a literal __DEFAULT_PARTITION__ left
untouched, null-DATE renders sentinel instead of crashing) 9/9;
PluginDrivenMvccExternalTableTest +1 (testHiveDefaultSentinelBuildsNullPartitionKey
asserts the value builds a NULL/isNull partition key) 35/35. Both fail-before:
old connector appended the raw "__DEFAULT_PARTITION__"; old bridge produced a
non-null literal key. End-to-end fix on the live cluster not run (deploy reverted
per request); root cause was empirically confirmed on the cluster before the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…LAIN line dropped by the SPI scan node

PluginDrivenScanNode.getNodeExplainString overrides FileScanNode without calling
super (it uses a custom TABLE/QUERY/PREDICATES format), so it dropped the parent's
"inputSplitNum=N, totalFileSize=X, scanRanges=Y" line. Legacy PaimonScanNode
inherited that line via super.getNodeExplainString, and test_paimon_predict asserts
inputSplitNum=N (9/3/6/0 for various IN predicates); on the SPI scan path the line
is absent (only partition=N/M and paimonNativeReadSplits=N/M are shown), so the
EXPLAIN check fails.

Re-emit the line byte-for-byte (including the (approximate) batch-mode prefix) from
the same selectedSplitNum/totalFileSize/scanRangeLocations fields the inherited
FileQueryScanNode.createScanRangeLocations already populates (selectedSplitNum =
inputSplits.size()), placed immediately before partition=N/M to match FileScanNode
ordering.

Emitted UNCONDITIONALLY for every plugin connector — like the sibling partition=N/M
and pushdown agg= lines already are — NOT gated on a hardcoded source name: the
generic SPI scan node must stay connector-agnostic, and inputSplitNum is universal
FileScanNode info, not connector-specific. Blast radius is safe: among SPI
connectors only paimon asserts this line; maxcompute explain checks are
contains-based (an added line before partition=N/M does not affect them) and the
jdbc/es notContains checks target unrelated lines (QUERY:/date()/ES terminate_after).

fe-core compiles clean. Runtime verification is the test_paimon_predict regression
run (not deployed, per request). No unit test added: the full getNodeExplainString
string requires mocking the entire desc->table->catalog chain plus the scan-range
state, which the existing explain tests deliberately avoid (they cover the extracted
static helpers); this change is a verbatim copy of FileScanNode's proven line.

NB: a pre-existing instance of the same source-name smell remains — the VERBOSE
per-backend block (if VERBOSE && ... && "paimon".equals(catalog.getType())) from
FIX-PAIMON-EXPLAIN-GAP — left untouched here because de-gating it changes
es/jdbc/max_compute VERBOSE output and warrants its own change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ats block dropped by the SPI scan node

paimon_data_system_table.assertJniPath asserted 'explain verbose ...$binlog'
contains 'SplitStat [type=JNI'. The earlier checks (paimonNativeReadSplits=0/N,
native==0) passed — only the per-split block was missing. The legacy
PaimonScanNode emits a VERBOSE-only PaimonSplitStats: / SplitStat [type=...]
block; the SPI PluginDrivenScanNode + PaimonScanPlanProvider.appendExplainInfo
ported only the paimonNativeReadSplits line, never the block (the test is
unchanged from master, written against the legacy node).

Fix, following the existing FIX-E synthetic-key pattern:
- PluginDrivenScanNode injects __explain_verbose only when detailLevel==VERBOSE
  (connector-agnostic; does not branch on source name).
- PaimonScanPlanProvider emits PaimonSplitStats: + one SplitStat [type=NATIVE|JNI]
  line per split (grouped NATIVE-first from the native/total counts), with the
  legacy >4 truncation. Exact per-DataSplit parity is not reconstructible on the
  SPI path (node keeps only counts; native files are re-split), but the split
  type — all the test checks — is faithful.

PaimonScanExplainTest 13/13 (4 new, RED→GREEN); fe-core compiles; checkstyle 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SPI Paimon connector

A filesystem-flavor Paimon catalog created with the documented minio.* keys
(minio.endpoint / minio.access_key / minio.secret_key / ...) over an s3://
warehouse failed at `show databases` with:

  org.apache.paimon.fs.UnsupportedSchemeException: Could not find a file io
  implementation for scheme 's3' in the classpath.

Root cause: PaimonCatalogFactory.applyStorageConfig — the fe-core-free port of
legacy StorageProperties — ported the S3/OSS/COS/OBS canonical blocks but
omitted MinIO. Legacy MinioProperties extends AbstractS3CompatibleProperties
(schema "s3") and translates minio.* to fs.s3a.*, registering fs.s3.impl=
S3AFileSystem. In the SPI connector a pure-minio.* catalog resolved every alias
to null, so applyCanonicalS3Config early-returned and fs.s3.impl was never set,
leaving Paimon's FileIO unable to resolve the s3 scheme.

Fix: add applyCanonicalMinioConfig (gated on the minio. key prefix) that emits
the shared S3A base via applyS3aBaseConfig. MinIO is S3A-compatible, so — unlike
COS (cosn) / OBS (native) — it adds no extra impl keys. Aliases are ported
verbatim from MinioProperties (minio.* first, with the shared s3.*/AWS_*
fallbacks); the region defaults to us-east-1 and the connection tuning to
100/10000/10000, both per legacy MinioProperties (a dedicated block is required
precisely because these defaults diverge from the S3 block's 50/3000/1000). The
block is purely additive and no-ops for any catalog without a minio. key.

Tests: 4 new unit tests in PaimonCatalogFactoryTest (RED->GREEN repro asserting
fs.s3.impl is registered, defaults parity, explicit-region override, and a
pure-S3 negative-parity guard). 60/60 pass; checkstyle 0 violations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants