[refactor](catalog) Catalog spi 07 paimon#64446
Draft
morningman wants to merge 66 commits into
Draft
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
b8d6426 to
f09b6df
Compare
本 session 仅调研+设计。14-agent code-grounded recon + cross-cut 对抗复审, 覆盖 paimon 5 功能区(普通读/系统表/procedure/DDL/mtmv)旧框架实现 → 映射新 catalog SPI → 对齐 maxcompute 连接器接口一致性。 新增: - research/p5-paimon-migration-recon.md: 5 区旧实现 + E1–E10 SPI 状态 + 跨切面风险 + MC 一致性 11 约定 + 测试基线 - tasks/P5-paimon-migration.md: old→new 映射 + 30 TODO/B0–B9 批 + 批次依赖图 + 验收标准 用户签字决策: - D-037 (P5-D1): flavor=单 Catalog + createCatalog flavor switch(MC 一致, 不建 backend 模块——5 个 backend 模块是空壳) - D-038 (P5-D2): MTMV/MVCC 桥 P5 内实现(fe-core PaimonPluginDrivenExternalTable), 翻闸 gated on 它,禁静默读 latest 回归 证伪 3 先验: backend 模块空壳(连接器走单 Catalog stub)/ FE 分发部分已预接 (残留=连接器 listPartitions)/ Base64 非 blocker(BE 有 STD fallback)。 procedure 区=零可迁 doc-only。 doc 同步: connectors/paimon.md(修 3 stale 表述)、decisions-log.md(+D-037/D-038, 36→38)、PROGRESS.md(header/§一/§二/§三/§四/§六/§七)、HANDOFF.md(覆盖,不留折叠历史)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T01: extract PaimonCatalogOps injection seam (5 read methods, B0 read-only) over the paimon SDK Catalog; refactor PaimonConnectorMetadata to inject it (6 call sites migrated, read path byte-for-byte unchanged); build the first fe-connector-paimon test module (no-mockito recording fake, mirroring MC's McStructureHelper): 9 metadata UTs pinning the databaseExists try/catch and the getColumnHandles reload-fallback, FakePaimonTable (fail-loud on non-read methods), and an env-gated live connectivity smoke. T02: R-007 paimon.version 3-way pin invariant comment (FE connector + BE paimon-scanner + preload-extensions already aligned at 1.3.1 via the single fe/pom.xml property); offline FE->BE serialized-Table round-trip smoke (real FileSystemCatalog -> connector encode -> BE-mirrored URL-first/STD-fallback decode, asserts rowType/partition/primary keys); parity-baseline doc inventorying the 41 existing regression suites as the after-cutover parity gate plus the real connector-side gaps and the live-e2e hard gate. Connector module: Tests run: 12, Failures: 0, Errors: 0, Skipped: 1 (the skip is the env-gated live test); checkstyle 0; import-gate clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-Catalog flavor switch on paimon.catalog.type for all five flavors (filesystem/hms/rest/jdbc/dlf), mirroring the legacy fe-core flavor properties without importing fe-core/fe-common. - New PaimonCatalogFactory: pure validate() + buildCatalogOptions() (paimon.catalog.type -> paimon `metastore` opt, per-flavor options, paimon.* passthrough excl storage prefixes) + buildHadoopConfiguration / buildHmsHiveConf / buildDlfHiveConf + requireOssStorageForDlf. - PaimonConnector: thread ConnectorContext; createCatalog wires all 5 flavors live (filesystem/jdbc with Hadoop Configuration, rest Options-only, hms/dlf with HiveConf), each wrapped in context.executeAuthenticated (Kerberos seam). JDBC DriverShim ported with driver-url resolution via getEnvironment() (replaces forbidden JdbcResource). - PaimonConnectorProperties: all flavor key constants (multi-alias String[]). - PaimonConnectorProvider: validateProperties override -> factory.validate. - pom: add paimon-hive-connector-3.1 + hadoop-common + hive-common (hive-common over hive-catalog-shade to avoid the fastutil conflict). - 31 new no-mockito unit tests (PaimonCatalogFactoryTest); module 43/0/0/1, checkstyle 0, import-gate clean. hms/dlf live connection is gated on B7 cutover + live-e2e: the Thrift metastore client is host-provided (not bundled) with a child-first Configuration/HiveConf cross-loader hazard to verify; jdbc driver_url FE security allow-list + external hive-site.xml file load are deferred. All documented in code NOTEs and plan-doc. rest also requires warehouse (legacy parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connector-side only; no fe-core / fe-connector-api / fe-connector-spi changes. B2 and B3 were both uncommitted and are entangled in the same files (PaimonConnectorMetadata, PaimonCatalogOps, PaimonConnector, RecordingPaimonCatalogOps), so they are committed together. B2 normal-read (T06-T10): - T06 PaimonScanPlanProvider transient-Table reload fallback (planScan + getScanNodeProperties both guarded) - T07 PaimonPredicateConverter parity-correct TZ (NTZ keeps UTC, LTZ not pushed) + supportsCastPredicatePushdown=false - T08 listPartitionNames/listPartitions/listPartitionValues (legacy display-name parity) + seam listPartitions(Identifier) - T09 doc-only pure-predicate pruning; T10 cache deferred to B8 B3 DDL metadata (T11-T15): - T11 PaimonTypeMapping.toPaimonType (Doris->paimon, byte-parity with legacy DorisToPaimonTypeVisitor; narrow gap preserved) - T12 PaimonSchemaBuilder (ConnectorCreateTableRequest -> paimon Schema) - T13 createTable/dropTable + seam DDL methods + ConnectorContext threaded (D7=B: each DDL op wrapped in executeAuthenticated; read path un-wrapped) - T14 supportsCreateDatabase/createDatabase (HMS-props gate) + dropDatabase(force) (enumerate-loop + native cascade) - T15 offline UTs (no-mockito; WHY+MUTATION) Verified: fe-connector-paimon Tests run: 96, Failures: 0, Errors: 0, Skipped: 1 (live); checkstyle 0; connector import-gate 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port paimon system tables and MVCC snapshots onto the plugin connector SPI. - T16: greenfield E7 SPI on ConnectorTableOps — listSupportedSysTables + getSysTableHandle (default no-ops; MC/jdbc/es/trino unaffected). - T17: PaimonConnectorMetadata implements E7 — names from SystemTableLoader.SYSTEM_TABLES; sys table loaded via the existing getTable seam with a 4-arg Identifier(db,table,"main",sysName); sys handle carries sysTableName + forceJni (binlog/audit_log); shared PaimonTableResolver gives metadata + scan one sys-aware reload rule. - T18: generic fe-core glue — PluginDrivenExternalTable centralizes handle acquisition into resolveConnectorTableHandle and delegates getSupportedSysTables to the connector; new PluginDrivenSysExternalTable (reports PLUGIN_EXTERNAL_TABLE) + PluginDrivenSysTable reuse the live SysTableResolver/NativeSysTable machinery (reusable by future connectors). - T19: forceJni gate so binlog/audit_log go JNI not native; buildTableDescriptor -> HIVE_TABLE (also fixes a latent normal-table SCHEMA_TABLE descriptor gap, DV-024); PluginDrivenScanNode fail-loud guard rejects scan-params/time-travel on system tables. - T20: first E5 MVCC consumer — beginQuerySnapshot/getSnapshotAt/getSnapshotById (empty table -> -1; sys handle -> empty) + SUPPORTS_MVCC_SNAPSHOT/TIME_TRAVEL capabilities. Inert until B5 wires the fe-core MvccTable consumer. Decisions: D-039 (E7 reuses the live SysTable machinery; RFC §10's $-suffix-via-getTableHandle design was never implemented and is superseded, DV-023). Deviations: DV-023, DV-024. Verification: import-gate 0; connector 124 tests pass (1 live skipped); fe-core PluginDriven*Test 100 pass; checkstyle 0; no cutover/B5 leakage (paimon not in SPI_READY_TYPES; PluginDrivenExternalTable still not an MvccTable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ridge + time-travel + procedure doc no-op B5a (MTMV/MVCC bridge): source-agnostic PluginDrivenMvccExternalTable (MTMVRelatedTableIf+MTMVBaseTableIf+MvccTable, D-042) wiring the B4-inert E5 snapshot SPI; PluginDrivenMvccSnapshot; list-partitions-at-snapshot. B5b (time-travel): scan-pin + AS-OF + tag + branch + @incr across connector (ConnectorTimeTravelSpec, PaimonIncrementalScanParams) and fe-core; holistic review fixes RD-1 (partitioned time-travel empty-universe scan-all guard in PluginDrivenScanNode) + RD-2 (@incr lists-latest partitions/schema). B6/T26: procedure doc no-op — zero migratable code; closed-form reject verified (ExecuteActionFactory:59-62 / CallFunc:42-43). All inert/gated until B7 cutover (paimon NOT yet in SPI_READY_TYPES). Excludes regression-conf.groovy (secrets) + scratch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview fixes Combines all previously-uncommitted P5 paimon work into one commit (per request). 8 fullpath-review fixes (BLOCKERs + key MAJORs) — connector + SPI + fe-core bridge: - FIX-STORAGE-CREDS: applyStorageConfig translates canonical s3.*/oss.*/AWS_* -> fs.s3a./fs.oss. (+DLF region->OSS endpoint) - FIX-NATIVE-PARTVAL: per-type serializePartitionValue + session TZ (LTZ only); binary/varbinary drops the partition map (no [B@hash garbage) - FIX-TZ-ALIAS: full legacy ZoneId.SHORT_IDS + 4 Doris overrides alias map (CST/PST/EST now resolve for FOR TIME AS OF datetime strings) - FIX-TABLE-STATS: getTableStatistics override + PaimonCatalogOps.rowCount seam (normal AND system tables, via the sys-aware resolveTable) - FIX-CPP-READER: honor enable_paimon_cpp_reader -> native DataSplit.serialize so BE's PaimonCppReader can decode the split - FIX-READ-NOTNULL: mapFields forces read-path columns nullable (legacy parity) - FIX-HMS-CONFRES: new ConnectorContext.loadHiveConfResources hook + 2-arg buildHmsHiveConf file-base merge (external hive-site.xml reaches the metastore) - FIX-REST-VENDED: new ConnectorContext.vendStorageCredentials hook + scan-props vended AWS_* overlay (REST per-table tokens reach BE) Also carries the previously-uncommitted B7 core cutover + D-045/D-046 restores. Tests: fe-connector-paimon 213 pass / 0 fail / 1 skip (live-gated); fe-core compiles + DefaultConnectorContextVendTest 2/0. Each fix's root-cause/patch/UT and impl-time corrections are in plan-doc/tasks/designs/P5-fix-<id>-design.md. Excluded from this commit: regression-test/conf/regression-conf.groovy (plaintext Aliyun keys, pending scrub) and scratch dirs (.audit-scratch/, conf.cmy/, META-INF/, *.bak). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical scheme Root cause: the paimon connector sent native ORC/Parquet data-file paths and deletion-vector (DV) paths to BE un-normalized. The paimon SDK emits warehouse-native schemes (oss://, cos://, obs://, s3a://, or the OSS bucket.endpoint authority form); BE's scheme-dispatched S3 file factory only recognizes s3://. On S3-compatible (non-AWS) warehouses this breaks native reads outright (B-7DF, data file) and silently drops the DV so DELETEd rows reappear (B-7DV, merge-on-read corruption). Legacy PaimonScanNode normalized both via the 2-arg LocationPath.of; the cutover dropped it. The two paths reach BE via different mechanisms (data-file through PluginDrivenSplit's single-arg LocationPath.of -> FileQueryScanNode:568; DV baked into thrift by the connector's populateRangeParams), so a fe-core-bridge-only fix cannot reach the DV path. Solution: new ConnectorContext.normalizeStorageUri SPI hook (identity default, mirroring vendStorageCredentials), implemented in DefaultConnectorContext via the engine's 2-arg normalizing LocationPath.of with the catalog's static storage map (threaded via a new lazy supplier + 4-arg ctor; PluginDrivenExternalCatalog wires it). The connector routes BOTH the data-file and DV paths through it inside the extracted, unit-testable buildNativeRange. JNI path untouched (carries its own FileIO). Fail-loud on un-normalizable paths (legacy parity). Static-vs-vended map scope noted in DV-025 (the pure-vended edge belongs to credential fixes #2/#3). Tests: fe-core DefaultConnectorContextNormalizeUriTest (oss->s3, s3 idempotent, null/blank, empty-map fail-loud); connector PaimonScanPlanProviderTest x3 (both paths normalized + call count, DV-less, no-context raw). paimon module 216/0/0, fe-core targeted green, checkstyle 0, import-gate clean. Live OSS+DV e2e CI-gated (not run). SPI RFC section 21 (E13), deviations DV-025. Also includes the round-2 review report + task list this fix derives from. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark FIX-URI-NORMALIZE complete (commit 20b19d1) in the task list and update HANDOFF: #1 summary + verification, next session starts at #2 (reuse the normalizeStorageUri BE-scan-prop normalization seam), and the standing reminders (regression-conf.groovy still holds a plaintext key -> path-whitelist only; P2 apache#8/apache#9 need user scope decision first). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical AWS_* Finding B-9 (BLOCKER, rereview2). The paimon connector copied static catalog-level storage credentials/config verbatim into the BE scan-node properties: PaimonScanPlanProvider.getScanNodeProperties iterated the raw catalog properties and emitted location.<rawkey> for any s3./oss./cos./obs./ hadoop./fs./dfs./hive. prefix; the fe-core bridge only strips the location. prefix. BE's native (FILE_S3) reader understands ONLY AWS_ACCESS_KEY/ AWS_SECRET_KEY/AWS_ENDPOINT/AWS_REGION/AWS_TOKEN, so static s3.access_key/ oss.access_key on a private bucket reached BE unintelligible -> no usable credentials -> 403. This is the third credential seam (static->BE-scan), missed by both the prior round and the 8 fixes (review §9.3); the catalog- FileIO seam (FIX-STORAGE-CREDS) and the vended seam (FIX-REST-VENDED) were already closed. Root cause: legacy PaimonScanNode.getLocationProperties returns only CredentialUtils.getBackendPropertiesFromStorageMap(storagePropertiesMap) (the canonical AWS_*/hadoop/dfs map). The cutover replaced that single normalized call with a raw prefix-copy loop; the connector cannot import fe-core's StorageProperties so it had no access to the normalization. Solution (D-048, user-signed full legacy-parity scope): new no-op-default SPI ConnectorContext.getBackendStorageProperties(); DefaultConnectorContext returns getBackendPropertiesFromStorageMap over the storagePropertiesSupplier already wired in FIX-URI-NORMALIZE (no ctor change, CredentialUtils already imported). The connector replaces its raw prefix-copy loop with a context-gated overlay of that map; the vended overlay stays after it (vended wins on collision, legacy precedence). Object-store creds -> AWS_*; HDFS -> canonical hadoop/dfs (preserves user overrides + adds the legacy defaults, folding in the §211 MINOR); drops the non-parity hive.* passthrough. Investigated the AWS_CREDENTIALS_PROVIDER_TYPE=ANONYMOUS two-step edge and confirmed via BE s3_util.cpp (both providers prefer explicit ak/sk over cred_provider_type) that it is harmless — no regression. Connector import-gate stays clean. Tests: fe-core DefaultConnectorContextBackendStoragePropsTest (OSS static creds -> AWS_*, raw alias absent; no-supplier -> empty); connector PaimonScanPlanProviderTest (+getScanNodePropertiesNormalizesStaticCreds raw alias not shipped; modified vended-overlay collision to canonical keys; renamed no-context test -> emits no storage props). Fail-before/pass-after proven by reverting the connector change (2/3 go red). Module 217/0/0 (1 CI-gated skip), checkstyle clean, import-gate clean. Live private-bucket native-read e2e is CI-gated (not run). SPI RFC §22 (E14). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record FIX-STATIC-CREDS-BE commit d23d5df in the task-list and update HANDOFF.md (HEAD, migration chain, completed/next sections). Next: #3 FIX-SCHEMA-EVOLUTION (B-1a+M-10) — the largest P0 SPI surface, independent of #1/#2; recommend a fresh session. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ema_info from the connector
Root cause (rereview2 BLOCKER B-1a): on the native (ORC/Parquet) read path the
paimon connector emitted only the per-file TPaimonFileDesc.schema_id but never
set the scan-level TFileScanRangeParams.current_schema_id / history_schema_info.
BE (table_schema_change_helper.h:219-237) then took the !__isset branch and fell
back to NAME-based file<->table column matching, so a schema-evolved (renamed /
reordered) table read NULL/garbage for the renamed columns silently. JNI path is
unaffected; native is the default. (M-10, Column.uniqueId=-1, deferred — DV-026.)
Design C (user-signed D-049): BE's field-id matcher (table_schema_change_helper
.cpp:312-430) reads only TField.id/name and a nested-vs-scalar type.type tag — no
Doris Type, no tuple descriptor — and org.apache.doris.thrift.* is import-legal in
connectors, so the connector builds the TSchema dictionary directly from paimon
SchemaManager and ships it via the existing populateScanLevelParams hook (the seam
DV-006 anticipated for hudi). Zero new SPI surface; connector-only.
- current_schema_id = -1; history_schema_info = the -1/current (pinned) schema +
one entry per SchemaManager.listAllIds() so every native file schema_id is
covered (BE fails loud on a missing entry, never silent).
- transport: base64 TBinaryProtocol carrier (a throwaway TFileScanRangeParams)
via a props key, because getScanPlanProvider() is per-call (no shared state).
Clean-room 3-lens review found 2 real BLOCKERs in the -1/current entry (both fixed
+ re-verified): (1) column-name casing — BE keys the table-side StructNode by the
-1 entry's name verbatim while the native reader queries the lowercase Doris slot
name, and current_schema_id=-1 never hits the ConstNode fast-path, so a mixed-case
column crashed (std::out_of_range) even on never-evolved tables; fix lowercases
ONLY top-level names (default-locale, matching the slot-name producer + legacy
parseSchema:507; nested stays paimon-cased per legacy PaimonUtil:302). (2) time
travel — the -1 entry used schemaManager.latest() (absolute latest) instead of the
snapshot-pinned schema the tuple uses; fix builds it from FileStoreTable.schema()
(pinned) and narrows the guard DataTable->FileStoreTable. Eager all-schemas read
accepted as a fail-loud deviation (DV-027).
Tests: PaimonScanPlanProviderTest +5 (field-id/name carriage, nested ARRAY/MAP/
STRUCT shape + struct-child ids, scalar tag, rename round-trip apply, top-level
lowercase vs nested paimon-case, non-FileStoreTable skip). Module 222/0/0 (1
CI-gated skip), checkstyle clean, import-gate clean. e2e
test_paimon_full_schema_change.groovy is CI-gated (not run). Design doc + D-049 +
DV-026/DV-027 + SPI RFC §23 (no new SPI).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…at CREATE (B-8a + B-8b) rereview2 #4. JDBC-metastore-flavor paimon catalogs only. Connector-only, zero new SPI. Root cause: - B-8a (functional BLOCKER): PaimonScanPlanProvider.getBackendPaimonOptions forwarded driver_url to BE RAW and its `key.startsWith("jdbc.")` filter dropped the `paimon.jdbc.*` alias. A bare `jdbc.driver_url=mysql.jar` reached BE, where JdbcDriverUtils.registerDriver does `new URL(value)` -> MalformedURLException; a `paimon.jdbc.driver_url` alias was dropped outright. Legacy PaimonJdbcMetaStoreProperties.getBackendPaimonOptions emits `jdbc.driver_url=JdbcResource.getFullDriverUrl(driverUrl)` (resolved) + `jdbc.driver_class`. - B-8b (security): driver_url was loaded into the FE JVM (URLClassLoader) and shipped to BE with no format / jdbc_driver_url_white_list / jdbc_driver_secure_path validation, plus a stale "paimon is not in SPI_READY_TYPES" disclaimer (false since the B7 cutover added paimon to CatalogFactory SPI_READY_TYPES). Solution (reuses existing hooks; no new SPI surface): - B-8a: getBackendPaimonOptions now reads driver_url via firstNonBlank(JDBC_DRIVER_URL) (honors both the jdbc.* and paimon.jdbc.* alias) and emits the canonical `jdbc.driver_url` RESOLVED to a scheme-bearing URL plus `jdbc.driver_class` (BE accepts both alias forms). Resolution is extracted to a shared static PaimonCatalogFactory.resolveDriverUrl(driverUrl, env) so FE driver registration and the BE-bound options resolve a given driver_url identically. - B-8b: PaimonConnector overrides Connector.preCreateValidation to route a configured driver_url (either alias) through ConnectorValidationContext.validateAndResolveDriverPath at CREATE CATALOG (format/whitelist/secure-path; throws -> CREATE fails before the jar loads). Mirrors JdbcDorisConnector. Stale disclaimer replaced with an accurate note. Scope (user-signed D-050; see DV-028/DV-029): validation is CREATE-time only — parity with the JDBC reference connector. The FE-restart-reload / ALTER-CATALOG / scan-time re-validation gap is a pre-existing fe-core limitation shared by all plugin connectors (default config is permissive); accepted, with a cross-connector follow-up filed. BE-side paimon.jdbc.{user,password,uri} alias-drop is out of scope (BE deserializes the table from serialized_table; only driver_url/driver_class are consumed by registerDriverIfNeeded). Tests: PaimonScanPlanProviderTest +5 (resolve bare name, honor paimon.jdbc.* alias, both-aliases priority+override, preserve scheme-bearing, non-jdbc empty); new PaimonConnectorPreCreateValidationTest +5 (validate jdbc/alias, skip non-jdbc/no-driver_url, propagate rejection). Module 232/0/0 (1 CI-gated skip); fail-before verified (5/9 new tests red when neutered); checkstyle 0; connector import-gate clean. Live e2e (JDBC flavor + remote jar) is CI-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon type-mapping toggles
Root cause: after the SPI cutover the paimon connector reads the type-mapping
toggles from UNDERSCORE keys (enable_mapping_binary_as_varbinary /
enable_mapping_timestamp_tz; PaimonConnectorProperties:39,42 ->
PaimonConnectorMetadata.buildTypeMappingOptions), but fe-core only ever writes
the canonical DOTTED catalog keys (enable.mapping.varbinary /
enable.mapping.timestamp_tz; CatalogProperty:50,52, written/defaulted by
ExternalCatalog.setDefaultPropsIfMissing and hidden via HIDDEN_PROPERTIES).
PluginDrivenExternalCatalog.createConnectorFromProperties hands the connector
the raw catalog property map verbatim, so getOrDefault(underscore,"false") is
always false. Even when the user enables the mapping at CREATE CATALOG, Paimon
BINARY stays STRING and TIMESTAMP_WITH_LOCAL_TIME_ZONE stays DATETIMEV2 — a
silent cutover regression (legacy PaimonExternalTable:350 reads the dotted key
and honors it). The binary key is doubly drifted (separator . -> _ AND token
varbinary -> binary_as_varbinary), so a generic dot->underscore normalizer
would not fix it. Latent until the flag is enabled.
Re-confirmation: M-crit was critic-surfaced (not 3-lens-gated), so the finding
was independently re-verified by a 5-agent scout + adversarial synthesizer
(REAL_BUG, high confidence; false-positive steelman rejected — dotted is
canonical per the original feature PRs, every regression CREATE CATALOG, legacy
parity, and the JDBC connector which kept dotted in the same SPI PR).
Solution (connector-only, zero new SPI, no BE): re-point the two
PaimonConnectorProperties constants to the canonical dotted keys
(ENABLE_MAPPING_VARBINARY = "enable.mapping.varbinary", renamed from
ENABLE_MAPPING_BINARY_AS_VARBINARY to match the CatalogProperty/JDBC/iceberg
convention and fix both separator and token; ENABLE_MAPPING_TIMESTAMP_TZ =
"enable.mapping.timestamp_tz") and update the one reference in
PaimonConnectorMetadata. No logic change — the Options(mapBinaryToVarbinary,
mapTimestampTz) arg order is already correct. BE-side consistency verified:
PluginDrivenScanNode extends FileQueryScanNode and inherits the dotted-key read
for the BE scan param (FileQueryScanNode:192-193,635-678), so FE column type
and BE scan param now agree (they diverged before this fix).
Scope: paimon-only (user-signed D-051). NEW hive + iceberg connectors share the
identical root cause; logged as a cross-connector follow-up (DV-030), not fixed
here. Rejected an fe-core dot->underscore normalizer (broader blast, breaks
JDBC which already reads dotted, and insufficient for paimon's renamed token).
Tests (PaimonConnectorMetadataTest): +2 UT. getTableSchemaHonorsDottedMappingKeys
(bug-catcher) sets the dotted keys true and asserts BINARY->VARBINARY /
LTZ->TIMESTAMPTZ; getTableSchemaDefaultsMappingFlagsOff (guard) asserts the
default-off STRING/DATETIMEV2. Module 234/0/0 (1 CI-gated skip), checkstyle 0,
import-gate clean. Fail-before verified: the bug-catcher reddens on the
underscore key (expected <VARBINARY> but was <STRING>) while the guard stays
green. E2E test_paimon_catalog_{varbinary,timestamp_tz}.groovy are CI-gated
(enablePaimonTest=false + external fixture) — not run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… all read RPCs in doAs (M-11)
Both are Kerberos-only (harmless on simple-auth: the no-op authenticator's
execute() == task.call()).
Root cause
- M-8 (fe-core): paimon filesystem/jdbc catalogs over Kerberized HDFS lost UGI
doAs on the cutover path. The HDFS HadoopExecutionAuthenticator is built only
inside initializeCatalog(), which is dead on the plugin path (only legacy
PaimonExternalCatalog calls it), so PluginDrivenExternalCatalog read the base
no-op from getExecutionAuthenticator(). HMS was unaffected — it wires the
authenticator in initNormalizeAndCheckProps(), which always runs.
- M-11 (connector): metadata read RPCs (listDatabases/getDatabase/listTables/
getTable[handle+sys+resolveTable]/listPartitions) ran without
executeAuthenticated; only the 4 DDL ops were wrapped (signed D7=B read-vs-DDL
asymmetry). On a Kerberos HMS catalog these reads ran outside the catalog
principal. Legacy wrapped every read.
Fix
- M-8 (filesystem+jdbc only; DLF/REST/HMS excluded — DLF uses Aliyun STS not
Kerberos, the review's "DLF" clause was overstated): new internal fe-core hook
MetastoreProperties.initExecutionAuthenticator(List<StorageProperties>) (default
no-op), invoked by PluginDrivenExternalCatalog.initPreExecutionAuthenticator from
the already-built storage list; filesystem/jdbc override it to build the HDFS
authenticator (shared AbstractPaimonProperties helper), mirroring HMS. No
connector change; no connector SPI change.
- M-11 (full legacy parity, signed D-052, supersedes the D7=B read clause): wrap
all 7 connector read RPCs in context.executeAuthenticated. A single resolveTable
wrap covers all resolveTable callers (metadata + scan). Domain exceptions are
caught INSIDE the lambda because Kerberos UGI.doAs wraps a thrown checked
Catalog.*NotExistException in UndeclaredThrowableException.
Tests
- M-11: PaimonConnectorMetadataReadAuthTest (12) + 2 scan-path tests assert each
read runs inside executeAuthenticated (RecordingConnectorContext failAuth/
authCount). Connector module 248/0/0 (1 CI-gated skip).
- M-8: Paimon{FileSystem,Jdbc}MetaStorePropertiesTest assert getExecutionAuthenticator()
returns HadoopExecutionAuthenticator after wiring without initializeCatalog;
fe-core metastore-props 21/0/0 (DLF/HMS regression-clean).
- fail-before verified red for both (M-8: stays base no-op AbstractPaimonProperties$1;
M-11: authCount/log-empty).
- True end-to-end doAs is live-Kerberos-e2e only (no paimon-kerberos suite); DV-031.
Decisions D-052 (M-11) / D-053 (M-8); deviation DV-031; design
plan-doc/tasks/designs/P5-fix-KERBEROS-DOAS-design.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon connector scan path (M-1) Root cause: the cutover (plugin) connector's split router read only the name-derived handle flag paimonHandle.isForceJni() (the binlog/audit_log NAME hatch) and never consulted the session var force_jni_scanner, so ORC/Parquet always took the native reader — legacy's JNI escape hatch (SET force_jni_scanner=true, used to dodge native-reader bugs incl. the B2 schema-evolution class) was silently gone. The connector ported only two of legacy's three native-gate conjuncts (PaimonScanNode.java:430: !forceJniScanner && !forceJniForSystemTable && supportNativeReader); the dropped !forceJniScanner conjunct is M-1. Solution (pure connector; no SPI, no fe-core import, no BE param — legacy serializes nothing for this var): - new isForceJniScannerEnabled(session): byte-for-byte mirror of isCppReaderEnabled, reads key "force_jni_scanner" (byte-identical to SessionVariable.FORCE_JNI_SCANNER) from the same VariableMgr.toMap channel; null-guarded, default false (legacy default). - Site A (correctness): shouldUseNativeReader gains an explicit forceJniScanner param (mirrors legacy's sibling boolean 1:1) ANDed into the native gate; planScan passes isForceJniScannerEnabled(session). The handle name-force is OR-sibling, never replaced (binlog/audit_log intact). - Site B (correctness-neutral): getScanNodeProperties suppresses the native-only paimon.schema_evolution dict when force_jni_scanner routes every split to JNI (BE consumes it only on native ORC/Parquet ranges; JNI/cpp readers ignore it). Matches the connector's own documented contract. Tests (fail-before + pass-after both verified): - isForceJniScannerEnabledReadsSessionProperty: pins the exact key, default-false, null-safety. - forceJniScannerRoutesNativeEligibleSplitToJni: a native-eligible split must route to JNI when force_jni_scanner=true (legacy parity). - 3 existing shouldUseNativeReader calls updated for the new param. - Module 250/0/0 (+1 CI-gated live skip); connector import-gate + checkstyle clean. - Real BE reader selection is a CI-gated live-e2e check (no offline coverage). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-COUNT-PUSHDOWN (P2, ask scope first) - task-list: #7 row → ✅ design/impl/build(250/0/0)/commit `05132a42668` + DONE detail. - HANDOFF: #7 summary (3rd-param overrides synthesizer call-site-OR per Rule 9; Site B correctness-neutral, no offline red test honestly noted); next = apache#8/apache#9 P2 perf-parity → AskUserQuestion for scope (accept-or-defer) BEFORE implementing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(*) on plugin paimon (M-2) Root cause: after cutover, COUNT(*) over a plugin-driven paimon table is result-correct but slow. The COUNT enum already reaches BE (FileScanNode.toThrift:90; PhysicalPlanTranslator:873 sets it on the plugin node, not excluded) and the per-range emit seam is already built (PaimonScanRange.Builder.rowCount -> paimon.row_count -> setTableLevelRowCount, byte-identical to legacy PaimonScanNode:303-308). The missing half is the signal + compute: DataSplit.mergedRowCount() is paimon-SDK-only (connector), and the getPushDownAggNoGroupingOp()==COUNT signal lives only on the fe-core node and reached nobody. So every split carried table_level_row_count=-1 and BE materialized the full post-merge row set just to count (file_scanner.cpp: 1298-1326) — costly on PK/MOR tables. Not pure-connector: the signal must cross the SPI boundary. Threading it via ConnectorSession (the FIX-FORCE-JNI precedent) was rejected — the agg-op is a per-query planner output, not a SET-variable, and would be a silent untyped channel. Solution (3 files; user signed off, D-054): - SPI (ConnectorScanPlanProvider): new default planScan overload carrying `boolean countPushdown`, delegating to the 6-arg variant — mirrors the limit/requiredPartitions extension chain; other connectors are no-op (E15). - fe-core (PluginDrivenScanNode.getSplits): read getPushDownAggNoGroupingOp()==TPushAggOp.COUNT and forward the flag. No post-loop math. - connector (PaimonScanPlanProvider): extract planScanInternal(...,countPushdown) (4-arg delegates false, new 7-arg delegates the flag); add the count short-circuit as the FIRST routing arm (a count-eligible split must not also emit a data range, else BE double-counts vs deletion vectors / PK merge); collapse-to-one — sum every count-eligible split's mergedRowCount and emit ONE JNI count range bearing the total (= legacy's <=10000 singletonList + assignCountToSplits case). New members: static isCountPushdownSplit + buildCountRange. Param shape = boolean (BE only needs COUNT-vs-not), scope = paimon-only (default no-op). legacy's >10000 parallel-split trim is intentionally dropped (connector has no numBackends, an fe-core-only concern) — perf-only divergence, result identical (DV-032). No new thrift, no BE change. Tests: connector PaimonScanPlanProviderTest +2 — isCountPushdownSplit eligibility on a real split (true/2, disabled/false); end-to-end planScan over a PARTITIONED PK table with asymmetric per-partition counts (2 + 3) asserting collapse-to-one carrying the SUM (5, unreachable from any single split) and no row_count when the flag is off. Connector 252/0/0 (1 CI-gated live skip), fe-core compile + checkstyle 0, import-gate clean. Fail-before verified: neuter isCountPushdownSplit->false -> the count tests red; mutate `countSum +=` -> `=` -> the cross-split-sum assertion red. Real BE CountReader selection / EXPLAIN = CI-gated live-e2e (existing legacy paimon count regression covers the BE contract). Adversarially reviewed (workflow wf_6ead7c2c-b58): one MAJOR caught and fixed (the collapse/sum test was degenerate on a single-split fixture); two MINORs refuted (batch-path signal moot for paimon; EXPLAIN count-line drop is cosmetic, noted in DV-032). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…files for read parallelism (M-3) Root cause: after cutover, a large native (ORC/Parquet) paimon data file gets ONE scanner — no intra-file parallelism. The connector's native arm emitted exactly one PaimonScanRange per RawFile (start=0, length=file.length()). Legacy PaimonScanNode:434-465 sub-splits each large file via determineTargetFileSplitSize + fileSplitter.splitFile. Result is correct (BE reads the whole file either way); only read parallelism regresses. Recon (wf_ad764bf6-1c9) confirmed: it is a real gap (ORC/Parquet are PLAIN/splittable, legacy does sub-split); DV x sub-split is SAFE (paimon deletion-vector rowids are GLOBAL file row positions, BE native readers report global positions even within a partial byte range, _kv_cache shares the DV bitmap across sub-splits keyed by path+offset, iceberg uses the identical machinery on routinely-split files); and it is pure-connector (the splitter math + 5 session vars re-stated with plain longs — the connector cannot import fe-core FileSplitter/SessionVariable). Solution (pure connector, zero SPI, zero fe-core; D-055): - Two pure statics: computeFileSplitOffsets(fileLength, targetSplitSize) ports FileSplitter.splitFile's specified-size branch byte-for-byte incl. the >1.1D tail guard (the last range absorbs a remainder up to 1.1x instead of a tiny tail split); determineTargetSplitSize(...) ports determineTargetFileSplitSize + applyMaxFileSplitNumLimit (the isBatchMode->0 branch omitted — paimon is never batch). - sessionLong + lazy resolveTargetSplitSize read the 5 file-split session vars via the VariableMgr.toMap channel (like isCppReaderEnabled) and sum native-eligible file sizes once per scan. - Native arm: emit one range per [start,length) sub-range via buildNativeRanges, attaching the SAME unmodified per-RawFile DeletionFile to EVERY sub-range (DV is global-row-position indexed; no offset re-basing). buildNativeRange gains (start, length); fileSize stays the whole file length. - Under COUNT(*) pushdown a native split that is not count-eligible (no precomputed merged count, e.g. a DV with null cardinality) is kept WHOLE (target size 0 -> one whole-file range), mirroring legacy splittable=!applyCountPushdown. The split-weight/target-size scheduling nicety is not ported (pre-existing native path already omitted it; perf/scheduling-only, not correctness) -> DV-033. Tests: connector PaimonScanPlanProviderTest +6 — computeFileSplitOffsets math (250MB/64MB->4 with 58MB tail, exact-multiple, small-file-whole, empty, target<=0); determineTargetSplitSize heuristic (file_split_size override, 32MB<->64MB threshold, max_file_split_num floor); end-to-end append-only fixture (tiny file_split_size -> >=2 contiguous sub-ranges tiling [0,fileLength); default -> 1 range); DV on every sub-range; whole-file under count pushdown. Updated the 3 existing buildNativeRange call sites to the new signature. Connector 258/0/0 (1 CI-gated live skip), checkstyle 0, import-gate clean. Fail-before verified: neuter computeFileSplitOffsets -> the 3 splitting tests red; attach DV only to the first sub-range -> the DV test red. Real BE multi-range + DV read = CI-gated live-e2e (legacy paimon regression covers the BE contract; no BE change). Adversarially reviewed (workflow wf_4ac7479d-39d): 2 confirmed and fixed (the count-pushdown sub-split parity gap + false comment; the missing DV-on-every-sub-range test), 2 refuted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hand off P3 coverage-gap verification - FIX-COUNT-PUSHDOWN (apache#8, M-2) = 525be03; FIX-NATIVE-SUBSPLIT (apache#9, M-3) = 2f5f467. - Both recon'd (multi-scout workflow) + adversarially reviewed before commit; each review caught a real finding (degenerate test / parity gap) that was fixed. - P0/P1/P2 all clear. Next: P3 coverage gaps (verify, not fix) — FIX-HMS-CONFRES re-check, DDL write parity, ANALYZE/column-stats, split-count accounting, cross-connector follow-ups. - task-list apache#9 commit hash finalized; HANDOFF overwritten. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rejection in PluginDrivenExternalCatalog.createTable
Root cause: the generic fe-core bridge PluginDrivenExternalCatalog.createTable
collapsed legacy PaimonMetadataOps.performCreateTable's ordered remote-then-local
existence probe into a single `exists` OR that was consumed ONLY by the IF NOT
EXISTS branch. The !IF NOT EXISTS path ignored it and unconditionally called
metadata.createTable. So a table present only in the local FE cache (a case-variant
folded onto an existing name under lower_case_meta_names, absent on a case-sensitive
remote) was CREATED remotely instead of rejected with ERR_TABLE_EXISTS_ERROR --
silent metadata corruption. Found by the P3 plugin-vs-legacy parity audit
(adversarially verified); narrow, backend-dependent trigger (filesystem/jdbc paimon;
HMS lowercases so both sides reject). Generic bridge -> also affects MaxCompute /
future iceberg/hudi.
Solution (fe-core bridge only; zero SPI/connector/BE): split the `exists` OR into
remoteExists/localExists; under !IF NOT EXISTS, when localExists is true throw
ERR_TABLE_EXISTS_ERROR (legacy local-arm parity). A remote-only conflict still falls
through to connector.createTable (case A unchanged). Option-2 surgical (D-056); the
residual case-A / all-DDL-op generic-error-code collapse is pre-existing and out of
scope (DV-034).
Tests: new PluginDrivenExternalCatalogDdlRoutingTest
.testCreateTableLocalConflictWithoutIfNotExistsRejects (local-hit + remote-miss +
!IF NOT EXISTS -> asserts DdlException thrown + metadata.createTable never called +
no edit log). fail-before: exactly 1 new test red ("Expected DdlException...nothing
was thrown"); pass-after: 26/0/0. fe-core checkstyle 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…P3-fix landed) P3 "go check" done via adversarial audit wf_25450c36-b7a: HMS-CONFRES / ANALYZE-stats / split-count all PARITY_HOLDS; DDL write surfaced one MAJOR correctness divergence -> FIX-CREATE-TABLE-LOCAL-CONFLICT (67a9b9d). Updates HANDOFF for next steps (P4 cleanup / B8 legacy removal / cross-connector follow-up). No P0/P1/P2/P3 blockers remain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4 N10.1)
Root cause: the plugin read-direction type mapping
PaimonTypeMapping.toVarcharType used `len >= 65533` to overflow a paimon
VarCharType to STRING, while legacy PaimonUtil.paimonPrimitiveTypeToDorisType
uses `len > 65533`. 65533 == ScalarType.MAX_VARCHAR_LENGTH is the legal
exact-fit max VARCHAR, not the STRING wildcard, so the connector widened
VARCHAR(65533) to STRING — a DESCRIBE / SHOW CREATE TABLE reported-type
divergence (data and read correctness unaffected; STRING is a superset).
Fix: change the boundary `>= 65533` -> `> 65533` to match legacy byte-for-byte
(pure connector, 1 char). The unreachable `len <= 0` defensive guard is kept
untouched (paimon VarCharType min length is 1).
Tests: new read-direction PaimonTypeMappingReadTest pins the boundary intent
(65532 -> VARCHAR(65532); 65533 -> VARCHAR(65533) [the fix]; 65534 -> STRING).
Fail-before exactly the 65533 assertion red ("expected VARCHAR but was STRING");
pass-after green. Full module 260/0/0 (1 CI-gated live skip), checkstyle 0,
connector import-gate clean. No BE/SPI change; reported-type parity otherwise
covered by the CI-gated legacy paimon DESCRIBE regression.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion value to NULL (P4) Root cause: PaimonScanRange.populateRangeParams routed paimon partition values through ConnectorPartitionValues.normalize, which applies Hive-directory null-sentinel coercion (a value of "\N" or "__HIVE_DEFAULT_PARTITION__" -> isNull). That coercion is correct for hudi (path-encoded partitions) but wrong for paimon: paimon partition values are TYPED — serializePartitionValue returns Java-null for a genuine null and the literal toString() otherwise — so a null is never a directory sentinel, and the coercion only ever bites a genuine literal value. A string partition column literally holding "\N" (which paimon does NOT reserve) or "__HIVE_DEFAULT_PARTITION__" was materialized as SQL NULL instead of the literal on the native ORC/Parquet read, diverging from legacy PaimonScanNode.setScanParams (source/PaimonScanNode.java:323-326) and yielding wrong rows for WHERE col='\N' / col IS NULL. The dominant genuine-NULL case is unaffected (both sides set isNull=true and BE ignores the rendered value string when is_null==true, partition_column_filler.h:40-44). Fix (1 file): derive isNull from the Java null ONLY (render genuine null as "", legacy-exact); drop the unused ConnectorPartitionValues import. ConnectorPartitionValues itself is left untouched — hudi (HudiScanRange.java:226) legitimately needs the Hive-directory coercion. The residual scan-vs-prune skew for a literal "__HIVE_DEFAULT_PARTITION__" value lives in the generic fe-core prune bridge (TablePartitionValues), is pre-existing and unchanged by this fix, and is logged as a deviation. Tests: new PaimonScanRangePartitionNullTest pins genuine-null -> (isNull=true, ""); literal "\N" -> (isNull=false, "\N"); literal "__HIVE_DEFAULT_PARTITION__" -> (isNull=false, verbatim); ordinary -> kept. Fail-before (re-inlined coercion) reds the literal + render rows; pass-after green. Full module 261/0/0 (1 CI-gated live skip), checkstyle 0, import-gate clean. Adversarial review (5 angles) SAFE_TO_COMMIT: total convergence of all 3 range builders on populateRangeParams; no query goes correct->wrong. No BE/SPI change; native partition materialization otherwise covered by the CI-gated legacy paimon partition regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…035]) Records the P4 cleanup pass disposition (P0–P4 now all clear): - FIX-VARCHAR-BOUNDARY (N10.1) `bcee91dcb52` + FIX-PARTITION-NULL-SENTINEL `4b2c2190dc2` landed as independent fix commits. - 15 items accepted as deviations (M5.1 transient-only + 14 display/perf/text/inert/connector-more-correct/false-premise) → [DV-035]. - D-057 logs the user-signed scope; DV-035 the accepted batch. - task-list §P4 marked done; HANDOFF rolled to next session (B8 legacy deletion or cross-connector follow-up batch). Read-only adversarial recon `wf_6884d37b-8ef` re-verified all ~17 review §5/§7 items against current code; the sentinel ACCEPT verdict was refuted by a prune-path skeptic (converted to FIX) and M5.1's "cheap fallback" premise was refuted at impl level (confirmed ACCEPT). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROPERTIES to paimon Root cause: branch commit 98a73bf (D-046 paimon parity) added LOCATION+PROPERTIES emission to the SHARED PLUGIN_EXTERNAL_TABLE branch of Env.getDdlStmt, gated only on !properties.isEmpty(). JDBC/ES/Trino catalogs are plugin-driven with non-empty getTableProperties() (connection props incl. credentials), so SHOW CREATE TABLE on a JDBC external table emitted LOCATION '' + PROPERTIES("password"=...) instead of the legacy comment-only ENGINE=JDBC_EXTERNAL_TABLE; — a correctness regression (test_nereids_refresh_catalog) and a JDBC credential leak. Still present on HEAD. Solution: gate the LOCATION+PROPERTIES emission additionally on TableType.PAIMON_EXTERNAL_TABLE.name().equals(getEngineTableTypeName()) — only the paimon engine type (the sole plugin-driven connector whose legacy DDL carried LOCATION/PROPERTIES) renders them. JDBC/ES/Trino/MaxCompute revert to comment-only; the credential leak is closed. Did NOT rebaseline the .out (would entrench the leaked-credential output). Tests: fe-core compile SUCCESS + checkstyle clean; adversarial static review SOUND (paimon incl. sys-table unwrap still renders LOCATION/PROPERTIES; jdbc/es/trino/maxcompute match committed comment-only .out; getTableProperties has no other DDL consumer). e2e: external_table_p0/nereids_commands/test_nereids_refresh_catalog (CI external pipeline). See plan-doc/FIX-SHOWCREATE-PLUGIN-PROPS-{design,summary}.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d6c93da to
f7114a2
Compare
…ma-cache (CI 968828) Root cause: PluginDrivenSysExternalTable did not override getSchemaCacheValue(), so it inherited ExternalTable.getSchemaCacheValue() which routes through ExternalCatalog.getSchema() and re-resolves the table by name in the db map. A transient system table (e.g. tbl$snapshots / tbl$manifests) is never registered in that map, so the lookup failed with "failed to load schema cache value for: ...$snapshots". Regression from the paimon SPI migration; legacy PaimonSysExternalTable avoided it by overriding getSchemaCacheValue()/initSchema() to compute on the transient instance. Solution: override getSchemaCacheValue() (and initSchema(SchemaCacheKey)) to compute the schema directly via the inherited PluginDrivenExternalTable.initSchema() (which honors this class's resolveConnectorTableHandle that threads the sys-table handle), memoized with double-checked locking — mirroring legacy PaimonSysExternalTable. Tests: covered by existing e2e suites paimon_system_table ($manifests), paimon_time_travel ($snapshots), test_paimon_system_table_auth (re-run in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…68828) Root cause: PaimonConnectorMetadata.mapFields built ConnectorColumn via the 5-arg ctor, which defaults isKey=false; ConnectorColumnConverter propagates it, so DESC showed Key=false for every paimon column. Legacy PaimonExternalTable/PaimonSysExternalTable always set Column isKey=true (3rd positional arg) for every column, so the .out files expect Key=true. Caused test_paimon_schema_change, test_paimon_char_varchar_type, test_paimon_timestamp_with_time_zone DESC diffs. Solution: pass isKey=true via the 6-arg ConnectorColumn ctor in mapFields (single chokepoint for latest + at-snapshot + system-table schema paths; toSchemaCacheValue preserves isKey on remap). Tests: extended PaimonConnectorMetadataTest.getTableSchemaForcesColumnsNullableForLegacyParity to pin isKey=true for both a PK and a non-PK column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split (CI 968828) Root cause: the paimon (and hudi) plugin-zip bundled org.apache.thrift:libthrift and loaded org.apache.thrift.* child-first (not in the connector parent-first allowlist), while fe-thrift is provided so org.apache.doris.thrift.TFileScanRangeParams resolves parent-first and implements the PARENT's TBase. PaimonScanPlanProvider.encodeSchemaEvolution()'s TSerializer.serialize(carrier) then mixes a child TSerializer with a parent-TBase carrier -> IncompatibleClassChangeError. Being an Error (not Exception), it escaped catch(Exception) and the connection handler, killing the mysql session. This was the dominant CI failure (~19 tests: 2 ANALYZE, the family-D connection drops, and the predict/timestamp_tz/sql_block_rule explain failures). Solution: - Exclude org.apache.doris:fe-thrift + org.apache.thrift:libthrift from the paimon and hudi plugin-zip assemblies, so org.apache.thrift.* resolves from the single parent fe-core copy that also owns org.apache.doris.thrift.* (matches the es/jdbc/hive/maxcompute assemblies). - Defense-in-depth: broaden encodeSchemaEvolution's catch to Exception | LinkageError so any future linkage error surfaces as a clean per-query failure instead of an uncaught Error that kills the whole connection (this is what turned ~5 real failures into ~19 collateral ones). Verified: rebuilt paimon and hudi plugin zips no longer contain libthrift/fe-thrift. Tests: e2e re-run in CI (the native-path paimon suites). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter scans (CI 968828) Root cause: on the SPI plugin scan path, PaimonScanPlanProvider.getScanNodeProperties emitted the paimon.predicate property only when filter.isPresent() && !predicates.isEmpty(), and populateScanLevelParams set the thrift field only when non-null. So a paimon read with no pushed-down filter (e.g. force_jni_scanner=true `select *`) omitted paimon_predicate entirely; BE then omitted the JNI key, and PaimonJniScanner.getPredicates() called PaimonUtils.deserialize(null) -> NPE "encodedStr is null". Legacy PaimonScanNode.createScanRangeLocations always serialized the (possibly empty) predicate list, so the field was always present. Caused test_paimon_catalog_varbinary, paimon_tb_mix_format, paimon_partition_legacy, paimon_timestamp_types, test_paimon_partition_table. Solution: - getScanNodeProperties always serializes the predicate list (empty list -> non-null base64 string) and emits paimon.predicate unconditionally, restoring the legacy invariant. - BE backstop: PaimonJniScanner.getPredicates() treats a null paimon_predicate param as "no filter" (returns emptyList) so the JNI reader never NPEs on a missing param. Tests: PaimonScanPlanProviderTest.getScanNodePropertiesAlwaysEmitsPredicateForNoFilterScan pins that a no-filter scan emits paimon.predicate and it deserializes to an empty list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8-family root-cause analysis (adversarially verified) of the 37 external-regression failures. 7 in-scope paimon-SPI regressions + 2 out-of-scope (hive CTAS stale test; BE shutdown ASAN race). RC-1/2/6/7 fixed (contained); RC-3/4/5 deferred to the docker-gated self-contained-classloader batch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imon plugin (CI 968828) Root cause: the connector sets fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem, but that impl ships only in the thirdparty jindofs jars (packaged by post-build.sh into fe/lib/jindofs, not a maven artifact). The paimon plugin runs child-first, so JindoOssFileSystem resolves from the parent and cannot be cast to the plugin's child-loaded org.apache.hadoop.fs.FileSystem -> "JindoOssFileSystem cannot be cast to FileSystem" -> "Unknown database" on first OSS listing (paimon_base_filesystem, test_paimon_deletion_vector_oss). The maven route is unbuildable (jindo-sdk/jindo-core are bound to an undeclared jindodata repo -> "present but unavailable"; runtime jindofs is 6.10.4, not in maven). Solution: after deploying the connector plugins, copy the jindofs jars (already placed in fe/lib/jindofs by post-build.sh) into the paimon plugin lib so JindoOssFileSystem loads child-first alongside the plugin's own hadoop FileSystem. Naturally gated (no-op unless --jindofs/DISABLE_BUILD_JINDOFS=OFF). CAVEAT (docker-gated, enablePaimonTest=true): jindo-core ships a native lib that binds to one classloader per JVM, so this is safe only while no concurrent non-paimon path loads jindo from fe/lib/jindofs in the same FE process — must be confirmed by the docker paimon suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on plugin (CI 968828) Root cause: the prior fix (FIX-PAIMON-HADOOP-CLASSLOADER) bundled hadoop-aws into the plugin (S3AFileSystem child-first) but NOT the AWS SDK v2 (hadoop-aws declares it as software.amazon.awssdk:bundle, which fe/pom.xml excludes). So the plugin's S3AInternalAuditConstants.<clinit> registered an ExecutionAttribute against the single PARENT-loaded sdk-core static, colliding with fe-core's S3A in ExecutionAttribute.ensureUnique() -> ExceptionInInitializerError that permanently poisoned S3A for the whole FE JVM (test_iceberg_jdbc_catalog/statistics/case_sensibility, test_paimon_statistics). Solution: bundle the AWS SDK v2 (software.amazon.awssdk:s3 + apache-client, BOM-managed 2.29.52) into the plugin child-first, so the plugin's S3A registers against its OWN ExecutionAttribute static. s3's compile closure brings sdk-core (ExecutionAttribute); apache-client is explicit (hadoop-aws wires ApacheHttpClient). software.amazon.awssdk stays child-first (not parent-first) — the separate child SDK copy is the point. Verified: rebuilt plugin zip bundles lib/sdk-core-2.29.52.jar containing software/amazon/awssdk/core/interceptor/ExecutionAttribute.class. Runtime S3A read + assumed-role/STS docker-gated (enablePaimonTest=true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… client (CI 968828) Root cause: paimon-hive-connector's RetryingMetaStoreClientFactory probes getProxy(HiveConf,...) via reflection, but RetryingMetaStoreClient/HiveMetaHookLoader resolved from the parent hive-catalog-shade-3.1.1 whose getProxy overloads use the PARENT's Configuration/HiveConf Class objects -> exact Class-identity mismatch across loaders -> all probes NoSuchMethodException -> "Failed to create the desired metastore client" (test_create_paimon_table). The metastore itself is reachable. Solution: bundle org.apache.hive:hive-metastore:2.3.7 (RetryingMetaStoreClient/HiveMetaStoreClient/ HiveMetaHookLoader + metastore api) child-first so its getProxy(HiveConf,...) overloads compile against the SAME child-bundled hive-common-2.3.9 HiveConf the connector builds. 2.3.7 pairs with hive-common 2.3.9 (API-stable HiveConf) and is fastutil-CLEAN, so unlike hive-catalog-shade it does not reintroduce the fastutil collision. libfb303 rides transitively; server-side datanucleus/derby/hbase/tephra, the stale hadoop-2.7.2 trio + guava, and libthrift are excluded (libthrift stays parent-first like the other connectors). Verified: rebuilt plugin zip bundles lib/hive-metastore-2.3.7.jar (RetryingMetaStoreClient with 5 getProxy(HiveConf) overloads) + libfb303; 0 fastutil entries; no hadoop-2.7.2 leak. The thrift 0.9.3-vs-host-0.16.0 wire skew and the DLF ProxyMetaStoreClient path are docker-gated (enablePaimonTest=true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… BE crash (CI 968880) Root cause (FE behavior change, no BE change): the paimon SPI scan path declared partition columns inconsistently across its two FE channels. The per-split PaimonScanRange.populateRangeParams emits the partition columns as columnsFromPath (so the BE APPENDS them), but the connector never emitted the scan-node-level path_partition_keys property, so PluginDrivenScanNode.getPathPartitionKeys() returned empty -> FileQueryScanNode.initSchemaParams did NOT exclude the partition columns from the file/decode set (num_of_columns_from_file + classifyColumn). Since paimon physically stores partition columns IN the ORC data file, the native OrcReader both DECODED dt/hh from the file AND APPENDED them from columnsFromPath -> a row-count double-fill (dt column rows=2 vs data block rows=1) that aborts the BE via DCHECK(block->rows()==col.column->size()) at vorc_reader.cpp:2638 (native ORC, intermittent under the random force_jni_scanner fuzz). Legacy PaimonScanNode.getPathPartitionKeys() returned [dt,hh] and drove BOTH the file-column exclusion AND the append from one source, so it never double-filled. Solution: emit the path_partition_keys scan-node property (lower-cased partition key names, matching the columnsFromPath keys and the Doris column names) in PaimonScanPlanProvider.getScanNodeProperties when the table is partitioned. This restores the legacy invariant — the BE excludes partition columns from the file decode set and appends them exactly once — for both the native ORC path (excluded from decode + appended from columnsFromPath) and the JNI path (projected out of required_fields + filled by _fill_columns_from_path). Mirrors the hive connector. The BE is unchanged. Tests: PaimonScanPlanProviderTest.getScanNodePropertiesEmitsPathPartitionKeysForPartitionedTable pins that a partitioned paimon table emits path_partition_keys. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…AWS-SDK interceptor cross-loader skew (CI 968994) - root cause: plugin bundles hadoop-aws+s3+sdk-core child-first but NOT s3-transfer-manager. The SPI resource software/amazon/awssdk/services/s3/execution.interceptors (+ its ApplyUserAgentInterceptor) lives only in s3-transfer-manager.jar. ChildFirstClassLoader found no child copy and fell back to the PARENT s3-transfer-manager, whose ApplyUserAgentInterceptor implements the PARENT sdk-core ExecutionInterceptor (a different Class than the child's) -> SdkClientException -> S3A broken -> 'no file io for scheme s3' -> 'Unknown database' cascade (swallowed at ExternalCatalog.buildDbForInit:914). - solution: bundle software.amazon.awssdk:s3-transfer-manager child-first (BOM-managed 2.29.52) so the resource + interceptor resolve against the child sdk-core. - fixes Class A: 6 s3 tests (test_paimon_s3/minio/schema_change/char_varchar_type/ full_schema_change/jdbc_catalog) + 18 'Unknown database' collateral. - verified: zip lib/ now bundles s3-transfer-manager-2.29.52.jar; dependency:tree clean. Runtime gate: docker enablePaimonTest=true. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cross-loader cast (CI 968994) - root cause: plugin did not bundle hadoop-huaweicloud, so OBSFileSystem resolved from the parent 'app' loader while the plugin's FileSystem is child-first -> 'OBSFileSystem cannot be cast to FileSystem' (paimon_base_filesystem obs:// branch). Same shape hadoop-aws already fixed for s3a. - solution: bundle com.huaweicloud:hadoop-huaweicloud (managed 3.1.1-hw-46, compile) child-first; the -hw-46 jar is a fat jar self-containing OBSFileSystem + the OBS SDK (com/obs/*), so OBS is self-consistent in one child-first jar. hadoop-common stays the plugin's direct depth-1 copy via Maven mediation (no duplicate FileSystem). Consistent with fe-core/hadoop-deps which already depend on the same artifact. - fixes Class B: paimon_base_filesystem. - verified: zip lib/ now bundles hadoop-huaweicloud-3.1.1-hw-46.jar; dependency:tree clean. Runtime gate: docker enablePaimonTest=true. - docs: plan-doc/task-list.md, plan-doc/fix-ab-packaging-design.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…amedTransport NoClassDefFound (CI 968994)
- root cause: paimon metastore=hive catalogs threw NoClassDefFoundError
org/apache/thrift/transport/TFramedTransport. paimon's RetryingMetaStoreClientFactory
reflects HiveMetaStoreClient (hive-metastore 2.3.7) constructor signatures, which reference
the thrift-0.9.x old-package TFramedTransport. Host libthrift 0.16.0 moved it to
.transport.layered, and RC-1 keeps org.apache.thrift parent-first (libthrift excluded from the
plugin) so the doris-gen TSerializer/TBase 0.16.0 path works -> the old-package class is
unsatisfiable. The two thrift consumers (doris-gen 0.16.0 vs HMS-client 0.9.x) cannot share the
original org.apache.thrift namespace in one loader.
- solution: new module fe-connector-paimon-hive-shade that bundles paimon-hive-connector-3.1 +
hive-metastore 2.3.7 + hive-common 2.3.9 + libthrift 0.9.3 and relocates org.apache.thrift ->
org.apache.doris.paimon.shaded.thrift (+ defensive it.unimi.dsi.fastutil relocation). The
connector depends on this shade instead of the raw hive deps. Mirrors the existing
hive-catalog-shade precedent. The doris-gen 0.16.0 thrift path stays parent-first, untouched.
- fixes Class C: test_create_paimon_table, test_paimon_statistics.
- verified (static, runtime gate is docker enablePaimonTest=true):
* shade jar: relocated org/apache/doris/paimon/shaded/thrift/transport/TFramedTransport present;
HiveMetaStoreClient references the relocated name (0 references to the original).
* plugin zip lib/: 0 genuine top-level org.apache.thrift .class (RC-1 preserved); no raw
paimon-hive-connector/libthrift/hive-metastore/hive-common/hive-shims jars; shade jar present;
single HiveConf.class; paimon-core still its own jar; HiveCatalogFactory SPI intact.
* UT: fe-connector-paimon 285/0/0 incl PaimonTableSerdeRoundTripTest (RC-1 guard) + PaimonCatalogFactoryTest.
- 2 build-config notes: shade filter drops META-INF/versions/** (paimon-hive fat bundle ships
Java-22 MR-jar classes shade's rewriter cannot parse; they are excluded parquet/jackson internals);
shaded-in deps marked <optional> so the connector plugin-zip does not re-bundle the raw jars.
- new module fe-connector-paimon-hive-shade; fe/fe-connector/pom.xml module registration;
fe-connector-paimon dependency swap. No production Java change. Design: plan-doc/fix-c-hms-thrift-design.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed by SPI cutover (CI 968994)
- root cause: PluginDrivenScanNode.getNodeExplainString is a full override that does NOT call
super (FileScanNode), so the SPI paimon scan path silently dropped explain lines the legacy
PaimonScanNode emitted: 'pushdown agg=COUNT (n)', the VERBOSE dataFileNum/deleteFileNum/
deleteSplitNum block, and 'paimonNativeReadSplits=<raw>/<total>'. The 5 tests are PURE DISPLAY
gaps — data queries return correct values; only the explain text was missing the lines
(plugindriven-explain-override-gap re-manifested for paimon).
- solution (paimon-gated so other plugin connectors stay byte-unchanged):
* SPI: ConnectorScanRange.getPushDownRowCount() (-1 default) + isNativeReadRange() (false default);
ConnectorScanPlanProvider.getDeleteFiles(TTableFormatFileDesc) (empty default).
* PaimonScanRange overrides the two getters (paimon.row_count / paimon.split).
* PaimonScanPlanProvider.appendExplainInfo emits paimonNativeReadSplits from synthetic count keys
the node injects; getDeleteFiles ports legacy PaimonScanNode.getDeleteFiles.
* FileScanNode: behavior-neutral extract-method appendBackendScanRangeDetail (verbatim VERBOSE block).
* PluginDrivenScanNode: accumulate native/total + pushdown-count in getSplits (pure statics);
override getDeleteFiles; emit 'pushdown agg' UNGATED (restores the line FileScanNode emits for
every other scan node), VERBOSE delete block paimon-gated, paimonNativeReadSplits paimon-only.
- fixes Class E: test_paimon_count, test_paimon_deletion_vector, test_paimon_deletion_vector_oss,
test_paimon_catalog_varbinary, test_paimon_catalog_timestamp_tz.
- tests (independently re-run, build cache disabled): PluginDrivenScanNodeExplainStatsTest 7/7,
PluginDrivenScanNodeDeleteFilesTest 4/4, PaimonScanExplainTest 9/9; existing
PluginDrivenScanNodePartitionCountTest 5/5 (no shared-node regression). Tests encode WHY
(the -1 sentinel survival, 0/N native accounting). Runtime gate: docker enablePaimonTest=true
comparison-mode run cross-checks the values vs .out.
- shared FileScanNode/PluginDrivenScanNode changes verified non-perturbing to es/jdbc/maxcompute/
iceberg/hive (extract is byte-identical; pushdown agg matches FileScanNode's unconditional emit).
Design: plan-doc/fix-e-explain-gap-design.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or build resolves hadoop-huaweicloud (CI 968994) - FIX-PAIMON-OBS-SELFCONTAINED (3c7adfe) added a com.huaweicloud:hadoop-huaweicloud dep to fe-connector-paimon, but that artifact (3.1.1-hw-46) is NOT in Maven Central / the Apache repos. fe-core resolves it via a <repository> id=huawei-obs-sdk it declares locally; the connector module does not inherit it (fe-connector / fe declare no repositories), so a clean-env FE build failed: 'hadoop-huaweicloud:jar:3.1.1-hw-46 was not found in https://repo.maven.apache.org/maven2'. (My earlier local build only passed because the jar was already cached in ~/.m2 from a full FE build.) - fix: declare the huawei-obs-sdk repository (https://repo.huaweicloud.com/repository/maven/huaweicloudsdk/) in fe-connector-paimon/pom.xml, mirroring fe-core; and scope the dep 'runtime' (mirrors fe-core — OBSFileSystem is loaded reflectively via the Hadoop FileSystem SPI, not referenced at compile time; plugin-zip.xml still bundles the runtime closure). - verified: removed hadoop-huaweicloud from ~/.m2, rebuilt non-offline -> re-fetched from huawei-obs-sdk (_remote.repositories), plugin zip still bundles hadoop-huaweicloud-3.1.1-hw-46.jar. Repo serves the jar (HTTP 200). Local mirror is mirrorOf=central, so the huawei repo is reached directly (as in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…via SPI-routed CatalogFactory (CI 968994) - PaimonMetadataOpsTest.beforeClass failed with 'No connector plugin loaded for catalog type paimon'. Pre-existing breakage (NOT from the CI-968994 packaging/explain fixes): 'paimon' was added to CatalogFactory.SPI_READY_TYPES by the SPI-framework cutover (5c32565), so CatalogFactory.createFromCommand('paimon', ...) now routes through the connector-plugin SPI and returns a PluginDrivenExternalCatalog — it throws when no plugin is installed in connector_plugin_root (the case in a plain fe-core UT), and even when loaded is not castable to the legacy (PaimonExternalCatalog) the test cast to. Either way beforeClass aborted the class. - fix: the test exercises the still-live legacy PaimonMetadataOps, so construct the legacy filesystem catalog directly (new PaimonFileExternalCatalog(...) + makeSureInitialized()) instead of through the SPI-routed factory (mirrors ExternalMetaCacheRouteResolverTest which constructs new PaimonExternalCatalog(...) directly). Dropped the now-unused CatalogFactory/CreateCatalogCommand imports. No production change. - verified: mvn -pl fe-core -am test -Dtest=PaimonMetadataOpsTest -Dmaven.build.cache.enabled=false -> Tests run: 6, Failures: 0, Errors: 0. (Only this test had the CatalogFactory-cast pattern.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… requested columns (CI 969249) Root cause: a single BE FATAL (Check failed: children.contains(table_column_name), table_schema_change_helper.h:166 <- vparquet_reader.cpp:488) on "SELECT * FROM test_paimon_spark.test_schema_change" aborted the whole BE for the rest of the run, cascading into ~47 "No backend available as scan node" collateral failures. FIX-SCHEMA-EVOLUTION (01b7642) added current_schema_id=-1 + history_schema_info, which switched BE from name-based file<->table matching onto the field-id path. That path keys the table-side StructNode by the -1/current entry's field names and then looks up each query slot (base_ctx->column_names) in it; a slot absent from the -1 entry trips the DCHECK. The connector built the -1 entry from an INDEPENDENT paimon-SDK read (fileStoreTable.schema()) — a different source than the Doris column list fe-core turns into the BE scan slots. When the two skew (this Spark table did ALTER TABLE ADD COLUMN after its last snapshot, so the resolved schema lagged the latest schema the slots come from) the added column was missing from the -1 entry -> abort. Legacy PaimonScanNode.doInitialize built the -1 entry from getTargetTable().getColumns() — the SAME list as the slots — so the names matched by construction and the lookup could never miss. Restore that invariant connector-side: buildSchemaEvolutionParam now keys the -1 entry off the requested `columns` via selectCurrentSchemaFields, matching each to a paimon DataField by name — the resolved (snapshot-pinned) schema wins on a name collision (time-travel + rename stay correct), with the fresh latest() schema as a fallback so an add-column-after-snapshot column is carried with its real field id (older files then fill NULL, the correct result). current_schema_id stays -1 (legacy sentinel). Fails loud if a requested column is in neither schema. +4 unit tests (add-column-after-snapshot, rename time-travel collision, fail-loud, empty-columns count scan). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…m catalogs over hdfs (CI 969249) Root cause: filesystem-metastore paimon catalogs on hdfs:// warehouses failed to create with org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs" (43x in fe.log), swallowed by ExternalCatalog.buildDbForInit into a misleading "Unknown database" (test_paimon_catalog_varbinary, test_catalog_upgrade_test). The plugin runs child-first and no longer carries an hdfs FileSystem impl: hadoop-common's service file registers only Local/viewfs/Har/Http(s), and FIX-PAIMON-HMS-THRIFT-SHADE (5ac8c30) made hive-common <optional> in fe-connector-paimon-hive-shade, so maven-shade dropped it AND its transitive hadoop-client-api — the prior carrier of DistributedFileSystem. HMS-flavor catalogs (thrift metadata) and filesystem-on-S3 (hadoop-aws) were unaffected, which is why only hdfs filesystem catalogs broke. Add org.apache.hadoop:hadoop-hdfs-client (runtime, ${hadoop.version}) — it carries DistributedFileSystem + the hdfs FileSystem service registration and reuses the plugin's single hadoop-common FileSystem (hadoop-common excluded to keep exactly one copy — no cross-loader split). Same self-contained child-first pattern as hadoop-aws/hadoop-huaweicloud. Verified on the assembled plugin zip: DistributedFileSystem carriers=1, FileSystem.class carriers=1, hdfs service entry present. Also corrects the now-false pom comment claiming hadoop-client-api is transitively bundled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent-core for buildRealDataSplit tests (CI 969249) The count-pushdown / native sub-split / cpp-reader serde tests in PaimonScanPlanProviderTest write a REAL local-filesystem paimon table (buildRealDataSplit) to produce a real DataSplit; paimon's writer/commit path references org.apache.hadoop.mapreduce.lib.input.FileInputFormat. That class reached the module transitively via hive-common until FIX-PAIMON-HMS-THRIFT-SHADE (5ac8c30) made hive-common <optional> in fe-connector-paimon-hive-shade, severing it and leaving 5 tests with NoClassDefFoundError: FileInputFormat (a pre-existing breakage, not from the FIX-PAIMON-SCHEMA-DICT-SLOTS / FIX-PAIMON-HDFS-CLIENT changes). Add hadoop-mapreduce-client-core at test scope only (version from dependencyManagement). The production read/planScan path does NOT touch FileInputFormat — paimon reads in CI 969249 succeeded without it and it is absent from the assembled plugin zip — so it must NOT be bundled into the plugin. Full paimon module suite now passes 297/0/0 (1 live-only skip). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…zedTable) native read A native read of a paimon $ro (read-optimized) system table aborted the whole BE with SIGSEGV. $ro resolves to a paimon ReadOptimizedTable, which WRAPS a FileStoreTable but is NOT instanceof FileStoreTable, so buildSchemaEvolutionParam skipped it and emitted no paimon.schema_evolution prop. With no history_schema_info, BE's gen_table_info_node_by_field_id fell into the legacy name-matching branch by_parquet_name(tuple_descriptor, ...), where PaimonParquetReader passes a still-null _tuple_descriptor (get_tuple_descriptor() is populated only later in _do_init_reader, after on_before_init_reader in the NVI sequence) and dereferenced it (table_schema_change_helper.cpp:94) -> SIGSEGV. Legacy PaimonScanNode.doInitialize set history_schema_info for ANY paimon table (incl. $ro) unconditionally, so BE always took the field-id path. Restore that parity FE-side (BE unchanged): resolveSchemaDictTable unwraps a ReadOptimizedTable to its base FileStoreTable (reloaded via the 2-arg base Identifier, auth-wrapped like resolveTable) and builds the dict from it; other non-FileStoreTable tables (metadata sys tables, which take the JNI path) still emit nothing as before. Test getScanNodePropertiesEmitsSchemaEvolutionForReadOptimizedSysTable: a real FileSystemCatalog table wrapped in a ReadOptimizedTable now emits the dict (current_schema_id=-1 + non-empty history). RED before the fix, GREEN after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gh ConnectorColumn desc t_ltz for a paimon catalog table returned the ts_ltz row with an empty DESC "Extra" column instead of WITH_TIMEZONE (the type timestamptz(3) was already correct). The DESC Extra column is Column.getExtraInfo() (IndexSchemaProcNode), set by legacy via Column.setWithTZExtraInfo() keyed on the SOURCE paimon type root TIMESTAMP_WITH_LOCAL_TIME_ZONE, independent of the enable.mapping.timestamp_tz flag. On the SPI path the schema flows PaimonConnectorMetadata.mapFields -> ConnectorColumn -> ConnectorColumnConverter.convertColumn -> Column, and ConnectorColumn had no field carrying that marker, so it was dropped at the SPI boundary. Carry the marker through the SPI: - ConnectorColumn: add withTimeZone field + withTimeZone() wither + isWithTimeZone() getter (added to equals/hashCode; public ctors unchanged). - PaimonConnectorMetadata.mapFields: mark the column when the source type root is TIMESTAMP_WITH_LOCAL_TIME_ZONE (regardless of the mapping flag). - ConnectorColumnConverter.convertColumn: re-apply setWithTZExtraInfo() when marked. - PluginDrivenExternalTable.toSchemaCacheValue: preserve the marker across the column-name remap branch. Scoped to the paimon source type (not a generic timestamptz-type rule) so other SPI connectors (jdbc_query TVF, maxcompute) and the hdfs TVF desc are unaffected, matching legacy. Tests: PaimonConnectorMetadataTest.getTableSchemaMarksLtzColumnsWithTimeZoneRegardlessOfMapping (both mapping states) and ConnectorColumnConverterTest.testWithTimeZoneColumnSetsExtraInfo / testPlainColumnHasNoExtraInfo. Full paimon suite 299/0F; checkstyle clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…enuine-null partition
Root cause: `<partcol> IS NULL` over a paimon table returned empty (EXPLAIN
partition=0/5) instead of the genuine-null row. Paimon renders a genuine-NULL
partition value as its partition.default-name sentinel (CoreOptions
.PARTITION_DEFAULT_NAME, default "__DEFAULT_PARTITION__") — show partitions shows
`category=__DEFAULT_PARTITION__`, distinct from the literal null/NULL/\N partitions.
The FE prune bridge PluginDrivenMvccExternalTable.toListPartitionItem built EVERY
partition value with new PartitionValue(v, false) (isNull=false, copied verbatim
from legacy PaimonUtil.toListPartitionItem), so the null partition was catalogued
as the literal string "__DEFAULT_PARTITION__". Nereids list pruning then matched
IS NULL against no null partition, pruned all of them, resolveRequiredPartitions
returned an empty list, and PluginDrivenScanNode.getSplits short-circuited to zero
splits. The native scan path was already correct (typed Java-null ->
serializePartitionValue null -> populateRangeParams isNull=true -> BE materializes
SQL NULL), which is why SELECT * returned the row but IS NULL did not. Latent in
master too (identical toListPartitionItem isNull=false + unchanged Nereids pruning;
paimon tests are docker-gated so it was never caught).
Fix (2 files):
- PaimonConnectorMetadata.listPartitions: read partition.default-name (the same way
partition.legacy-name is read) and, when a spec value equals it, render the
Doris-canonical ConnectorPartitionValues.HIVE_DEFAULT_PARTITION
("__HIVE_DEFAULT_PARTITION__") in the partition name. Checked BEFORE the legacy
DATE-format branch, which also fixes a latent Integer.parseInt(
"__DEFAULT_PARTITION__") crash for a null DATE partition.
- PluginDrivenMvccExternalTable.toListPartitionItem: derive isNull from
TablePartitionValues.HIVE_DEFAULT_PARTITION.equals(value), mirroring the sibling
TablePartitionValues.toListPartitionItem (the Doris-wide null convention; future
iceberg/hudi SPI reuse get it for free).
Keys off paimon's partition.default-name, NOT "\N"/"__HIVE_DEFAULT_PARTITION__":
the sentinel fix (4b2c219) established paimon does not reserve those (they are
real literal data). Safe because paimon planScan IGNORES requiredPartitions (split
selection is predicate-driven via the paimon SDK), so the translated name never
reaches split selection — it only drives the FE empty-list short-circuit; and no
ConnectorMetadata SPI method takes a partition name back, so the rendered name
stays FE-internal (getPartitionSnapshot looks up the same FE-built map).
Tests: PaimonConnectorMetadataPartitionTest +3 (string-null -> canonical sentinel,
custom partition.default-name honored with a literal __DEFAULT_PARTITION__ left
untouched, null-DATE renders sentinel instead of crashing) 9/9;
PluginDrivenMvccExternalTableTest +1 (testHiveDefaultSentinelBuildsNullPartitionKey
asserts the value builds a NULL/isNull partition key) 35/35. Both fail-before:
old connector appended the raw "__DEFAULT_PARTITION__"; old bridge produced a
non-null literal key. End-to-end fix on the live cluster not run (deploy reverted
per request); root cause was empirically confirmed on the cluster before the fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…LAIN line dropped by the SPI scan node PluginDrivenScanNode.getNodeExplainString overrides FileScanNode without calling super (it uses a custom TABLE/QUERY/PREDICATES format), so it dropped the parent's "inputSplitNum=N, totalFileSize=X, scanRanges=Y" line. Legacy PaimonScanNode inherited that line via super.getNodeExplainString, and test_paimon_predict asserts inputSplitNum=N (9/3/6/0 for various IN predicates); on the SPI scan path the line is absent (only partition=N/M and paimonNativeReadSplits=N/M are shown), so the EXPLAIN check fails. Re-emit the line byte-for-byte (including the (approximate) batch-mode prefix) from the same selectedSplitNum/totalFileSize/scanRangeLocations fields the inherited FileQueryScanNode.createScanRangeLocations already populates (selectedSplitNum = inputSplits.size()), placed immediately before partition=N/M to match FileScanNode ordering. Emitted UNCONDITIONALLY for every plugin connector — like the sibling partition=N/M and pushdown agg= lines already are — NOT gated on a hardcoded source name: the generic SPI scan node must stay connector-agnostic, and inputSplitNum is universal FileScanNode info, not connector-specific. Blast radius is safe: among SPI connectors only paimon asserts this line; maxcompute explain checks are contains-based (an added line before partition=N/M does not affect them) and the jdbc/es notContains checks target unrelated lines (QUERY:/date()/ES terminate_after). fe-core compiles clean. Runtime verification is the test_paimon_predict regression run (not deployed, per request). No unit test added: the full getNodeExplainString string requires mocking the entire desc->table->catalog chain plus the scan-range state, which the existing explain tests deliberately avoid (they cover the extracted static helpers); this change is a verbatim copy of FileScanNode's proven line. NB: a pre-existing instance of the same source-name smell remains — the VERBOSE per-backend block (if VERBOSE && ... && "paimon".equals(catalog.getType())) from FIX-PAIMON-EXPLAIN-GAP — left untouched here because de-gating it changes es/jdbc/max_compute VERBOSE output and warrants its own change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ats block dropped by the SPI scan node paimon_data_system_table.assertJniPath asserted 'explain verbose ...$binlog' contains 'SplitStat [type=JNI'. The earlier checks (paimonNativeReadSplits=0/N, native==0) passed — only the per-split block was missing. The legacy PaimonScanNode emits a VERBOSE-only PaimonSplitStats: / SplitStat [type=...] block; the SPI PluginDrivenScanNode + PaimonScanPlanProvider.appendExplainInfo ported only the paimonNativeReadSplits line, never the block (the test is unchanged from master, written against the legacy node). Fix, following the existing FIX-E synthetic-key pattern: - PluginDrivenScanNode injects __explain_verbose only when detailLevel==VERBOSE (connector-agnostic; does not branch on source name). - PaimonScanPlanProvider emits PaimonSplitStats: + one SplitStat [type=NATIVE|JNI] line per split (grouped NATIVE-first from the native/total counts), with the legacy >4 truncation. Exact per-DataSplit parity is not reconstructible on the SPI path (node keeps only counts; native files are re-split), but the split type — all the test checks — is faithful. PaimonScanExplainTest 13/13 (4 new, RED→GREEN); fe-core compiles; checkstyle 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SPI Paimon connector A filesystem-flavor Paimon catalog created with the documented minio.* keys (minio.endpoint / minio.access_key / minio.secret_key / ...) over an s3:// warehouse failed at `show databases` with: org.apache.paimon.fs.UnsupportedSchemeException: Could not find a file io implementation for scheme 's3' in the classpath. Root cause: PaimonCatalogFactory.applyStorageConfig — the fe-core-free port of legacy StorageProperties — ported the S3/OSS/COS/OBS canonical blocks but omitted MinIO. Legacy MinioProperties extends AbstractS3CompatibleProperties (schema "s3") and translates minio.* to fs.s3a.*, registering fs.s3.impl= S3AFileSystem. In the SPI connector a pure-minio.* catalog resolved every alias to null, so applyCanonicalS3Config early-returned and fs.s3.impl was never set, leaving Paimon's FileIO unable to resolve the s3 scheme. Fix: add applyCanonicalMinioConfig (gated on the minio. key prefix) that emits the shared S3A base via applyS3aBaseConfig. MinIO is S3A-compatible, so — unlike COS (cosn) / OBS (native) — it adds no extra impl keys. Aliases are ported verbatim from MinioProperties (minio.* first, with the shared s3.*/AWS_* fallbacks); the region defaults to us-east-1 and the connection tuning to 100/10000/10000, both per legacy MinioProperties (a dedicated block is required precisely because these defaults diverge from the S3 block's 50/3000/1000). The block is purely additive and no-ops for any catalog without a minio. key. Tests: 4 new unit tests in PaimonCatalogFactoryTest (RED->GREEN repro asserting fs.s3.impl is registered, defaults parity, explicit-region override, and a pure-S3 negative-parity guard). 60/60 pass; checkstyle 0 violations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.