[MINOR][CORE] Add TaskContext JNI callback for reading Spark task attempt id from native by taiyang-li · Pull Request #12435 · apache/gluten

taiyang-li · 2026-07-02T13:20:53Z

Background

One of a small series of common-code changes to introduce a new backend — Bolt (ByteDance unified lakehouse analytics acceleration engine) — into the Gluten community. The series minimizes the delta to Gluten common code so Bolt can plug in cleanly, while leaving Velox and ClickHouse backends unaffected.

This PR

The Bolt backend needs the Spark task attempt id at the native layer for task-level identification (memory pool naming, spill directories, logs, etc.). Instead of threading the value through every JNI entry point (which would require changing RuntimeJniWrapper.createRuntime and the whole NativeMemoryManagerJniWrapper create/hold/release chain plus every backends Runtime / MemoryManager factory signature), this PR adds a small JNI callback that native code invokes on demand:

Java: org.apache.gluten.task.TaskContextJniWrapper#currentTaskAttemptId() returns TaskContext.get().taskAttemptId(), or -1L when there is no task context on the calling thread.
C++: gluten::getCurrentSparkTaskAttemptId() attaches the current thread to the JVM as a daemon on demand and calls the Java helper; class ref / method id cached in function-local statics on first use.

Since Sparks TaskContext is a per-thread ThreadLocal, this returns a meaningful value whenever the native call runs on an executor task thread — exactly when backends need it. Existing backends (Velox, ClickHouse) that do not query the task attempt id from native are unaffected.

…tempt id from native Instead of extending every JNI entry point (createRuntime / MemoryManager create/hold/release) to plumb the Spark task attempt id from Java down to C++ as an extra parameter, expose a small callback surface that native code uses on demand: - Java side: org.apache.gluten.task.TaskContextJniWrapper#currentTaskAttemptId() reads TaskContext.get().taskAttemptId() on the current thread and returns -1 when there is no task context. - C++ side: gluten::getCurrentSparkTaskAttemptId() attaches the current thread to the JVM as a daemon on demand and calls back into the Java helper via JNI. The class ref and method id are cached in function-local statics on first use. Because Spark's TaskContext is a per-thread ThreadLocal, this returns a meaningful value whenever the native call is running on an executor task thread (or any thread inheriting that ThreadLocal), which is exactly when backends need it. No signature change to Runtime / MemoryManager / RuntimeJniWrapper / NativeMemoryManagerJniWrapper. No behavior change for existing backends (Velox, ClickHouse) that do not query the task attempt id from native. Co-Authored-By: Aime <aime@bytedance.com> Change-Id: I3185249796b0c396813dc39f54bd8e8b8589ca2a

taiyang-li force-pushed the pr/task-context-jni-callback branch from 1244485 to 696b750 Compare July 2, 2026 13:25

github-actions Bot added the VELOX label Jul 2, 2026

taiyang-li requested review from FelixYBW, WangGuangxin, exmy, lgbo-ustc and zhanglistar July 3, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MINOR][CORE] Add TaskContext JNI callback for reading Spark task attempt id from native#12435

[MINOR][CORE] Add TaskContext JNI callback for reading Spark task attempt id from native#12435
taiyang-li wants to merge 1 commit into
apache:mainfrom
taiyang-li:pr/task-context-jni-callback

taiyang-li commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

taiyang-li commented Jul 2, 2026

Background

This PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant