Skip to content

[MINOR][CORE] Add TaskContext JNI callback for reading Spark task attempt id from native#12435

Open
taiyang-li wants to merge 1 commit into
apache:mainfrom
taiyang-li:pr/task-context-jni-callback
Open

[MINOR][CORE] Add TaskContext JNI callback for reading Spark task attempt id from native#12435
taiyang-li wants to merge 1 commit into
apache:mainfrom
taiyang-li:pr/task-context-jni-callback

Conversation

@taiyang-li

Copy link
Copy Markdown
Contributor

Background

One of a small series of common-code changes to introduce a new backend — Bolt (ByteDance unified lakehouse analytics acceleration engine) — into the Gluten community. The series minimizes the delta to Gluten common code so Bolt can plug in cleanly, while leaving Velox and ClickHouse backends unaffected.

This PR

The Bolt backend needs the Spark task attempt id at the native layer for task-level identification (memory pool naming, spill directories, logs, etc.). Instead of threading the value through every JNI entry point (which would require changing RuntimeJniWrapper.createRuntime and the whole NativeMemoryManagerJniWrapper create/hold/release chain plus every backends Runtime / MemoryManager factory signature), this PR adds a small JNI callback that native code invokes on demand:

  • Java: org.apache.gluten.task.TaskContextJniWrapper#currentTaskAttemptId() returns TaskContext.get().taskAttemptId(), or -1L when there is no task context on the calling thread.
  • C++: gluten::getCurrentSparkTaskAttemptId() attaches the current thread to the JVM as a daemon on demand and calls the Java helper; class ref / method id cached in function-local statics on first use.

Since Sparks TaskContext is a per-thread ThreadLocal, this returns a meaningful value whenever the native call runs on an executor task thread — exactly when backends need it. Existing backends (Velox, ClickHouse) that do not query the task attempt id from native are unaffected.

…tempt id from native

Instead of extending every JNI entry point (createRuntime / MemoryManager
create/hold/release) to plumb the Spark task attempt id from Java down to
C++ as an extra parameter, expose a small callback surface that native code
uses on demand:

- Java side: org.apache.gluten.task.TaskContextJniWrapper#currentTaskAttemptId()
  reads TaskContext.get().taskAttemptId() on the current thread and returns
  -1 when there is no task context.
- C++ side: gluten::getCurrentSparkTaskAttemptId() attaches the current
  thread to the JVM as a daemon on demand and calls back into the Java
  helper via JNI. The class ref and method id are cached in function-local
  statics on first use.

Because Spark's TaskContext is a per-thread ThreadLocal, this returns a
meaningful value whenever the native call is running on an executor task
thread (or any thread inheriting that ThreadLocal), which is exactly when
backends need it.

No signature change to Runtime / MemoryManager / RuntimeJniWrapper /
NativeMemoryManagerJniWrapper. No behavior change for existing backends
(Velox, ClickHouse) that do not query the task attempt id from native.

Co-Authored-By: Aime <aime@bytedance.com>
Change-Id: I3185249796b0c396813dc39f54bd8e8b8589ca2a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant