Skip to content

feature: Add validation layer Timing Checker for host-side API timing#480

Draft
MichalMrozek wants to merge 1 commit into
oneapi-src:masterfrom
MichalMrozek:feature/validation-layer-timing-checker
Draft

feature: Add validation layer Timing Checker for host-side API timing#480
MichalMrozek wants to merge 1 commit into
oneapi-src:masterfrom
MichalMrozek:feature/validation-layer-timing-checker

Conversation

@MichalMrozek

@MichalMrozek MichalMrozek commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a new validation-layer checker, enabled with ZEL_ENABLE_TIMING_CHECKER=1, that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics (call count, total, min, max, average in nanoseconds). The design is inspired by the host-function timing in Intel's unitrace profiler, ported into the validation layer.

For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (QueryPerformanceCounter on Windows, clock_gettime(CLOCK_MONOTONIC_RAW) elsewhere). The measured span is dominated by the underlying driver call and is consistent across calls, making it suitable for relative host-cost analysis.

All ze/zes/zet/zer APIs are covered through generated override headers, mirroring the existing certification checker.

Output modes (each independently controlled)

Variable Effect
ZEL_ENABLE_TIMING_CHECKER Enable; per-API summary table logged at teardown
ZEL_TIMING_CHECKER_CSV=<path> Also export per-API stats to a CSV (process id appended)
ZEL_TIMING_CHECKER_LIVE Also log each call's duration as it happens

Requires ZE_ENABLE_VALIDATION_LAYER=1 and loader logging enabled for output to be visible.

Scope

Device-side (GPU execution) timing is intentionally out of scope. Unlike unitrace, the validation layer receives handles by value and the Epilogue runs after the driver call, so it cannot inject the timestamp events that GPU timing requires. The checker is structured so device timing can be added later.

Code generation

The generated generated/{ze,zes,zet,zer}_timing.h headers are reproducible from the new scripts/templates/validation/timing.h.mako template, wired into scripts/generate_code.py, so they regenerate on the next specification update.

Changes

  • New checker: source/layers/validation/checkers/timing/ (engine, registration, generated overrides, CMake)
  • New template scripts/templates/validation/timing.h.mako + scripts/generate_code.py wiring
  • checkers/CMakeLists.txt, validation layer README.md, CHANGELOG.md
  • Transparency unit test in test/loader_validation_layer.cpp + CTest registration

Testing

Validated end-to-end on Intel GPU hardware (Linux, real libze_intel_gpu driver) by running the zello_world sample through a locally built loader and validation layer:

  • Summary table is emitted at teardown, sorted by total time, e.g.:
    ==== Level Zero Host API Timing (ns) ====
    Function                                  Calls    Total      Min        Max        Avg
    zeInitDrivers                             2        65581869   666        65581203   32790934
    zeCommandListCreateImmediate              1        2783378    2783378    2783378    2783378
    ...
    
  • ZEL_TIMING_CHECKER_CSV produces a PID-suffixed CSV with header Function,Calls,TotalNs,MinNs,MaxNs,AvgNs and one row per API.
  • ZEL_TIMING_CHECKER_LIVE logs each call's duration ([timing] zeInitDrivers 66196062 ns).
  • Transparency: with the checker enabled, zello_world still completes its compute/copy correctly ("completed execution!"), confirming results are unaffected.
  • The ValidationLayerTimingChecker unit test passes against the null driver.
  • The validation layer and tests also build and link cleanly on Windows.

The CHANGELOG entry is under an Unreleased heading; please assign a version on release.

Add a new validation-layer checker (ZEL_ENABLE_TIMING_CHECKER) that
measures the host-side (CPU) duration of every Level Zero API call and
aggregates per-API statistics (call count, total, min, max, average in
nanoseconds). The approach is inspired by the host-function timing in
Intel's unitrace profiler, ported into the validation layer.

For each API the checker stamps a high-resolution monotonic timestamp in
the Prologue and reads it again in the Epilogue (QueryPerformanceCounter
on Windows, CLOCK_MONOTONIC_RAW elsewhere). All APIs are covered through
generated ze/zes/zet/zer override headers, mirroring the certification
checker.

Three independently controlled output modes:
- summary table logged at teardown (default when enabled)
- ZEL_TIMING_CHECKER_CSV: export per-API stats to a PID-suffixed CSV
- ZEL_TIMING_CHECKER_LIVE: log each call's duration as it happens

The generated headers are reproducible from the new
scripts/templates/validation/timing.h.mako template (wired into
scripts/generate_code.py) on the next specification update.

Device-side (GPU execution) timing is intentionally out of scope: the
validation layer receives handles by value and the Epilogue runs after
the driver call, so it cannot inject the timestamp events GPU timing
requires.

Adds README documentation, a CHANGELOG entry, and a transparency unit
test asserting the checker does not alter API results.

Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant