feature: Add validation layer Timing Checker for host-side API timing#480
Draft
MichalMrozek wants to merge 1 commit into
Draft
feature: Add validation layer Timing Checker for host-side API timing#480MichalMrozek wants to merge 1 commit into
MichalMrozek wants to merge 1 commit into
Conversation
Add a new validation-layer checker (ZEL_ENABLE_TIMING_CHECKER) that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics (call count, total, min, max, average in nanoseconds). The approach is inspired by the host-function timing in Intel's unitrace profiler, ported into the validation layer. For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (QueryPerformanceCounter on Windows, CLOCK_MONOTONIC_RAW elsewhere). All APIs are covered through generated ze/zes/zet/zer override headers, mirroring the certification checker. Three independently controlled output modes: - summary table logged at teardown (default when enabled) - ZEL_TIMING_CHECKER_CSV: export per-API stats to a PID-suffixed CSV - ZEL_TIMING_CHECKER_LIVE: log each call's duration as it happens The generated headers are reproducible from the new scripts/templates/validation/timing.h.mako template (wired into scripts/generate_code.py) on the next specification update. Device-side (GPU execution) timing is intentionally out of scope: the validation layer receives handles by value and the Epilogue runs after the driver call, so it cannot inject the timestamp events GPU timing requires. Adds README documentation, a CHANGELOG entry, and a transparency unit test asserting the checker does not alter API results. Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new validation-layer checker, enabled with
ZEL_ENABLE_TIMING_CHECKER=1, that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics (call count, total, min, max, average in nanoseconds). The design is inspired by the host-function timing in Intel's unitrace profiler, ported into the validation layer.For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (
QueryPerformanceCounteron Windows,clock_gettime(CLOCK_MONOTONIC_RAW)elsewhere). The measured span is dominated by the underlying driver call and is consistent across calls, making it suitable for relative host-cost analysis.All ze/zes/zet/zer APIs are covered through generated override headers, mirroring the existing certification checker.
Output modes (each independently controlled)
ZEL_ENABLE_TIMING_CHECKERZEL_TIMING_CHECKER_CSV=<path>ZEL_TIMING_CHECKER_LIVERequires
ZE_ENABLE_VALIDATION_LAYER=1and loader logging enabled for output to be visible.Scope
Device-side (GPU execution) timing is intentionally out of scope. Unlike unitrace, the validation layer receives handles by value and the Epilogue runs after the driver call, so it cannot inject the timestamp events that GPU timing requires. The checker is structured so device timing can be added later.
Code generation
The generated
generated/{ze,zes,zet,zer}_timing.hheaders are reproducible from the newscripts/templates/validation/timing.h.makotemplate, wired intoscripts/generate_code.py, so they regenerate on the next specification update.Changes
source/layers/validation/checkers/timing/(engine, registration, generated overrides, CMake)scripts/templates/validation/timing.h.mako+scripts/generate_code.pywiringcheckers/CMakeLists.txt, validation layerREADME.md,CHANGELOG.mdtest/loader_validation_layer.cpp+ CTest registrationTesting
Validated end-to-end on Intel GPU hardware (Linux, real
libze_intel_gpudriver) by running thezello_worldsample through a locally built loader and validation layer:ZEL_TIMING_CHECKER_CSVproduces a PID-suffixed CSV with headerFunction,Calls,TotalNs,MinNs,MaxNs,AvgNsand one row per API.ZEL_TIMING_CHECKER_LIVElogs each call's duration ([timing] zeInitDrivers 66196062 ns).zello_worldstill completes its compute/copy correctly ("completed execution!"), confirming results are unaffected.ValidationLayerTimingCheckerunit test passes against the null driver.The CHANGELOG entry is under an
Unreleasedheading; please assign a version on release.