Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits#20668
Fix int16 TOSA.TABLE LUT zeroed when output range uses <16 bits#20668christine-long-meta wants to merge 2 commits into
Conversation
Summary: `InsertTableOpsPass.generate_16_bit_table_values` builds the int16 `TOSA.TABLE` lookup for unary ops (sigmoid, tanh, ...). It computes `rshift = ceil(log2(max_table_value)) + 1 - 16` to fit the table into 16 signed bits, then does `lut_values >> rshift`, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0"). When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in `[0, 1]`; quantized with a small scale (e.g. `1/4096`), the largest table value is `4096` (13 bits), so `rshift = 13 - 16 = -3`. `lut_values >> -3` is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16 `TABLE` op with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate. Fix: clamp `rshift` to >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documented `rshift == 0` / `rescale_lshift == -7` case. The fix is general -- it covers any int16 `TABLE` op whose output range is small. Differential Revision: D107331163
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20668
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 926f6e5 with merge base fc408f8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@christine-long-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107331163. |
This PR needs a
|
Summary:
InsertTableOpsPass.generate_16_bit_table_valuesbuilds the int16TOSA.TABLElookup for unary ops (sigmoid, tanh, ...). It computesrshift = ceil(log2(max_table_value)) + 1 - 16to fit the table into 16 signed bits, then doeslut_values >> rshift, assuming the table fills ~16 bits (its own comment notes "for int16, rshift == 0").When the op's output range uses fewer than 16 bits this breaks. A sigmoid output is in
[0, 1]; quantized with a small scale (e.g.1/4096), the largest table value is4096(13 bits), sorshift = 13 - 16 = -3.lut_values >> -3is an undefined negative right-shift; on the host the shift count is masked and the entire table is zeroed, so the activation returns 0 for every input. This makes any int16TABLEop with a small output range (e.g. a sigmoid in a Squeeze-and-Excitation block) degenerate.Fix: clamp
rshiftto >= 0. When it would be negative the values already fit in int16, so no shift is needed; this restores the documentedrshift == 0/rescale_lshift == -7case. The fix is general -- it covers any int16TABLEop whose output range is small.Differential Revision: D107331163
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani