Steps to reproduce
$ printf 'a\nb\nc\n' > A; printf 'a\nX\nc\n' > B
$ diffutils diff $'\xff--width=5' A B
thread 'main' panicked at src/params.rs:116:45:
called `Result::unwrap()` on an `Err` value: "\xFF--width=5"
$ echo $?
134
diff aborts (panic, exit 134) when given a non-UTF-8 argument whose lossy form ends in --width=<digits> — e.g. an argument with a leading invalid byte like $'\xff--width=5'.
Expected behavior
Match GNU: the unrecognized argument is a file operand; with three operands diff reports the extra operand and exits 2.
$ /usr/bin/diff $'\xff--width=5' A B
diff: extra operand 'B'
$ echo $?
2
Root cause
The --width regex lacks a start anchor, unlike the sibling --tabsize one:
// src/params.rs:62-63
let tabsize_re = Regex::new(r"^--tabsize=(?<num>\d+)$").unwrap(); // anchored — safe
let width_re = Regex::new(r"--width=(?P<long>\d+)$").unwrap(); // no leading ^
// src/params.rs:115-116
if width_re.is_match(param.to_string_lossy().as_ref()) { // matches lossy form
let param = param.into_string().unwrap(); // line 116: Err on non-UTF-8
to_string_lossy() maps invalid bytes to U+FFFD, so a non-UTF-8 argument whose tail is --width=N still matches; into_string() then fails on the real bytes.
Fix: anchor the regex at the start (^--width=…$, matching tabsize_re), and/or match on the bytes / avoid into_string().unwrap() so a non-UTF-8 argument falls through to the operand path. (The unanchored regex also makes diff xyz--width=5 silently accept a width option instead of treating it as a filename — same root cause, non-panic symptom.)
Found by our static analysis tooling.
Steps to reproduce
diffaborts (panic, exit 134) when given a non-UTF-8 argument whose lossy form ends in--width=<digits>— e.g. an argument with a leading invalid byte like$'\xff--width=5'.Expected behavior
Match GNU: the unrecognized argument is a file operand; with three operands diff reports the extra operand and exits 2.
Root cause
The
--widthregex lacks a start anchor, unlike the sibling--tabsizeone:to_string_lossy()maps invalid bytes toU+FFFD, so a non-UTF-8 argument whose tail is--width=Nstill matches;into_string()then fails on the real bytes.Fix: anchor the regex at the start (
^--width=…$, matchingtabsize_re), and/or match on the bytes / avoidinto_string().unwrap()so a non-UTF-8 argument falls through to the operand path. (The unanchored regex also makesdiff xyz--width=5silently accept a width option instead of treating it as a filename — same root cause, non-panic symptom.)Found by our static analysis tooling.