Skip to content

Case ignore#3272

Merged
larsoner merged 8 commits into
codespell-project:masterfrom
vEnhance:case-ignore
Jan 9, 2024
Merged

Case ignore#3272
larsoner merged 8 commits into
codespell-project:masterfrom
vEnhance:case-ignore

Conversation

@vEnhance

@vEnhance vEnhance commented Jan 3, 2024

Copy link
Copy Markdown
Contributor

Starting on #1578, WIP

@DimitriPapadopoulos

Copy link
Copy Markdown
Collaborator

#3270 makes all typos (but not suggested fixes) lowercase. Does your implantation require #3270 as a prior?

@vEnhance

vEnhance commented Jan 3, 2024

Copy link
Copy Markdown
Contributor Author

#3270 makes all typos (but not suggested fixes) lowercase. Does your implantation require #3270 as a prior?

It should be independent. The implementation of this PR only affects the reading of -I and -L options and leaves everything else the same.

In fact, the build_dict function that parses the built-in dictionaries currently reads (this is not changed by this PR):

def build_dict(
    filename: str,
    misspellings: Dict[str, Misspelling],
    ignore_words: Set[str],
) -> None:
    with open(filename, encoding="utf-8") as f:
        translate_tables = [(x, str.maketrans(x, y)) for x, y in alt_chars]
        for line in f:
            [key, data] = line.split("->")
            # TODO for now, convert both to lower. Someday we can maybe add
            # support for fixing caps.
            key = key.lower()
            data = data.lower()
            if key not in ignore_words:
                add_misspelling(key, data, misspellings)
            # generate alternative misspellings/fixes
            for x, table in translate_tables:
                if x in key:
                    alt_key = key.translate(table)
                    alt_data = data.translate(table)
                    if alt_key not in ignore_words:
                        add_misspelling(alt_key, alt_data, misspellings)

that is, when reading the dictionaries included with codespell, actually everything is already converted into lowercase anyway.

@vEnhance vEnhance marked this pull request as ready for review January 3, 2024 19:10
Comment thread codespell_lib/_codespell.py
Comment thread codespell_lib/_codespell.py Outdated
Comment thread codespell_lib/_codespell.py Outdated
DimitriPapadopoulos

This comment was marked as resolved.

@larson

This comment was marked as resolved.

@DimitriPapadopoulos

Copy link
Copy Markdown
Collaborator

@larsoner Would you like to have a look?

@larsoner

larsoner commented Jan 8, 2024

Copy link
Copy Markdown
Member

Code changes look reasonable. There is some risk that someone with custom dictionaries will get different behavior with this PR, though. Safest would be to add a switch to make this opt-in like --ignore-words-case-sensitive or similar. Less safe but maybe useful (or maybe YAGNI) would be to add a --ignore-words-case-insensitive that could get back the old behavior, but people can also get the old behavior just by modifying their dictionaries. I think I can live with the backward incompatible change but wanted to make sure we thought about it first!

@vEnhance

vEnhance commented Jan 8, 2024

Copy link
Copy Markdown
Contributor Author

There is some risk that someone with custom dictionaries will get different behavior with this PR, though.

So I believe the changes in this PR should only affect words passed by config or command line options, which is why I stated in #3272 (comment) this was independent of #3270. Similarly, there should be no backwards-incompatability with custom dictionaries either, because those are read using build_dict which has remained unchanged.

The only users who should see any changes are people who have specified, say, something like SEH in -I which, as mentioned previously (e.g. #1578 (comment)), actually did nothing. I can't actually imagine any situations where someone would add a word into an ignore list and want/expect it to have no effect.

Please let me know if I misunderstood.

@larsoner larsoner merged commit 87edf05 into codespell-project:master Jan 9, 2024
@larsoner

larsoner commented Jan 9, 2024

Copy link
Copy Markdown
Member

Thanks @vEnhance !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants