Skip to content

Performance improve#2

Merged
b4prog merged 9 commits into
mainfrom
performance-improve
Jun 26, 2026
Merged

Performance improve#2
b4prog merged 9 commits into
mainfrom
performance-improve

Conversation

@b4prog

@b4prog b4prog commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Duplicate reports now show optional timing details for discovery, processing, and duplicate detection in verbose mode.
    • Source analysis now covers all supported file types in the current directory by default, with clearer extension filtering options.
  • Bug Fixes

    • Improved discovery to respect ignore rules while working outside a Git repository.
    • Expanded language handling to better avoid false duplicate matches in generated and config-style content.
  • Documentation

    • Updated CLI and development instructions to match the new behavior and workflow.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@b4prog, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 18 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e4371c8-0536-4e64-a13b-a515f78b837e

📥 Commits

Reviewing files that changed from the base of the PR and between 2678e44 and daf14a1.

📒 Files selected for processing (1)
  • src/lib.rs
📝 Walkthrough

Walkthrough

The PR adds parallel source discovery and file processing, records verbose timings for discovery, processing, and duplicate detection, updates duplicate-report rendering and language classification rules, and revises the README to describe the new behavior.

Changes

Duplicate Detection Pipeline

Layer / File(s) Summary
Dependencies and parallel processing
Cargo.toml, src/line.rs
Cargo.toml adds ignore and rayon and bumps the crate version, and process_source_files switches to Rayon while a test checks input order.
Recursive discovery
src/discovery.rs
discover_source_files now uses ignore::WalkBuilder, helper functions build SourceFile values from walk entries, and a test covers .gitignore handling without a Git repository.
Timing capture and report output
src/lib.rs, src/report.rs
run measures discovery, file processing, and duplicate detection in verbose mode, stores optional durations in DuplicateReport, and the report renderer prints sorted extensions and a verbose Timings section.
Language mitigation rules
src/language.rs
TypeScript/JavaScript mitigation patterns expand for generated markers, YAML is separated from JSON/TOML, and tests cover the updated line classifications.
README updates
README.md
Usage, duplicate-report, and development text are rewritten to describe the new default discovery behavior, verbose timings, and verification commands.

Sequence Diagram(s)

sequenceDiagram
  participant ReportDuplicate
  participant Discovery
  participant Processing
  participant DuplicateDetection
  participant DuplicateReport

  ReportDuplicate->>Discovery: measure discovery
  ReportDuplicate->>Processing: measure file processing
  ReportDuplicate->>DuplicateDetection: measure duplicate detection
  ReportDuplicate->>DuplicateReport: populate optional timings
  DuplicateReport->>DuplicateReport: render verbose timings and sorted extensions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • b4prog/CodeM8#1: Shares the same discovery traversal and language-pattern work, making it the closest code-level follow-up.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is related to the changes, but it is too vague to explain what was improved. Use a more specific title, such as improving duplicate detection performance with parallel discovery and processing.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch performance-improve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/language.rs`:
- Around line 154-155: The LANGUAGE_PATTERNS split removed JSON and TOML
registration, so update the language pattern list to keep the json and toml
extensions supported while still separating YAML. Make the fix in the
LANGUAGE_PATTERNS entries and any related registration blocks so
discovery/filtering continues to recognize .json and .toml files, and verify
classify_line still works for those types through explicit pattern registration
rather than relying on the unknown-extension fallback.

In `@src/lib.rs`:
- Around line 41-53: The discovery timing in the main flow is excluding the
Git-branch lookup done by git::changed_files_against_origin, so the reported
“Discovery” duration is incomplete for --git-branch -verbose. Adjust the timing
around the source discovery path in src/lib.rs so the time_result wrapper
includes both the Git-branch file lookup and discovery::discover_source_files,
likely by moving the git branch file resolution inside the same timed closure or
otherwise combining the two steps before recording discovery_duration. Keep the
existing identifiers time_result, discovery::discover_source_files, and
git::changed_files_against_origin together so the full discovery work is
measured consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d56e0d07-bfcd-434c-b811-db0d4be33063

📥 Commits

Reviewing files that changed from the base of the PR and between 19878c9 and 2678e44.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • Cargo.toml
  • README.md
  • src/discovery.rs
  • src/language.rs
  • src/lib.rs
  • src/line.rs
  • src/report.rs
📜 Review details
🔇 Additional comments (8)
src/language.rs (1)

20-31: LGTM!

Also applies to: 343-361

README.md (1)

43-49: LGTM!

Also applies to: 68-78, 103-121

Cargo.toml (2)

3-3: LGTM!


12-13: 📐 Maintainability & Code Quality

No Cargo.lock change needed. ignore and rayon are already present in the lockfile, so --locked should not fail.

src/line.rs (1)

4-16: LGTM!

Also applies to: 68-68, 103-128

src/discovery.rs (1)

4-6: LGTM!

Also applies to: 40-129, 237-250

src/lib.rs (1)

14-14: LGTM!

Also applies to: 63-103, 167-169, 213-216

src/report.rs (1)

3-23: LGTM!

Also applies to: 44-70, 100-116, 128-128, 148-148, 180-199, 233-233, 247-270

Comment thread src/language.rs
Comment thread src/lib.rs
@b4prog b4prog merged commit 56b5970 into main Jun 26, 2026
3 checks passed
@b4prog b4prog deleted the performance-improve branch June 26, 2026 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant