fix: classify merged gitlab MRs as MERGE_REQUEST_MERGED (CM-1298)#4271
Conversation
API ingestion was emitting MERGE_REQUEST_CLOSED for merge requests with a merged_at timestamp, so MERGE_REQUEST_MERGED activities never landed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Pull request overview
This PR fixes GitLab API stream ingestion classification so merge requests that have a merged_at timestamp emit MERGE_REQUEST_MERGED (instead of incorrectly emitting MERGE_REQUEST_CLOSED), aligning API-stream behavior with the existing merged-handling in processData and webhook ingestion.
Changes:
- Update
handleMergeRequestsStreamto emitGitlabActivityType.MERGE_REQUEST_MERGEDwhenitem.data.merged_atis present.
Comments suppressed due to low confidence (1)
services/libs/integrations/src/integrations/gitlab/processStream.ts:199
- Merged merge requests typically have both
merged_atandclosed_atset. With this change, merged MRs will emitMERGE_REQUEST_MERGEDand then also fall through to theclosed_atbranch, producing an extraMERGE_REQUEST_CLOSEDactivity. IfMERGE_REQUEST_CLOSEDis meant to represent “closed without merging”, the closed branch should beelse if(or otherwise gated) so it doesn’t run for merged MRs.
type: GitlabActivityType.MERGE_REQUEST_MERGED,
projectId: data.projectId,
pathWithNamespace: data.pathWithNamespace,
})
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…M-1298) Generalize the gerrit one-off cleanup so it can target any platform and any set of activity types via --platform / --types / --before CLI args. Used to purge mislabeled gitlab merge_request-closed rows so re-ingestion can recreate them with the correct type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
b2a7c4a to
f408ec7
Compare
…ect PG delete (CM-1298) Ports safety improvements proposed in #4080 onto the parameterized script: - Split into separate Tinybird (unquoted) and Postgres (pg-promise params, "updatedAt" double-quoted) filter builders so each store gets its own dialect. - Add cheap pre-flight Tinybird count() on both datasources to show blast radius before any destructive action. - Interactive confirmation prompt with --yes / -y bypass for non-interactive runs. - Switch PG cleanup to direct chunked DELETE (fetch matching IDs from PG, delete by PK) rather than streaming IDs from Tinybird. Decouples PG cleanup throughput from Tinybird. - Extend Tinybird job wait timeout from 1h to 6h for large bulk deletes. - Persist result JSON immediately after triggering TB jobs so the job IDs survive a wait timeout. - Docstring note that derived MVs are not cascaded by raw datasource deletes. Supersedes #4080. Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6c07a9c. Configure here.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>

Summary
MERGE_REQUEST_CLOSEDfor merge requests with amerged_attimestamp, soMERGE_REQUEST_MERGEDactivities never landed in the database. Switches the API path to emitMERGE_REQUEST_MERGEDfor merged MRs;MERGE_REQUEST_CLOSEDcontinues to cover MRs closed without merging. Webhook path was already correct.cleanup-gerrit-activities.ts→cleanup-activities-by-platform-and-type.ts) so it accepts--platform,--types, and optional--beforeCLI args. We'll use it to purge the mislabeledgitlab merge_request-closedrows in Postgres and Tinybird so re-ingestion can recreate them with the correct type.count()on both datasources before any destructive action — shows blast radius.--yes/-ybypass for non-interactive runs."updatedAt"double-quoted) vs Tinybird (unquoted ClickHouse). PG cleanup now uses a direct chunked DELETE driven by the PG filter itself, decoupled from Tinybird query throughput.Fixes CM-1298.
Supersedes #4080.
Test plan
services/libs/integrationsandservices/apps/script_executor_worker.--dry-runmode against staging and confirm the Tinybird row counts match expectations.MERGE_REQUEST_MERGEDactivities appear with timestamp matchingmerged_at.MERGE_REQUEST_CLOSED.🤖 Generated with Claude Code
Note
Medium Risk
The GitLab type fix is low risk; the new operational script performs irreversible bulk deletes in Postgres and Tinybird, so mis-specified filters or skipped confirmation could cause significant data loss.
Overview
GitLab ingestion fix: API polling in
processStream.tsnow emitsMERGE_REQUEST_MERGEDwhen a merge request hasmerged_at, instead of mislabeling those rows asMERGE_REQUEST_CLOSED. MRs with onlyclosed_atstill useMERGE_REQUEST_CLOSED, aligning the API path with existing webhook handling andprocessDataparsers.Cleanup tooling: The hard-coded Gerrit cleanup script is removed and replaced by
cleanup-activities-by-platform-and-type.ts, wired via a newpnpmscript. Operators pass--platform,--types, and optional--segment-id,--before,--dry-run,--yes, and--tb-tokento purge matching rows from PostgresactivityRelationsand Tinybirdactivities/activityRelations.Safety and behavior changes in the new script: Pre-flight Tinybird
count()on both datasources, interactive confirmation (skippable with--yes), validated CLI inputs, separate Postgres (parameterized) vs Tinybird (interpolated) filter builders, Postgres deletes via chunked ID fetch/delete on the PG filter (not Tinybird-driven batches), 6h Tinybird job wait with results JSON written before the wait so job IDs survive timeouts, and documentation that raw datasource deletes do not cascade to materialized views.Reviewed by Cursor Bugbot for commit 24e4b7e. Bugbot is set up for automated code reviews on this repo. Configure here.