PYTHON-5903 : Retry attempts should share a single stable operation_id by shrey-mongo · Pull Request #2901 · mongodb/mongo-python-driver

shrey-mongo · 2026-06-29T15:26:07Z

PYTHON-5903: Retry attempts should share a single stable operation_id

Changes in this PR

Command-monitoring events carry an operation_id to correlate the events of one logical operation. For single-document CRUD and cursor operations, each retry attempt got a different operation_id (it defaulted to the per-attempt request_id), so consumers couldn't group a retried operation's events. Bulk writes already shared one across batches.

_ClientConnectionRetryable mints the operation_id once and stamps it on the checked-out connection (conn.op_id) on every attempt
Connection gains an op_id attribute, reset on check-in.
pool.command and server.run_operation source op_id from the connection and pass it to command_runner.run_command / run_cursor_command, which thread it through to the publish_command_* events.

Test Plan

New test/asynchronous/test_operation_id_retry.py forces multiple retries via a failCommand failpoint across a read/write matrix and asserts all command events share one operation_id; verifies non-retryable multi-writes don't retry.
Ran the new tests plus the existing command-monitoring and retryable reads/writes suites against a single-node replica set which passed.

Checklist

Checklist for Author

Did you update the changelog (if necessary)?
Is there test coverage?
Is any followup work tracked in a JIRA ticket? If so, add link(s).

Checklist for Reviewer

Does the title of the PR reference a JIRA Ticket?
Do you fully understand the implementation? (Would you be comfortable explaining how this code works to someone else?)
Is all relevant documentation (README or docstring) updated?

NoahStapp · 2026-06-29T16:46:08Z

Please fill out the pull request template completely.

Copilot

Pull request overview

This PR ensures command-monitoring operation_id remains stable across retry attempts for retryable reads/writes and relevant cursor operations, improving APM correlation for a single logical operation.

Changes:

Mint a single per-operation operation_id in the retryable-operation wrapper and stamp it onto the checked-out connection for each attempt.
Thread the connection’s op_id through pool/server command execution into command_runner so published command-monitoring events use it.
Add sync + async integration tests that force multiple retries via failCommand and assert all related command events share one operation_id.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`tools/synchro.py`	Registers the new test file in the synchro conversion list.
`test/test_operation_id_retry.py`	Sync integration coverage asserting `operation_id` stability across retries (and no retry for non-retryable multi-writes).
`test/asynchronous/test_operation_id_retry.py`	Async equivalent of the retry `operation_id` stability tests.
`pymongo/synchronous/server.py`	Passes `conn.op_id` into cursor command execution for APM publishing.
`pymongo/synchronous/pool.py`	Adds `Connection.op_id`, forwards it into `run_command`, and clears it on check-in.
`pymongo/synchronous/mongo_client.py`	Mints/stabilizes per-operation ids across retries and stamps them on checked-out connections.
`pymongo/synchronous/command_runner.py`	Accepts/threads `op_id` through `run_command` / `run_cursor_command` into `_run_command`.
`pymongo/asynchronous/server.py`	Async counterpart: passes `conn.op_id` for cursor command execution.
`pymongo/asynchronous/pool.py`	Async counterpart: adds/forwards/clears `op_id` on connections.
`pymongo/asynchronous/mongo_client.py`	Async counterpart: stable per-operation id and per-attempt stamping onto connections.
`pymongo/asynchronous/command_runner.py`	Async counterpart: threads `op_id` through command execution to APM event publishing.

shrey-mongo · 2026-06-29T17:12:33Z

Updated description to include full template

codecov-commenter · 2026-06-29T18:21:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

NoahStapp · 2026-06-29T18:57:01Z

+            "data": {
+                "failCommands": [command_name],
+                "closeConnection": True,
+                "appName": _APP_NAME,


This class needs to add the @async_client_context.require_failCommand_appName annotation to safely use appName.

NoahStapp · 2026-06-29T19:02:21Z

This needs to be updated to use the new operation_id, if present.

NoahStapp · 2026-06-30T15:05:07Z

There are merge conflicts, please resolve.

shrey-mongo · 2026-06-30T16:38:12Z

I fixed the merge conflict and the typing error that was causing the workflow to fail

NoahStapp · 2026-07-01T14:31:41Z

            func=func,
        )

-    def require_failCommand_appName(self, func):


What's the purpose of this change?

It was needed to fix an untyped-decorator mypy error that was occurring in my test. This matches the signature of require_failCommand_fail_point.

You don't need both the require_failCommand_fail_point and require_failCommand_appName annotations, the latter is a strict subset of the former.

NoahStapp · 2026-07-01T14:37:58Z

        # For gossiping $clusterTime from the connection handshake to the client.
        self._cluster_time = None
+        # Stable operation id for the operation currently using this connection.
+        self.op_id: Optional[int] = None


I don't think we want to track op_id on connections. It introduces more risk of accidental inclusion or staleness on operations like auth compared to explicitly passing it through each operation. What about passing op_id as a kwarg on Connection.command() and on _Query/_GetMore where operations are actually initiated?

shrey-mongo requested a review from a team as a code owner June 29, 2026 15:26

shrey-mongo requested a review from aclark4life June 29, 2026 15:26

shrey-mongo force-pushed the shrey-mongo/PYTHON-5903 branch 2 times, most recently from 247b26f to 682dbac Compare June 29, 2026 15:56

NoahStapp requested review from NoahStapp and Copilot and removed request for aclark4life June 29, 2026 16:44

Copilot started reviewing on behalf of NoahStapp June 29, 2026 16:44 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

NoahStapp requested changes Jun 29, 2026

View reviewed changes

shrey-mongo requested a review from NoahStapp June 30, 2026 15:04

fixed merge conflict

75f4e21

shrey-mongo force-pushed the shrey-mongo/PYTHON-5903 branch from 5546477 to 75f4e21 Compare June 30, 2026 15:46

NoahStapp requested changes Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

shrey-mongo commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR

Test Plan

Checklist

Checklist for Author

Checklist for Reviewer

Uh oh!

NoahStapp commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

shrey-mongo commented Jun 29, 2026

Uh oh!

codecov-commenter commented Jun 29, 2026

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp commented Jun 30, 2026

Uh oh!

shrey-mongo commented Jun 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shrey-mongo commented Jun 29, 2026 •

edited

Loading