Skip to content

fix(dsl): compute cron wait at execution time, not at df.start() time#183

Open
Copilot wants to merge 3 commits into
mainfrom
copilot/bugfix-df-wait-for-schedule
Open

fix(dsl): compute cron wait at execution time, not at df.start() time#183
Copilot wants to merge 3 commits into
mainfrom
copilot/bugfix-df-wait-for-schedule

Conversation

Copilot AI commented May 27, 2026

Copy link
Copy Markdown
Contributor

Fixes #130.

df.wait_for_schedule() pre-computed wait_seconds at graph construction time. Any delay between df.start() and when the BGW actually runs the WAIT_SCHEDULE node caused the timer to fire early — potentially before the intended cron tick. Worse, in a recurring @> loop the offset was baked once and reused on every continue_as_new iteration, so after the first tick the wait collapsed to ~0 and the loop busy-spun.

Approach

The next cron tick is a function of "now", so it must be computed when the node executes, not at df.start() time. Inside the orchestration we read the current time via ctx.utc_now() — duroxide's deterministic clock, whose value is recorded in history and replayed verbatim — and then do pure cron math against it. This is fully replay-safe (the only non-determinism, the clock read, is the recorded syscall) and needs no extra activity.

This supersedes the earlier draft of this PR, which stored a target_timestamp at DSL time and added a compute_cron_wait activity. That approach was both unnecessary (ctx.utc_now() already gives a deterministic clock read inside the orchestration) and incorrect for recurring loops (the timestamp went stale across iterations).

Changes

  • src/dsl.rsdf.wait_for_schedule() now only validates the cron expression eagerly (so a bad expression still fails fast at df.start()) and stores just {"cron_expr": ...}; removed the DSL-time Utc::now() / wait_seconds computation.
  • src/orchestrations/execute_function_graph.rsexecute_wait_schedule_node reads ctx.utc_now(), computes the next cron tick from that instant, and schedules schedule_timer(next - now). A NOTE points at duroxide Add pg_durable.worker_role GUC and use -U postgres consistently #34 (absolute-deadline timer) for a future simplification to schedule_timer_until(next).
  • src/explain.rs — WAIT_SCHEDULE display is now WAIT '<cron>' (the precomputed (Ns) is gone, since the wait is no longer known at plan time).
  • src/lib.rs — unit test asserts the node config keeps cron_expr and contains neither wait_seconds nor target_timestamp.

Config shape change

Before:

{"cron_expr": "*/5 * * * *", "wait_seconds": 142}

After:

{"cron_expr": "*/5 * * * *"}

Copilot AI changed the title [WIP] Fix df.wait_for_schedule to compute wait at execution time fix(dsl): compute cron wait at execution time, not at df.start() time May 27, 2026
Copilot AI requested a review from pinodeca May 27, 2026 14:40
@pinodeca pinodeca marked this pull request as ready for review May 31, 2026 15:20
df.wait_for_schedule() previously baked the wait duration at df.start()
time via Utc::now(), which meant any delay between start and execution —
and critically every iteration of a recurring `@>` loop — woke at the
wrong moment (the stale, reused target busy-spun with wait=0 after the
first tick).

Now the DSL only validates the cron expression and stores it; the next
tick is computed inside the orchestration using duroxide's deterministic
clock (ctx.utc_now()) plus pure cron math, so it is replay-safe and
correct for both single-shot and recurring waits.

A NOTE references duroxide issue #34 (absolute-deadline timer) so this
can later be simplified to ctx.schedule_timer_until(next).
@pinodeca pinodeca force-pushed the copilot/bugfix-df-wait-for-schedule branch from 63cb0ca to 24b0046 Compare June 22, 2026 21:52
Adds 24_wait_for_schedule_exec_time.sql, which fails under the old
df.start()-time cron computation and passes with the new execution-time
computation. A df.sleep(30) before the wait introduces a start->execution
delay; the new code recomputes the next ':00' tick at execution time
(fires near the minute boundary, second ~= 0) while the old code reused a
fixed offset and would fire ~30s into the minute. Asserts the fire lands
before second 15 (the midpoint) to distinguish the two implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: df.wait_for_schedule computes cron wait at DSL time, not execution time

3 participants