Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Documentation/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
TECH_DOCS += technical/paint-down-to-common
TECH_DOCS += technical/parallel-checkout
TECH_DOCS += technical/partial-clone
TECH_DOCS += technical/platform-support
Expand Down
1 change: 1 addition & 0 deletions Documentation/technical/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ articles = [
'multi-pack-index.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
'paint-down-to-common.adoc',
'parallel-checkout.adoc',
'partial-clone.adoc',
'platform-support.adoc',
Expand Down
149 changes: 149 additions & 0 deletions Documentation/technical/paint-down-to-common.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
Merge-Base Computation and paint_down_to_common()
==================================================

The function `paint_down_to_common()` in `commit-reach.c` computes merge
bases by walking the commit graph backwards from two sets of tips and
finding where their ancestry meets.

Use cases
---------

Computing merge bases is used in two different ways:

1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
`merge`, `rebase`). A merge base is a common ancestor that is
not itself an ancestor of another common ancestor.

2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
--is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
an ancestor of commit B?" If a common ancestor equals one of the
inputs, that input is necessarily the only merge base -- no other
common ancestor can be both as recent and not an ancestor of it.

Both use cases share the same algorithm and implementation.

Algorithm
---------

Given a commit `one` and a set of commits `twos[]`, the walk paints
commits with two colors:

- PARENT1: reachable from `one`
- PARENT2: reachable from any commit in `twos[]`

The walk uses a priority queue ordered by generation number (falling
back to commit date when generation numbers are unavailable). Each
step dequeues the highest-priority commit (this is when we say a
commit is "visited") and propagates its paint flags to its parents,
enqueuing them if they gained new flags. When a commit receives
both PARENT1 and PARENT2, it is a merge-base candidate. A candidate
gains the STALE flag so its ancestors propagate staleness -- any
deeper common ancestor is necessarily redundant.

INFINITY and finite generation regions
--------------------------------------

The commit-graph stores a generation number for each commit. Commits
not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
graph is closed under reachability: if a commit is in the graph, all
its ancestors are too. This partitions the commit graph into two regions:

....
+---------------------------------------+
| INFINITY region |
| generation = INFINITY |
| queue order: heuristic (commit date) |
+---------------------------------------+
|
v
+---------------------------------------+
| Finite region |
| generation = finite |
| queue order: topological |
+---------------------------------------+
....

When the commit-graph is enabled, the INFINITY region is typically
very small -- it only contains commits added since the last
commit-graph refresh.

All reachable INFINITY-generation commits are visited before any
finite-generation commit, because INFINITY is larger than any finite
value. Once the walk crosses into the finite region, it stays there.

In the finite region, generation ordering guarantees topological
traversal: children are always visited before their parents. This
means that paint on already-visited commits is final -- no future
traversal step can add paint to them.

In the INFINITY region, commit-date ordering can violate this: a
parent with a later date can be visited before a child with an earlier
date. Paint flags are therefore NOT final at visit time, and a
commit visited with only one side's paint may later gain the other.

Paint flags are only added, never removed. Since each flag can be set
at most once per commit, the number of times a commit can be
re-enqueued is bounded by the number of flag transitions.

Termination
-----------

The walk tracks the number of commits of each type in the queue
(PARENT1-only, PARENT2-only, pending merge-base). The main loop
ends when one of the following conditions holds:

1. The queue is empty.
2. The queue contains only stale entries.
3. Generation cutoff: the dequeued commit's generation is below
a caller-supplied `min_generation` threshold.
4. Single result: the caller only needs one merge base, one has
been found, and the walk has entered the finite-generation
region.
5. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
remain in the queue, no pending merge-base candidates exist,
and the walk has entered the finite-generation region.

Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
Once all queued entries are stale, no new merge-base candidates can
be discovered -- that requires at least one non-stale commit from
each side meeting. Continuing the walk could still invalidate
existing candidates by proving one is an ancestor of another, but
`remove_redundant()` handles that as a post-processing step, so it
is safe to exit early.

Side-exhaustion condition
~~~~~~~~~~~~~~~~~~~~~~~~~
A new merge-base requires commits from both sides to meet. When one
side's exclusive counter reaches zero and there are no pending
merge-base candidates, no future traversal step can produce a new
candidate.

This optimization only activates in the finite-generation region
where topological ordering holds. In that region, children are
always visited before parents, so paint flags are final at visit
time and an exhausted side cannot reappear. In the INFINITY region,
commit-date ordering can violate this guarantee, so the check is
skipped.

Generation cutoff
~~~~~~~~~~~~~~~~~
Some callers (notably `remove_redundant()`) supply a `min_generation`
threshold -- the minimum generation of the input commits. No merge
base can have a generation below this threshold, so the walk
terminates as soon as it dequeues such a commit.

Single result
~~~~~~~~~~~~~
When only one merge base is needed and the walk is in the
finite-generation region, the first candidate found is necessarily
the highest-generation common ancestor. No remaining commit in the
queue can be a descendant of this candidate (generation ordering
guarantees children are visited first), so it cannot be redundant
and the walk can stop immediately.

Related documentation
---------------------

- `Documentation/technical/commit-graph.adoc` -- generation numbers
and the reachability closure property.
147 changes: 110 additions & 37 deletions commit-reach.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "tag.h"
#include "commit-reach.h"
#include "ewah/ewok.h"
#include "trace2.h"

/* Remember to update object flag allocation in object.h */
#define PARENT1 (1u<<16)
Expand Down Expand Up @@ -78,68 +79,139 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
queue->max_nonstale = NULL;
}

static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
struct commit *c)
/*
* Priority queue with per-side commit counters for paint_down_to_common().
* Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
* PARENT2-only, or both (a pending merge-base candidate).
*/
struct paint_state {
struct prio_queue queue;
size_t parent1_count;
size_t parent2_count;
size_t mb_candidate_count;
timestamp_t min_generation;
timestamp_t last_gen;
};

static void paint_count_update(struct paint_state *state,
unsigned flags, int delta)
{
if (c->object.flags & ENQUEUED)
return;
c->object.flags |= ENQUEUED;
nonstale_queue_put(queue, c);
switch (flags & (PARENT1 | PARENT2 | STALE)) {
case PARENT1:
state->parent1_count += delta;
break;

case PARENT2:
state->parent2_count += delta;
break;

case PARENT1 | PARENT2:
state->mb_candidate_count += delta;
break;

case PARENT1 | PARENT2 | STALE:
break;

default:
BUG("unexpected paint state");
}
}

static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
static void paint_queue_put(struct paint_state *state,
struct commit *c, unsigned add_flags)
{
struct commit *commit = nonstale_queue_get(queue);
unsigned old_flags = c->object.flags;
c->object.flags |= add_flags;

if (old_flags & ENQUEUED) {
paint_count_update(state, old_flags, -1);
paint_count_update(state, c->object.flags, 1);
} else {
c->object.flags |= ENQUEUED;
prio_queue_put(&state->queue, c);
paint_count_update(state, c->object.flags, 1);
}
}

/*
* Dequeue the next commit for the paint walk, or return NULL when
* no more merge bases can be discovered.
*/
static struct commit *paint_queue_get(struct paint_state *state)
{
struct commit *commit = prio_queue_get(&state->queue);
timestamp_t generation;

if (!commit)
return NULL;

commit->object.flags &= ~ENQUEUED;
generation = commit_graph_generation(commit);

if (state->min_generation && generation > state->last_gen)
BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
generation, state->last_gen,
oid_to_hex(&commit->object.oid));
state->last_gen = generation;

/* generation cutoff */
if (generation < state->min_generation)
return NULL;

if (commit)
commit->object.flags &= ~ENQUEUED;
if (!state->mb_candidate_count) {
/* only stale entries remain */
if (!state->parent1_count && !state->parent2_count)
return NULL;

/* one side is exhausted */
if ((!state->parent1_count || !state->parent2_count) &&
generation < GENERATION_NUMBER_INFINITY)
return NULL;
}

paint_count_update(state, commit->object.flags, -1);
return commit;
}

/* all input commits in one and twos[] must have been parsed! */
/*
* See Documentation/technical/paint-down-to-common.adoc
*
* All input commits in one and twos[] must have been parsed!
*/
static int paint_down_to_common(struct repository *r,
struct commit *one, int n,
struct commit **twos,
timestamp_t min_generation,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
struct nonstale_queue queue = {
{ compare_commits_by_gen_then_commit_date }
struct paint_state state = {
.queue = { compare_commits_by_gen_then_commit_date }
};
struct commit *commit;
int i;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
int steps = 0;
struct commit_list **tail = result;

state.min_generation = min_generation;
state.last_gen = GENERATION_NUMBER_INFINITY;
if (!min_generation && !corrected_commit_dates_enabled(r))
queue.pq.compare = compare_commits_by_commit_date;
state.queue.compare = compare_commits_by_commit_date;

one->object.flags |= PARENT1;
if (!n) {
commit_list_append(one, result);
return 0;
}
nonstale_queue_put_dedup(&queue, one);
paint_queue_put(&state, one, 0);

for (i = 0; i < n; i++) {
twos[i]->object.flags |= PARENT2;
nonstale_queue_put_dedup(&queue, twos[i]);
}
for (i = 0; i < n; i++)
paint_queue_put(&state, twos[i], PARENT2);

while (queue.max_nonstale) {
struct commit *commit = nonstale_queue_get_dedup(&queue);
while ((commit = paint_queue_get(&state))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);

if (min_generation && generation > last_gen)
BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
generation, last_gen,
oid_to_hex(&commit->object.oid));
last_gen = generation;

if (generation < min_generation)
break;
steps++;

flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
if (flags == (PARENT1 | PARENT2)) {
Expand All @@ -152,7 +224,7 @@ static int paint_down_to_common(struct repository *r,
* descendant of this one.
*/
if (!(mb_flags & MERGE_BASE_FIND_ALL) &&
generation < GENERATION_NUMBER_INFINITY)
state.last_gen < GENERATION_NUMBER_INFINITY)
break;
}
/* Mark parents of a found merge stale */
Expand All @@ -165,7 +237,7 @@ static int paint_down_to_common(struct repository *r,
if ((p->object.flags & flags) == flags)
continue;
if (repo_parse_commit(r, p)) {
clear_nonstale_queue(&queue);
clear_prio_queue(&state.queue);
commit_list_free(*result);
*result = NULL;
/*
Expand All @@ -180,12 +252,13 @@ static int paint_down_to_common(struct repository *r,
return error(_("could not parse commit %s"),
oid_to_hex(&p->object.oid));
}
p->object.flags |= flags;
nonstale_queue_put_dedup(&queue, p);
paint_queue_put(&state, p, flags);
}
}

clear_nonstale_queue(&queue);
clear_prio_queue(&state.queue);
trace2_data_intmax("paint_down_to_common", r,
"steps", steps);
commit_list_sort_by_date(result);
return 0;
}
Expand Down
1 change: 1 addition & 0 deletions t/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -786,6 +786,7 @@ integration_tests = [
't6041-bisect-submodule.sh',
't6050-replace.sh',
't6060-merge-index.sh',
't6099-merge-base-side-exhaustion.sh',
't6100-rev-list-in-order.sh',
't6101-rev-parse-parents.sh',
't6102-rev-list-unexpected-objects.sh',
Expand Down
Loading
Loading