diff: halt tree-diff early after max_changes

When computing the changed-paths bloom filters for the commit-graph,
we limit the size of the filter by restricting the number of paths
in the diff. Instead of computing a large diff and then ignoring the
result, it is better to halt the diff computation early.

Create a new "max_changes" option in struct diff_options. If non-zero,
then halt the diff computation after discovering strictly more changed
paths. This includes paths corresponding to trees that change.

Use this max_changes option in the bloom filter calculations. This
reduces the time taken to compute the filters for the Linux kernel
repo from 2m50s to 2m35s. On a large internal repository with ~500
commits that perform tree-wide changes, the time reduced from
6m15s to 3m48s.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Derrick Stolee
2020-03-30 00:31:27 +00:00
committed by Junio C Hamano
parent ed591febb4
commit e369698016
3 changed files with 14 additions and 1 deletions

View File

@ -434,6 +434,9 @@ static struct combine_diff_path *ll_diff_tree_paths(
if (diff_can_quit_early(opt))
break;
if (opt->max_changes && opt->num_changes > opt->max_changes)
break;
if (opt->pathspec.nr) {
skip_uninteresting(&t, base, opt);
for (i = 0; i < nparent; i++)
@ -518,6 +521,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
/* t↓ */
update_tree_entry(&t);
opt->num_changes++;
}
/* t > p[imin] */
@ -535,6 +539,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
skip_emit_tp:
/* ∀ pi=p[imin] pi↓ */
update_tp_entries(tp, nparent);
opt->num_changes++;
}
}
@ -552,6 +557,7 @@ struct combine_diff_path *diff_tree_paths(
const struct object_id **parents_oid, int nparent,
struct strbuf *base, struct diff_options *opt)
{
opt->num_changes = 0;
p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt);
/*