refs/files: use heuristic to decide whether to repack with --auto

The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
whether or not a repack is in order. This switch has been introduced
mostly with the "reftable" backend in mind, which already knows to
auto-compact its tables during normal operations. When the flag is set,
then it will use the same auto-compaction mechanism and thus end up
doing nothing in most cases.

The "files" backend does not have any such heuristic yet and instead
packs any loose references unconditionally. So we rewrite the complete
"packed-refs" file even if there's only a single loose reference to be
packed.

Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
`git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
unconditionally executed via our auto maintenance, so we end up repacking
references every single time auto maintenance kicks in. And while that
commit already mentioned that the "files" backend unconditionally packs
refs now, the author obviously didn't quite think about the consequences
thereof. So while the idea was sound, we really should have added a
heuristic to the "files" backend before implementing it.

Introduce a heuristic that decides whether or not it is worth to pack
loose references. The important factors to decide here are the number of
loose references in comparison to the overall size of the "packed-refs"
file. The bigger the "packed-refs" file, the longer it takes to rewrite
it and thus we scale up the limit of allowed loose references before we
repack.

As is the nature of heuristics, this mechansim isn't obviously
"correct", but should rather be seen as a tradeoff between how much
resources we spend packing refs and how inefficient the ref store
becomes. For all I can say, we have successfully been using the exact
same heuristic in Gitaly for several years by now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Patrick Steinhardt
2024-09-04 10:53:08 +02:00
committed by Junio C Hamano
parent bd51dca36e
commit c3459ae9ef
4 changed files with 164 additions and 9 deletions

View File

@ -1300,6 +1300,68 @@ static int should_pack_ref(struct files_ref_store *refs,
return 0;
}
static int should_pack_refs(struct files_ref_store *refs,
struct pack_refs_opts *opts)
{
struct ref_iterator *iter;
size_t packed_size;
size_t refcount = 0;
size_t limit;
int ret;
if (!(opts->flags & PACK_REFS_AUTO))
return 1;
ret = packed_refs_size(refs->packed_ref_store, &packed_size);
if (ret < 0)
die("cannot determine packed-refs size");
/*
* Packing loose references into the packed-refs file scales with the
* number of references we're about to write. We thus decide whether we
* repack refs by weighing the current size of the packed-refs file
* against the number of loose references. This is done such that we do
* not repack too often on repositories with a huge number of
* references, where we can expect a lot of churn in the number of
* references.
*
* As a heuristic, we repack if the number of loose references in the
* repository exceeds `log2(nr_packed_refs) * 5`, where we estimate
* `nr_packed_refs = packed_size / 100`, which scales as following:
*
* - 1kB ~ 10 packed refs: 16 refs
* - 10kB ~ 100 packed refs: 33 refs
* - 100kB ~ 1k packed refs: 49 refs
* - 1MB ~ 10k packed refs: 66 refs
* - 10MB ~ 100k packed refs: 82 refs
* - 100MB ~ 1m packed refs: 99 refs
*
* We thus allow roughly 16 additional loose refs per factor of ten of
* packed refs. This heuristic may be tweaked in the future, but should
* serve as a sufficiently good first iteration.
*/
limit = log2u(packed_size / 100) * 5;
if (limit < 16)
limit = 16;
iter = cache_ref_iterator_begin(get_loose_ref_cache(refs, 0), NULL,
refs->base.repo, 0);
while ((ret = ref_iterator_advance(iter)) == ITER_OK) {
if (should_pack_ref(refs, iter->refname, iter->oid,
iter->flags, opts))
refcount++;
if (refcount >= limit) {
ref_iterator_abort(iter);
return 1;
}
}
if (ret != ITER_DONE)
die("error while iterating over references");
return 0;
}
static int files_pack_refs(struct ref_store *ref_store,
struct pack_refs_opts *opts)
{
@ -1312,6 +1374,9 @@ static int files_pack_refs(struct ref_store *ref_store,
struct strbuf err = STRBUF_INIT;
struct ref_transaction *transaction;
if (!should_pack_refs(refs, opts))
return 0;
transaction = ref_store_transaction_begin(refs->packed_ref_store, &err);
if (!transaction)
return -1;