builtin/repack.c: implement --expire-to for storing pruned objects

When pruning objects with `--cruft`, `git repack` offers some
flexibility when selecting the set of which objects are pruned via the
`--cruft-expiration` option.

This is useful for expiring objects which are older than the grace
period, making races where to-be-pruned objects become reachable and
then ancestors of freshly pushed objects, leaving the repository in a
corrupt state after pruning substantially less likely [1].

But in practice, such races are impossible to avoid entirely, no matter
how long the grace period is. To prevent this race, it is often
advisable to temporarily put a repository into a read-only state. But in
practice, this is not always practical, and so some middle ground would
be nice.

This patch introduces a new option, `--expire-to`, which teaches `git
repack` to write an additional cruft pack containing just the objects
which were pruned from the repository. The caller can specify a
directory outside of the current repository as the destination for this
second cruft pack.

This makes it possible to prune objects from a repository, while still
holding onto a supplemental copy of them outside of the original
repository. Having this copy on-disk makes it substantially easier to
recover objects when the aforementioned race is encountered.

`--expire-to` is implemented in a somewhat convoluted manner, which is
to take advantage of the fact that the first time `write_cruft_pack()`
is called, it adds the name of the cruft pack to the `names` string
list. That means the second time we call `write_cruft_pack()`, objects
in the previously-written cruft pack will be excluded.

As long as the caller ensures that no objects are expired during the
second pass, this is sufficient to generate a cruft pack containing all
objects which don't appear in any of the new packs written by `git
repack`, including the cruft pack. In other words, all of the objects
which are about to be pruned from the repository.

It is important to note that the destination in `--expire-to` does not
necessarily need to be a Git repository (though it can be) Notably, the
expired packs do not contain all ancestors of expired objects. So if the
source repository contains something like:

              <unreachable>
             /
    C1 --- C2
      \
       refs/heads/master

where C2 is unreachable, but has a parent (C1) which is reachable, and
C2 would be pruned, then the expiry pack will contain only C2, not C1.

[1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Taylor Blau
2022-10-24 14:43:12 -04:00
committed by Junio C Hamano
parent c12cda479e
commit 91badeba32
3 changed files with 167 additions and 0 deletions

View File

@ -702,6 +702,10 @@ static int write_cruft_pack(const struct pack_objects_args *args,
* By the time it is read here, it contains only the pack(s)
* that were just written, which is exactly the set of packs we
* want to consider kept.
*
* If `--expire-to` is given, the double-use served by `names`
* ensures that the pack written to `--expire-to` excludes any
* objects contained in the cruft pack.
*/
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
@ -755,6 +759,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
int geometric_factor = 0;
int write_midx = 0;
const char *cruft_expiration = NULL;
const char *expire_to = NULL;
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@ -804,6 +809,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("find a geometric progression with factor <N>")),
OPT_BOOL('m', "write-midx", &write_midx,
N_("write a multi-pack index of the resulting packs")),
OPT_STRING(0, "expire-to", &expire_to, N_("dir"),
N_("pack prefix to store a pack containing pruned objects")),
OPT_END()
};
@ -1000,6 +1007,39 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
&existing_kept_packs);
if (ret)
return ret;
if (delete_redundant && expire_to) {
/*
* If `--expire-to` is given with `-d`, it's possible
* that we're about to prune some objects. With cruft
* packs, pruning is implicit: any objects from existing
* packs that weren't picked up by new packs are removed
* when their packs are deleted.
*
* Generate an additional cruft pack, with one twist:
* `names` now includes the name of the cruft pack
* written in the previous step. So the contents of
* _this_ cruft pack exclude everything contained in the
* existing cruft pack (that is, all of the unreachable
* objects which are no older than
* `--cruft-expiration`).
*
* To make this work, cruft_expiration must become NULL
* so that this cruft pack doesn't actually prune any
* objects. If it were non-NULL, this call would always
* generate an empty pack (since every object not in the
* cruft pack generated above will have an mtime older
* than the expiration).
*/
ret = write_cruft_pack(&cruft_po_args, expire_to,
pack_prefix,
NULL,
&names,
&existing_nonkept_packs,
&existing_kept_packs);
if (ret)
return ret;
}
}
string_list_sort(&names);