clone, submodule: pass partial clone filters to submodules

When cloning a repo with a --filter and with --recurse-submodules
enabled, the partial clone filter only applies to the top-level repo.
This can lead to unexpected bandwidth and disk usage for projects which
include large submodules. For example, a user might wish to make a
partial clone of Gerrit and would run:
`git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`.
However, only the superproject would be a partial clone; all the
submodules would have all blobs downloaded regardless of their size.
With this change, the same filter can also be applied to submodules,
meaning the expected bandwidth and disk savings apply consistently.

To avoid changing default behavior, add a new clone flag,
`--also-filter-submodules`. When this is set along with `--filter` and
`--recurse-submodules`, the filter spec is passed along to git-submodule
and git-submodule--helper, such that submodule clones also have the
filter applied.

This applies the same filter to the superproject and all submodules.
Users who need to customize the filter per-submodule would need to clone
with `--no-recurse-submodules` and then manually initialize each
submodule with the proper filter.

Applying filters to submodules should be safe thanks to Jonathan Tan's
recent work [1, 2, 3] eliminating the use of alternates as a method of
accessing submodule objects, so any submodule object access now triggers
a lazy fetch from the submodule's promisor remote if the accessed object
is missing. This patch is a reworked version of [4], which was created
prior to Jonathan Tan's work.

[1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16)
[2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate',
	2021-09-20)
[3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules',
	2021-10-25)
[4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Josh Steadmon
2022-02-04 21:00:49 -08:00
committed by Junio C Hamano
parent 297ca895a2
commit f05da2b48b
8 changed files with 175 additions and 8 deletions

View File

@ -6,3 +6,8 @@ clone.defaultRemoteName::
clone.rejectShallow::
Reject to clone a repository if it is a shallow one, can be overridden by
passing option `--reject-shallow` in command line. See linkgit:git-clone[1]
clone.filterSubmodules::
If a partial clone filter is provided (see `--filter` in
linkgit:git-rev-list[1]) and `--recurse-submodules` is used, also apply
the filter to submodules.

View File

@ -16,7 +16,7 @@ SYNOPSIS
[--depth <depth>] [--[no-]single-branch] [--no-tags]
[--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
[--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow]
[--filter=<filter>] [--] <repository>
[--filter=<filter> [--also-filter-submodules]] [--] <repository>
[<directory>]
DESCRIPTION
@ -182,6 +182,11 @@ objects from the source repository into a pack in the cloned repository.
at least `<size>`. For more details on filter specifications, see
the `--filter` option in linkgit:git-rev-list[1].
--also-filter-submodules::
Also apply the partial clone filter to any submodules in the repository.
Requires `--filter` and `--recurse-submodules`. This can be turned on by
default by setting the `clone.filterSubmodules` config option.
--mirror::
Set up a mirror of the source repository. This implies `--bare`.
Compared to `--bare`, `--mirror` not only maps local branches of the

View File

@ -133,7 +133,7 @@ If you really want to remove a submodule from the repository and commit
that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
options.
update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--[no-]single-branch] [--] [<path>...]::
update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--[no-]single-branch] [--filter <filter spec>] [--] [<path>...]::
+
--
Update the registered submodules to match what the superproject
@ -177,6 +177,10 @@ submodule with the `--init` option.
If `--recursive` is specified, this command will recurse into the
registered submodules, and update any nested submodules within.
If `--filter <filter spec>` is specified, the given partial clone filter will be
applied to the submodule. See linkgit:git-rev-list[1] for details on filter
specifications.
--
set-branch (-b|--branch) <branch> [--] <path>::
set-branch (-d|--default) [--] <path>::