Going into a secondary worktree and asking "is the main worktree
bare?" did not work correctly when per-worktree configuration
option was in use, which has been corrected.
* op/worktree-is-main-bare-fix:
worktree: detect from secondary worktree if main worktree is bare
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.
* da/help-autocorrect-one-fix:
help: add "show" as a valid configuration value
help: show the suggested command when help.autocorrect is false
Foreign language interface for Rust into our code base has been added.
* js/libgit-rust:
libgit: add higher-level libgit crate
libgit-sys: also export some config_set functions
libgit-sys: introduce Rust wrapper for libgit.a
common-main: split init and exit code into new files
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
Convert a handful of unit tests to work with the clar framework.
* sk/unit-tests-0130:
t/unit-tests: convert strcmp-offset test to use clar test framework
t/unit-tests: convert strbuf test to use clar test framework
t/unit-tests: adapt example decorate test to use clar test framework
t/unit-tests: convert hashmap test to use clar test framework
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers. It however forgot
to check parse errors.
* pw/apply-ulong-overflow-check:
apply: detect overflow when parsing hunk header
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.
* ps/setup-reinit-fixes:
setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
t0001: remove duplicate test
CI updates (containerization, dropping stale ones, etc.).
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When extensions.worktreeConfig is true and the main worktree is
bare -- that is, its config.worktree file contains core.bare=true
-- commands run from secondary worktrees incorrectly see the main
worktree as not bare. As such, those commands incorrectly think
that the repository's default branch (typically "main" or
"master") is checked out in the bare repository even though it's
not. This makes it impossible, for instance, to checkout or delete
the default branch from a secondary worktree, among other
shortcomings.
This problem occurs because, when extensions.worktreeConfig is
true, commands run in secondary worktrees only consult
$commondir/config and $commondir/worktrees/<id>/config.worktree,
thus they never see the main worktree's core.bare=true setting in
$commondir/config.worktree.
Fix this problem by consulting the main worktree's config.worktree
file when checking whether it is bare. (This extra work is
performed only when running from a secondary worktree.)
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some test in t6423 supress Git's exit code, which can cause test
failures go unnoticed. Specifically using git <subcommand> |
<other-command> masks potential failures of the Git command.
This commit ensures that Git's exit status is correctly propogated by:
- Avoiding pipes that suppress exit codes.
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"test -f" does not provide a nice error message when we hit test
failures, so use test_path_is_file instead.
Signed-off-by: ambar chakravartty <amch9605@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Adapt strcmp-offset test script to clar framework by using clar
assertions where necessary. Introduce `test_strcmp_offset__empty()` to
verify `check_strcmp_offset()` behavior when both input strings are
empty. This ensures the function correctly handles edge cases and
returns expected values.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt strbuf test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce `test_example_decorate__initialize()` to explicitly set up
object IDs and retrieve corresponding objects before tests run. This
ensures a consistent and predictable test state without relying on data
from previous tests.
Add `test_example_decorate__cleanup()` to clear decorations after each
test, preventing interference between tests and ensuring each runs in
isolation.
Adapt example decorate test script to clar framework by using clar
assertions where necessary. Previously, tests relied on data written by
earlier tests, leading to unintended dependencies between them. This
explicitly initializes the necessary state within
`test_example_decorate__readd`, ensuring it does not depend on prior
test executions.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapts hashmap test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
The exact same issue as described in the preceding commit also exists
for GIT_DEFAULT_HASH. Thus, reinitializing a repository that e.g. uses
SHA1 with `GIT_DEFAULT_HASH=sha256 git init` will cause the object
format of that repository to change to SHA256. This is of course bogus
as any existing objects and refs will not be converted, thus causing
repository corruption:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git commit --allow-empty -m message
[main (root-commit) 35a7344] message
$ GIT_DEFAULT_HASH=sha256 git init
Reinitialized existing Git repository in /tmp/repo/.git/
$ git show
fatal: your current branch appears to be broken
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with an object hash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
the default ref format that new repostiories shall be initialized with.
While this is the expected behaviour when creating a new repository, it
is not when reinitializing a repository: we should retain the ref format
currently used by it in that case.
This doesn't work correctly right now:
$ git init --ref-format=files repo
Initialized empty Git repository in /tmp/repo/.git/
$ GIT_DEFAULT_REF_FORMAT=reftable git init repo
fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory
Instead of retaining the current ref format, the reinitialization tries
to reinitialize the repository with the different format. This action
fails when git-init(1) tries to write the ".git/refs/heads" stub, which
in the context of the reftable backend is always written as a file so
that we can detect clients which inadvertently try to access the repo
with the wrong ref format. Seems like the protection mechanism works for
this case, as well.
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with a ref storage format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test in question is an exact copy of the testcase preceding it.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" uses strtoul() to parse the numbers in the hunk header but
silently ignores overflows. As LONG_MAX is a legitimate return value for
strtoul() we need to set errno to zero before the call to strtoul() and
check that it is still zero afterwards. The error message we display is
not particularly helpful as it does not say what was wrong. However, it
seems pretty unlikely that users are going to trigger this error in
practice and we can always improve it later if needed.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The C functions exported by libgit-sys do not provide an idiomatic Rust
interface. To make it easier to use these functions via Rust, add a
higher-level "libgit" crate, that wraps the lower-level configset API
with an interface that is more Rust-y.
This combination of $X and $X-sys crates is a common pattern for FFI in
Rust, as documented in "The Cargo Book" [1].
[1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.
* ps/reflog-migration-with-logall-fix:
refs: fix migration of reflogs respecting "core.logAllRefUpdates"
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.
* am/trace2-with-valueless-true:
trace2: prevent segfault on config collection with valueless true
reflog entries for symbolic ref updates were broken, which has been
corrected.
* kn/reflog-symref-fix:
refs: fix creation of reflog entries for symrefs
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.
* rs/ref-fitler-used-atoms-value-fix:
ref-filter: remove ref_format_clear()
ref-filter: move is-base tip to used_atom
ref-filter: move ahead-behind bases into used_atom
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
Introduce libgit-sys, a Rust wrapper crate that allows Rust code to call
functions in libgit.a. This initial patch defines build rules and an
interface that exposes user agent string getter functions as a proof of
concept. This library can be tested with `cargo test`. In later commits,
a higher-level library containing a more Rust-friendly interface will be
added at `contrib/libgit-rs`.
Symbols in libgit can collide with symbols from other libraries such as
libgit2. We avoid this by first exposing library symbols in
public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
avoid collisions and set to visible using a visibility pragma. In
build.rs, Rust builds contrib/libgit-rs/libgit-sys/libgitpub.a, which also
contains libgit.a and other dependent libraries, with
-fvisibility=hidden to hide all symbols within those libraries that
haven't been exposed with a visibility pragma.
Co-authored-by: Kyle Lippincott <spectral@google.com>
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable/ library code has been made -Wsign-compare clean.
* ps/reftable-sign-compare:
reftable: address trivial -Wsign-compare warnings
reftable/blocksource: adjust `read_block()` to return `ssize_t`
reftable/blocksource: adjust type of the block length
reftable/block: adjust type of the restart length
reftable/block: adapt header and footer size to return a `size_t`
reftable/basics: adjust `hash_size()` to return `uint32_t`
reftable/basics: adjust `common_prefix_size()` to return `size_t`
reftable/record: handle overflows when decoding varints
reftable/record: drop unused `print` function pointer
meson: stop disabling -Wsign-compare
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.
* mh/credential-cache-authtype-request-fix:
credential-cache: respect authtype capability
Move a few more unit tests to the clar test framework.
* sk/unit-tests:
t/unit-tests: convert reftable tree test to use clar test framework
t/unit-tests: adapt priority queue test to use clar test framework
t/unit-tests: convert mem-pool test to use clar test framework
t/unit-tests: handle dashes in test suite filenames
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.
Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.
Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.
My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:
Test this tree
--------------------------------------------------
5314.1: paths at head 4.5K
5314.2: distinct hash value: v1 4.1K
5314.3: maximum multiplicity: v1 13
5314.4: distinct hash value: v2 4.2K
5314.5: maximum multiplicity: v2 9
Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.
In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
[1] https://github.com/microsoft/fluentui
That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.
With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.
In a more extreme example, an internal monorepo had a much worse
collision rate:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.
The recently-added --name-hash-version option allows for testing
different name hash functions. Version 2 intends to preserve some of the
locality of version 1 while more often breaking collisions due to long
filenames.
Distinguishing objects by more of the path is critical when there are
many name hash collisions and several versions of the same path in the
full history, giving a significant boost to the full repack case. The
locality of the hash function is critical to compressing something like
a shallow clone or a thin pack representing a push of a single commit.
This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.
[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875
Checked out at the parent of [2], I see the following statistics:
Test HEAD
---------------------------------------------------------------
5313.2: thin pack with version 1 0.37(0.44+0.02)
5313.3: thin pack size with version 1 1.2M
5313.4: big pack with version 1 2.04(7.77+0.23)
5313.5: big pack size with version 1 20.4M
5313.6: shallow fetch pack with version 1 1.41(2.94+0.11)
5313.7: shallow pack size with version 1 34.4M
5313.8: repack with version 1 95.70(676.41+2.87)
5313.9: repack size with version 1 439.3M
5313.10: thin pack with version 2 0.12(0.12+0.06)
5313.11: thin pack size with version 2 22.0K
5313.12: big pack with version 2 2.80(5.43+0.34)
5313.13: big pack size with version 2 25.9M
5313.14: shallow fetch pack with version 2 1.77(2.80+0.19)
5313.15: shallow pack size with version 2 33.7M
5313.16: repack with version 2 33.68(139.52+2.58)
5313.17: repack size with version 2 160.5M
To make comparisons easier, I will reformat this output into a different
table style:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.37 s | 0.12 s | 1.2 M | 22.0 K |
| Big Pack | 2.04 s | 2.80 s | 20.4 M | 25.9 M |
| Shallow Pack | 1.41 s | 1.77 s | 34.4 M | 33.7 M |
| Repack | 95.70 s | 33.68 s | 439.3 M | 160.5 M |
The v2 hash function successfully differentiates the CHANGELOG.md files
from each other, which leads to significant improvements in the thin
pack (simulating a push of this commit) and the full repack. There is
some bloat in the "big pack" scenario and essentially the same results
for the shallow pack.
In the case of the Git repository, these numbers show some of the issues
with this approach:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.1 K | 1.1 K |
| Big Pack | 1.69 s | 1.95 s | 13.5 M | 14.5 M |
| Shallow Pack | 1.26 s | 1.29 s | 12.0 M | 12.2 M |
| Repack | 29.51 s | 29.01 s | 237.7 M | 238.2 M |
Here, the attempts to remove conflicts in the v2 function seem to cause
slight bloat to these sizes. This shows that the Git repository benefits
a lot from cross-path delta pairs.
The results are similar with the nodejs/node repo:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.6 K | 1.6 K |
| Big Pack | 4.61 s | 3.26 s | 56.0 M | 52.8 M |
| Shallow Pack | 7.82 s | 7.51 s | 104.6 M | 107.0 M |
| Repack | 88.90 s | 73.75 s | 740.1 M | 764.5 M |
Here, the v2 name-hash causes some size bloat more often than it reduces
the size, but it also universally improves performance time, which is an
interesting reversal. This must mean that it is helping to short-circuit
some delta computations even if it is not finding the most efficient
ones. The performance improvement cannot be explained only due to the
I/O cost of writing the resulting packfile.
The Linux kernel repository was the initial target of the default name
hash value, and its naming conventions are practically build to take the
most advantage of the default name hash values:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|----------|----------|---------|---------|
| Thin Pack | 0.17 s | 0.07 s | 4.6 K | 4.6 K |
| Big Pack | 17.88 s | 12.35 s | 201.1 M | 159.1 M |
| Shallow Pack | 11.05 s | 22.94 s | 269.2 M | 273.8 M |
| Repack | 727.39 s | 566.95 s | 2.5 G | 2.5 G |
Here, the thin and big packs gain some performance boosts in time, with
a modest gain in the size of the big pack. The shallow pack, however, is
more expensive to compute, likely because similarly-named files across
different directories are farther apart in the name hash ordering in v2.
The repack also gains benefits in computation time but no meaningful
change to the full size.
Finally, an internal Javascript repo of moderate size shows significant
gains when repacking with --name-hash-version=2 due to it having many name
hash collisions. However, it's worth noting that only the full repack
case has significant differences from the v1 name hash:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|-----------|-----------|----------|---------|---------|
| Thin Pack | 8.28 s | 7.28 s | 16.8 K | 16.8 K |
| Big Pack | 12.81 s | 11.66 s | 29.1 M | 29.1 M |
| Shallow | 4.86 s | 4.06 s | 42.5 M | 44.1 M |
| Repack | 3126.50 s | 496.33 s | 6.2 G | 855.6 M |
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.
But this option isn't free. There are a few tests that change behavior
with the variable enabled.
First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.
Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>