
In7a5d604443
(commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception viaf559d6d45e
(revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas7a5d604443
initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
81 lines
2.1 KiB
Bash
Executable File
81 lines
2.1 KiB
Bash
Executable File
#!/bin/sh
|
|
|
|
test_description='handling of missing objects in rev-list'
|
|
|
|
TEST_PASSES_SANITIZE_LEAK=true
|
|
. ./test-lib.sh
|
|
|
|
# We setup the repository with two commits, this way HEAD is always
|
|
# available and we can hide commit 1.
|
|
test_expect_success 'create repository and alternate directory' '
|
|
test_commit 1 &&
|
|
test_commit 2 &&
|
|
test_commit 3
|
|
'
|
|
|
|
# We manually corrupt the repository, which means that the commit-graph may
|
|
# contain references to already-deleted objects. We thus need to enable
|
|
# commit-graph paranoia to not returned these deleted commits from the graph.
|
|
GIT_COMMIT_GRAPH_PARANOIA=true
|
|
export GIT_COMMIT_GRAPH_PARANOIA
|
|
|
|
for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
|
|
do
|
|
test_expect_success "rev-list --missing=error fails with missing object $obj" '
|
|
oid="$(git rev-parse $obj)" &&
|
|
path=".git/objects/$(test_oid_to_path $oid)" &&
|
|
|
|
mv "$path" "$path.hidden" &&
|
|
test_when_finished "mv $path.hidden $path" &&
|
|
|
|
test_must_fail git rev-list --missing=error --objects \
|
|
--no-object-names HEAD
|
|
'
|
|
done
|
|
|
|
for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
|
|
do
|
|
for action in "allow-any" "print"
|
|
do
|
|
test_expect_success "rev-list --missing=$action with missing $obj" '
|
|
oid="$(git rev-parse $obj)" &&
|
|
path=".git/objects/$(test_oid_to_path $oid)" &&
|
|
|
|
# Before the object is made missing, we use rev-list to
|
|
# get the expected oids.
|
|
git rev-list --objects --no-object-names \
|
|
HEAD ^$obj >expect.raw &&
|
|
|
|
# Blobs are shared by all commits, so evethough a commit/tree
|
|
# might be skipped, its blob must be accounted for.
|
|
if [ $obj != "HEAD:1.t" ]; then
|
|
echo $(git rev-parse HEAD:1.t) >>expect.raw &&
|
|
echo $(git rev-parse HEAD:2.t) >>expect.raw
|
|
fi &&
|
|
|
|
mv "$path" "$path.hidden" &&
|
|
test_when_finished "mv $path.hidden $path" &&
|
|
|
|
git rev-list --missing=$action --objects --no-object-names \
|
|
HEAD >actual.raw &&
|
|
|
|
# When the action is to print, we should also add the missing
|
|
# oid to the expect list.
|
|
case $action in
|
|
allow-any)
|
|
;;
|
|
print)
|
|
grep ?$oid actual.raw &&
|
|
echo ?$oid >>expect.raw
|
|
;;
|
|
esac &&
|
|
|
|
sort actual.raw >actual &&
|
|
sort expect.raw >expect &&
|
|
test_cmp expect actual
|
|
'
|
|
done
|
|
done
|
|
|
|
test_done
|