* ps/reftable-backend:
refs/reftable: fix leak when copying reflog fails
ci: add jobs to test with the reftable backend
refs: introduce reftable backend
"git branch" and friends learned to use the formatted text as
sorting key, not the underlying timestamp value, when the --sort
option is used with author or committer timestamp with a format
specifier (e.g., "--sort=creatordate:format:%H:%M:%S").
* vd/for-each-ref-sort-with-formatted-timestamp:
ref-filter.c: sort formatted dates by byte value
Doc update.
* js/contributor-docs-updates:
SubmittingPatches: hyphenate non-ASCII
SubmittingPatches: clarify GitHub artifact format
SubmittingPatches: clarify GitHub visual
SubmittingPatches: provide tag naming advice
SubmittingPatches: update extra tags list
SubmittingPatches: discourage new trailers
SubmittingPatches: drop ref to "What's in git.git"
CodingGuidelines: write punctuation marks
CodingGuidelines: move period inside parentheses
Doc update.
* ml/doc-merge-updates:
Documentation/git-merge.txt: use backticks for command wrapping
Documentation/git-merge.txt: fix reference to synopsis
Doc updates to clarify what an "unborn branch" means.
* jc/orphan-unborn:
orphan/unborn: fix use of 'orphan' in end-user facing messages
orphan/unborn: add to the glossary and use them consistently
Doc updates.
* jc/doc-most-refs-are-not-that-special:
docs: MERGE_AUTOSTASH is not that special
docs: AUTO_MERGE is not that special
refs.h: HEAD is not that special
git-bisect.txt: BISECT_HEAD is not that special
git.txt: HEAD is not that special
Unlike other environment variables that took the usual
true/false/yes/no as well as 0/1, GIT_FLUSH only understood 0/1,
which has been corrected.
* cp/git-flush-is-an-env-bool:
write-or-die: make GIT_FLUSH a Boolean environment variable
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.
* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email". Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.
* tz/send-email-negatable-options:
send-email: avoid duplicate specification warnings
perl: bump the required Perl version to 5.8.1 from 5.8.0
The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.
* jc/index-pack-fsck-levels:
index-pack: --fsck-objects to take an optional argument for fsck msgs
index-pack: test and document --strict=<msg-id>=<severity>...
Update the ref sorting functions of 'ref-filter.c' so that when date fields
are specified with a format string (such as in 'git for-each-ref
--sort=creatordate:<something>'), they are sorted by their formatted string
value rather than by the underlying numeric timestamp. Currently, date
fields are always sorted by timestamp, regardless of whether formatting
information is included in the '--sort' key.
Leaving the default (unformatted) date sorting unchanged, sorting by the
formatted date string adds some flexibility to 'for-each-ref' by allowing
for behavior like "sort by year, then by refname within each year" or "sort
by time of day". Because the inclusion of a format string previously had no
effect on sort behavior, this change likely will not affect existing usage
of 'for-each-ref' or other ref listing commands.
Additionally, update documentation & tests to document the new sorting
mechanism.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The syntax commonly used for alternatives is --opt-(a|b), not
--opt-{a,b}.
List bad/new and good/old consistently in this order, to be
consistent with the description for "git bisect terms". Clarify
<term> to either <term-old> or <term-new> to make them consistent
with the description of "git bisect (good|bad)" subcommands.
Suggested-by: Matthieu Moy <git@matthieu-moy.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for "git bisect terms", although it did not hide
any information, was a bit incomplete and forced readers to fill in
the blanks to get the complete picture.
Acked-by: Matthieu Moy <git@matthieu-moy.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.
This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:
- It becomes possible to do truly atomic writes where either all refs
are committed to disk or none are. This was not possible with the
"files" backend because ref updates were split across multiple loose
files.
- The disk space required to store many refs is reduced, both compared
to loose refs and packed-refs. This is enabled both by the reftable
format being a binary format, which is more compact, and by prefix
compression.
- We can ignore filesystem-specific behaviour as ref names are not
encoded via paths anymore. This means there is no need to handle
case sensitivity on Windows systems or Unicode precomposition on
macOS.
- There is no need to rewrite the complete refdb anymore every time a
ref is being deleted like it was the case for packed-refs. This
means that ref deletions are now constant time instead of scaling
linearly with the number of refs.
- We can ignore file/directory conflicts so that it becomes possible
to store both "refs/heads/foo" and "refs/heads/foo/bar".
- Due to this property we can retain reflogs for deleted refs. We have
previously been deleting reflogs together with their refs to avoid
file/directory conflicts, which is not necessary anymore.
- We can properly enumerate all refs. With the "files" backend it is
not easily possible to distinguish between refs and non-refs because
they may live side by side in the gitdir.
Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.
Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:
- It makes it significantly easier to land the "reftable" backend as
tests behave the same. It would be tough to argue for each and every
single test that doesn't pass with the "reftable" backend.
- It ensures compatibility between repositories that use the "files"
backend and repositories that use the "reftable" backend. Like this,
hosters can migrate their repositories to use the "reftable" backend
without causing issues for clients that use the "files" backend in
their clones.
It is expected that these artificial limitations may eventually go away
in the long term.
Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:
- Creating N refs in separate transactions shows that the "files"
backend is ~50% faster. This is not surprising given that creating a
ref only requires us to create a single loose ref. The "reftable"
backend will also perform auto compaction on updates. In real-world
workloads we would likely also want to perform pack loose refs,
which would likely change the picture.
Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms]
Range (min … max): 1.8 ms … 4.3 ms 133 runs
Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms]
Range (min … max): 2.4 ms … 2.9 ms 132 runs
Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s]
Range (min … max): 1.969 s … 1.980 s 3 runs
Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s]
Range (min … max): 2.597 s … 2.622 s 3 runs
Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s]
Range (min … max): 198.189 s … 198.670 s 3 runs
Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s]
Range (min … max): 290.223 s … 298.761 s 3 runs
- Creating N refs in a single transaction shows that the "files"
backend is significantly slower once we start to write many refs.
The "reftable" backend only needs to update two files, whereas the
"files" backend needs to write one file per ref.
Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms]
Range (min … max): 1.8 ms … 2.6 ms 151 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms]
Range (min … max): 2.4 ms … 3.4 ms 148 runs
Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms]
Range (min … max): 148.5 ms … 167.8 ms 15 runs
Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms]
Range (min … max): 56.3 ms … 72.9 ms 40 runs
Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s]
Range (min … max): 152.165 s … 153.542 s 3 runs
Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s]
Range (min … max): 51.769 s … 52.012 s 3 runs
- Deleting a ref in a fully-packed repository shows that the "files"
backend scales with the number of refs. The "reftable" backend has
constant-time deletions.
Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms]
Range (min … max): 1.6 ms … 2.1 ms 316 runs
Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms]
Range (min … max): 1.7 ms … 2.1 ms 294 runs
Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms]
Range (min … max): 1.9 ms … 2.5 ms 287 runs
Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms]
Range (min … max): 1.8 ms … 2.1 ms 217 runs
Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms]
Range (min … max): 224.6 ms … 245.2 ms 6 runs
Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms]
Range (min … max): 2.0 ms … 2.1 ms 3 runs
- Listing all refs shows no significant advantage for either of the
backends. The "files" backend is a bit faster, but not by a
significant margin. When repositories are not packed the "reftable"
backend outperforms the "files" backend because the "reftable"
backend performs auto-compaction.
Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.0 ms 1729 runs
Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 1.8 ms 1816 runs
Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms]
Range (min … max): 4.1 ms … 4.6 ms 645 runs
Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms]
Range (min … max): 4.2 ms … 5.9 ms 643 runs
Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s]
Range (min … max): 2.511 s … 2.627 s 10 runs
Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s]
Range (min … max): 2.692 s … 2.752 s 10 runs
Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 1.9 ms 1834 runs
Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 2.0 ms 1840 runs
Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms]
Range (min … max): 13.3 ms … 14.5 ms 208 runs
Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms]
Range (min … max): 4.3 ms … 6.2 ms 624 runs
Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s]
Range (min … max): 11.965 s … 12.370 s 10 runs
Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s]
Range (min … max): 2.769 s … 2.836 s 10 runs
- Printing a single ref shows no real difference between the "files"
and "reftable" backends.
Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms]
Range (min … max): 1.4 ms … 1.8 ms 1779 runs
Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 2.5 ms 1753 runs
Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 1.9 ms 1840 runs
Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.0 ms 1831 runs
Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.1 ms 1848 runs
Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.1 ms 1762 runs
So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.
The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".
There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.
As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.
[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/
Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that multi-pack reuse is supported, enable it via the
feature.experimental configuration in addition to the classic
`pack.allowPackReuse`.
This will allow more users to experiment with the new behavior who might
not otherwise be aware of the existing `pack.allowPackReuse`
configuration option.
The enum with values NO_PACK_REUSE, SINGLE_PACK_REUSE, and
MULTI_PACK_REUSE is defined statically in builtin/pack-objects.c's
compilation unit. We could hoist that enum into a scope visible from the
repository_settings struct, and then use that enum value in
pack-objects. Instead, define a single int that indicates what
pack-objects's default value should be to avoid additional unnecessary
code movement.
Though `feature.experimental` implies `pack.allowPackReuse=multi`, this
can still be overridden by explicitly setting the latter configuration
to either "single" or "false". Tests covering all of these cases are
showin t5332.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the --exclude-per-directory option marked it
as deprecated, which confused readers into thinking there may be a
plan to remove it in the future, which was not our intention.
* jc/ls-files-doc-update:
ls-files: avoid the verb "deprecate" for individual options
The labels on conflict markers for the common ancestor, our version,
and the other version are available to custom 3-way merge driver
via %S, %X, and %Y placeholders.
* ad/custom-merge-placeholder-for-symbolic-pathnames:
merge-ll: expose revision names to custom drivers
git-index-pack has a --strict option that can take an optional argument
to provide a list of fsck issues to change their severity.
--fsck-objects does not have such a utility, which would be useful if
one would like to be more lenient or strict on data integrity in a
repository.
Like --strict, allow --fsck-objects to also take a list of fsck msgs to
change the severity.
Remove the "For internal use only" note for --fsck-objects, and document
the option. This won't often be used by the normal end user, but it
turns out it is useful for Git forges like GitLab.
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
2015-06-22) allowed a list of fsck msg to downgrade to be passed to
--strict. However this is a hidden argument that was not documented nor
tested. Though it is true that most users would not call this option
directly, (nor use index-pack for that matter) it is still useful to
document and test this feature.
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All conditional "advice" messages show how to turn them off, which
becomes repetitive. Setting advice.* configuration explicitly on
now omits the instruction part.
* rj/advice-disable-how-to-disable:
advice: allow disabling the automatic hint in advise_if_enabled()
Define "special ref" as a very narrow set that consists of
FETCH_HEAD and MERGE_HEAD, and clarify everything else that used to
be classified as such are actually just pseudorefs.
* ps/not-so-many-refs-are-special:
Documentation: add "special refs" to the glossary
refs: redefine special refs
refs: convert MERGE_AUTOSTASH to become a normal pseudo-ref
sequencer: introduce functions to handle autostashes via refs
refs: convert AUTO_MERGE to become a normal pseudo-ref
sequencer: delete REBASE_HEAD in correct repo when picking commits
sequencer: clean up pseudo refs with REF_NO_DEREF