Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
Fix performance regression of a recent "fatten promisor pack with
local objects" protection against an unwanted gc.
* jt/fix-fattening-promisor-fetch:
index-pack --promisor: also check commits' trees
index-pack --promisor: don't check blobs
index-pack --promisor: dedup before checking links
Yet another "pass the repository through the callchain" topic.
* kn/midx-wo-the-repository:
midx: inline the `MIDX_MIN_SIZE` definition
midx: pass down `hash_algo` to functions using global variables
midx: pass `repository` to `load_multi_pack_index`
midx: cleanup internal usage of `the_repository` and `the_hash_algo`
midx-write: pass down repository to `write_midx_file[_only]`
write-midx: add repository field to `write_midx_context`
midx-write: use `revs->repo` inside `read_refs_snapshot`
midx-write: pass down repository to static functions
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) seems to contain an oversight in that the tree of a commit
is not checked. Teach git to check these trees.
The fix slows down a fetch from a certain repo at $DAYJOB from 2m2.127s
to 2m45.052s, but in order to make the fetch correct, it seems worth it.
In order to test this, we could create server and client repos as
follows...
C S
\ /
O
(O and C are commits both on the client and server. S is a commit
only on the server. C and S have the same tree but different commit
messages. The diff between O and C is non-zero.)
...and then, from the client, fetch S from the server.
In theory, the client declares "have C" and the server can use this
information to exclude S's tree (since it knows that the client has C's
tree, which is the same as S's tree). However, it is also possible for
the server to compute that it needs to send S and not O, and proceed
from there; therefore the objects of C are not considered at all when
determining what to send in the packfile. In order to prevent a test of
client functionality from having such a dependence on server behavior, I
have not included such a test.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to the parent of this commit, it was found that not
checking for the existence of blobs linked from trees sped up the fetch
from 24m47.815s to 2m2.127s. Teach Git to do that.
The tradeoff of not checking blobs is documented in a code comment.
(Blobs may also be linked from tag objects, but it is impossible to know
the type of an object linked from a tag object without looking it up in
the object database, so the code for that is untouched.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) fixed a bug with what was believed to be a negligible
decrease in performance [1] [2]. But at $DAYJOB, with at least one repo,
it was found that the decrease in performance was very significant.
Looking at the patch, whenever we parse an object in the packfile to
be indexed, we check the targets of all its outgoing links for its
existence. However, this could be optimized by first collecting all such
targets into an oidset (thus deduplicating them) before checking. Teach
Git to do that.
On a certain fetch from the aforementioned repo, this improved
performance from approximately 7 hours to 24m47.815s. This number will
be further reduced in a subsequent patch.
[1] https://lore.kernel.org/git/CAG1j3zGiNMbri8rZNaF0w+yP+6OdMz0T8+8_Wgd1R_p1HzVasg@mail.gmail.com/
[2] https://lore.kernel.org/git/20241105212849.3759572-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kn/the-repository:
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.
First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.
In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set because `git index-pack` can be used without a repository.
In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.
These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.
We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.
We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We now ensure "index-pack" is used with the "--promisor" option
only during a "git fetch".
* jt/index-pack-allow-promisor-only-while-fetching:
index-pack: teach --promisor to forbid pack name
Currently,
- Running "index-pack --promisor" outside a repo segfaults.
- It may be confusing to a user that running "index-pack --promisor"
within a repo may make changes to the repo's object DB, especially
since the packs indexed by the index-pack invocation may not even be
related to the repo.
As discussed in [1] and [2], teaching --promisor to forbid a packfile
name solves both these problems. This combination of arguments requires
a repo (since we are writing the resulting .pack and .idx to it) and it
is clear that the files are related to the repo.
Currently, Git uses "index-pack --promisor" only when fetching into
a repo, so it could be argued that we should teach "index-pack" a
new argument (say, "--fetching-mode") instead of tying --promisor to
a generic argument like the packfile name. However, this --promisor
feature could conceivably be used whenever we have a packfile that is
known to come from the promisor remote (whether obtained through Git's
fetch protocol or through other means) so not using a new argument seems
reasonable - one could envision a user-made script obtaining a packfile
and then running "index-pack --promisor --stdin", for example. In fact,
it might be possible to relax the restriction further (say, by also
allowing --promisor when indexing a packfile that is in the object DB),
but relaxing the restriction is backwards-compatible so we can revisit
that later.
One thing to watch out for is the possibility of a future Git feature
that indexes a pack in the context of a repo, but does not necessarily
write the resulting pack to it (and does not necessarily desire to
make any changes to the object DB). One such feature would be fetch
quarantine, which might need the repo context in order to detect
hash collisions, but would also need to ensure that the object DB
is undisturbed in case the fetch fails for whatever reason, even if
the reason occurs only after the indexing is complete. It may not be
obvious to the implementer of such a feature that "index-pack" could
sometimes write packs other than the indexed pack to the object DB,
but there are already other ways that "fetch" could write to the object
DB (in particular, packfile URIs and bundle URIs), so hopefully the
implementation of this future feature would already include a test that
the object DB be undisturbed.
This change requires the change to t5300 by 1f52cdfacb (index-pack:
document and test the --promisor option, 2022-03-09) to be undone.
(--promisor is already tested indirectly, so we don't need the explicit
test here any more.)
[1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"thread_local" is a keyword in C23. To make sure that our code compiles
on a wide variety of C versions, rename struct thread_local to "struct
thread_local_data" to avoid a conflict.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.
This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.
This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.
Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.
Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.
Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The convention to calling into built-in command implementation has
been updated to pass the repository, if known, together with the
prefix value.
* jc/pass-repo-to-builtins:
add: pass in repo variable instead of global the_repository
builtin: remove USE_THE_REPOSITORY for those without the_repository
builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
builtin: add a repository parameter for builtin functions
Instead of including USE_THE_REPOSITORY_VARIABLE by default on every
builtin, remove it from builtin.h and add it to all the builtins that
include builtin.h (by definition, that means all builtins/*.c).
Also, remove the include statement for repository.h since it gets
brought in through builtin.h.
The next step will be to migrate each builtin
from having to use the_repository.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to reduce the usage of the global the_repository, add a
parameter to builtin functions that will get passed a repository
variable.
This commit uses UNUSED on most of the builtin functions, as subsequent
commits will modify the actual builtins to pass the repository parameter
down.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported that git-verify-pack(1) has started to crash with Git
v2.46.0 when run outside of a repository. This is another fallout from
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), where we have stopped setting the default hash algorithm
for `the_repository`. Consequently, code that relies on `the_hash_algo`
will now crash when it hasn't explicitly been initialized, which may be
the case when running outside of a Git repository.
The crash is not in git-verify-pack(1) but instead in git-index-pack(1),
which gets called by the former. Ideally, both of these programs should
be able to identify the hash algorithm used by the packfile and index
without having to rely on external information. But unfortunately, the
format for neither of them is completely self-describing, so it is not
possible to derive that information. This is a design issue that we
should address by introducing a new packfile version that encodes its
object hash.
For now though the more important fix is to not make either of these
programs crash anymore, which we do by falling back to SHA1 when the
object hash is unconfigured. This pessimizes reading packfiles which
use a different hash than SHA1, but restores previous behaviour.
Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.
As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".
Note that both in this and subsequent commits in this series we always
just pass `the_repository->hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two packfile stream consumers, index-pack and
unpack-objects, that allow excess payload after the packfile stream
data. Their code to relay excess data hasn't changed significantly
since their original implementation that appeared in 67e5a5ec
(git-unpack-objects: re-write to read from stdin, 2005-06-28) and
9bee2478 (mimic unpack-objects when --stdin is used with index-pack,
2006-10-25).
These code blocks contain hand-rolled loops using xwrite(), written
before our write_in_full() helper existed. This helper now provides
the same functionality.
Replace these loops with write_in_full() for shorter, clearer
code. Update related variables accordingly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.
* jc/index-pack-fsck-levels:
index-pack: --fsck-objects to take an optional argument for fsck msgs
index-pack: test and document --strict=<msg-id>=<severity>...
git-index-pack has a --strict option that can take an optional argument
to provide a list of fsck issues to change their severity.
--fsck-objects does not have such a utility, which would be useful if
one would like to be more lenient or strict on data integrity in a
repository.
Like --strict, allow --fsck-objects to also take a list of fsck msgs to
change the severity.
Remove the "For internal use only" note for --fsck-objects, and document
the option. This won't often be used by the normal end user, but it
turns out it is useful for Git forges like GitLab.
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
2015-06-22) allowed a list of fsck msg to downgrade to be passed to
--strict. However this is a hidden argument that was not documented nor
tested. Though it is true that most users would not call this option
directly, (nor use index-pack for that matter) it is still useful to
document and test this feature.
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:
+ git index-pack --fix-thin --stdin
fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)
=================================================================
==3904583==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
#4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
#5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
#6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
#7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
Aborted
What happens is this:
0. We construct a bogus pack with a duplicate object in it and trigger
index-pack.
1. We spawn a bunch of worker threads to resolve deltas (on my system
it is 16 threads).
2. One of the threads sees the duplicate object and bails by calling
exit(), taking down all of the threads. This is expected and is the
point of the test.
3. At the time exit() is called, we may still be spawning threads from
the main process via pthread_create(). LSan hooks thread creation
to update its book-keeping; it has to know where each thread's
stack is (so it can find entry points for reachable memory). So it
calls pthread_getattr_np() to get information about the new thread.
That may allocate memory that must be freed with a matching call to
pthread_attr_destroy(). Probably LSan does that immediately, but
if you're unlucky enough, the exit() will happen while it's between
those two calls, and the allocated pthread_attr_t appears as a
leak.
This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.
It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).
In the meantime, it's pretty easy to avoid the race by making creation
of the worker threads "atomic". That is, we'll spawn all of them before
letting any of them start to work. That's easy to do because we already
have a work_lock() mutex for handing out that work. If the main process
takes it, then all of the threads will immediately block until we've
finished spawning and released it.
This shouldn't make any practical difference for non-LSan runs. The
thread spawning is quick, and could happen before any worker thread gets
scheduled anyway.
Probably other spots that use threads are subject to the same issues.
But since we have to manually insert locking (and since this really is
kind of a hack), let's not bother with them unless somebody experiences
a similar racy false-positive in practice.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The OpenSSL 3+ EVP API for SHA-* cannot support our prior use cases
supported by other SHA-* implementations. It has the following
differences:
1. ->init_fn is required before all use
2. struct assignments don't work and requires ->clone_fn
3. can't support ->update_fn after ->final_*fn
While fixing cases 1 and 2 is merely the matter of calling ->init_fn and
->clone_fn as appropriate, fixing case 3 requires calling ->final_*fn on
a temporary context that's cloned from the primary context.
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/ZPCL11k38PXTkFga@debian.me/
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Fixes: 3e440ea0ab ("sha256: avoid functions deprecated in OpenSSL 3+")
Fixes: bda9c12073 ("avoid SHA-1 functions deprecated in OpenSSL 3+")
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark-up unused parameters in the code so that we can eventually
enable -Wunused-parameter by default.
* jk/unused-parameter:
t/helper: mark unused callback void data parameters
tag: mark unused parameters in each_tag_name_fn callbacks
rev-parse: mark unused parameter in for_each_abbrev callback
replace: mark unused parameter in each_mergetag_fn callback
replace: mark unused parameter in ref callback
merge-tree: mark unused parameter in traverse callback
fsck: mark unused parameters in various fsck callbacks
revisions: drop unused "opt" parameter in "tweak" callbacks
count-objects: mark unused parameter in alternates callback
am: mark unused keep_cr parameters
http-push: mark unused parameter in xml callback
http: mark unused parameters in curl callbacks
do_for_each_ref_helper(): mark unused repository parameter
test-ref-store: drop unimplemented reflog-expire command
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
There are a few callback functions which are used with the fsck code,
but it's natural that not all callbacks need all parameters. For
reporting, even something as obvious as "the oid of the object which had
a problem" is not always used, as some callers are only checking a
single object in the first place. And for both reporting and walking,
things like void data pointers and the fsck_options aren't always
necessary.
But since each such parameter is used by _some_ callback, we have to
keep them in the interface. Mark the unused ones in specific callbacks
to avoid triggering -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Header files cleanup.
* en/header-split-cache-h-part-3: (28 commits)
fsmonitor-ll.h: split this header out of fsmonitor.h
hash-ll, hashmap: move oidhash() to hash-ll
object-store-ll.h: split this header out of object-store.h
khash: name the structs that khash declares
merge-ll: rename from ll-merge
git-compat-util.h: remove unneccessary include of wildmatch.h
builtin.h: remove unneccessary includes
list-objects-filter-options.h: remove unneccessary include
diff.h: remove unnecessary include of oidset.h
repository: remove unnecessary include of path.h
log-tree: replace include of revision.h with simple forward declaration
cache.h: remove this no-longer-used header
read-cache*.h: move declarations for read-cache.c functions from cache.h
repository.h: move declaration of the_index from cache.h
merge.h: move declarations for merge.c from cache.h
diff.h: move declaration for global in diff.c from cache.h
preload-index.h: move declarations for preload-index.c from elsewhere
sparse-index.h: move declarations for sparse-index.c from cache.h
name-hash.h: move declarations for name-hash.c from cache.h
run-command.h: move declarations for run-command.c from cache.h
...
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several builtins depend on being able to disable the replace references
so we actually operate on each object individually. These currently do
so by directly mutating the 'read_replace_refs' global.
A future change will move this global into a different place, so it will
be necessary to change all of these lines. However, we can simplify that
transition by abstracting the purpose of these global assignments with a
method call.
We will need to keep this read_replace_refs global forever, as we want
to make sure that we never use replace refs throughout the life of the
process if this method is called. Future changes may present a
repository-scoped version of the variable to represent that repository's
core.useReplaceRefs config value, but a zero-valued read_replace_refs
will always override such a setting.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we
added a test knob to conditionally enable writing a ".rev" file when
indexing a pack. At the time, this was used to ensure that the test
suite worked even when ".rev" files were written, which served as a
stress-test for the on-disk reverse index implementation.
Now that reading from on-disk ".rev" files is enabled by default, the
test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning.
We could get rid of the option entirely, but there would be no
convenient way to test Git when ".rev" files *aren't* in place.
Instead of getting rid of the option, invert its meaning to instead
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.
This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
spelling of "true", we are still running and exercising Git's behavior
when forced to generate reverse indexes from scratch. Do so by setting
it in the linux-TEST-vars CI run to ensure that we are maintaining good
coverage of this now-legacy code.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.
A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack") to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).
Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.
They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:
- an ~80% reduction in the time it takes to serve fetches on a popular
repository, Homebrew/homebrew-core.
- a ~60% reduction in the peak memory usage to serve fetches on that
same repository.
- a collective savings of ~35% in CPU time across all pack-objects
invocations serving fetches across all repositories in a single
datacenter.
Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms]
Range (min … max): 521.0 ms … 577.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms]
Range (min … max): 226.0 ms … 259.6 ms 13 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
The same is true of writing a pack containing the objects for the 30
most-recent commits:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms]
Range (min … max): 839.3 ms … 886.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms]
Range (min … max): 567.5 ms … 599.3 ms 10 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms]
Range (min … max): 290.3 ms … 331.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms]
Range (min … max): 1.6 ms … 4.6 ms 1155 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.
This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.
(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).
[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers