mirrors/git - git - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Derrick Stolee	b4cf68476a	pack-objects: prevent name hash version change When the --name-hash-version option is used in 'git pack-objects', it can change from the initial assignment to when it is used based on interactions with other arguments. Specifically, when writing or reading bitmaps, we must force version 1 for now. This could change in the future when the bitmap format can store a name hash version value, indicating which was used during the writing of the packfile. Protect the 'git pack-objects' process from getting confused by failing with a BUG() statement if the value of the name hash version changes between calls to pack_name_hash_fn(). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	ce961135cc	pack-objects: add GIT_TEST_NAME_HASH_VERSION Add a new environment variable to opt-in to different values of the --name-hash-version=<n> option in 'git pack-objects'. This allows for extra testing of the feature without repeating all of the test scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add this to the linux-TEST-vars CI build as that test run is already overloaded. The behavior exposed by this test variable is of low risk and should be sufficient to allow manual testing when an issue arises. But this option isn't free. There are a few tests that change behavior with the variable enabled. First, there are a few tests that are very sensitive to certain delta bases being picked. These are both involving the generation of thin bundles and then counting their objects via 'git index-pack --fix-thin' which pulls the delta base into the new packfile. For these tests, disable the option as a decent long-term option. Second, there are some tests that compare the exact output of a 'git pack-objects' process when using bitmaps. The warning that ignores the --name-hash-version=2 and forces version 1 causes these tests to fail. Disable the environment variable to get around this issue. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	928ef41dd8	repack: add --name-hash-version option The new '--name-hash-version' option for 'git repack' is a simple pass-through to the underlying 'git pack-objects' subcommand. However, this subcommand may have other options and a temporary filename as part of the subcommand execution that may not be predictable or could change over time. The existing test_subcommand method requires an exact list of arguments for the subcommand. This is too rigid for our needs here, so create a new method, test_subcommand_flex. Use it to check that the --name-hash-version option is passing through. Since we are modifying the 'git repack' command, let's bring its usage in line with the Documentation's synopsis. This removes it from the allow list in t0450 so it will remain in sync in the future. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	fc62e033cd	pack-objects: add --name-hash-version option The previous change introduced a new pack_name_hash_v2() function that intends to satisfy much of the hash locality features of the existing pack_name_hash() function while also distinguishing paths with similar final components of their paths. This change adds a new --name-hash-version option for 'git pack-objects' to allow users to select their preferred function version. This use of an integer version allows for future expansion and a direct way to later store a name hash version in the .bitmap format. For now, let's consider how effective this mechanism is when repacking a repository with different name hash versions. Specifically, we will execute 'git pack-objects' the same way a 'git repack -adf' process would, except we include --name-hash-version=<n> for testing. On the Git repository, we do not expect much difference. All path names are short. This is backed by our results: \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 260 MB \| N/A \| \| --name-hash-version=1 \| 127 MB \| 129s \| \| --name-hash-version=2 \| 127 MB \| 112s \| This example demonstrates how there is some natural overhead coming from the cloned copy because the server is hosting many forks and has not optimized for exactly this set of reachable objects. But the full repack has similar characteristics for both versions. Let's consider some repositories that are hitting too many collisions with version 1. First, let's explore the kinds of paths that are commonly causing these collisions: * "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differentiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences In the parent directory. * Localization files frequently have common filenames but differentiates via parent directories. In C#, the name "/strings.resx.lcl" is used for these localization files and they will all collide in name-hash. [1] https://github.com/microsoft/beachball I've come across many other examples where some internal tool uses a common name across multiple directories and is causing Git to repack poorly due to name-hash collisions. One open-source example is the fluentui [2] repo, which uses beachball to generate CHANGELOG.json and CHANGELOG.md files, and these files have very poor delta characteristics when comparing against versions across parent directories. \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 694 MB \| N/A \| \| --name-hash-version=1 \| 438 MB \| 728s \| \| --name-hash-version=2 \| 168 MB \| 142s \| [2] https://github.com/microsoft/fluentui In this example, we see significant gains in the compressed packfile size as well as the time taken to compute the packfile. Using a collection of repositories that use the beachball tool, I was able to make similar comparisions with dramatic results. While the fluentui repo is public, the others are private so cannot be shared for reproduction. The results are so significant that I find it important to share here: \| Repo \| --name-hash-version=1 \| --name-hash-version=2 \| \|----------\|-----------------------\|-----------------------\| \| fluentui \| 440 MB \| 161 MB \| \| Repo B \| 6,248 MB \| 856 MB \| \| Repo C \| 37,278 MB \| 6,755 MB \| \| Repo D \| 131,204 MB \| 7,463 MB \| Future changes could include making --name-hash-version implied by a config value or even implied by default during a full repack. It is important to point out that the name hash value is stored in the .bitmap file format, so we must force --name-hash-version=1 when bitmaps are being read or written. Later, the bitmap format could be updated to be aware of the name hash version so deltas can be quickly computed across the bitmapped/not-bitmapped boundary. To promote the safety of this parameter, the validate_name_hash_version() method will die() if the given name-hash version is incorrect and will disable newer versions if not yet compatible with other features, such as --write-bitmap-index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:41 -08:00
Bence Ferdinandy	93dc16483a	fetch set_head: fix non-mirror remotes in bare repositories In `b1b713f722` (fetch set_head: handle mirrored bare repositories, 2024-11-22) it was implicitly assumed that all remotes will be mirrors in a bare repository, thus fetching a non-mirrored remote could lead to HEAD pointing to a non-existent reference. Make sure we only overwrite HEAD if we are in a bare repository and fetching from a mirror. Otherwise, proceed as normally, and create refs/remotes/<nonmirrorremote>/HEAD instead. Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 08:16:47 -08:00
Bence Ferdinandy	638060dcb9	fetch set_head: refactor to use remote directly As a preparatory step to use even more properties from the remote struct, refactor set_head to take the entire struct as a parameter, instead of the necessary bits. This also allows consolidating the use of gtransport->remote in set_head, making the access of the remote's properties consistent in the function. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 08:16:45 -08:00
ZheNing Hu	08032fa30f	gc: add `--expire-to` option This commit extends the functionality of `git gc` by adding a new option, `--expire-to=<dir>`. Previously, this feature was implemented in `91badeba32` (builtin/repack.c: implement `--expire-to` for storing pruned objects, 2022-10-24), which allowing users to specify a directory where unreachable and expired cruft packs are stored during garbage collection. However, users had to run `git repack --cruft --expire-to=<dir>` followed by `git prune` to achieve similar results within `git gc`. By introducing `--expire-to=<dir>` directly into `git gc`, we simplify the process for users who wish to manage their repository's cleanup more efficiently. This change involves passing the `--expire-to=<dir>` parameter through to `git repack`, making it easier for users to set up a backup location for cruft packs that will be pruned. Due to the original `git gc --prune=now` deleting all unreachable objects by passing the `-a` parameter to git repack. With the addition of the `--cruft` and `--expire-to` options, it is necessary to modify this default behavior: instead of deleting these unreachable objects, they should be merged into a cruft pack and collected in a specified directory. Therefore, we do not pass `-a` to the repack command but instead pass `--cruft`, `--expire-to`, and `--cruft-expiration=now` to repack. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-24 14:32:28 -08:00
Patrick Steinhardt	8ccc75c245	remote: announce removal of "branches/" and "remotes/" Back when Git was in its infancy, remotes were configured via separate files in "branches/" (back in 2005). This mechanism was replaced later that year with the "remotes/" directory. Both mechanisms have eventually been replaced by config-based remotes, and it is very unlikely that anybody still uses these directories to configure their remotes. Both of these directories have been marked as deprecated, one in 2005 and the other one in 2011. Follow through with the deprecation and finally announce the removal of these features in Git 3.0. Signed-off-by: Patrick Steinhardt <ps@pks.im> [jc: with a small tweak to the help message] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-24 08:08:56 -08:00
Taylor Blau	a8dd3821fe	csum-file: introduce hashfile_checkpoint_init() In `106140a99f` (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and `9218c0bfe1` (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both `106140a99f` and `9218c0bfe1` work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-23 10:28:17 -08:00
Karthik Nayak	6b2aa7fd37	pack-write: pass hash_algo to `write_rev_file()` The `write_rev_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Also modify children functions `write_rev_file_order()` and `write_rev_header()` to accept 'the_hash_algo'. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. However, in `midx-write.c`, since all usage of global variables is removed, don't reintroduce them and instead use the `repo` available in the context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	7653e9af9b	pack-write: pass hash_algo to `write_idx_file()` The `write_idx_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls `write_idx_file()`, update it to accept a `struct git_hash_algo` as a parameter and pass it through to the callee. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	e2f6f76585	pack-write: pass repository to `index_pack_lockfile()` The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	8244d01de6	pack-write: pass hash_algo to `fixup_pack_header_footer()` The `fixup_pack_header_footer()` function uses the global `the_hash_algo` variable to access the repository's hash function. To avoid global variable usage, pass a hash_algo from the layers above. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
René Scharfe	c5490ce9d1	ref-filter: remove ref_format_clear() Now that ref_format_clear() no longer releases any memory we don't need it anymore. Remove it and its counterpart, ref_format_init(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 09:06:24 -08:00
René Scharfe	5e58db6575	ref-filter: move ahead-behind bases into used_atom verify_ref_format() parses a ref-filter format string and stores recognized items in the static array "used_atom". For "ahead-behind:<committish>" it stores the committish part in a string_list member "bases" of struct ref_format. ref_sorting_options() also parses bare ref-filter format items and stores stores recognized ones in "used_atom" as well. The committish parts go to a dummy struct ref_format in parse_sorting_atom(), though, and are leaked and forgotten. If verify_ref_format() is called before ref_sorting_options(), like in git for-each-ref, then all works well if the sort key is included in the format string. If it isn't then sorting cannot work as the committishes are missing. If ref_sorting_options() is called first, like in git branch, then we have the additional issue that if the sort key is included in the format string then filter_ahead_behind() can't see its committish, will not generate any results for it and thus it will be expanded to an empty string. Fix those issues by replacing the string_list with a field in used_atom for storing the committish. This way it can be shared for handling both ref-filter format strings and sorting options in the same command. Reported-by: Ross Goldberg <ross.goldberg@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 09:06:15 -08:00
Junio C Hamano	7b39a128c8	Merge branch 'ps/the-repository' More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`	2025-01-21 08:44:54 -08:00
Junio C Hamano	d6a7cace21	Merge branch 'jt/fsck-skiplist-parse-fix' A misconfigured "fsck.skiplist" configuration variable was not diagnosed as an error, which has been corrected. * jt/fsck-skiplist-parse-fix: fsck: reject misconfigured fsck.skipList	2025-01-21 08:44:53 -08:00
Junio C Hamano	cb441e1ec3	Merge branch 'ps/reftable-get-random-fix' The code to compute "unique" name used git_rand() which can fail or get stuck; the callsite does not require cryptographic security. Introduce the "insecure" mode and use it appropriately. * ps/reftable-get-random-fix: reftable/stack: accept insecure random bytes wrapper: allow generating insecure random bytes	2025-01-21 08:44:53 -08:00
Jeff King	98046591b9	index-pack, unpack-objects: use skip_prefix to avoid magic number When parsing --pack_header=, we manually skip 14 bytes to the data. Let's use skip_prefix() to do this automatically. Note that we overwrite our pointer to the front of the string, so we have to add more context to the error message. We could avoid this by declaring an extra pointer to hold the value, but I think the modified message is actually preferable; it should give translators a bit more context. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:56 -08:00
Jeff King	f1299bff26	index-pack, unpack-objects: use get_be32() for reading pack header Both of these commands read the incoming pack into a static unsigned char buffer in BSS, and then parse it by casting the start of the buffer to a struct pack_header. This can result in SIGBUS on some platforms if the compiler doesn't place the buffer in a position that is properly aligned for 4-byte integers. This reportedly happens with unpack-objects (but not index-pack) on sparc64 when compiled with clang (but not gcc). But we are definitely in the wrong in both spots; since the buffer's type is unsigned char, we can't depend on larger alignment. When it works it is only because we are lucky. We'll fix this by switching to get_be32() to read the headers (just like the last few commits similarly switched us to put_be32() for writing into the same buffer). It would be nice to factor this out into a common helper function, but the interface ends up quite awkward. Either the caller needs to hardcode how many bytes we'll need, or it needs to pass us its fill()/use() functions as pointers. So I've just fixed both spots in the same way; this is not code that is likely to be repeated a third time (most of the pack reading code uses an mmap'd buffer, which should be properly aligned). I did make one tweak to the shared code: our pack_version_ok() macro expects us to pass the big-endian value we'd get by casting. We can introduce a "native" variant which uses the host integer ordering. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:56 -08:00
Jeff King	798e0f4516	packfile: factor out --pack_header argument parsing Both index-pack and unpack-objects accept a --pack_header argument. This is an undocumented internal argument used by receive-pack and fetch to pass along information about the header of the pack, which they've already read from the incoming stream. In preparation for a bugfix, let's factor the duplicated code into a common helper. The callers are still responsible for identifying the option. While this could likewise be factored out, it is more flexible this way (e.g., if they ever started using parse-options and wanted to handle both the stuck and unstuck forms). Likewise, the callers are responsible for reporting errors, though they both just call die(). I've tweaked unpack-objects to match index-pack in marking the error for translation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:55 -08:00
Junio C Hamano	f66d1423f5	builtin: send usage() help text to standard output Using the show_usage_and_exit_if_asked() helper we introduced earlier, fix callers of usage() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. These are the bog standard "if we got only '-h', then that is a request for help" callers. Their if (argc == 2 && !strcmp(argv[1], "-h")) usage(message); are simply replaced with show_usage_and_exit_if_asked(argc, argv, message); With this, the built-ins tested by t0012 all send their help text to their standard output stream, so the check in t0012 that was half tightened earlier is now fully tightened to insist on standard error stream being empty. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	a36a822d7d	oddballs: send usage() help text to standard output Using the show_usage_if_asked() helper we introduced earlier, fix callers of usage() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The callers in this step are oddballs in that their invocations of usage() are not guarded by if (argc == 2 && !strcmp(argv[1], "-h") usage(...); There are (unnecessarily) being clever ones that do things like if (argc != 2 \|\| !strcmp(argv[1], "-h") usage(...); to say "I know I take only one argument, so argc != 2 is always an error regardless of what is in argv[]. Ah, by the way, even if argc is 2, "-h" is a request for usage text, so we do the same". Some like "git var -h" just do not treat "-h" any specially, and let it take the same error code paths as a parameter error. Now we cannot do the same, so these callers are rewrittin to do the show_usage_and_exit_if_asked() first and then handle the usage error the way they used to. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	b821c999ca	builtins: send usage_with_options() help text to standard output Using the show_usage_with_options_if_asked() helper we introduced earlier, fix callers of usage_with_options() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The test in t7600 for "git merge -h" may want to be retired, as the same is covered by t0012 already, but it is specifically testing that the "-h" option gets a response even with a corrupt index file, so for now let's leave it there. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	564b907c8a	Merge branch 'ps/more-sign-compare' More -Wsign-compare fixes. * ps/more-sign-compare: sign-compare: avoid comparing ptrdiff with an int/unsigned commit-reach: use `size_t` to track indices when computing merge bases shallow: fix -Wsign-compare warnings builtin/log: fix remaining -Wsign-compare warnings builtin/log: use `size_t` to track indices commit-reach: use `size_t` to track indices in `get_reachable_subset()` commit-reach: use `size_t` to track indices in `remove_redundant()` commit-reach: fix type of `min_commit_date` commit-reach: fix index used to loop through unsigned integer prio-queue: fix type of `insertion_ctr`	2025-01-16 16:35:14 -08:00
Jean-Noël Avila	d533c10697	doc: the mode param of -u of git commit is optional Fix the synopsis to reflect the option description. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-15 14:43:36 -08:00
Junio C Hamano	b28fb93e51	Merge branch 'ps/build-sign-compare' Last-minute fix for a regression in "git blame --abbrev=<length>" when insane <length> is specified; we used to correctly cap it to the hash output length but broke it during the cycle. * ps/build-sign-compare: builtin/blame: fix out-of-bounds write with blank boundary commits builtin/blame: fix out-of-bounds read with excessive `--abbrev`	2025-01-10 09:19:34 -08:00
Patrick Steinhardt	e7fb2ca945	builtin/blame: fix out-of-bounds write with blank boundary commits When passing the `-b` flag to git-blame(1), then any blamed boundary commits which were marked as uninteresting will not get their actual commit ID printed, but will instead be replaced by a couple of spaces. The flag can lead to an out-of-bounds write as though when combined with `--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ` as we simply use memset(3p) on that array with the user-provided length directly. The result is most likely that we segfault. An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes. But when the underlying object ID is SHA1, and if the abbreviated length exceeds the SHA1 length, it would cause us to print more bytes than desired, and the result would be misaligned. Instead, fix the bug by computing the length via strlen(3p). This makes us write as many bytes as the formatted object ID requires and thus effectively limits the length of what we may end up printing to the length of its hash. If `--abbrev=` asks us to abbreviate to something shorter than the full length of the underlying hash function it would be handled by the call to printf(3p) correctly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-10 06:56:55 -08:00
Patrick Steinhardt	1fbb8d7ecb	builtin/blame: fix out-of-bounds read with excessive `--abbrev` In `6411a0a896` (builtin/blame: fix type of `length` variable when emitting object ID, 2024-12-06) we have fixed the type of the `length` variable. In order to avoid a cast from `size_t` to `int` in the call to printf(3p) with the "%.*s" formatter we have converted the code to instead use fwrite(3p), which accepts the length as a `size_t`. It was reported though that this makes us read over the end of the OID array when the provided `--abbrev=` length exceeds the length of the object ID. This is because fwrite(3p) of course doesn't stop when it sees a NUL byte, whereas printf(3p) does. Fix the bug by reverting back to printf(3p) and culling the provided length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an `int`. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-10 06:56:54 -08:00
M Hickford	0b43274850	credential-cache: respect authtype capability Previously, credential-cache populated authtype regardless whether "get" request had authtype capability. As documented in git-credential.txt, authtype "should not be sent unless the appropriate capability ... is provided". Add test. Without this change, the test failed because "credential fill" printed an incomplete credential with only protocol and host attributes (the unexpected authtype attribute was discarded by credential.c). Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-09 15:04:15 -08:00
Justin Tobler	ca7158076f	fsck: reject misconfigured fsck.skipList In Git, fsck operations can ignore known broken objects via the `fsck.skipList` configuration. This option expects a path to a file with the list of object names. When the configuration is specified without a path, an error message is printed, but the command continues as if the configuration was not set. Configuring `fsck.skipList` without a value is a misconfiguration so config parsing should be more strict and reject it. Update `git_fsck_config()` to no longer ignore misconfiguration of `fsck.skipList`. The same behavior is also present for `fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration parsers for these are updated to ensure the related operations remain consistent. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-07 09:22:25 -08:00
Patrick Steinhardt	1568d1562e	wrapper: allow generating insecure random bytes The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-07 09:04:18 -08:00
Junio C Hamano	a41e394e21	Merge branch 'bf/fetch-set-head-config' A hotfix on an advice messagge added during this cycle. * bf/fetch-set-head-config: fetch: fix erroneous set_head advice message	2025-01-06 12:02:21 -08:00
Bence Ferdinandy	233d48f5de	fetch: fix erroneous set_head advice message `9e2b7005be` (fetch set_head: add warn-if-not-$branch option, 2024-12-05) tried to expand the advice message for set_head with the new option, but unfortunately did not manage to add the right incantation. Fix the advice message with the correct usage of warn-if-not-$branch. Reported-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-06 06:50:03 -08:00
Junio C Hamano	effbef2beb	Merge branch 'jk/lsan-race-ignore-false-positive' CI jobs that run threaded programs under LSan has been giving false positives from time to time, which has been worked around. This is an alternative to the jk/lsan-race-with-barrier topic with much smaller change to the production code. * jk/lsan-race-ignore-false-positive: test-lib: ignore leaks in the sanitizer's thread code test-lib: check leak logs for presence of DEDUP_TOKEN test-lib: simplify leak-log checking test-lib: rely on logs to detect leaks Revert barrier-based LSan threading race workaround	2025-01-02 13:37:08 -08:00
Junio C Hamano	fc89d14c63	Revert barrier-based LSan threading race workaround The extra "barrier" approach was too much code whose sole purpose was to work around a race that is not even ours (i.e. in LSan's teardown code). In preparation for queuing a solution taking a much-less-invasive approach, let's revert them.	2025-01-01 14:13:01 -08:00
Junio C Hamano	d893741e02	Merge branch 'jk/lsan-race-with-barrier' CI jobs that run threaded programs under LSan has been giving false positives from time to time, which has been worked around. * jk/lsan-race-with-barrier: grep: work around LSan threading race with barrier index-pack: work around LSan threading race with barrier thread-utils: introduce optional barrier type Revert "index-pack: spawn threads atomically" test-lib: use individual lsan dir for --stress runs	2025-01-01 09:21:15 -08:00
Junio C Hamano	98422943f0	Merge branch 'ps/weak-sha1-for-tail-sum-fix' An earlier "csum-file checksum does not have to be computed with sha1dc" topic had a few code paths that had initialized an implementation of a hash function to be used by an unmatching hash by mistake, which have been corrected. * ps/weak-sha1-for-tail-sum-fix: ci: exercise unsafe OpenSSL backend builtin/fast-import: fix segfault with unsafe SHA1 backend bulk-checkin: fix segfault with unsafe SHA1 backend	2025-01-01 09:21:14 -08:00
Patrick Steinhardt	106140a99f	builtin/fast-import: fix segfault with unsafe SHA1 backend Same as with the preceding commit, git-fast-import(1) is using the safe variant to initialize a hashfile checkpoint. This leads to a segfault when passing the checkpoint into the hashfile subsystem because it would use the unsafe variants instead: ++ git --git-dir=R/.git fast-import --big-file-threshold=1 AddressSanitizer:DEADLYSIGNAL ================================================================= ==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0) ==577126==The signal is caused by a READ memory access. ==577126==Hint: address points to the zero page. #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2 #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2 #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2 #4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2 #5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3 #6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5 #7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4 #8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4 #9 0x555555b1f493 in run_builtin ../git.c:480:11 #10 0x555555b1bfef in handle_builtin ../git.c:740:9 #11 0x555555b1e6f4 in run_argv ../git.c:807:4 #12 0x555555b1b87a in cmd_main ../git.c:947:19 #13 0x5555561649e6 in main ../common-main.c:64:11 #14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) #15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) #16 0x555555772c84 in _start (git+0x21ec84) ==577126==Register values: rax = 0x0000511000000cc0 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000 rdi = 0x0000000000000000 rsi = 0x00005070000009c0 rbp = 0x00005070000009c0 rsp = 0x00007fffffff5b30 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30 r12 = 0x0000000000000000 r13 = 0x00007fffffff6b60 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex ==577126==ABORTING ./test-lib.sh: line 1039: 577126 Aborted git --git-dir=R/.git fast-import --big-file-threshold=1 < input error: last command exited with $?=134 not ok 167 - R: blob bigger than threshold The segfault is only exposed in case the unsafe and safe backends are different from one another. Fix the issue by initializing the context with the unsafe SHA1 variant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:46:30 -08:00
Jeff King	7a8d9efc26	grep: work around LSan threading race with barrier There's a race with LSan when spawning threads and one of the threads calls die(). We worked around one such problem with index-pack in the previous commit, but it exists in git-grep, too. You can see it with: make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux cd t ./t0003-attributes.sh --stress which fails pretty quickly with: ==git==4096424==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180 #2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150 #3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long, unsigned long, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614 #4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53 #5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431 #6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447 #7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 As with the previous commit, we can fix this by inserting a barrier that makes sure all threads have finished their setup before continuing. But there's one twist in this case: the thread which calls die() is not one of the worker threads, but the main thread itself! So we need the main thread to wait in the barrier, too, until all threads have gotten to it. And thus we initialize the barrier for num_threads+1, to account for all of the worker threads plus the main one. If we then test as above, t0003 should run indefinitely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:58 -08:00
Jeff King	526c0a851b	index-pack: work around LSan threading race with barrier We sometimes get false positives from our linux-leaks CI job because of a race in LSan itself. The problem is that one thread is still initializing its stack in LSan's code (and allocating memory to do so) while anothe thread calls die(), taking down the whole process and triggering a leak check. The problem is described in more detail in `993d38a066` (index-pack: spawn threads atomically, 2024-01-05), which tried to fix it by pausing worker threads until all calls to pthread_create() had completed. But that's not enough to fix the problem, because the LSan setup code runs in the threads themselves. So even though pthread_create() has returned, we have no idea if all threads actually finished their setup before letting any of them do real work. We can fix that by using a barrier inside the threads themselves, waiting for all of them to hit the start of their main function before any of them proceed. You can test for the race by running: make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux cd t ./t5309-pack-delta-cycles.sh --stress which fails quickly before this patch, and should run indefinitely without it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:58 -08:00
Jeff King	ca9d60f246	Revert "index-pack: spawn threads atomically" This reverts commit `993d38a066`. That commit was trying to solve a race between LSan setting up the threads stack and another thread calling exit(), by making sure that all pthread_create() calls have finished before doing any work that might trigger the exit(). But that isn't sufficient. The setup code actually runs in the individual threads themselves, not in the spawning thread's call to pthread_create(). So while it may have improved the race a bit, you can still trigger it pretty quickly with: make SANITIZE=leak cd t ./t5309-pack-delta-cycles.sh --stress Let's back out that failed attempt so we can try again. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:57 -08:00
Patrick Steinhardt	5e7fe8a7b8	commit-reach: use `size_t` to track indices when computing merge bases The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:12:40 -08:00
Patrick Steinhardt	1ab5948141	builtin/log: fix remaining -Wsign-compare warnings Fix remaining -Wsign-compare warnings in "builtin/log.c" and mark the file as -Wsign-compare-clean. While most of the fixes are obvious, one fix requires us to use `cast_size_t_to_int()`, which will cause us to die in case the `size_t` cannot be represented as `int`. This should be fine though, as the data would typically be set either via a config key or via the command line, neither of which should ever exceed a couple of kilobytes of data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:46 -08:00
Patrick Steinhardt	0905ed201a	builtin/log: use `size_t` to track indices Similar as with the preceding commit, adapt "builtin/log.c" so that it tracks array indices via `size_t` instead of using signed integers. This fixes a couple of -Wsign-compare warnings and prepares the code for a similar refactoring of `repo_get_merge_bases_many()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:46 -08:00
Junio C Hamano	88e59f8027	Merge branch 'js/range-diff-diff-merges' "git range-diff" learned to optionally show and compare merge commits in the ranges being compared, with the --diff-merges option. * js/range-diff-diff-merges: range-diff: introduce the convenience option `--remerge-diff` range-diff: optionally include merge commits' diffs in the analysis	2024-12-23 09:32:17 -08:00
Junio C Hamano	002a8a9d36	Merge branch 'as/show-index-uninitialized-hash' Regression fix for 'show-index' when run outside of a repository. * as/show-index-uninitialized-hash: t5300: add test for 'show-index --object-format' show-index: fix uninitialized hash function	2024-12-23 09:32:12 -08:00
Junio C Hamano	4156b6a741	Merge branch 'ps/build-sign-compare' Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-23 09:32:11 -08:00
Junio C Hamano	49edce4ff9	show-index: the short help should say the command reads from its input The short help text given by "git show-index -h" says $ git show-index -h usage: git show-index [--object-format=<hash-algorithm>] --[no-]object-format <hash-algorithm> specify the hash algorithm to use The command takes a pack .idx file from its standard input. The user has to _know_ this, as there is no indication from this output. Give a hint that the data to work on is fed from its standard input. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-20 17:30:57 -08:00
Junio C Hamano	cb89eebf3b	Merge branch 'js/log-remerge-keep-ancestry' "git log -p --remerge-diff --reverse" was completely broken. * js/log-remerge-keep-ancestry: log: --remerge-diff needs to keep around commit parents	2024-12-19 10:58:31 -08:00

1 2 3 4 5 ...

12555 Commits